**Chromosomal Di**ff**erentiation in Genetically Isolated Populations of the Marsh-Specialist** *Crocidura suaveolens* **(Mammalia: Soricidae)**

### **Francisca Garcia 1, Luis Biedma 2, Javier Calzada 2, Jacinto Román 3, Alberto Lozano 4,5, Francisco Cortés 1, José A. Godoy 6,\* and Aurora Ruiz-Herrera 4,5,\***


Received: 21 January 2020; Accepted: 27 February 2020; Published: 2 March 2020

**Abstract:** The genus *Crocidura* represents a remarkable model for the study of chromosome evolution. This is the case of the lesser white-toothed shrew (*Crocidura suaveolens*), a representative of the Palearctic group. Although continuously distributed from Siberia to Central Europe, *C. suaveolens* is a rare, habitat-specialist species in the southwesternmost limit of its distributional range, in the Gulf of Cádiz (Iberian Peninsula). In this area, *C. suaveolens* is restricted to genetically isolated populations associated to the tidal marches of five rivers (Guadiana, Piedras, Odiel, Tinto and Guadalquivir). This particular distributional range provides a unique opportunity to investigate whether genetic differentiation and habitat specialization was accompanied by chromosomal variation. In this context, the main objective of this study was to determinate the chromosomal characteristics of the habitat-specialist *C. suaveolens* in Southwestern Iberia, as a way to understand the evolutionary history of this species in the Iberian Peninsula. A total of 41 individuals from six different populations across the Gulf of Cádiz were collected and cytogenetically characterized. We detected four different karyotypes, with diploid numbers (2n) ranging from 2n = 40 to 2n = 43. Two of them (2n = 41 and 2n = 43) were characterized by the presence of B-chromosomes. The analysis of karyotype distribution across lineages and populations revealed an association between mtDNA population divergence and chromosomal differentiation. *C. suaveolens* populations in the Gulf of Cádiz provide a rare example of true karyotypic polymorphism potentially associated to genetic isolation and habitat specialization in which to investigate the evolutionary significance of chromosomal variation in mammals and their contribution to phenotypic and ecological divergence.

**Keywords:** *Crocidura suaveolens*; shrews; habitat specialist; chromosomes; chromosomal evolution; B-chromosomes; chromosomal polymorphism; mtDNA

#### **1. Introduction**

Large-scale chromosomal changes, such as inversions, translocations, fusions and fissions, contribute to the reshuffling of genomes, thus providing new chromosomal forms on which natural selection can work. In this context, genome reshuffling has important evolutionary and ecological implications, since gene flow can be reduced within the reorganized regions in the heterokaryotype, thus affecting co-adapted genes locked within the rearrangement that, if advantageous, increase in frequency in natural populations (reviewed in [1]). In fact, evidence on the role of large-scale chromosomal changes in adaptation and diversification has been reported, especially in the case of inversions [2–5]. As for chromosomal fusions, however, empirical studies are limited to the house mouse (*Mus musculus domesticus*) and shrews (*Sorex araneus*), two mammalian systems where the presence of chromosomal fusions (either fixed or in polymorphic state within populations) are widespread [6–13]. Understanding the genetic and mechanistic basis of these processes will provide insights into how biodiversity originates and is maintained.

Shrews (family Soricidae) represent a clear example of chromosomal diversification within mammals, with diploid numbers ranging from 2n = 19 (*Blarina hylophaga*) to 2n = 68 (*Crocidura yankariensis*), showing both inter- and intra-specific chromosomal diversity [14]. Within Soricidae, the genus *Crocidura* represents a remarkable model where karyotypic variation correlates with phylogenetic relationships within the group. Initial studies based on nuclear and mitochondrial sequences (mtDNA) suggested a common ancestry of all *Crocidura* species [15,16], with a clear dichotomy between Afrotropical and Palearctic taxa, which display contrasting patterns of chromosomal differentiation [17]. With the exception of *C. luna* (2n = 28 or 36 [18]) Afrotropical species are characterized by high diploid numbers (from 2n = 42 to 2n = 68), 2n = 50 being the most common chromosomal form. Palearctic species, on the other hand, present a tendency for low diploid numbers (from 2n = 22 to 2n = 42, [17], with 2n = 40 as the predominant karyotype. It has been suggested that this bimodal distribution of diploid numbers within the genus *Crocidura* is probably the result of two contrasting tendencies in chromosomal evolution: (i) chromosomal fissions in Afrotropical species, and (ii) chromosomal fusions (tandem fusions centric fusions, and/or whole arm translocations) in Palearctic species [19].

Among Palearctic species, the lesser white-toothed shrew (*C. suaveolens*) is one of the most widely distributed shrews in Eurasia that presents the characteristic 2n = 40 chromosomal form [20,21]. Chromosomal polymorphisms, mainly involving pericentric inversions and heterochromatin distribution, have been reported as rare for white-toothed shrews [21]. Although continuously distributed from Siberia to Central Europe [22], the lesser white-toothed shrew is less common and has a more fragmented distribution in Western Europe, including the Iberian Peninsula [23]. In the southwesternmost limit of its distributional range, in the Gulf of Cádiz, *C. suaveolens* is rare, and only occurs in restricted, isolated populations associated to the tidal marches of five rivers (Guadiana, Piedras, Odiel, Tinto and Guadalquivir) [24] (see Figure 1). It has been proposed that its restricted distribution, the strict tidal marsh association and partly the genetic isolation of the populations, could be the consequence of competitive exclusion by the greater white-toothed shrew (*C. russula*, 2n = 42) [25,26]. In fact, recent studies based on mtDNA revealed the presence of two differentiated sub-lineages in Southwestern Iberia that diverged from other Iberian lineages around 140 Ka (50–240 Ka), and between themselves around 110 Ka (30–190 Ka). One of these sub-lineages occupies four river mouths closely located to one another (from 1 to 12 km), in the province of Huelva (Guadiana, Piedras, Odiel and Tinto; sub-lineage C3 [25]), whereas a second sub-lineage was present in the Guadalquivir river mouth (sub-lineage C4 [25]). Subsequent studies using microsatellites revealed that *C. suaveolens* populations from Guadiana, Piedras, Odiel and Tinto rivers showed low differentiation among them, but high differentiation with both the distant Guadalquivir population and the closely located Estero Domingo Rubio (EDR) population [24]. Overall, genetic data suggest that the observed genetic patterns are the result of both historical isolation and current restrictions to gene flow imposed by an adverse landscape matrix [24]. How chromosomal reorganizations have contributed (if so) to this genetic diverge is currently unknown.

**Figure 1.** Study area and populations of *C. suaveolens* sampled in the tidal marsh (dark gray) of the Gulf of Cádiz, Southwestern Iberia. The individuals of Cáceres and Zamora are shown in the inset map. The sampling locations and diploid numbers are shown as pie charts. The size of each chart indicates the number of individuals sampled, which ranged between 5 and 8 individuals, in each location.

Despite their relevance, chromosomal data has often been neglected in phylogeographical studies (reviewed in [27]). The particular distributional range of the habitat-specialist *C. suaveolens* in Southwestern Iberia provides a unique opportunity to investigate whether the formation of genetically isolated populations and habitat specialization was accompanied by chromosomal differentiation. Given this context, in the present study, we aimed to do the following: (i) identify the extent of chromosomal variation *C. suaveolens* populations in Southwestern Iberia, and (ii) determine how chromosomal variation is distributed within and among genetic and phylogeographic groups. In this paper, we discuss the relative contribution of genetic drift and natural selection and the potential evolutionary significance of observed chromosomal differences.

#### **2. Materials and Methods**

#### *2.1. Sampling*

A total of 41 individuals from six different populations across the Gulf of Cádiz were collected in 2015 and 2016. This included different localities along the banks of the Guadalquivir, Guadiana, Odiel, Piedras and Tinto rivers (see Figure 1 and Table 1). The sample size ranged from two to six individuals per population. When possible, samples were obtained from both river banks, in order to evaluate whether rivers were acting as barriers to gene flow. Two additional individuals from the Northwest Iberian lineage (lineage B [24]) were also included in the study, for comparison (individuals from the Cáceres and Zamora populations, Table 1).


**Table 1.** Karyotype variants of *Crocidura suaveolens* found in the study area. Information includes population, region, locality, number of specimens (N), diploid chromosomal number (2n), autosomal fundamental number (FNa) and the presence of B-chromosomes (B-chr).

Tissue samples (ear and tail) were obtained from each specimen on site and transferred to the cell culture laboratory in transport medium (Dulbecco's Modified Eagle Medium—DMEM— supplemented with 2 mm L-glutamine, 10% fetal bovine serum, 1% 100× Penicillin/Streptomycin/Amphotericin B solution and 0.07 mg/mL Gentamicin). Animals were immediately released after sample collection. Captures were performed with official permits issued by the corresponding nature conservation institutions, and research was conducted with approval of the bioethics committee of the University of Huelva and Universitat Autònoma de Barcelona.

#### *2.2. Cell Culture and Chromosomal Harvest*

Tissue samples were mechanically and enzymatically disaggregated. Briefly, tissue was washed with 5 mL of DPBS (Dulbecco's Phosphate-Buffered Saline) solution (DPBS with 1% 100x Penicillin/Streptomycin/Amphotericin B solution and 1mg/mL of Gentamicin) for 10 min, at 37 ◦C, in an orbital shaker, at 200 rpm. Biopsies were then shredded into small pieces, using a scalpel, in a Petri dish with 1 mL of DMEM without supplements, and incubated for 45 min in DMEM with 0.25% Collagenase Type II at 37 ◦C and at 200 rpm. Cell suspension was then centrifuged for 10 min at 300 G. The reaming cells were resuspended in 5 mL of completed growth medium (DMEM, supplemented with 20% of fetal bovine serum, 2 mm L-Glutamine) seeded on 25 cm2 T-flasks and cultured at standard conditions (37 ◦C, 10% CO2) for four weeks. Cultivated cells proliferated as an adherent monolayer. Subcultures of the adherent cells at early passages (3rd and 4th) were used to obtain chromosomes.

Chromosomal harvest was conducted as previously described [28]. In order to enhance the dispersion of chromosomes, cells were incubated in a hypotonic solution (KCl 0.075 M) for 20 min at 37 ◦C, inverting every 5 min. Subsequently, the cells were centrifuged (5 min at 300 g) and transferred into 15 mL tubes. The cell pellet was washed twice by adding 5 mL of fixative solution (methanol, acetic acid at 3:1 concentration, freshly prepared) and centrifuged (5 min at 300 g). Cells were centrifuged again and diluted in 1 mL of fixative solution and stored at −20 ◦C until use.

#### *2.3. Chromosomal Characterization*

Chromosomal spreads were obtained by dropping 15 μL of cell suspension onto a clean dry slide. Slides were baked at 65 ◦C during one hour and kept at −20 ◦C until use. Metaphases were stained homogenously with Giemsa solution for the analysis of the modal karyotype and then G-banded for karyotyping, as previously described [28].

An optical microscope (model Zeiss Axioskop) equipped with a charged coupled device camera (ProgResR CS10Plus, Jenoptik Optical Systems, Jena, Germany) was used for the microscope analysis. A minimum of 25 good-quality metaphases were captured per specimen with the program Progress Capture 2.7.7 and analyzed in order to obtain the modal karyotype. In order to construct representative karyotype of each specimen analyzed, chromosomes were ordered by morphology and decreasing size, resulting in a representative karyotype.

#### *2.4. Karyotype Distribution across Lineages and Populations*

Several mitochondrial lineages in Iberia were defined previously by Biedma and collaborators [25], two of them (sub-lineages C3 and C4) are present in the study area. Populations were defined as genetic clusters, with each corresponding to one of the disjunct marshes associated to each of the five main rivers in the region (Guadalquivir, Tinto, Odiel, Guadiana and Piedras), and a sixth genetically differentiated population in EDR. Samples were grouped by mitochondrial lineage or geographical population for population cytogenetic analyses.

For the analysis of cytogenetic diversity and differentiation, the A diploid number (autosomes and sex chromosomes) and the presence/absence of B (supernumerary) chromosomes were treated as separate diploid and haploid traits, respectively. GENEPOP on the web (https://genepop.curtin.edu. au; [29]) was used to estimate allele (karyotype) frequencies, observed and expected heterozygosities, and to test for departure from Hardy–Weinberg (HW) expectations. We also tested for genotypic linkage disequilibrium between the two traits (A-chromosomal number and presence/absence of B-chromosomes), using the log likelihood ratio statistics. Differentiation between populations was assessed by the exact G or Fisher's tests and by estimating Wright's FST index.

#### **3. Results**

#### *3.1. Chromosomal Diversity in C. suaveolens in the Gulf of Cádiz*

Four karyotype variants were detected in the populations sampled: 2n=40, 2n=41 (i.e., 2n = 40 + B), 2n = 42 and 2n = 43 (i.e., 2n = 40 + B; Table 1). All nine individuals from the banks of the Guadalquivir River (sub-lineage C4) were characterized by presenting 2n = 40 (autosomal fundamental number, FNa = 46) (see Figure 2A), the same pattern found in the two individuals from the Northwest Iberian lineage (Cáceres and Zamora). The autosomes consisted of 15 pairs of acrocentric chromosomes and four pairs of bi-armed chromosomes, of which one pair was metacentric and three pairs sub-metacentric. The X-chromosome was a large sub-metacentric (see Figure 2A).

Specimens from three populations (Guadiana, Piedras and EDR) from sub-lineage C3 presented the same karyotype, consisting of 2n = 42 (FNa = 50) (see Figure 2B), with no polymorphic karyomorphs among the 18 individuals analyzed (see Table 1). The autosomes were 15 pairs of acrocentric chromosomes and five pairs of bi-armed chromosomes, of which one pair was metacentric and four pairs were sub-metacentric. The X-chromosome was a large sub-metacentric (see Figure 2B).

Interestingly, chromosomal polymorphisms were detected in both Odiel and Tinto river banks (Table 1), both populations also belonging to sub-lineage C3. In the case of the Odiel population, two distinct karyotypes were found in different proportion: 2n = 42 and 2n = 41. Three out of eight (37.5%) specimens presented the same 2n = 42 karyotype found in Guadiana, Piedras and EDR, whereas five individuals (62.5%) presented a karyotype consisting of 2n = 41 (FNa = 48) chromosomes (see Figure 2C). The 2n = 42 karyotype (FNa = 50) corresponded with the same one found in Guadiana, Piedras and EDR, and it was characterized by the presence of 15 pairs of acrocentric chromosomes and four pairs of bi-armed chromosomes, of which one pair was metacentric and three pairs were sub-metacentric. The 2n = 41 karyotype, however, corresponded to 15 pairs of acrocentric chromosomes and four pairs of bi-armed chromosomes, of which one pair was metacentric and three pairs were sub-metacentric, and the presence of one single B-chromosome was in all the individuals. Since the main difference with the 2n = 40 karyotype found in the Guadalquivir River was the presence of a single B-chromosome, we refer to the 2n = 41 karyotype from Odiel as 2n = 40 + B.

**Figure 2.** Examples of G-banded karyotypes of *C. suaveolens* found in the studied area. (**A**) Individual from the Guadalquivir River population (2n = 40). (**B**) Individual from the EDR population (2n = 42). (**C**) Individual from the Odiel River population (2n = 41). (**D**) Individual from the Tinto River population (2n = 43). Note the presence of a single B-chromosome in (**C**) and (**D**) karyotypes.

We also found two distinct karyotypes in the Tinto River population. Two out of six (33.3%) individuals surveyed in Tinto presented 2n = 43 (FNa = 52), whereas the rest (66.6%) were characterized by 2n = 42 (the same karyotype formula found in Guadiana, Piedras and EDR). The 2n = 43 karyotype was characterized by 15 pairs of acrocentric chromosomes and five pairs of bi-armed chromosomes, of which one pair was metacentric and four pairs were sub-metacentric, and the presence of a B-chromosome. The X-chromosome was a large sub-metacentric (see Figure 2D). Due to chromosomal G-banding homologies and the presence of a single B-chromosome, we refer to the 2n = 43 karyotype from Tinto as 2n = 42 + B.

G-banding comparison between 2n = 40 and 2n = 42 karyotypes suggests the presence of chromosomal fusion/fission events between bi-armed chromosomes.

#### *3.2. Karyotype Distribution and Diversity*

The four karyotypes observed were unevenly distributed across lineages and populations in the study area. The 2n = 40 A karyotype was the only one observed in the Guadalquivir population, where sub-lineage C4 occurs, and in Zamora and Cáceres samples, which are representative of the B lineage. In contrast, the 2n = 42 A karyotype was the most frequent in C3 populations, and the only one detected in EDR, Piedras and Guadiana (see Figure 3). The presence of B-chromosomes was restricted to the C3 lineage populations, where it occurred in the context of both 2n = 40 (five out of five occurrences) and 2n = 42 A karyotypes (two out of six occurrences). As a result, chromosomal differentiation between mtDNA subclades across both traits was highly significant (Fisher's exact test, Chi-Squared > 26.46, d.f. = 4, *p* < 0.00002), but it was high and significant for A-chromosome number

(FST = 0.763, N = 82, *p* < 0.00001) and low and not significant for B-chromosome frequencies (FST = 0.090, N = 41, *p* = 0.179).

**Figure 3.** Distribution of karyotypes across the mitochondrial lineages described in Biedma et al. (2019). The geographical distribution of the sub-lineages sampled in this study is indicated. Pie charts represent the frequencies of karyotypes in each mitochondrial lineage, with N referring to the number of individuals sampled.

Most of the chromosomal diversityin the regionis distributed among populations, asmost populations showed no karyotypic diversity. Notable exceptions are the neighboring populations of Odiel and Tinto, each showing two composite karyotypes (2n = 40 + B and 2n = 42 in Odiel; 2n = 42 and 2n = 42 + B in Tinto). When only diploid A-chromosomal number is considered, the only population showing both 2n = 40 and 2n = 42 A-complements is Odiel (KR = 2; HE = 0.47; Table 2); although our limited sample sizes do not allow us to exclude the occurrence of both chromosomal types in other populations, results suggest highly unbalanced frequencies if this is the case. Departure from Hardy–Weinberg expectations was significant in Odiel (FIS = 1, N = 8, p = 0.0074) due to the absence of heterokaryotypes despite rather equalized frequencies of the two karyotypes, although we cannot discard a spurious result due to small sample size. Odiel and Tinto were also variable with respect to the presence/absence of B-chromosomes (H = 0.47 and H = 0.44, respectively). The presence of B-chromosome appeared not completely independent of A-chromosome number in Odiel, as B-chromosome was observed there only on a 2n = 40 background (Chi-Squared = 8.02, d.f. = 2, *p* = 0.018).

Overall, karyotypic differentiation among populations was extremely high (Exact G test, N = 41, *<sup>p</sup>* <sup>&</sup>lt; 3.98×10<sup>−</sup>9), both for A-chromosome number (FST = 0.796, N = 41, *p* < 0.0001) and B-chromosome (FST = 0.408, N = 41, *p* = 0.0014; Figure 4 chromosome (FST = 0.408, N = 41, *p* = 0.0014; Figure 4).

**Table 2.** Distribution of karyotypes across populations and karyotypic diversity. The frequency distribution of karyotypes for the composite karyotype and the frequency and diversity statistics for the A-chromosome number and B-chromosome presence are shown for each population, mtDNA lineage and for the pool of samples from the Gulf of Cádiz. Diversity is measured as expected heterozygosity and haplotype diversity for A-chromosome number and B-chromosome, respectively.


**Figure 4.** Separation of populations according to frequency of 2n = 42 and B-chromosome.

Karyotypic differentiation for A-chromosome number was maximal (FST = 1, N = 14–17, p < 0.0001) between Guadalquivir and all other populations, except Odiel (FST = 0.308, N = 17, *p* = 0.0062), and high between Odiel and Tinto, EDR, Piedras or Guadiana (FST = 0.496–0.550, N = 13–15, *p* = 0.0014–0.0004; Table 3). Slightly different patterns were observed for B-chromosome, with highest differentiation between Guadalquivir and Odiel (FST = 0.591, N= 17, *p* = 0.0093) and lowest between Odiel and Tinto (FST = 0.013, N = 14, *p* = 0.591; Table 3).


**Table 3.** Karyotypic differentiation among pairs of populations. FST values are shown for A-chromosomal number (above the diagonal) and B-chromosome (below the diagonal). Asterisks indicate the significance of exact G tests (\* *p* < 0.01; \*\* *p* < 0.001). Pairwise comparisons indicated by "-" could not be estimated due to lack of variation in the pair.

#### **4. Discussion**

#### *4.1. Overview of Chromosomal Evolution in Crocidura*

Understanding how genomes are organized and which types of chromosomal rearrangements are implicated in macroevolutionary events are fundamental to understanding the dynamics and emergence of new species [30]. Because Crocidura is the largest and one of the most karyotypically diverse genera of the family Soricidae [19,31], it offers a unique opportunity to test the role of chromosomal reorganization in species' diversification and habitat-specialization.

It has been long assumed that the widespread and monophyletic Palearctic *C. suaveolens* group is characterized by a karyotype of 2n = 40 [15,16]. Exceptions to this rule were initially reported in isolated populations from the Czech Republic (2n = 41) and Switzerland (2n = 42) [21]. Here we extend these initial observations and describe the presence of previously unreported chromosomal variation in the southwesternmost limit of its distributional range, in the Gulf of Cádiz, emphasizing the uniqueness of these genetically isolated and marsh-specialist populations. Karyotypic variability was reflected by the presence of four different karyotypes (2n = 40, 2n = 40 + B, 2n = 42 and 2n = 42 + B), the latter (2n = 42 + B) being reported here for the first time for C. suaveolens. As 2n = 40 is considered the chromosomal form ancestral for *C. suaveolens*, the G-banding comparisons between karyotypes suggest that chromosomal fissions and/or inversions changes in centromeric position together with the emergence of supernumerary (B) chromosomes have originated the chromosomal variability detected in our study. Most notably, our results provide a rare example of a true intrapopulation chromosomal polymorphism in *C. suaveolens*.

Remarkably, two of the karyotypes detected (2n = 40+B and 2n = 42+B) were characterized by the presence of B-chromosomes. Although previous studies recorded the occurrence of B-chromosomes for *C. crossei* [17], *C.* cf. *malayana* [32], *C. poensis* [33] and *C. suaveolens* itself [21], the present study is the first report focusing on the southwesternmost limit of the distributional range of *C. suaveolens*. Despite their widespread distribution in wild populations of several animal, plant and fungi species, the evolutionary origin and function of B-chromosomes are largely unknown. These dispensable chromosomes present a particular behavior, thus not following Mendelian segregation laws [34]. Most B-chromosomes are mainly or entirely heterochromatic (i.e., largely non-coding), although in some cases, B-chromosomes can provide some positive adaptive advantage, as suggested by associations with particular habitats [35] or with increases of crossing over and recombination frequencies [36–38]. Interestingly, the presence of B-chromosomes in our study was associated to the C3 mt DNA clade, more particularly in Tinto and Odiel populations. This, together with its generalized rarity in the rest of the distribution, suggests a recent and derivative origin of supernumerary chromosomes. Pending further functional and genomic studies, we can only speculate on the evolutionary implications of B-chromosomes in the *C. suaveolens* populations of Southwestern Iberia.

#### *4.2. Karyotypic Diversity and Di*ff*erentiation of C. suaveolens Populations in the Gulf of Cádiz*

The karyotypic diversity detected in the *C. suaveolens* populations in the Gulf of Cadiz reveals the potential of chromosomal differentiation in the isolation of genetically distinct populations of the marsh-specialist *C. suaveolens* (Mammalia: Soricidae). In fact, we found an association between the two differentiated mtDNA sub-lineages (C3 and C4) and diploid numbers. All specimens from the Guadalquivir River's mouth (sub-lineage C4) were characterized by the ancestral chromosomal form for *C. suaveolens* (2n = 40), which was also present in the individuals sampled from the Northwest Iberian lineage (lineage B [24]). Remarkably, this ancestral karyotype was not detected in sub-lineage C3 populations (i.e., Guadiana, Odiel, Piedras, Tinto and EDR), which presented diploid numbers ranging from 2n = 41 to 2n = 43. Since it has been suggested that both mtDNA sub-lineages' ages diverged around 110 Ka (30–190 Ka) [25], the relationship between mtDNA divergence and chromosomal reorganizations adds support for the long-term isolation among C3 and C4 lineages.

Within populations of the C3 sub-lineage, the chromosomal form 2n = 42 was the most widespread, suggesting a common (and recent) origin in this lineage, most probably derived by a chromosomal fission from the ancestral form 2n = 40. The presence of 2n = 40 in these populations (always observed in combination with B-chromosome) is thus more parsimoniously explained as the retention of ancestral variation, especially since secondary contact with 2n = 40 populations in the Guadalquivir River is considered unlikely [24,25]. Chromosomal variation in Odiel and Tinto populations may thus constitute a transient "floating" polymorphism, as defined by [27], whose persistence may have been favored by different factors, such as its relatively recent origin (divergence of C3 sub-linage dated ca. 110 ka, [24]), relatively high population sizes and by the spatial structure within the Odiel–Tinto march complex. On the other hand, lack of detected polymorphisms in westernmost populations may be the consequence of a gradual diversity loss during westward colonization of more recent marshes, due to serial founder events [25].

Although chromosomal rearrangements have traditionally been considered to have a strong underdominance effect (e.g., [39]), there is increasing evidence to suggest that the small effect of certain types of rearrangements would favor their persistence within populations as polymorphisms [12]. This is the case, for example, of centric fusions/fissions (e.g., the physical joining of two acrocentric chromosomes by their centromeric regions and vice versa). Chromosomal fusions are particularly extended in nature (reviewed in [27]), occurring in as diverse taxa as mammals, reptiles, insects or mollusks [40,41]. Weak or null selection against centric fusions/fissions heterokaryotes due to mild meiotic pairing dysgenesis or reduction in meiotic recombination also might favor their presence in many lineages, as it has been previously suggested for Primates [42–44], Cetartiodactyla [45], the house mouse [12,46] and shrews [47]. Although the limited sample size included in our study calls for caution, the lack of observations of heterokaryotes, despite balanced frequencies of the two karyotypes detected in Odiel, could indicate underdominance in the case of *C. suaveolens* populations, a possibility that deserves further research.

#### **5. Conclusions**

Our observations for *C. suaveolens* provide an example of the persistence of chromosomal polymorphisms in mammals. The concurrence of chromosomal variation, recently diverged mitochondrial sub-lineages and habitat specialization in *C. suaveolens* populations in the Gulf of Cádiz provides a promising scenario to test the evolutionary significance of chromosomal variation in mammals and assess its contribution to phenotypic and ecological divergence. Future work should address the currently unresolved questions on the possible fitness differences among karyotypes, their role on postzygotic reproductive isolation and their association with morphological or ecological variation.

**Author Contributions:** Data curation, L.B., J.C., J.R., J.A.G. and A.R.-H.; formal analysis, F.G., L.B., J.C., J.R., J.A.G. and A.R.-H.; funding acquisition, J.A.G. and A.R.-H.; methodology, F.G., L.B., J.C., J.R., A.L., F.C., J.A.G and A.R.-H.; project administration, J.A.G and A.R.-H.; resources, J.A.G. and A.R.-H.; supervision, A.R.-H; writing—original draft, J.A.G. and A.R.-H.; writing—review and editing, F.G., L.B., J.C. and J.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially funded by the Ministerio de Economía y Competitividad (Spain), through projects CGL2014-54317-P and CGL2017-83802-P to ARH, and through the 'Microproyecto' "Evolutionary history of lesser white-toothed shrew, *C. suaveolens*, in the Iberian Peninsula and genetic status of populations in the Gulf of Cádiz" financed by Estación Biológica de Doñana (EBD) with funds from the Severo Ochoa Program for Centres of Excellence in R+D+I (SEV-2012-0262). LB benefited from an FPU fellowship from the Ministerio de Educación, Cultura y Deporte.

**Acknowledgments:** We acknowledge the valuable help in karyotyping of Ariadna Cortina, Marta Marín and Cinta Macià. We also thank the regional administrations of Andalucía, Castilla y León and Extremadura for the sampling permits, as well the managers of the protected natural areas sampled and environmental agents who assisted us in the field work.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Spatial and Temporal Dynamics of Contact Zones Between Chromosomal Races of House Mice,** *Mus musculus domesticus***, on Madeira Island**

### **Joaquim T. Tapisso 1,**†**, Sofia I. Gabriel 1,2,**†**, Ana Mota Cerveira 1,2, Janice Britton-Davidian 3,**‡**, Guila Ganem 3, Jeremy B. Searle 4, Maria da Graça Ramalhinho <sup>1</sup> and Maria da Luz Mathias 1,\***


Received: 22 April 2020; Accepted: 1 July 2020; Published: 6 July 2020

**Abstract:** Analysis of contact zones between parapatric chromosomal races can help our understanding of chromosomal divergence and its influence on the speciation process. Monitoring the position and any movement of contact zones can allow particular insights. This study investigates the present (2012–2014) and past (1998–2002) distribution of two parapatric house mouse chromosomal races—PEDC (Estreito da Calheta) and PADC (Achadas da Cruz)—on Madeira Island, aiming to identify changes in the location and width of their contact. We also extended the 1998–2002 sampling area into the range of another chromosomal race—PLDB (Lugar de Baixo). Clinal analysis indicates no major geographic alterations in the distribution and chromosomal characteristics of the PEDC and PADC races but exhibited a significant shift in position of the Rb (7.15) fusion, resulting in the narrowing of the contact zone over a 10+ year period. We discuss how this long-lasting contact zone highlights the role of landscape on mouse movements, in turn influencing the chromosomal characteristics of populations. The expansion of the sampling area revealed new chromosomal features in the north and a new contact zone in the southern range involving the PEDC and PLDB races. We discuss how different interacting mechanisms (landscape resistance, behaviour, chromosomal incompatibilities, meiotic drive) may help to explain the pattern of chromosomal variation at these contacts between chromosomal races.

**Keywords:** Robertsonian fusions; chromosomal evolution; distribution; clinal analysis

#### **1. Introduction**

Since the first study by Gropp et al. [1], numerous chromosomal races of the western house mouse (*Mus musculus domesticus*), resulting from centric or Robertsonian (Rb) fusions of pairs of acrocentric chromosomes, have been identified in Europe and North Africa [2,3]. Robertsonian races are defined by presence of metacentric chromosomes and chromosome numbers that can be reduced as low as 2n = 22, in contrast to the all-acrocentric 2n = 40 standard race [2]. Further chromosomal variability arises from the occurrence of whole-arm reciprocal translocations (WARTs) and through hybridisation [4,5]. Areas of contact between different chromosomal races have been extensively studied in the house mouse providing insights into the role of chromosomal variation on population divergence and speciation [3,5–9]. Specifically, it has been argued that chromosomal rearrangements may promote hybrid unfitness leading to reduced interracial gene flow close to the mutation breakpoint, as well as recombination suppression in the same genomic region. This can result in genetic divergence, ultimately leading to 'parapatric speciation', i.e., the situation where, in a geographical context, two neighbouring forms/races become separate species whilst in contact and hybridising [8,10–12]. Additionally, behavioural factors, such as divergence of mate recognition systems, environmental factors, such as barriers to dispersal, and stochastic factors, resulting from source/sink demographic population systems, may also contribute to decreased gene flow, thus enhancing reproductive isolation between the hybridising races [5,13–22].

The majority of chromosomal races in the house mouse are grouped into metacentric 'systems' occupying defined geographical areas. These enable contact and hybridisation between chromosomal races with different sets of metacentric chromosomes, but also between those metacentric races and the standard all-acrocentric race that surrounds the 'system' [2]. At present, about 20 hybrid zones have been described, which are discrete contacts between chromosomal races as well as some polymorphic areas referred to as unconfirmed hybrid zones, e.g. [3].

The Madeira 'system', characterising the western house mouse populations on the island of Madeira, involves an extraordinary chromosomal variability and includes six metacentric races with diploid numbers ranging from 22 to 38 [4,23]. So far, a single contact zone between two neighbouring parapatric races—the PEDC race (Estreito da Calheta) and the PADC race (Achadas da Cruz) [17]—has been found in a previous survey (1998–2002). The two races share seven Rb fusions (2n = 24) and differ by the occurrence of Rb (6.7) in the PEDC race and Rb (7.15) in PADC [23]. A notable characteristic of this zone is the staggered clines of the diagnostic fusions due to the presence of chromosome 7 in an acrocentric state, e.g. [5]. Human activities influencing the wider environment were proposed to be the main factors regulating the structure and location of this zone [17]. These two races occur in the westernmost part of Madeira, the PEDC race in the southwest plus in an apparently isolated area in the north coast, and the PADC race in a more north-westward area, located between the main distributional area of PEDC and its isolated population in the north (Figure 1).

In the present study, based on a recent survey (2012-2014), we reanalyse the distribution of the PADC and PEDC races and the extent and structure of the previously identified contact zone on the south coast of Madeira. We also extend the sampling area northwards to investigate the northern contact between the PEDC and PADC races. Furthermore, we include a third race (Lugar de Baixo, PLDB) in the analysis, occurring eastward of the southern range of PEDC and separated from this race by a 2 km-wide area of natural vegetation [17], that may represent a natural barrier for active dispersal of house mice. The PLDB race also shares a total of seven fusions (2n = 24) with the PEDC race, again only differing in one, Rb (15.18) vs. (5.18), respectively [23].

Most of the studies carried out on the structure of contact zones involving Rb races of *M. m. domesticus* have described the contact of two freely interacting populations, such as PADC and PEDC. Few studies have considered the influence of physical barriers on hybridising forms (e.g. [24] and references therein). Here we aimed to investigate: a) the structure of the contact zone between the PEDC and PADC races, by comparing the results from two surveys conducted over ten years apart, b) the chromosomal features of the northern populations of mice, that could not be analysed in the 1998–2002 survey because of unsuccessful trapping, and c) the effect of both environmental and physical barriers on mouse dispersal and the influence they may have on the chromosomal characteristics of the PEDC-PADC and PEDC-PLDB contact zones. Previously it was hypothesised that PEDC and PADC occupied habitats differed in quality, providing a basis for source-sink dynamics to emerge [17]. Spatial differentiation in habitat quality is the major factor stabilising source-sink systems, although other factors may also affect movements of individuals among populations, e.g. [25–27]. By attaining the above goals we expect to answer the following questions: i) has the contact zone

between the PEDC and PADC races moved or changed in width?, ii) do populations of the PADC race and the northern isolated PEDC population interact?, iii) can source-sink dynamics explain the distribution of PEDC and PLDB and the movement of mice between these populations? iv) is there a natural barrier that could block active dispersal of house mice in this system (therefore influencing chromosomal characteristics)?

By answering these questions, we hope to better understand to what extent the chromosomal races studied are physically isolated and the potential impact of interaction or hybridisation between the races on further reproductive isolation and divergence.

#### **2. Material and Methods**

#### *2.1. Study Area and Sampling of Mice*

A total of five field trips to Madeira Island, each lasting approximately three weeks, were carried out between 2012 and 2014. Sampling of house mice took place across the geographical range of three metacentric races, Lugar de Baixo (hereafter PLDB), Estreito da Calheta (hereafter PEDC) and Achadas da Cruz (hereafter PADC) [17,23]. The races were named following Ramalhinho et al. [28] and Piálek et al. [2] and are all characterised by eight Rb fusions differing from each other by single fusions [17,23]. PLDB differs from PEDC by the presence of Rb (15.18) and absence of Rb (5.18) fusions, while PEDC differs from PADC by the presence of Rb (6.7) and absence of Rb (7.15) fusions. The combination of chromosomal fusions displayed by each race is as follows:

PLDB: Rb (2.4) (3.14) (6.7) (8.11) (9.12) (10.16) (13.17) (15.18)

PEDC: Rb (2.4) (3.14) (5.18) (6.7) (8.11) (9.12) (10.16) (13.17)

PADC: Rb (2.4) (3.14) (5.18) (7.15) (8.11) (9.12) (10.16) (13.17)

Based on the habitat preferences of the house mouse in Madeira [17], sampling sites were mainly located in human dwellings and human-modified habitats (farms, cultivated and fallow fields). Altogether, we sampled 65 sites along a south-northwest-east transect c. 83 km long, between the villages of Ponta do Sol and Chão da Ribeira (Figure 1; Table S1). The sites were numbered sequentially from site 1 at the southern end, to site 65 at the north-eastern end of the transect. Sites 6 to 50 correspond approximately to the sites previously surveyed by Nunes et al. [17] over the period 1998–2002. The distance between sampling sites ranged between c. 300 m and 8 km, according to terrain (Figure 1). All animals were captured in Sherman live-traps baited with sardine paste. Traps were set for one to three nights depending on trapping success. Mice were transported to animal facilities in Funchal ("Estação de Biologia Marinha") for karyotyping and data gathering. All applicable international, national, and/or institutional guidelines for the care and use of animals were followed. Capturing and testing were conducted by researchers certified by Ministério da Agricultura, do Mar do Ambiente e do Ordenamento do Território (01/2014/CAPT) for Portugal.

#### *2.2. Chromosome Analysis*

Karyotypes were prepared from bone marrow cells, following standard protocols used in earlier studies [4], namely the 'air-drying' procedure [29] and G-banding of chromosomes [30]. A minimum of three metaphases per individual were analysed, making sure that all homologous pairs of chromosomes were identified in each. Metaphases were observed under an Olympus microscope and karyotyped using Leica Chantal software. Chromosomes were identified following Cowell [31] at the Faculty of Sciences in Lisbon and at the Institute of Evolutionary Sciences, Montpellier. Chromosomal analyses were performed using an Olympus BX41 microscope with an attached Leica DC 250 camera equipped with a Leica CW4000 Karyo image analysis system.

The distribution of metacentric races was recorded following the methodology described in Nunes et al. [17]. Accordingly, the sampling sites were distributed along the main road running around the western region of Madeira Island (ER222 and ER101, Figure 1), because most houses were there and it likely acts as the dispersal route for mice. Excluding two sampling points, located 1250 m and 1750 m from the road, all other sites were located c. 2m to 865 m away from the road. In order to fully compare the results from both surveys (1998–2002 vs. 2012-2014), the exact same methodology was employed. As such, the positions of sampling sites off the main road were orthogonally projected onto it, see [17]. Sites located less than 200 m apart after projection were pooled and treated as a single site in all analyses. For each site, we: i) determined the diploid number of all animals, ii) identified all fusions, and iii) determined the percentage of acrocentric and metacentric chromosomes for diagnostic chromosomes and fusions.

#### *2.3. Temporal Analysis*

For the temporal analysis, we compared our new dataset with the one obtained between 1998 and 2002 (Table S2). Previously sampled sites were re-projected orthogonally along the main road to avoid errors when comparing distances among sites between the two periods. The 1998–2002 dataset includes the results reported by Nunes et al. [17] and unpublished data (Table S2). Temporal comparisons included: i) the chromosomal characteristics of sites and groups of sites for the whole transect, ii) the percentage of acrocentrics and metacentrics in sites and groups of sites also for the whole extension of the transect, and iii) the clinal pattern for diagnostic fusions of PEDC (Rb 6.7) and PADC (Rb 7.15) along the previously sampled transect analysed by Nunes et al. [17], i.e., sites 6 to 50 (km 10.1 to 58.17). The software package C-Fit8 was used to compute eight regression cline models (Table S3). Model selection was then performed based on the Akaike's information criterion [32] (Table S3). Logistic or scaled logistic functions were selected to best fit the metacentric clines:

$$\mathbf{f}(\mathbf{x}) = (\mathbf{e}^{\mathbf{s}(\mathbf{x}-\mathbf{c})}/1 + \mathbf{e}^{\mathbf{s}(\mathbf{x}-\mathbf{c})}) \times \mathbf{h},\tag{1}$$

where *x* is the geographic distance, *s* and *c* are the maximum slope and the centre of the cline, respectively, and *h* is the height. The width (expressed as an inverse of the slope) and cline centres were compared between the pair of fusions Rb (6.7) and Rb (7.15) for both time periods. To check for differences between clines, we assumed that twice the difference in loge likelihood values between the constrained and unconstrained model (on cline centre for coincidence and on cline width for concordance) followed a χ<sup>2</sup> distribution with a *d.f.* equal to the difference of the parameters estimated, see [33].

#### **3. Results**

#### *3.1. Current Chromosomal Variation in Western Madeira*

After projecting all sampling sites on the road (Figure 1), 29 sites were pooled (in combinations of 2 to 4 sites). Pooling of sites restricted chromosomal analyses to a total of 48 sites. A total of 449 mice were trapped along the transect, 363 of which were successfully karyotyped. The number of karyotyped animals per site ranged from 1 to 31. Each site was allocated to a zone/race on the basis of frequency of the race-specific Rb fusions, i.e., Rb (15.18) vs. (5.18) and Rb (6.7) vs. (7.15) (Figure 1).

Fusion Rb (15.18) was identified along the first six sites of the transect (site 1 to 6 at km 0 to 10.1). From sites 1 to 4 (km 0 to 8.4) all mice carried this fusion in a homozygous state, and in sites 5 and 6 (km 8.48 and 10.1) in a heterozygous state as a result of hybridisation between PEDC and PLDB.

Fusion Rb (5.18), in a homozygous state, first occurred in a mouse at site 3 (km 7.16) and again at sites 5 and 6 (km 8.48 and 10.1), both in a homozygous state (5 out of 21 mice) and a heterozygous state (10 out of 21 mice). These heterozygous Rb (5.18) mice were the same that also carried the fusion Rb (15.18) in a heterozygous state mentioned above, again indicating the occurrence of hybridisation. From site 7 to site 48 (km 12 to 49.54) all mice carried the fusion Rb (5.18) in a homozygous state (211 mice). Exceptionally, at site 32 (km 32.95) chromosomes 5 and 18 were acrocentric in one of the sampled mice. From sites 49 to 57 (km 57.83 to 77.36) the fusion occurred in both homozygous (49 out of 58 mice) and heterozygous states (9 out of 58 mice). Along the last sites on the transect (site 58

to 65 at km 79.12 to 82.78) fusion Rb (5.18) was again homozygous (i.e., fixed) in all mice sampled (58 individuals) (Figure 2; Figure 3).

Fusion Rb (6.7) was homozygous in most animals collected from sites 1 to 18 (km 0 to 16.63) (96 out of 97 mice), with the exception of one heterozygous mouse for Rb (6.7) at site 6 (km 10.1). From site 19 (km 19.85) northwards the frequency of occurrence of this fusion gradually decreased, being absent between sites 35 to 55 (km 33.47 to 72.72) but occurring again between sites 56 and 65 (km 75.59 to 82.78) (51 out of 67 mice). In this section of the transect the fusion occurred both in a homozygous or heterozygous state.

**Figure 1.** Location of sampling sites (dark green dots) orthogonally projected on the roads ER222 and ER101 (red line) of Madeira. See Table S1 for details on sampling sites. Black lines represent the limits of different chromosomal zones, PLDB represents the Lugar de Baixo race, PLDB-PEDC the contact zone between the Lugar de Baixo and Estreito da Calheta races, PEDC the Estreito da Calheta race, PEDC-PADC the contact zone between the Estreito da Calheta and Achadas da Cruz races and PADC the Achadas da Cruz race.

Fusion Rb (7.15) occurred in mice sampled from sites 33 to 53 (km 33 to 68.39), usually in a heterozygous state. Mice with acrocentric chromosomes 6, 7, and 15 co-occurred with mice carrying both Rb (6.7) and Rb (7.15) fusions from site 25 (km 28.6) until the end of the transect (km 82.78).

Based on the above description, it was possible to classify different segments of the transect according to the relative frequency of the four diagnostic Rb fusions (Figures 1 and 2). Animals assigned to the PLDB race occurred in the first segment on the south coast of Madeira (sites 1 to 6 at km 0 to 10.1). The identification of PLDB-PEDC hybrids in the last two sites of this segment (sites 5 to 6 at km 8.48 to 10.1) confirmed a hybrid zone between the two races. The next segment is characterised by animals carrying the PEDC diagnostic fusions in a homozygous state (sites 7 to 18 at km 12 to 16.63), followed by a longer segment where animals carried the fusion Rb (6.7) in a heterozygous state (sites 19 to 34 at km 19.85 to 33.42). This segment is followed by a very small area, where PEDC and PADC overlap, although no hybrids between them were identified (sites 33 to 34 at km 33 to 33.42). The next segment occupies the entire north-west corner of Madeira Island (sites 35 to 53 at km 33.47 to 68.39), where individuals were either assigned to PADC, due to the presence of Rb (7.15), or carried the acrocentrics 6, 7, and 15. The last segment of the transect (sites 56 to 65 at km 75.59 to 82.78) was characterised by mice with Rb (6.7), either in a homozygous or heterozygous state. A few animals along this last section of the transect had acrocentrics 6, 7, and 15 (16 mice).

#### *3.2. Temporal Variation in the Distribution of Madeira Rb Races*

The distribution of races PLDB, PEDC and PADC shows no significant differences between the two periods analysed (1998–2002 versus 2012-2014). Animals assigned to the race PLDB were found until km 10.1 (Figures 2a and 3a). In this section of the transect, the main difference between temporal periods is the detection of hybrids between races PLDB and PEDC in the 2012-2014 sampling (Figure 2a).

The next segment of the transect is characterised by mice assigned to race PEDC, with fusion Rb (6.7) in a homozygous state (Figures 2b and 3b), in the period 1998–2002 between km 11.7 and 13.9 and in the period 2012-2014 between km 12 and 16.6.

The following segment, starting at km 16.9 in 1998–2002 and at km 19.8 in 2012-2014, was characterised by the presence of homozygous and heterozygous mice for Rb (6.7), and mice carrying acrocentric chromosomes 6 and 7. This segment ended at km 31.9 in both periods analysed. This was followed by a very short segment defined by the presence of mice carrying fusions Rb (6.7) or (7.15) either in homozygous or heterozygous states and mice acrocentric for the chromosomes involved in both fusions.

Despite the substantial sampling effort in this segment, hybrids between races PEDC and PADC were exclusively found in the period 1998–2002. This segment was located between km 32.2 and 33.5 for the period 1998–2002 and between km 33.0 and 33.4 for the period 2012-2014.

Adjacent to this short region, a longer segment, ranging from km 34.2 to km 63.2 in 1998–2002 and from km 34.2 to km 68.4 in 2012-2014, is characterised by the presence of mice homozygous or heterozygous for the fusion Rb (7.15) as well as mice acrocentric for chromosomes 7 and 15.

The last segment of the transect was characterised by the presence of mice either homozygous or heterozygous for Rb (6.7), and mice carrying acrocentric chromosomes 6 and 7. For the period 1998–2002 this segment was located between km 77.2 and km 81.7, while for the period 2012-2014 it was located between km 72.7 and km 82.8.

Following the above results we assessed the present extent and location of the different chromosomal zones (Figure 1): zone **PLDB**; zone **PLDB–PEDC,** a contact zone between the PLDB and PEDC races; zone **PEDC South**, including mice either homozygous or heterozygous for Rb (6.7) and mice carrying acrocentric 6 and 7; zone **PEDC–PADC,** a contact zone between the PEDC and PADC races; zone **PADC,** with mice either homozygous or heterozygous for Rb (7.15) plus mice acrocentric for 7 and 15; and zone **PEDC North,** with identical characteristics to zone PEDC South.

Cline model estimates are summarised in Table 1. Tests of concordance and coincidence of the fusion Rb (6.7) revealed similarities between the two periods analysed (Figure 4). The results of maximum likelihood ratio tests revealed no significant differences in the cline width between years 1998–2002 and 2012-2014 (12.48 vs. 10.61, respectively; maximum likelihood difference of 1.43, *p* = 0.232), nor between cline centres (19.07 vs. 18.26; maximum likelihood difference of 1.70, *p* = 0.192). Cline analysis of the Rb (7.15) fusion also showed a similarity between cline width in the two periods

analysed (2.09 vs. 1.96; maximum likelihood difference of 0.03, *p* = 0.862) but a significant difference between cline centres (24.71 vs. 23.89; maximum likelihood difference of 9.04, *p* = 0.003) was observed.


**Table 1.** Maximum likelihood estimates of cline centre position and cline width for Rb 6.7 (logistic regression model) and Rb 7.15 (scaled logistic regression model) and respective confidence intervals.

**Figure 2.** Frequency of diagnostic Rb fusions for the period 2012-2014. a) the upper panel shows frequencies of fusions Rb (15.18) and Rb (5.18) in homozygous [(15.18)2; (5.18)2] and heterozygous [(5.18)] state, frequency of hybrid mice carrying both fusions [(5.18)/(15.18)] and frequencies of mice

carrying none of the fusions (Acrocentric). b) the lower panel shows frequencies of fusions Rb (6.7) and Rb (7.15) in homozygous [(6.7)2; (7.15)2] and heterozygous [(6.7); (7.15)] states, frequency of hybrid mice carrying both fusions [(6.7)/(7.15)] and frequencies of mice carrying none of the fusions (Acrocentric).

**Figure 3.** Frequency of diagnostic chromosomal fusions for the period 1998–2002. a) the upper panel shows frequencies of fusions Rb (15.18) and Rb (5.18) in homozygous [(15.18)2; (5.18)2] and heterozygous [(5.18)] states, frequency of hybrid mice carrying both fusions [(5.18)/(15.18)] and frequencies of mice carrying none of the fusions (Acrocentric). b) the lower panel shows frequencies of fusions Rb (6.7) and Rb (7.15) in homozygous [(6.7)2; (7.15)2] and heterozygous [(6.7); (7.15)] states, frequency of hybrid mice carrying both fusions [(6.7)/(7.15)] and frequencies of mice carrying none of the fusions (Acrocentric).

**Figure 4.** Comparison of clinal patterns for fusions Rb (6.7) (blue) and Rb (7.15) (orange) between the periods 1998–2002 (circles and full line) and 2012-2014 (triangles and dashed line).

#### **4. Discussion**

#### *4.1. Current Distribution of Estreito da Calheta (PEDC) and Achadas da Cruz (PADC) Races*

The possible involvement of chromosomal rearrangements in species formation has engendered much interest. Several models of speciation have been proposed, emphasising the role of hybrid zones between chromosomal races, e.g. [34,35]. The western house mouse, due to the accumulation and fixation of Robertsonian fusions, has been considered an excellent model to study how chromosomal rearrangements may be implicated in population divergence, race formation, and speciation, including the analysis of chromosomal hybrid zones [9]. Here, we compared the structure, position and width of the hybrid zone between two chromosomal races (PEDC and PADC) on Madeira Island at two time periods 10 years apart, and expanded the sampling efforts at both ends of the previously analysed transect.

Our results confirm the previous findings by Nunes et al. [17] of four chromosomal zones occurring along the south-western and northward distribution of the PEDC-PADC contact. As described in the previous study, the frequency of diagnostic fusions Rb (6.7) and Rb (7.15) varied along the transect. One of the main differences between the two time periods was the broadening of the PADC chromosomal zone, i.e., a wider geographic detection of mice assigned to this race, reflecting the increase of the sampling area during the 2012-2014 survey in the north-western part of the island. The increase of the transect along the north-western coast towards the northern side of the island also allowed another contact zone – PEDC North – to be identified between PADC and the previously described (and very geographically circumscribed) PEDC isolate on the north coast, characterised by the presence of mice mostly homozygous or heterozygous for Rb (6.7), with a few occasional mice exclusively acrocentric for these chromosomes.

Our 2012-2014 survey, particularly along the northern side of Madeira also revealed the occurrence of fusion Rb (1.15) whose distribution had not previously been identified on the island. This fusion, always found in a heterozygous state, appears to be more frequent in the PEDC North zone, having been detected in a total of 12 mice from 6 sampling sites. However, it was also very occasionally found outside this range, namely in the PEDC South zone (in two mice in a single location) and in the northernmost end of the PADC zone in a single mouse, heterozygous for Rb (7.15). In nature, chromosome 1 has been involved in several fusions, however Rb (1.15) is not a common one [2,36]. In Madeira, we suggest that Rb (1.15) may result from a recent mutation event, considering the fusion's somewhat restricted geographic distribution and its exclusive heterozygous state. This fusion seems to have emerged multiple times in the house mouse in situations where chromosomes 1 and 15 are available to fuse but, so far, it has never been found in a fixed state. A similar situation has been suggested by

Sans-Fuentes et al. [37] for the appearance of a rare fusion, Rb (7.17) in the polymorphic Barcelona 'system', and by Adolph & Klein [38] for several rare Rb fusions in mice from southern Germany.

Our results suggest that some temporal change has occurred in the spatial distribution of the diagnostic fusions in the PEDC-PADC parapatric area, resulting in the narrowing of the contact zone between them. Comparisons involving Rb (6.7) indicate that the general clinal pattern for this fusion did not change significantly between 1998–2002 and 2012–2014, i.e., no significant variation was observed in either the cline centre or width. However, for Rb (7.15), a statistically significant shift in the cline centre was detected towards the PEDC-PADC contact zone. This encompasses the main area of polymorphism, exhibiting an increased number of homozygous mice for the Rb (7.15) fusion over the last decade. Nevertheless, despite the change of the cline centre of Rb (7.15), the absence of a major geographical shift of this zone seems to exclude a possible numerical imbalance between the races, possibly related with a lower density of mice at the hybrid zone [17], allowing each population to evolve independently and 7.15 to increase towards fixation. A similar situation has been already described by Castiglia & Capanna [14] in a contact zone between two chromosomal races in Italy. Considering the observed narrowing of the contact zone after about 10 years, it would be interesting to address the dynamics of both chromosomal races and their hybrid zone after a longer timeframe.

#### *4.2. Dynamics of the Contact Zone Between Estreito da Calheta (PEDC) and Achadas da Cruz (PADC) Races*

Contact or hybrid zones are narrow regions where two more or less homogeneous parental forms meet and interbreed [39,40]. These zones are maintained by the influence of diverse factors, e.g., a balance between incompatibility of chromosomal races and dispersal [3,8,10,41–44]. This definition clearly fits the hybrid zone between PEDC and PADC chromosomal races. Two main mechanisms may determine patterns of gene flow between the hybridising taxa: i) 'suppressed recombination', contributing to the accumulation of genetic incompatibilities between parental types, and ii) 'hybrid dysfunction', suggesting that hybrids or heterozygotes are less fit, and thus are selected against [8,44,45]. The role of both mechanisms has been discussed for different geographic metacentric 'systems' of the house mouse [8,44,46,47].

A single study on the fertility of hybrids between PEDC and PADC discussed the more traditional concept model of 'hybrid dysfunction'. Results revealed that hybrids between these two races obtained under experimental conditions exhibited moderate subfertility (approximately a 50% decrease), suggesting that, under natural conditions, this level of underdominance could contribute to limit gene flow between parental populations [48].

A tension zone may move if there is a difference in population size (affecting dispersal rate) for the two races that are hybridising [42]. Nunes et al. [17], on the basis of the results of the 1998–2002 survey and on the distribution and abundance of crops and agricultural areas over the studied transect, suggested that potential habitats for mice are of better quality, more abundant and more evenly distributed over the area occupied by PEDC, while potential habitats for PADC are of poorer quality, fewer and more dispersed. This environmental contrast defines a source-sink continuum and could result in an asymmetrical dispersal rate into the contact zone between the two races [49,50]. In a source-sink system the contribution of sinks is indeed relevant for the persistence of individuals but a stable equilibrium between populations can be maintained in invariant environments [26]. According to this hypothesis, PEDC mice are expected to move more into/across the hybrid zone when compared with PADC mice. Despite a slight (non-significant) shift of the Rb (6.7) cline in that direction, further contributing to the observed narrowing of the contact zone, it was the Rb (7.15) fusion that significantly increased in frequency in the contact zone. Also, this increase may also be favoured by sexual selection as predicted in Nunes et al. [20], where patterns of preference of both males and females in the contact zone would favour Rb (7.15) individuals and thus contribute to the observed cline shift. Genomic analysis would be valuable in assessing the level and direction of gene flow across the hybrid zone, helping clarify the putative contribution of environmental quality on the dynamics of mouse movement across this area.

The relative stability of the zone may also be explained by the fact that environmental conditions in this area did not change significantly. In fact, during the period of approximately 10 years, between the past and present survey, no marked changes were noticed. The types of crops and land use were almost the same in 2012-2014 as before, with no notable reduction in agricultural areas associated with urbanisation, suggesting environmentally stable conditions. Hence, we did not expect demographical changes both within and at the border of the hybrid zone.

The role of commensalism of mice in combination with a fragmented landscape matrix and the extreme topography of Madeira has restricted the movement of mice and contributed to population isolation and fixation of fusions, namely the fixation of Rb (7.15) in the northernmost part of the transect [17,23].

The maintenance of the contact zone, although narrower, can also be explained by the persistence of an 'acrocentric peak', where mice carry acrocentric chromosomes 6, 7, and 15, limiting the production of hybrids (only two PEDC-PADC hybrids were identified in 1998–2002 and none in 2012-2014). There is an expectation that individuals that are acrocentric for chromosomes 6, 7, and 15 will be favoured because they cannot produce PEDC x PADC F1 hybrids and therefore will have more grand-offspring than other karyotypic categories. This situation is described by Nunes et al. [46 and references therein] and Gündüz et al. [5]. These authors suggest that the comparatively lower fitness of PEDC x PADC F1 hybrids, characterised by a chain-of-four configuration at meiosis I (6-6.7-7.15-15), will favour individuals homozygous for chromosomes 6, 7, and 15 thus leading to a high frequency of acrocentrics in the contact zone. On the other hand, meiotic drive may help to explain the slight but significant narrowing of the hybrid zone by maintaining metacentrics involving the fusions Rb (6.7) and Rb (7.15) in the area [51]. Natural variation in centromere strength has been found in wild populations of the house mouse, including in Madeira, where relatively stronger centromeres of metacentric chromosomes are preferentially retained in the egg in opposition to centromeres of acrocentrics, more likely to be segregated to the polar body [51]. As such, in these populations, where metacentrics tend to naturally accumulate, hybrid zones characterised by an 'acrocentric peak' may persist but become reduced in width, as observed in our study, reflecting the balancing forces in action as discussed above.

Thus, it seems reasonable to propose that multiple selective pressures (e.g., hybrid underdominance coincident with a peak of acrocentrics, meiotic drive, differences in habitat quality, habitat fragmentation) may be acting together resulting in the observed changes of the position and structure of the PEDC-PADC contact zone.

#### *4.3. Contact Zone Between Races Estreito da Calheta (PEDC) and Lugar de Baixo (PLDB)*

In the present study, we also analysed to what extent landscape discontinuities constrain mouse migration and the karyotypic structure of the contact zone between PEDC and PLDB. These two races have very limited distributions, with the PLDB occupying a restricted area east of PEDC which in turn is parapatric with PADC as discussed above.

The chromosomal analysis of mice across the distributional areas of the two races shows a contact zone between them encompassing approximately 3 km, i.e., from km 7.2 to km 10.1 along the transect, with all hybrid individuals detected between both races located between km 8.48 and 10.1.

It is worth pointing out that similarly to metacentrics Rb (6.7) and Rb (7.15), there was a possible role of WARTs in generating the diagnostic metacentrics Rb (5.18) and Rb (15.18) [4]. As such, it was not expected that chromosome 18, in particular, could occur here in an acrocentric form unless by migration of acrocentric-bearing mice, as inferred for the PEDC-PADC contact zone [5] or for contact zones between chromosomal races in Italy [7,14] and Spain [52].

The offspring resulting from the reproduction between mice of the PEDC and the PLDB races will most likely exhibit hybrid unfitness. Because races have metacentrics with monobrachial homology, the F1 between Rb (5.18) and Rb (15.18) form a chain-of-four at meiosis I, therefore, as for PEDC x PADC offspring, underdominance in hybrids can be anticipated [46]. Nevertheless, a surprisingly large number of hybrids were detected between both races across a much wider area when compared

to the PEDC-PADC contact zone. Assuming that the 2 km-wide band of non-commensal habitat is not an unbridgeable obstacle for active dispersal between the two races, in particular for PEDC mice, in addition to the possibility of passive dispersal movements, one can ask what mechanisms could have hampered the movement of PLDB mice across the valley. It would be interesting to further understand whether there is a sex-biased dispersion (expectedly by young males, [53]) into the PLDB race acting as a vehicle of gene flow contributing to the width of the contact zone between both races. Although the area inhabited by the PLDB race is occupied by sparse commensal habitat mostly surrounded by forests and woodlands, potentially offering fewer opportunities for mice to find shelter and food [54], whether this area can function as a sink as opposed to the PEDC source-area should be better investigated. In support of this hypothesis, a previous study showed that the amount of energy spent for maintenance is different between races and slightly lower in PLDB (energy intake: PLDB 28.96 <sup>±</sup> 7.59 KJ day−1; PEDC 32.80 <sup>±</sup> 2.64 KJ day−1) [55]. These differences in energy balance may reflect typically lower food availability for PLDB mice as a proxy for lower habitat quality [55,56]. This is again consistent with previous findings by Nunes et al. [17] regarding the PADC and PEDC races. Future studies will allow us to focus on a better understanding of the formation, maintenance, and evolution of this contact zone.

Most of the previously described hybrid zones in house mice involve a metacentric race and the all-acrocentric standard race [3]. The present data on the house mouse metacentric 'system' in Madeira reinforce the knowledge on the contact zones between different metacentric races. Further behavioural experiments coupled with genome-wide analysis will allow a better understanding of the underlying mechanisms maintaining the equilibrium of contact zones between Rb races. As such, it will be possible to assess levels of gene flow, its preferential direction (if any), identification of loci under selection, loci presenting clines across the contact zone, and its potential role in survival and reproduction [57,58].

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/7/748/s1, Table S1: Description of localities sampled between 2012 and 2014 along a transect, including location name, identification code (ID), site number, GPS coordinates (Latitude and Longitude), distance along the transect (km) and number of house mice karyotyped (N), Table S2: Description of sites sampled between 1998 and 2002 including location name, identification code (ID), site number, GPS coordinates (Latitude and Longitude), distance along the transect (km) and number of animals karyotyped (N), Table S3: Maximum likelihood estimates and Akaike values computed on CFit8 considering eight cline models.

**Author Contributions:** Conceptualization, J.T.T., S.I.G, A.M.C., J.B.S. and M.d.L.M.; Data curation, J.T.T., S.I.G. and A.M.C.; Formal analysis, J.T.T. and A.M.C.; Funding acquisition, S.I.G, A.M.C., G.G., J.B.S. and M.d.L.M.; Investigation, J.T.T., S.I.G, A.M.C., J.B.-D., G.G., M.d.G.R., J.B.S. and M.d.L.M.; Methodology, J.T.T., S.I.G, A.M.C., J.B.-D. and M.d.G.R.; Project administration, J.T.T., S.I.G, A.M.C. and M.d.L.M.; Resources, M.d.L.M.; Supervision, S.I.G; M.d.L.M.; Validation, J.T.T., S.I.G., A.M.C., J.B.-D., M.d.G.R. and J.B.S.; Visualization, J.T.T., A.M.C. and M.d.L.M.; Writing—Original draft, J.T.T. and M.d.L.M.; Writing—Review & editing, J.T.T., S.I.G, A.M.C., G.G., J.B.S. and M.d.L.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by Fundação para a Ciência e a Tecnologia (FCT) project PTDC/BIA-EVF/116884/2010 'Speciation or despeciation? Zooming in on a chromosomal hybrid zone in the Madeiran house mouse', involving European FEDER funds and FCT post-doc fellowships to SIG (SFRH/BPD/88854/2012) and AMC (SFRH/BPD/47070/2008). SIG and AMC were also funded by national funds (OE), through FCT in the scope of the framework contract foreseen in the numbers 4, 5 and 6 of the article 23, of the Decree-Law 57/2016, of August 29, changed by Law 57/2017, of July 19. Thanks are also due to FCT/MCTES for the financial support to (UIDP/50017/2020+UIDB/50017/2020) through national funds.

**Acknowledgments:** We here pay tribute to our colleague Janice Britton-Davidian - for her dynamism and dedicated scientific contribution that was absolutely central to our studies of the Madeira Rb system - and for her unforgettable friendship. We would like to thank Manuel Biscoito, Director of Estação de Biologia Marinha do Funchal for all the support in Madeira, in particular for the mice maintenance and laboratory facilities, Tomé Neves for his precious help with the Madeira map and statistical analysis, and Sophie von Merten for additional statistical advice. Finally, we would like to thank the three anonymous reviewers whose comments greatly improved the final version of the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Meiotic Chromosome Contacts as a Plausible Prelude for Robertsonian Translocations**

**Sergey Matveevsky 1,\*, Oxana Kolomiets 1, Aleksey Bogdanov 2, Elena Alpeeva <sup>2</sup> and Irina Bakloushinskaya <sup>2</sup>**


Received: 28 February 2020; Accepted: 31 March 2020; Published: 2 April 2020

**Abstract:** Robertsonian translocations are common chromosomal alterations. Chromosome variability affects human health and natural evolution. Despite the significance of such mutations, no mechanisms explaining the emergence of such translocations have yet been demonstrated. Several models have explored possible changes in interphase nuclei. Evidence for non-homologous chromosomes end joining in meiosis is scarce, and is often limited to uncovering mechanisms in damaged cells only. This study presents a primarily qualitative analysis of contacts of non-homologous chromosomes by short arms, during meiotic prophase I in the mole vole, *Ellobius alaicus,* a species with a variable karyotype, due to Robertsonian translocations. Immunocytochemical staining of spermatocytes demonstrated the presence of four contact types for non-homologous chromosomes in meiotic prophase I: (1) proximity, (2) touching, (3) anchoring/tethering, and (4) fusion. Our results suggest distinct mechanisms for chromosomal interactions in meiosis. Thus, we propose to change the translocation mechanism model from 'contact first' to 'contact first in meiosis'.

**Keywords:** *Ellobius alaicus*; translocation; non-homologous chromosome connections; meiosis; synaptonemal complex

#### **1. Introduction**

Chromosomal stability, number and positioning are essential factors for correct genome functionality and inheritance. At the end of 19th century, Rabl hypothesized the non-random, three-dimensional location of chromosomes in the nucleus, whereas in the last two decades, these observations have been advanced with data from advanced technological methods, including fluorescence in situ hybridization (FISH), immunocytochemistry and others [1–3]. Information on the tissue-specific positioning of chromosomes has revealed functional nuclear regulation [4,5], or altered states in cancer cells [6,7]. Chromosomal changes at the individual development are usually highlighted as catastrophic genomic events, while such chromosomal alterations often result in carcinogenesis and infertility [8,9]. Genomes of carcinogenetic cells are highly dynamic, and in some cases, chromosomal changes, either induced or spontaneous, lead to therapeutic resistance [10]. Chromoanagenesis, encompasses chromothripsis, chromoanasynthesis and chromoplexy, and was recently described as a possible mechanism contributing to chromosomal evolution [11–13]. Even though a rearranged genome may suffer maladaptive modifications, some variations are beneficial for organisms or species in terms of advantageous natural selection. Karyotypic diversity, structural variations of autosomes and sex chromosomes exemplify the importance of chromosomal changes throughout evolution [14–19].

Distinct drivers for chromosomal change have been identified, e.g., mobile Penelope elements implicated in *Drosophila* speciation [20], LINE-1 elements in mammals [21], and noncoding RNAs [22] etc. These factors destabilize genomes, initiate DNA damage, and provoke the abnormal linking of chromosomes. At least two translocation models based on non-random chromosome distribution in the nucleus have been proposed [23].

An initial step in "breakage-first" models are double-strand breaks (DSBs). Then potential partners occasionally tie up and produce changed chromosomes, which can obtain distinct fragments of non-homologous chromosomes, up to the whole arm, e.g., Robertsonian translocations (Rbs) or Whole-Arm Reciprocal Translocations (WARTs) [23]. Translocation probability rates could be higher if chromosomes were located close to each other in the nuclear space [24]. For intermingling chromosomes, DSBs in contact zones can lead to non-homologous linking and translocations [25,26]. In this context, the evolutionary integrative breakage model [27] stressed determining the genomic distribution of evolutionary breakpoints due to particular DNA sequence composition and the nucleome, combined with alteration of gene expression due to genome reshuffling.

In the 'contact-first' models, chromosomes should be broken, but DSBs start in colocalized chromatids inside specific protein complexes [28]. Firstly, chromosomes come together, then undergo DSBs and join with other partners. Chromosome region mobility is different during the cell cycle; it is higher in the early G1 phase, but decreases in the S phase. In an interphase nucleus, chromatin status, positioning, and cytoskeleton mechanical forces, all influence chromosome movement [29]. The combined variant of both models was also proposed. If DSBs occur in G1 and are not repaired until G2, they may cluster and form translocations [30].

In all models, the fate of small acrocentric chromosomal arms is uncertain; but most probably, they are eliminated. It is important to highlight these models were developed for interphase nuclei when chromosomes were decondensed and occupied specific chromosomal territories. Moreover, the tissue-specific positioning of chromosomes in interphase nuclei [31], enables the formation of distinctive carcinogenic translocations [8], which better fit the first model.

When we investigate the altered three-dimensional organization of somatic cells with re-arranged chromosomes, we cannot immediately determine the evolutionary consequences. How will such translocations pass to the next generations? Therefore, we must look for genome rearrangements in the germline. De novo chromosome rearrangements can arise during germ cell proliferation, meiosis, and in haploid sperm or eggs [32]. Any rearrangement provoking genomic instability may be beneficial for diversification and genetic speciation.

A specific fusion between acrocentric chromosomes, ended by metacentric chromosomes was first described by Robertson [33], and later named Robertsonian translocations (Rbs). The frequency of such translocations in humans is high: approximately 1 in 1000 individuals [34]. The most common rob(13q14q) and rob(14q21q) translocations originate during oogenesis [35]. A breakpoint diversity exemplified distinct ways for Rb formation, the significant input of the pre-meiotic replication and proper meiotic recombination [36,37]. Probable mechanisms for the formation of Rbs involving telomere changes were suggested [38]. One of the potential mechanisms may be a loss of p-arm telomeres when chromosome breakage occurs within minor satellite sequences [39,40]; another way is a fusion without any losses, and inactivation of telomeres [41]; the one more way may operate via the deletion/inactivation of the telomerase RNA gene which induces telomere shortening [42]. Data on meiosis are scarce and are limited to non-homologous end joining (NHEJ) in mouse spermatocytes after gamma radiation [43], however telocentric chromosome associations in the pachytene are observed for *Mus domesticus* [44].

The high frequency of translocations in humans and the evolutionary input of Rbs requires exploration of non-model species to reveal origin and maintenance mechanisms. Several mammalian species have demonstrated natural variability's in chromosomal numbers, including Rbs. *Mus, Sorex, Ellobius* species, and some others exhibit changes in diploid numbers, along with stable fundamental numbers due to whole branch fusions [45–48]. Recently, using chromosome painting, we described karyotype structures in three cryptic *Ellobius* species, *E. talpinus*, *E. tancrei,* and *E. alaicus*; we demonstrated a homology of re-arranged chromosomes, and showed the existence of XX sex

chromosomes in males and females [49–51]. We hypothesized that a neocentromere origin in one pair of chromosomes was an initial disturbance event for the *E. tancrei* (2n = 54–30) genome, in contrast to stable *E. talpinus* (2n = 54) [49,52]. *E. alaicus* is very close to *E. tancrei*, a translocation Rb(2.11) emerged in both species; other Rbs are species-specific ones. *E. alaicus* demonstrates rapid chromosomal changes in nature, such as the fixation of Rbs in the large population in the Pamir-Alay [50]. In this study, we analyzed the meiotic sustainability of species with rapidly evolving genome.

#### **2. Materials and Methods**

#### *2.1. Material and Mitotic Chromosomes*

We used samples of 6 specimens of *E. alaicus*, kept in the cytogenetic collection (a part of the Joint collection of wildlife tissues for fundamental, applied and environmental researches of the Koltzov Institute of Developmental Biology RAS, Core Centrum of the Koltzov Institute of Developmental Biology RAS, state registration number 6868145). Two males and two females (collection numbers; 27353, 27357, 27354, 27356) were karyotyped using bone marrow suspensions [53]. Tissue from another male (27532) and female (27351) were also used to derive somatic cell cultures. C-band staining of mitotic metaphase plates was performed according to Sumner [54].

We followed international, national, and institutional guidelines for animal care. Studies were approved by the Ethics Committee for Animal Research of the Koltzov Institute of Developmental Biology RAS and the Vavilov Institute of General Genetics RAS.

#### *2.2. Cell Culture*

Chondrocyte and fibroblast cell lines were obtained from the Cell culture collection of the Koltzov Institute of Developmental Biology RAS. Chondrocytes were cryopreserved after the first subcultivation and fibroblasts after the third subcultivation and were stored at −196 ◦C in liquid nitrogen using cultivation media with 10% DMSO for cryopreservation. Karyotype evaluation was performed after the cells were recovered after freezing.

#### *2.3. Meiotic Chromosome Studies and Immunostaining*

Samples from three *E. alaicus* (27352, 27353 and 27357) adult males were used for the meiotic study. Synaptonemal complex (SC) preparations were made and fixed according to Peters et al. [55], with some modifications [56].

Primary antibodies used for immunostaining: rabbit anti-synaptonemal complex protein 3 (SYCP3) antibody (diluted 1:250, Abcam, Cambridge, UK); human anti-centromere Calcinosis Raynaud's phenomenon, Esophageal dysmotility, Sclerodactyly, and Telangiectasia (CREST) antibody (CREST, 1:250, Fitzgerald Industries International, USA); mouse anti-phospho-histone H2AX (diluted 1:250–500, Abcam) (also known as γH2AFX); rabbit anti-H3K9me3 antibody (1:100, Abcam; kindly provided by Dr. Jesus Page).

As secondary antibodies we used goat anti-rabbit IgG, Alexa Fluor 488-conjugate (Invitrogen, Carlsbad, CA, USA); goat anti-human IgG, Alexa Fluor 546-conjugate (Invitrogen); goat anti-mouse IgG, Alexa Fluor 546-conjugate (Invitrogen, USA) (diluted 1:250–500). Slides were washed in phosphate-buffered saline (PBS) and placed into Vectashield, with 4 ,6-diamidino-2-phenylindole (DAPI) (Vector Laboratories, USA). Slides were analyzed using a fluorescence light microscope, Axio Imager D1 (Carl Zeiss, Jena, Germany). Immunostaining was described previously [57,58]. We immunostained H3K9me3 histones using two approaches; 1) the first round of SYCP3 staining, then a second round—H3K9me3 or 2) the first round of H3K9me3, then a second round of SYCP3 staining.

SCs measurements in 41 spermatocytes were performed using the MicroMeasure program (Colorado State University, CO, USA).

#### **3. Results**

#### *3.1. Mitotic Chromosomes and SC Karyotype of E. alaicus*

All animals demonstrated normal for *E. alaicus* karyotypes; 2n = 52. the fundamental number of chromosome arms (NF) was 56 (Figure S1a–f), consisting of one pair of submetacentrics (№7), characteristic of *E. tancrei* and *E. alaicus*, and one pair of large Robertsonian metacentrics 2(Rb2.11), typical of *E. alaicus* [50]. Mitotic metaphases from bone marrow suspensions showed no visible alterations (Figure S1b). Mitotic metaphases from fibroblast cultures demonstrated a stable karyotype in one male (Figure S1a), and a small number of deviations in one female, i.e., associations and polyploid cells. In chondrocyte cultures of the same specimens, more associations and polyploid cells were identified in female cells (Figure S1d–f), alongside with a large number of micronuclei, disturbed anaphases and massive chromosomal changes (Figures S1g and S2).

A total of 302 spermatocytes at different prophase I stages in three males were analyzed. As expected, in the pachytene stage we revealed 25 fully synapsed autosomal bivalents (large Robertsonian metacentric Rb(2.11)), one mid-size submetacentric №7 with neocentromere [49,52]; 23 acrocentrics and an XX sex bivalent (Figure 1), formed by two large acrocentrics. Analyzing large cell numbers made it possible to distinguish a range of gradually decreasing in size acrocentrics.

**Figure 1.** Pachytene spermatocyte (**a**–**c**) and synaptonemal complex (SC) karyotype (**d**) of the Alay mole vole, *E. alaicus*. SCs were immunostained with antibodies against synaptonemal complex protein 3 (SYCP3) (green) and centromeres—with antibodies to kinetochores (Calcinosis Raynaud's phenomenon, Esophageal dysmotility, Sclerodactyly, and Telangiectasia (CREST), red). The 4 ,6-diamidino-2-phenylindole (DAPI)-stained the chromatin (blue). SC numbers corresponded to the metaphase karyotype. Rb(2.11)—Robertsonian submetacentric. Chromosome №7 is non-Robertsonian submetacentric. XX—male sex chromosomes. Bar (**a**–**c**) = 5 μm.

#### *3.2. Prophase I Stages in E. alaicus*

All prophase I stages were demonstrated for *E. alaicus*. Thin short SYCP3 fragments and SYCP3 conglomerates were formed in the leptotene stage (Figure 2a). In the early zygotene stage, long axial elements were visible, and SYCP3 blocks were kept (Figure 2b). As the zygotene progressed, the axial elements of homologous chromosomes began to contact each other (Figure S3), and synapsed more frequently from telomere areas to the central part (Figure 2c), less often in the opposite way (Figure 2d). In the pachytene stage, chromosomes were completely synapsed (Figure 2f). In the diplotene stage, chromosome desynapsis might be different within a single cell, from telomere sites to the center, or vice versa (Figure 2g). Then SYCP3 degraded (Figure 2h), and it was preserved as dots in the diakinesis stage (Figure 2i).

Sex (XX) chromosomes from early to mid zygotene were detected as separate axial elements (Figure S3). From the mid-late zygote, the XX chromosomes became well visible (Figure 2c). The sex bivalent is similar to those of the cryptic species, *E. talpinus* [59] and *E. tancrei* [60,61]. In the middle pachytene, the XX-bivalent was usually shifted to the periphery of the meiotic nucleus and had two telomeric synaptic regions, a wide asynaptic region, and chromatin bodies on axial elements (Figure 2f). It should be noted that sometimes chromatin (nucleolus-like) bodies in sex bivalents were SYCP3-positive (Figure 1, Figure 2e,f), which has not been previously observed for the other two cryptic species and *E. alaicus* (2n = 48) from the Pamir–Alay [50].

**Figure 2.** Prophase I stages in *E. alaicus* males. Axial/lateral elements were identified using anti-SYCP3 antibodies (green) and kinetochores (red) using CREST. During the leptotene stage, numerous thin SYCP3 fragments and SYCP3 conglomerates were formed (**a**), formed axial elements (**b**) were visible in the zygotene. Chromosome synapsis began with either distal (telomeric) (**c**) or central chromosome parts (**d**). At the pachytene stage, 25 autosomal SCs and sex (XX) bivalents were formed (**e**,**f**). The XX bivalent demonstrated synapsis at telomeric areas and asynapsis at the central region (**e**,**f**). In the mid-late diplotene, desynapsis progressed from both the central and telomeric areas (**g**). SYCP3 was degraded (**h**) and retained as dots (**i**). Rb—Robertsonian submetacentric. Chromosome 7 is non-Robertsonian submetacentric. Bar (**a**–**i**) = 5 μm.

#### *3.3. Types of Non-Homologous Acrocentric Connections*

Various non-homologous chromosome contacts were identified in meiotic prophase I, during zygotene-early pachytene stages. We revealed at least four sequential contact and link types of chromosomes (Figure 3, Figure 4, Figure 5, Figure 6, Figure 7).

#### 3.3.1. Proximity

Acrocentrics occupied spatial positions, intending to get their centromeres closer (pink squares and pink numbers of chromosomes in Figure 3, Figure 7, Figures S4 and S5). The distance between the chromosomes was about 1–2 microns. Centromere regions of chromosomes were located around the H3K9me3-domain (Figure 6 and Figures S5–S7). This type precedes true meiotic contacts (see the following types). This 'proximity' type was somewhat subjective, because it can easily be confused with closely located chromosomes.

#### 3.3.2. Touching

Acrocentrics moved closer to each other. In each contact acrocentric, one axial element of the short arm was extended, reaching out to one another as if touching each other with their ends (blue squares and blue numbers of chromosomes in Figure 3, Figure 5, Figure 7, Figures S4 and S5). Usually, the distance between the chromosomes is less than 1 micron.

#### 3.3.3. Anchoring/Tethering

One of the axial elements in the short arms of non-homologous acrocentrics were linked to each other by SYCP3-filament (yellow squares and yellow numbers of chromosomes in Figure 3, Figure 4, Figure 5, Figure 7, Figures S4 and S5). Other axial elements of the short arms of the two non-homologous partners were not connected (see blue arrowheads in Figures 4 and 5c).

#### 3.3.4. Fusion

The other two axial elements in the short arm of the non-homologous acrocentrics were tightly adjacent to each other, or possibly connected entirely. The two acrocentrics likely represented a single bivalent with two centromeres (Figure 4e,f, Figure 5c). Centromeric regions were closer to each other compared to other connection types. Such dicentric bivalents were observed in some low-chromosomal forms in *E. tancrei* (Figure 7 and Figure S8).

Such linkage was evidenced by immunostaining; two thin SYCP3-positive filaments (Figure 4e,f). The difference between 'anchoring/tethering' (see filaments between chromosomes 5 and 6 in Figure S4e) and 'fusion' (Figure 4e,f; see chromosomes 14 and 18 in Figure S4e) can be determined by the presence of a complete bridge, with a short distance between two contacting non-homologous chromosomes. In all cases, the contact was made by the short arms of acrocentric chromosomes by SYCP3-filaments (see enlarged fragments of cells; Figure 4).

51% of pachytene cells had meiotic contacts including 'touching', 'anchoring/tethering' and 'fusion' types and excluding 'proximity' (Figure 5a). The patterns of proportion of different connection types were similar in three males: the number of 'touching' was greater than the number of 'anchoring/tethering', the number of 'fusion' was less than two other types (Figure 5b). The 'fusion' type was not found in one male, №27353 (Figure 5b). The rarity of 'fusion' type may be due to the fact that more molecular events should precede the joining/fusion of the axial elements of the short arms of two non-homologous acrocentrics.

Chromosome combinations in pachytene spermatocytes of *E. alaicus* were numerous. We were unable to determine the trend in the frequencies of contact between certain chromosomes. However, chromosomes 1,4,5,20–22 came into contact more often. For example, chromosome №1, which was the most regularly seen in all three types of true interactions ('touching', 'anchoring/tethering', and 'fusion'). Rb(1.3) was described in *E. alaicus* with 2n = 50 [50].

#### *3.4. Histone H3K9me3 in Prophase I and Meiotic Chromosomes Contacts*

We investigated H3K9me3 (trimethylation of H3 lysine 9) immunolocalization and distribution in contacting chromosomes. H3K9me3 is an epigenetic marker for heterochromatin allocation in prophase I [62].

During the zygotene stage, large clouds of H3K9me3 were localized at pericentromeric regions of axial elements, or at fully formed SCs (Figure S6a–c). In the pachytene stage, clouds of H3K9me3 were reduced in size and were clearly localized to SC pericentromeric regions (Figure 6a–c, Figure S6d–i). Rb-metacentrics had minimal H3K9me3 levels in the centromeric region (Figure 6a–c, Figure S6d–i), or were absent. Non-Robertsonian submetacentric №7 did not demonstrate clear H3K9me3 signals in all studied cells (Figures S6e–h and S7a–c). H3K9me3 usually shrouded one of the axial elements, and a chromatin body inside the sex bivalent (XX) (Figure 6a–d); less often H3K9me3 covered it entirely (Figure S5g,h), whilst γH2AFX totally enclosed XX (Figures 6c and 3b,e). A more detailed description of XX sex chromosomes in *E. alaicus* will be discussed in future work.

**Figure 3.** Different chromosome combinations in pachytene spermatocytes of *E. alaicus* (**a**–**f**). Axial elements were identified using anti-SYCP3 antibodies (green), kinetochores using CREST antibodies (red), and anti-γH2AFX (violet) was used as a marker of chromatin inactivation. Pink, blue, and yellow squares correspond to different types of non-homologous acrocentric connections (**f**). Bar (**a**–**e**) = 5 μm.

**Figure 4.** Anchoring/tethering and fusion of non-homologous acrocentric chromosomes in *E. alaicus* spermatocytes (**a**–**l**). Lateral elements were identified using anti-SYCP3 antibodies (green), and CREST antibodies for kinetochores (red). A1 and A2—contacting non-homologous acrocentrics. Blue arrowheads show the free ends of lateral elements of non-homologous acrocentrics.

In non-homologous acrocentric contacts, H3K9me3 was involved in varying volumes, usually in centromeric regions. We detected a large H3K9me3 cloud between chromosomes at the 'proximity' type (pinks points in Figure 6a,c,e–g,i). For closer chromosome contacts, i.e., 'touching' and 'anchoring/tethering', H3K9me3 distribution ranged from average to insignificant (small) (yellow points in Figure 6a,d,e,g,i). If contacts involved shorter and longer acrocentrics, H3K9me3 signals were located closer to centromeric regions of shorter acrocentrics (Figure 6g,i). For other cases, H3K9me3 signals were clearly identified between the two centromeres of contacting chromosomes (Figure 6d,h).

#### *3.5. Mitotic Cells*

In contrast to meiotic cells demonstrating 'touching', 'anchoring', and 'fusion' of different acrocentrics during meiotic prophase I (Figure 5), in mitotic metaphases, we detected a single association in most cells, which were similar to the 'fusion' type. In total, we checked 693 cells: 532 cells had normal karyotype, 32 cells demonstrated chromosomal associations (5.6%), and 126 were aberrant ones (micronuclei, polyploid cells, disturbed anaphases, chromothripsis, etc.). C-banding exposed small blocks of pericentromeric heterochromatin and several intercalary blocks (for two pairs of acrocentrics) (Figure S1c). Small blocks of C-heterochromatin were visible in associated chromosomes, but such blocks were not always merged.

**Figure 5.** Quantitative characteristics of meiotic contacts in *E. alaicus*. (**a**). The diagram shows the percentage of pachytene cells with or without meiotic connections in three mole vole males (№27352, 27353, 27357). 'Touching', 'anchoring/tethering' and 'fusion' types were counted in cells with connections only. (**b**). Proportions of connection types. Blue color corresponds to 'touching', yellow—to 'anchoring/tethering' and gray—to ''fusion". Red numbers are counted pachytene spermatocytes (**a**,**b**). (**c**). Types of contacts that are included in the diagrams. The color of the frames corresponds to the types of contacts (**b**).

**Figure 6.** Location of SYCP3 (green), H3K9me3 (red), CREST (white) and gamma-H2AFX (violet) in the pachytene stage of prophase I of *E. alaicus* spermatocytes (**a**–**i**). Size bar = 5 μm (**a**–**c**). (**d**) an enlarged fragment of (**a**), (**e**–**i**) enlarged fragments of distinct cells. Designations A1–A6 do not correspond to numbers of chromosomes as in Figure 1. See the explanation in the text.

**Figure 7.** Different types of chromosome connections in meiotic prophase I. The scheme is based on observations of chromosome behaviors during *E. alaicus* meiosis. Green corresponds to SC axial/lateral elements, red dots refer to centromeres. Non-homologous acrocentrics (A1 and A2) occupy positions in the nucleus, intending to get closer (proximity, position 1). One of two lateral elements in the short arm A1 and A2 are extended (position 2), touching each other (position 3) (touching) and eventually bind together (anchoring/tethering, position 4). The other two axial elements are tightened to each other. (Position 5) and probably joined together (position 6) (fusion). After the probable binding of the two axial elements, the centromeres are located somewhat closer to each other (compare positions 5 and 6). Such dicentric bivalents were found in all oocytes of low-chromosomal forms of *E. tancrei,* 2n = 34 (Figure S8). In such dicentric chromosomes, one centromere may be inactivated, with a new chromosome emerging (position 7).

#### **4. Discussion**

In this work, we describe the variety of chromosomal interactions in male meiosis of *E. alaicus*. Immunocytochemical staining of spermatocytes demonstrated at least four contact types for non-homologous acrocentrics in meiotic prophase I. Starting from clustering inside the heterochromatic cloud, chromosomes demonstrate touching by the elongated axial element of the short arms, then tether by SYCP3 filaments and complete the process when tightly adjacent to each other. As a result, two acrocentrics likely represents a single bivalent with two centromeric regions.

Recently, the first report on connections between non-homologous chromosomes and SC structures was demonstrated by investigating meiosis in CD-1 male mice after irradiation [43]. The authors of this work stressed that the formation of chromosome bridges between non-homologous chromosomes differed from normal endogenous DSB interactions, which did not require connections between axial elements of homologous chromosomes. The wild ancestors of CD-1 mice were bred in a Swiss laboratory in the 1920s. Genomic studies revealed that CD-1 was mostly derived from *M. domesticus* [63], a species with enormous chromosome variability [64]. In this species, specific major and minor satDNA tandem repeats, which are oriented head-to-tail at centromeres [65], may facilitate fusion of mono-armed chromosomes, and build up bi-armed ones. Repeat polarity occurs in telocentric and Rb chromosomes, therefore it was assumed that tandem repetitive satDNA in *M. domesticus* may have been a universal pattern of chromosomal evolution. Numerous Rbs, characteristic for the species, and presumed

random associations of bivalents in the wild type, *M. domesticus* (2n = 40) spermatocytes, exemplified a wide spectrum of fusions due to the universal structure of pericentromeric satDNA in this species.

We did not analyze pericentromeric heterochromatin in *Ellobius*, although the numerous associations (at 'proximity' type) were similar to *M. domesticus* [43]. Pericentromeric heterochromatin evidently participates in chromosome contacts and fusions. A prelude at the zygotene (Figure S3) may be assigned as an example. Formation of Rbs may depend on telomere changes, especially shortening and inactivation [38]. Recently, the study of wild house mice demonstrated that telomere shortening and the number of critically short telomeres are likely to result in Rb formation [66]. Earlier, we did not detect the telomeric sequences in centromeric regions of numerous Rbs, including Rb(2.11), in *E. tancrei* and *E. talpinus* hybrids [49]. The lack of signal may be explained by elimination as well as inactivation of telomere fragments. The study should be continued in *E. alaicus*.

Contrary to mouse experiments [43], where chromosome contacts were discovered after gamma radiation, *E. alaicus* came from intact natural habitats, close to terra typica of the species. Numerous and various chromosomal contacts, the enormous number of cells with such chromosome 'dance' in all studied males demonstrated an internal background for evolutionary changes. In natural *E. alaicus* populations, we detected [50] four different Rb translocations Rb(2.11), Rb(1.3), Rb(4.9), Rb(3.10) in distinct combinations, with 2n from 52 to 48. A small number of chromosomal associations in mitotic metaphases, which we demonstrated now for animals with a single pair of Rb(2.11), indirectly confirmed the leading role of changes in the meiotic prophase I. We revealed differences in mitotic cell divisions for fibroblast and chondrocyte cultures. Cell culture often demonstrates unstable karyotype structures and distinct abnormality types and frequencies [67,68]. This assumption was our rationale for comparing data on chromosome numbers and behaviors using bone marrow slides, and two different cell culture approaches. Fibroblasts mostly retained 2n = 52, and numerous polyploid cells. Chondrocyte cultures appeared to be more fragile and demonstrated distinct changes as micronuclei, disturbed anaphases, massive chromosome changes, and associations (Figures S1 and S2). The evolutionary significance of such mutations is incredible if such events appear in the germline [32,69]. The picture of chromosomal re-assemblage (Figure S1) may be an example of chromothripsis, which combines chromosome shattering and fusion [70].

In mitotic cells, we observed two centromeres and two blocks of heterochromatin in the association of two small acrocentrics (Figure S1c,e). Previously, in the pachytene, we observed a small metacentric with two centromeres in all studied animals from natural *E. tancrei* populations, 2n = 34 (as in Figure S8). *E. tancrei* demonstrates a wide spectrum of homologous and non-homologous Robertsonian metacentrics in natural populations [51]. We suggest that dicentric chromosome formation in natural populations of this species, confirms the 'contact first in meiosis' model.

Although we had no meiotic data to prove double-stranded DNA breaks and chromosome reassembling, we demonstrated that the 'contact-first' scheme may be applicable to meiotic translocation mechanisms. Another question—the fate of small arms of acrocentric chromosomes, which are apparently kept in fused chromosomes (Figure 4e,f and Figure 5c)—is closely related to the role and evolution of the centromere. As previously mentioned, we supposed the neocentromere origin was a crucial step for *E. tancrei* and *E. alaicus* genome evolution. In cases of 'fused' chromosomes of *E. alaicus,* we revealed two distinct centromere signals using CREST immunostaining. The question of the further fate of two centromeres and chromosome fragments between them is now open, and the study should be continued.

Considering that mitosis and meiosis data are consistent, we can argue for the essential role of heterochromatin. In mouse spermatocytes, heterochromatin preserves the association of homologous centromeres and promotes faithful chromosome segregation in meiosis I [71]. Heterochromatin, in maintaining genome stability [72], may be a focal point for changes. We demonstrated that a 'proximity' to fuse correlated with the heterochromatin affinity of non-homologous chromosomes. Chromosomes were located around the H3K9me3-heterochromatin cloud (Figure 6f,i), similar to telocentric associations in mice [44]. Clustering of heterochromatic regions inside the H3K9me3 clouds

was demonstrated for different sets of acrocentrics in *Mus musculus domesticus* either for the wild type 2n = 40 or for different forms with Rbs [44,73,74]. This clustering could be the background of the translocations. However, no data on contacting chromosomes were published, except the gamma radiation treatment caused the formation of bridges between chromosomes [43]. Earlier, we described the participation of heterochromatin of short chromosome arms in the formation of SC chains in mole voles heterozygous for multiple Rbs [75].

High natural variability along with the rapid fixation of new translocations and specific contacts of non-homologous chromosomes in meiosis, which we demonstrated for *E. alaicus*, built a promising background to change the model for the mechanism of translocations from the 'contact first' to the 'contact first in meiosis'.

The evolutionary significance of meiosis as a tool to increase recombinational diversity is balanced by its function as a mechanism for purifying selection [76]. Meiotic checkpoints block cell cycle progression in response to defects and preclude abnormal chromosome segregation. Nevertheless, creative meiosis became apparent if we apply the 'contact-first' model for meiotic prophase I, as we demonstrated for a unique rodent species, which entered the stage of Robertsonian translocations emergence.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/4/386/s1, Figure S1: Mitotic chromosomes of *Ellobius alaicus*, Figure S2: Aberrations in mitosis of *Ellobius alaicus*, Figure S3: Zygotene spermatocyte of *E.alaicus*, Figure S4: Different chromosome combinations in pachytene spermatocytes of *E.alaicus*, Figure S5: Different chromosome combinations in pachytene spermatocytes of *E.alaicus*, Figure S6: Location of SYCP3, H3K9me3, CREST and chromatin configuration (DAPI) in zygotene—pachytene stages of prophase I of *E. alaicus* spermatocytes, Figure S7: Location of SYCP3, H3K9me3, CREST, and chromatin configuration (DAPI) in pachytene—diakinesis stages of the prophase I of *E. alaicus* spermatocytes, Figure S8: A pachytene oocyte of *E. tancrei,* 2n = 34, NF = 56.

**Author Contributions:** Conceptualization, S.M., O.K., I.B.; methodology, O.K., I.B., S.M.; investigation, S.M., I.B., E.A. and A.B.; writing, I.B., S.M., E.A., A.B., and O.K.; visualization, S.M., and I.B.; funding acquisition, S.M., I.B.; supervision, O.K., and I.B. All the authors read and approved the final text of the submitted manuscript and the supporting information. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially supported by the research grants of the Russian Foundation for Basic Research Nos. 20-34-70027, 20-04-00618; VIGG RAS State Assignment Contract (S.M, O.K), and IDB RAS State Assignment for Basic Research (A.B., E.A., I.B.).

**Acknowledgments:** We thank the Genetic Polymorphisms Core Facility of the Vavilov Institute of General Genetics of the Russian Academy of Sciences, Moscow, for the possibility to use their microscopes. We are grateful to Core Centrum of the Koltzov Institute of Developmental Biology RAS for the possibility of using samples from the Cell culture collection and the Joint collection of wildlife tissues for fundamental, applied, and environmental researches.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Structural Variation of the X Chromosome Heterochromatin in the** *Anopheles gambiae* **Complex**

**Atashi Sharma 1, Nicholas A. Kinney 2, Vladimir A. Timoshevskiy 1, Maria V. Sharakhova 1,3,4 and Igor V. Sharakhov 1,2,3,5,\***


Received: 19 February 2020; Accepted: 16 March 2020; Published: 19 March 2020

**Abstract:** Heterochromatin is identified as a potential factor driving diversification of species. To understand the magnitude of heterochromatin variation within the *Anopheles gambiae* complex of malaria mosquitoes, we analyzed metaphase chromosomes in *An. arabiensis*, *An. coluzzii*, *An. gambiae*, *An. merus*, and *An. quadriannulatus*. Using fluorescence *in situ* hybridization (FISH) with ribosomal DNA (rDNA), a highly repetitive fraction of DNA, and heterochromatic Bacterial Artificial Chromosome (BAC) clones, we established the correspondence of pericentric heterochromatin between the metaphase and polytene X chromosomes of *An. gambiae*. We then developed chromosome idiograms and demonstrated that the X chromosomes exhibit qualitative differences in their pattern of heterochromatic bands and position of satellite DNA (satDNA) repeats among the sibling species with postzygotic isolation, *An. arabiensis*, *An. merus*, *An. quadriannulatus*, and *An. coluzzii* or *An. gambiae*. The identified differences in the size and structure of the X chromosome heterochromatin point to a possible role of repetitive DNA in speciation of mosquitoes. We found that *An. coluzzii* and *An. gambiae*, incipient species with prezygotic isolation, share variations in the relative positions of the satDNA repeats and the proximal heterochromatin band on the X chromosomes. This previously unknown genetic polymorphism in malaria mosquitoes may be caused by a differential amplification of DNA repeats or an inversion in the sex chromosome heterochromatin.

**Keywords:** *Anopheles*; heterochromatin; mosquito; mitotic chromosome; sex chromosome; satellite DNA; X chromosome

#### **1. Introduction**

The major malaria vector in Africa *Anopheles gambiae* has been a subject of extensive research over the past few decades. Initially described as a single species, *An. gambiae* was later subdivided into a complex of morphologically indistinguishable species by crossing experiments [1,2], fixed differences in polytene chromosome arrangement and distinct adaptations [3,4], differences in Intergenic Sequence (IGS) and Internal Transcribed Spacer (ITS) regions in the ribosomal DNA (rDNA) [5,6], and by whole-genome divergence [7–9]. Members of the *An. gambiae* complex differ in genetic diversity and they include major vectors, minor vector, and nonvectors [3,4,10]. The current list of the species includes *An. arabiensis, An. amharicus, An. bwambae, An. coluzzii, An. fontenillei, An. gambiae, An. merus, An. melas*, and *An. quadriannulatus* [4,7,9].

The majority of interspecies crosses in the *An. gambiae* complex produce fertile females and sterile males [2,11–14] in agreement with Haldane's rule [15]. However, the most closely related species *An. coluzzii* and *An. gambiae* do not have postzygotic reproductive barriers as in laboratory conditions they readily mate and lay viable eggs that produce fertile hybrid males and females [16,17]. If the *An. gambiae* complex originated as recently as 526 thousand years ago, the *An. coluzzii* and *An. gambiae* lineages split from the common ancestor only ~61 thousand years ago [18]. The recently diverged species *An. gambiae sensu stricto* and *An. coluzzii* have been initially considered as the S and M molecular forms of *An. gambiae* [5,6] based on specific single nucleotide polymorphism (SNP) differences of ITS2 sequences [19,20]. Later studies demonstrated that the two forms also differ in genome sequence [8], gene expression [21], larval ecology [22], larval behavior in the presence of predators [23,24], adult swarming behavior [25], and adult mate recognition [17,26]. Thus, these data show evidence for ecological, behavioral, and genome-wide differentiation between *An. coluzzii* and *An. gambiae*.

Cytogenetic analysis of the polytene chromosomes banding patterns, including fixed chromosomal inversions, is an established tool for distinguishing species of malaria mosquitoes [3,4,27–29]. In addition, cytogenetic analysis of mitotic chromosomes demonstrated interspecific differences in sex chromosome heterochromatin between *An. gambiae* and *An. arabiensis* (named *An. gambiae* species a and B at that time) [30]. Staining with the Hoechst fluorescent dye has shown that the presence and brightness of X-chromosome heterochromatic bands differ between the two species [31]. More recently, heterochromatin variation of mitotic Y chromosomes among species of the *An. gambiae* complex has been clearly demonstrated [32]. Intraspecific polymorphism in the X and Y chromosome heterochromatin has also been observed in both natural and laboratory populations of *An. gambiae* [30,31], but is unclear if the previously observed variations can be related to possible differences between the more recently described incipient species *An. coluzzii* and *An. gambiae s.s.* [7]. This information can be useful because vector control to be successful must consider full spectrum of genetic and phenotypic variation within vector species. Further, it has been suggested that polymorphism in sex chromosome heterochromatin may affect fertility and sexual behavior of mosquitoes [33], but the lack of understanding of heterochromatin structure and function prevent researchers from mechanistic understanding of the phenotypic effects.

Sequencing of the *An. gambiae* genome provided important information regarding its organization [34]. It has been demonstrated that repetitive DNA component represent a substantial portion in of the *An. gambiae* genome (33%) that is higher than in *Drosophila melanogaster* genome where repeats make up approximately 24% of the genome [35,36]. The majority of repetitive DNA in the *An. gambiae* genome is tightly packed in heterochromatin around the centromeres [37]. Difficulty with sequencing of heterochromatin led to underrepresentation of heterochromatic sequences in the *An. gambiae* genome assembly [34]. Moreover, the repetitive nature of these sequences poses an impediment in mapping them correctly to chromosomes. Subsequent attempts to map heterochromatic genomic scaffolds led to an addition of ~16 Mb of heterochromatin to the genome of the *An. gambiae* PEST strain [38]. Further progress was made by predicting functions of 232 heterochromatin genes [39] and by mapping genes on the heterochromatin-euchromatin boundary of the polytene chromosome map [40]. Bioinformatics analysis of so called "unknown chromosome" or ~42 Mb of unmapped sequences in *An. gambiae* PEST genome demonstrated that it has characteristics of heterochromatin [39]. Although both *An. gambiae* and *An. coluzzii* genome assemblies are now available [39,41], corresponding information of their entire heterochromatin on a chromosome map is still missing. Likewise, sequencing of five other species from the *An. gambiae* complex also excluded the majority of heterochromatic sequences from the genome assembly [9,42].

Comparison of *An. gambiae* (the former S form) and *An. coluzzii* (the former M form) genomes identified pericentromeric autosomal and X-chromosome regions of high differentiation, termed "speciation islands," or "islands of genomic divergence" [43,44]. These regions largely correspond to pericentromeric heterochromatin. Overlaps between the heterochromatin and the

islands of genomic divergence are 91% in the X chromosome, 97% in the 2L arm, and 94% in the 3L arm [39]. Other studies emphasized that the highest genomic divergence between these nascent species occurred within the 4 Mb of mapped heterochromatin on the X chromosome [8,45,46]. A recent population genomic analysis of 765 field-collected mosquitoes across Africa has demonstrated that sequences in the pericentromeric X heterochromatin show the greatest separation between *An. coluzzii* and *An. gambiae* populations on a neighbor-joining tree [47]. The study also analyzed the *An. coluzzii* and *An. gambiae* genomes for CRISPR/Cas9 target sites; it found that 5474 genes had at least one viable target after excluding target sites with nucleotide variation. Interestingly, targetable genes are spread non-uniformly across the genome, falling predominantly in pericentromeric heterochromatic regions, where levels of variation are lower [47]. However, the full spectrum of heterochromatin variation and molecular composition of the X-chromosome heterochromatin in malaria mosquitoes remains unknown. Because repetitive DNA is underreplicated [48], polytene chromosomes have limited applications in studies of heterochromatin. In comparison, mitotic chromosomes are a promising system for evolutionary studies of heterochromatin in malaria mosquitoes.

In this study, we examined similarities and differences in heterochromatin patterns within X mitotic chromosomes among the major malaria vectors *An. gambiae*, *An. coluzzii, An. arabiensis,* minor vector *An. merus,* and zoophilic non-vector *An. quadriannulatus*. The correspondence of pericentric heterochromatin between the metaphase and polytene X chromosome was established for *An. gambiae*. We report quantitative differences in molecular organization of heterochromatin among members of the *An. gambiae* complex and highlight shared polymorphism of the heterochromatin structure among strains of the incipient species *An. gambiae* and *An. coluzzii.* The identified variation in heterochromatin is discussed with respect to the phylogenetic relationships among the species and a possible role of heterochromatin in speciation.

#### **2. Materials and Methods**

#### *2.1. Mosquito Strains and Colony Maintenance*

Laboratory colonies examined for this study were provided by the Biodefense and Emerging Infections Research Resources Repository and included colonies of *An. gambiae* PIMPERENA (MRA-861), *An. gambiae* KISUMU1 (MRA-762), *An. gambiae* ZANU (MRA-594), *An. coluzzii* SUA2La (MRA-765), *An. coluzzii* MOPTI (MRA-763), *An. coluzzii* MALI-NIH (MRA-860), *An. arabiensis* DONGOLA 2Ra2Rb3R (MRA-1235), *An. quadriannulatus* SANGWE (MRA-1155), and *An. merus* MAF (MRA-1156). The species and strains used in this study are presented in Table S1. All mosquitoes were reared at 27 ◦C, with 12:12 light:dark cycle and 70% relative humidity. Authentication of the species was done using PCR diagnostics [49,50]. Larvae were fed fish food and adult mosquitoes were fed 1% sugar water. To induce oviposition, females were fed defibrinated sheep blood (Colorado Serum Co., Denver, Colorado, USA) using artificial blood feeders. To perform interspecies crosses between *An. gambiae* and *An. coluzzii*, male and female pupae were separated to ensure virginity of adult mosquitoes. We differentiated males and females at the pupal stage using sex-specific differences in the shape of their terminalia [51]. After the emergence of adults, crossing experiments were performed by combining 30 females and 15 males in a single cage. Five days after random mating, the females were fed sheep blood. Two days later, an egg dish, covered with moist filter paper to keep the eggs from drying out, was put into the cage to obtain F1 hybrid larvae for cytogenetic analyses. Any possible colony contamination during the experiment was ruled out by verifying all the strains for their respective species using intentional mismatch primers (IMP) as described elsewhere [19]. All the strains showed the expected band size, with *An. gambiae* strains around 330 bp and *An. coluzzii* strains around 460 bp.

#### *2.2. Chromosome Preparation*

Preparations from early 4th instar larvae of lab colonies were made from leg and wing imaginal discs, as previously described [52]. Both male and female larvae were immobilized on ice for 10 min, and then dissected in a drop of cold, freshly prepared, hypotonic solution (0.075 M potassium chloride). After decapitating the head, the thorax was opened using dissecting scissors (Fine Science Tools, Foster City, CA, USA), followed by removal of the gut and fat from the body. a fresh drop of hypotonic solution was added to the preparation for 10 min, followed by fixation in a drop of modified Carnoy's solution (ethanol:glacial acetic acid, 3:1) for 1 min. Next, a drop of freshly prepared 50% propionic acid was added, and the imaginal discs were covered with a 22 × 22 mm coverslip. After 5 min, the preparations were squashed using the flat rubber end of a pencil and dipped in liquid nitrogen until the bubbling stopped. Coverslips were removed using a sharp blade and slides were transferred to cold 50% ethanol and stored at −20 ◦C. After 2 h, slides were dehydrated in a series of 70%, 80%, and 100% ethanol. Preparations with the highest number of metaphase plates were chosen for *in situ* hybridization.

#### *2.3. C0t DNA Preparation and DNA Probe Labeling*

To identify the position of the centromere in chromosome X, we prepared the repetitive DNA fraction using a previously described method [52]. Genomic DNA was isolated from 500 g of non-bloodfed adult *An. coluzzii* MOPTI mosquitoes using a Qiagen Blood and Cell culture DNA Maxi kit (Qiagen, Hilden, Germany). Isolated C0t1 repetitive fraction was precipitated with isopropanol and labelled by nick-translation in 50 μl, containing 1 μg DNA, 0.05 mM each of unlabeled dATP, dCTP, and dGTP, and 0.015 mM of dTTP, 1 μl of fluorescein-dUTP, 0.05 mg/ml of BSA, 5 μl of 10× nick-translation buffer, 20 U of DNA-polymerase I, and 0.0012 U of DNase, at 15 ◦C for 2.5 h. Primers for 18S rDNA and satDNA repeats from *An. gambiae* AgY53A, AgY477-AgY53B, AgY477, and AgY53C were obtained, as previously described [14,53] (Table S2). An Immomix PCR kit (Bioline, Inc., Taunton, MA, USA) was used to label satellites by incorporating Cy3 and Cy5 fluorescently labeled nucleotides (Enzo Life Sciences, Inc., Farmingdale, NY, USA) directly into the PCR reaction. Each 25 μl PCR mix consisted of 35–40 ng genomic DNA, 0.3 U Taq polymerase, 1× PCR buffer, 200 μM each of dATP, dCTP, and dGTP, and 65 μM dTTP, and 0.5 μl Cy3-dUTP or 0.5 μl Cy5-dUTP (Enzo Life Sciences, Inc., Farmingdale, NY, USA). Thermocycling was performed using ImmoMix TM (Bioline Inc., Taunton, MA, USA) beginning with a 95 ◦C incubation for 10 min followed by 35 cycles of 95 ◦C for 30 sec, 52 ◦C for 30 sec, 72 ◦C for 45 sec, 72 ◦C for 5 min, and a final hold at 4 ◦C. Bacterial Artificial Chromosome (BAC) clones 05F01 (GenBank accession: AL142298, AL142299), 179F22 (GenBank accession: BH373300, BH373306), and 01K23 (GenBank accession: AL607293, AL607294) [34] were labelled by nick-translation in 50 μl, containing 1 μg DNA, 0.05 mM each of unlabeled dATP, dCTP, and dGTP and 0.015 mM of dTTP, 1 μl of Cy3-dUTP (or another fluorochrome), 0.05 mg/ml of BSA, 5 μl of 10× nick-translation buffer, 20 U of DNA-polymerase I, and 0.0012 U of DNase at 15 ◦C for 2.5 h.

#### *2.4. Fluorescence in situ Hybridization (FISH)*

Suitable slides with >10 metaphase plates were selected for FISH, which was performed as previously described [52,54]. Briefly, slides with good preparations were treated with 0.1 mg/ml RNase at 37 ◦C for 30 min. After washing twice with 2× saline-sodium citrate (SSC) for 5 min, slides were digested with 0.01% pepsin and 0.037% HCl solution for 5 min at 37 ◦C. After washing slides twice in 1× phosphate-buffered saline (PBS) for 5 min at room temperature, preparations were fixed in 3.7% formaldehyde for 10 min at RT. Slides were then washed in 1× PBS and dehydrated in a series of 70%, 80%, and 100% ethanol for 5 min at RT. Then, 10 μl of probes were mixed, added to the preparations, and incubated overnight at 37 ◦C. After washing slides in 1× SSC at 60 ◦C for 5 min, 4× SSC/NP40 solution at 37 ◦C for 10 min, and 1× PBS for 5 min at room temperature (RT), preparations were counterstained with a DAPI-antifade solution (Life Technologies, Carlsbad, CA, USA) and kept in the dark for at least 2 h before visualization with a fluorescence microscope.

#### *2.5. Image Acquisition and Chromosome Measurements*

Chromosome slide preparations were viewed with an Olympus BX61 fluorescence microscope (Olympus, Tokyo, Japan) using BioView software (BioView Inc., Billerica, MA, USA) at 1000× magnification. For idiogram development, metaphase plates were measured and analyzed from 5 larvae per strain. a total of 40 metaphase plates were used to measure chromosome lengths in *An. coluzzii*. For sibling species *An. arabiensis*, *An. quadriannulatus*, and *An. merus*, 11–13 metaphase plates were used to measure chromosome lengths (Table S3). Chromosome images were inverted using Adobe Photoshop CS6 (Adobe Inc., San Jose, CA, USA) and measured using the ruler tool at 1000× magnification. a statistical Tukey's test or a nonparametric Kruskal–Wallis rank sum test followed by a Dunn's test were performed to compare chromosome length between the species using JMP13 software (SAS Institute Inc., Cary, NC, USA) or R software (RStudio, Boston, MA, USA). Fluorescence intensities of proximal and distal X chromosome heterochromatin bands were measured using Adobe Photoshop CS6 (Adobe Inc., San Jose, CA, USA) and compared between three strains of *An. gambiae* and three strains of *An. coluzzii*. About forty measurements for each band were taken from forty sister chromatids of twenty X chromosomes for each strain. Pairwise comparisons between strains were performed using Student's t-test.

#### *2.6. Quantative Analysis of Fluorescent Signal Positions*

Chromosomal positions of satDNA AgY477–AgY53B and AgY477, and the proximal DAPI band were identified and compared in three strains of *An. gambiae* and three strains of *An. coluzzii*. a custom MATLAB script [55] was written and used to measure the position of fluorescence peaks in individual channels [56]. The program was automated to measure the position of each satDNA and DAPI peak along the length of the X chromosome. The output provided the maximum likelihood of a combination of probe positions. The results were compared within and between species. The MATLAB program consisted of two user-guided steps followed by a third step of automated analysis. In the first step, sex chromosomes were identified by the user on mitotic slide preparations from *An. gambiae* and *An. coluzzii* strains. In the second step, the boundary of each identified sex chromosome was traced by a trained user. In an automated step, the pattern of heterochromatin and satDNA fluorescence was averaged longitudinally along the user-defined boundary. Program output displayed fluorescence intensity graphically from chromosome centromere to telomere. Peak fluorescence intensity was used to automatically infer the order of the heterochromatin and satDNA probes. User guidance was deliberately introduced to eliminate the pitfalls of full automation. In particular, user guidance was critical for accurate identification of sex chromosomes in the imaged mitotic slide preparations. Longitudinal averaging of fluorescence intensity during the program's automated analysis was robust to the boundary of each chromosome traced by the user.

#### **3. Results**

#### *3.1. Correspondence Between Heterochromatin of the Polytene and Mitotic X Chromosome in Anopheles gambiae*

Using FISH with a C0t1 fraction of repetitive DNA, 18S rDNA, and heterochromatic BAC clones, we attempted to establish the correspondence of pericentric heterochromatin between the metaphase and polytene X chromosome in *An. gambiae*. The most repetitive C0t1 DNA fraction should presumably represent the centromeric satellites of the mitotic chromosomes corresponding to the most proximal regions of polytene arms. C0t analysis is the process of renaturation of single stranded DNA to its complementary sequence [57]. The analysis is based on the observation that the more repetitive DNA sequences require a shorter time to re-anneal following denaturation. DNA fractions with C0t1 values equal to 1.0 <sup>×</sup> 10−4–1.0 <sup>×</sup> 10−<sup>1</sup> are considered as highly repetitive DNA fractions [57]. We expected C0t1 to map to narrow regions corresponding to centromeres in mitotic chromosomes. While that was the case for autosomes, in X chromosomes the C0t1 DNA

probe localized to a wide region, including the entire proximal heterochromatin band adjacent to the rDNA locus (Figure 1A). Thus, hybridization of the C0t1 DNA fraction could be indicative of the centromere positions in autosomes, but we could not precisely determine the location of the centromere on the X chromosome in *An. gambiae*. We mapped three heterochromatic BAC clones, 05F01, 179F22, and 01K23, to the distal heterochromatin band of mitotic chromosomes in *An. coluzzii* (Figure 1B). According to the *An. gambiae* PEST genome assembly [38]*,* these BAC clones are located in the most proximal heterochromatic region 6 in the X polytene chromosome map (Figure 1C). BAC clone 05F01 is mapped to scaffold AAAB01008973 (X: 20,698,730–20,818,594), BAC clone 01K23 is mapped to scaffold AAAB01008967 (X: 23,512,846–23569864), and BAC clone 179F22 is mapped to scaffold AAAB01008976 (X: 23,930,050–24,043,835) [38]. These results demonstrate that the assembled euchromatic part of the X chromosome occupies no more than half of the total X chromosome length (Figure 1C). Although polytene chromosomes are significantly larger than mitotic chromosomes, heterochromatin is under-replicated [48] and represented by diffuse structures without clear banding patterns [39], while heterochromatin in mitotic chromosomes is much more prevalent and detailed (Figure 1D). Because the *An. gambiae* PEST X chromosome genome assembly ends at the coordinate 24,393,108, our data indicate that the assembled heterochromatin largely corresponds to the distal heterochromatic band of the mitotic X chromosome. Only few copies of 18S rDNA sequences are found in the assembled X chromosome. Thus, most of the rDNA repeats and more proximal regions of the X chromosome heterochromatin are absent from the mapped genome assembly.

**Figure 1.** Comparison of the X heterochromatin structure between mitotic and polytene chromosomes. (**A**) Multicolor fluorescence *in situ* hybridization (FISH) of a C0t1 fraction of repetitive DNA and 18S rDNA with mitotic chromosomes of *An. gambiae* PIMPERENA; scale bar is 2 μm. (**B**) FISH mapping of BAC clones 05F01 (yellow), 179F22 (red), and 01K23 (green) to the distal heterochromatin block on the X chromosome of the *An. coluzzii* MOPTI strain. Chromosomes are counter-stained with DAPI. (**C**) Relative positions of heterochromatic blocks and DNA probes of the polytene and mitotic X chromosome idiograms. rDNA = 18S rDNA probe. Repetitive DNA = C0t1 fraction of repetitive DNA. C = putative centromere. (**D**) Relative sizes of mitotic and polytene chromosomes in the KISUMU strain of *An. gambiae.* Chromosomes are counter-stained with YOYO-1. Polytene chromosome physical map is from [38].

#### *3.2. Variation in Heterochromatin Morphology Among Sibling Species of the An. gambiae Complex*

We analyzed the heterochromatin morphology in species of the *An. gambiae* complex originating from different parts of Africa (Figure 2A) and diverging from each other from ~61,000 to ~526,000 years ago [18] (Figure 2B).

**Figure 2.** Geographical distribution of species and phylogeny of the *An. gambiae* complex. (**A**) a map of Africa with approximal distribution of species and places of origin of the laboratory strains. (**B**) Species phylogeny based on the X chromosome genomic sequences (redrawn from [18]). Times of species divergence in million years (Myrs) are shown at the tree nodes.

We studied mid-metaphase chromosomes obtained from imaginal discs of the 4th instar larvae of *An. gambiae*, *An. coluzzii*, *An. arabiensis, An. merus,* and *An. quadriannulatus*. Chromosomes at this stage provide reproducible heterochromatin patterns verified across multiple individuals. DAPI, a counter-stain that preferentially stains AT-rich DNA, was utilized to augment the natural banding patterns resulting from heterochromatin variation between and within species. To simplify the comparison, all fluorescence images were inverted, and converted to gray-scale images (Figure 3). The number and size of the X chromosome heterochromatin bands varied across the sibling species. Differences in the X chromosome heterochromatin pattern were clearly visible between all species with postzygotic reproductive isolation (Figure 3). Two relatively small heterochromatin bands can be seen in *An. gambiae* and *An. coluzzii* (Figure 3A,B). The KISUMU strain of *An. gambiae* often shows polymorphism in the size of the heterochromatin between different X chromosomes, even within one individual (Figures 1D and 3A) Extended heterochromatin is observed in X chromosomes of *An. arabiensis, An. quadriannulatus*, and *An. merus* (Figure 3C). The pattern of fluorescence intensity of the heterochromatic bands differs among species as well. *An. gambiae* and *An. coluzzii* possess a dense heterochromatin block on the end proximal to the putative centromere. This is followed by a dull fluorescence area to which the 18S rDNA locus is mapped. a light distal band is present immediately next to the rDNA locus, marking the assembled heterochromatin in the current *An. gambiae* PEST genome assembly [38]. a large dark heterochromatin band is visible in the *An. arabiensis* X chromosome. Our results for *An. gambiae* and *An. arabiensis* corroborated with those for the natural populations of these species in a previous study [38]. In contrast, *An. quadriannulatus*, a zoophilic non-vector species of the *An. gambiae* complex, has one light and two dark heterochromatin bands. The longest X chromosome was found in *An. merus*, the marshy saltwater resident of the *An. gambiae* complex. The *An. merus* X chromosome contains three large heterochromatin blocks and a large rDNA locus making it comparable in length to the autosomes.

**Figure 3.** Metaphase karyotypes of females in species from the *An. gambiae* complex. (**A**) *An. gambiae* strains. (**B**) *An. coluzzii* strains. (**C**) *An. arabiensis*, *An. quadriannulatus*, and *An. merus* strains. Black and dark gray blocks correspond to compact and diffuse heterochromatin, respectively. X chromosomes are labeled. Species names are indicated in italics and strain names are capitalized.

#### *3.3. Molecular Variation of Heterochromatin Among Species of the An. gambiae Complex*

We analyzed similarities and differences in the molecular content of the heterochromatin among sibling species. We performed FISH with 18S rDNA and previously identified satDNA sequences AgY53A, Ag53C, AgY477, and the genomic fragment AgY477–AgY53B that contains partial sequences of satellites AgY477 and AgY53B as well as a junction region [53] (Table S2). In *An. gambiae* PIMPERENA, satellites AgY53A and AgY477–AgY53B hybridized to single locations in the tip and base of the X chromosome proximal heterochromatin band, respectively (Figure 4A,B). This is in sharp contrast to *An. merus*, where AgY477–AgY53B hybridized with four distinct regions of the X chromosome heterochromatin (Figure 4C). Interestingly, AgY477–AgY53B hybridized differently in *An. coluzzii* SUA compared with *An. gambiae* PIMPERENA; AgY477–AgY53B is found in the base of the X chromosome proximal heterochromatin band of *An. coluzzii* SUA (Figure 4D). Satellite AgY477 has also been mapped to the base of the X chromosome proximal heterochromatin band of *An. gambiae* ZANU [32]. As described in the following results section, localization of these satellites represents shared variation among strains of *An. gambiae* and *An. coluzzii.* Unlike AgY53A and AgY477–AgY53B, satellite Ag53C is not sex chromosome-specific [53]; it hybridized with pericentromeric regions of autosomes in *An. coluzzii* MOPTI (Figure 4E). The 18S rDNA probe was mapped to the space between two dense heterochromatic bands on the X chromosome in *An. quadriannulatus* (Figure 4F). Previously, it was demonstrated that the AgY477–AgY53B sequence is immediately adjacent to the rDNA locus on the X chromosome of *An. quadriannulatus* [32]. Both AgY477–AgY53B and AgY477 co-localize at the very tip of the X chromosome outside the major dense heterochromatin band in *An. arabiensis* (Figure 4G). Hybridization of each of these satellites, together with 18S rDNA, demonstrated that, unlike in *An. gambiae* or *An. coluzzii*, AgY477 (Figure 4H) and AgY477–AgY53B (Figure 4I) are separated from the rDNA locus by the large band of dense heterochromatin in *An. arabiensis.*

**Figure 4.** Mapping of repetitive DNA elements to mitotic chromosomes of species from the *An. gambiae* complex. (**A**) FISH of AgY477–AgY53B (green) and 18S rDNA (red) in *An. gambiae* PIMPERENA. (**B**) FISH of satellite AgY53A (green) and 18S rDNA (red) in *An. gambiae* PIMPERENA. (**C**) FISH of AgY477–AgY53B (green) and 18D rDNA (red) in *An. merus*. (**D**) FISH of AgY477–AgY53B (green) and 18S rDNA (red) in *An. coluzzii* SUA. (**E**) FISH of satellite Ag53C (yellow) and 18D rDNA (red) in *An. coluzzii* MOPTI. (**F**) FISH of 18S rDNA (red) in *An. quadriannulatus*. (**G**) FISH of satellite AgY477 (red) and AgY477–AgY53B (green) in *An. arabiensis*. (**H**) FISH of satellite AgY477 (red) and 18D rDNA (green) in *An. arabiensis*. (**I**) FISH of 18S rDNA (green) and AgY477–AgY53B (red) in *An. arabiensis*. Scale bar = 2 μm.

#### *3.4. Shared Heterochromatin Variation Between An. coluzzii and An. gambiae*

To visualize the rDNA locus, we performed FISH of the 18S rDNA probe with mitotic chromosomes. Within all strains of *An. gambiae* and *An. coluzzii*, the rDNA locus was mapped between the proximal and distal heterochromatin bands (Figure S1). We noticed variation in the size of the rDNA locus both among the strains of *An. gambiae* and *An. coluzzii* and between homologous chromosomes within individual mosquitoes. The intra-strain polymorphism in the size of the rDNA locus was especially obvious in KISUMU, ZANU, and MOPTI. This polymorphism may suggest variation in the rDNA copy number.

To better resolve the relative satDNA positions among multiple mosquito strains, we generated F1 hybrids between laboratory strains of *An. gambiae* and *An. coluzzii*. F1 hybrid larvae resulting from the *An. gambiae* × *An. coluzzii* crosses were analyzed for heterochromatin morphology and satDNA location. Qualitative and quantitative differences in the pattern of heterochromatin blocks, rDNA loci, and satDNA location were clearly observed between homoeologous X chromosomes within the F1 hybrids. Different positions and sizes of the AgY477–AgY53B and AgY477 FISH signals with respect to the proximal heterochromatin band can be seen between homoeologous X chromosomes

in F1 ♀*An. gambiae* PIMPERENA × ♂*An. coluzzii* MOPTI (Figure 5A). Different sizes of the proximal heterochromatin band and the rDNA locus can be seen between homoeologous X chromosomes in F1 ♀*An. gambiae* ZANU × ♂*An. coluzzii* MALI (Figure 5B). Also, different sizes of the hybridization signals from AgY477 and 18S rDNA can be seen between homologous X chromosomes in F1 ♀*An. coluzzii* MOPTI × ♂*An. gambiae* KISUMU (Figure 5C).

**Figure 5.** Variation in the pattern of heterochromatin blocks, rDNA loci, and satDNA location on the X chromosomes in F1 female hybrids between *An. gambiae* and *An. coluzzii.* (**A**) FISH of AgY477–AgY53B (green) and AgY477 (red) in F1 ♀*An. gambiae* PIMPERENA × ♂*An. coluzzii* MOPTI. (**B**) FISH of AgY477 (red) and 18S rDNA (green) in F1 ♀*An. gambiae* ZANU × ♂*An. coluzzii* MALI. (**C**) FISH of AgY477 (red) and 18S rDNA (green) in F1 ♀*An. coluzzii* MOPTI × ♂*An. gambiae* KISUMU. Scale bar = 2 μm.

Using a custom MATLAB script [56], the position of the satDNA repeats with respect to the proximal heterochromatin band was mapped on the X chromosomes in *An. gambiae* and *An. coluzzii* strains. Because sequences of AgY477–AgY53B and AgY477 overlap, it is technically unfeasible to distinguish between the signals when mapping their distance from the proximal heterochromatin band. Therefore, we combined the fluorescence signals of AgY477–AgY53B and AgY477 and mapped them with respect to the fluorescence DAPI peak of the heterochromatin. The advantages of our automated analysis are twofold. First, fluorescent signals in each image possess a degree of overlap, which masks the order of fluorescent peaks from centromere to telomere; however, the order of peaks is easily detected using our automated analysis. Second, observation alone is insufficient to determine the distance between fluorescent peaks, which are generally separated by fewer than 10 pixels. Determining the distance between nearby peaks is made tractable in our pipeline; however, even this computational approach remains limited by microscopic resolution. Images of the mitotic sex chromosomes were captured using 1000× magnification; neighboring pixels at this resolution represent a distance of 0.05 μm. Thus, peaks spaced closer than 0.05 μm were prone to falsely inverted order. Nonetheless, peak orders were generally resolvable by aggregating multiple images averaged for multiple individuals of each strain. We measured the distance from satellites to the fluorescence DAPI peak and the linear order of the peaks. For example, the order of the satDNA and DAPI peaks is switched between *An. gambiae* PIMPERENA and *An. coluzzii* MOPTI (Figure 6A). Distribution patterns of signal intensities for satellites and DAPI along the X chromosomes also differ between *An. gambiae* PIMPERENA and *An. coluzzii* MOPTI (Figure 6B). Our mapping results revealed that

the strains differed in the pattern of satDNA and DAPI peak relative positions and distance on the X chromosome (Figure 6C). Clustering analysis showed that *An. gambiae* PIMPERENA and *An. coluzzii* MALI depicted the same pattern of satDNA, followed by the DAPI band, while *An. coluzzii* MOPTI, *An. coluzzii* SUA, and *An. gambiae* ZANU clustered together showing the DAPI peak followed by the satDNA locus (Figure 6D). Interestingly, *An. gambiae* KISUMU had an almost 1:1 ratio of the direct and the reverse satDNA-DAPI band order indicating that this strain is maintaining high polymorphism in their X chromosome heterochromatin. Thus, our data shows the dynamism of the heterochromatin at the molecular level in the important malaria vectors. We found no evidence for clustering the strains by species, indicating that this heterochromatin variation is shared between *An. coluzzii* and *An. gambiae* either as ancestral polymorphism or by ongoing hybridization between the incipient species.

**Figure 6.** Polymorphism of the X heterochromatin structure among the *An. gambiae* and *An. coluzzii* strains. (**A**) Distance and order frequency of FISH signal and DAPI band intensity peaks in *An. gambiae* PIMPERENA and *An. coluzzii* MOPTI. (**B**) Distribution patterns of signal intensities for satellites and DAPI along the X chromosomes in *An. gambiae* PIMPERENA and *An. coluzzii* MOPTI. (**C**) Relative positions of satellite peak distance and width with respect to the proximal heterochromatin band for six strains of *An. gambiae* and *An. coluzzii*. (**D**) Clustering of the *An. gambiae* and *An. coluzzii* strains based on the relative positions of satellite peak distance and width with respect to the proximal heterochromatin band.

We also compared the average fluorescence intensities of the proximal and distal X chromosome heterochromatin between three strains of *An. gambiae* and three strains of *An. coluzzii*. a statistical analysis, using Student's t-test, revealed no specific pattern when strains were compared with respect to the species. Most strains were significantly different from each other in their fluorescence intensities of the proximal band (*P* < 0.05), except for SUA-PIMPERENA (*P* = 0.06), SUA-ZANU (*P* = 0.16), ZANU-PIMPERENA (*P* = 0.61), and MALI-KISUMU (*P* = 0.74). MOPTI was significantly different from every other strain (Figure S2A). Comparison of fluorescence intensities of the distal heterochromatin band revealed a similar pattern for pairwise comparison between strains of both species. Most strains were significantly different from each other (*P* < 0.05), except for PIMPERENA-KISUMU (*P* = 0.07), ZANU-MALI (*P* = 0.11), and KISUMU-SUA (*P* = 0.89) (Figure S2B). Overall, the fluorescence intensities of both X chromosome heterochromatin bands vary among the strains, regardless of their species identity.

#### *3.5. X Chromosome Idiograms for Species of the An. gambiae Complex*

To construct idiograms, we measured the lengths of the chromosomes (Table S3) and conducted a statistical pairwise Tukey's test between the sibling species. Our analysis revealed that the total X chromosome lengths were significantly different for all species pairs, except between *An. quadriannulatus* and *An. arabiensis* (*P* = 0.64). Due to the possibility that the X chromosome length data violated the assumption of equal variance, we repeated statistical analyses using a nonparametric Kruskal–Wallis rank sum test followed by a Dunn's test. These analyses resulted in qualitatively identical results (Figure 7, Table S4). The polytene X chromosomes in all these species are similar in length; hence, the difference in mitotic X chromosomes can be attributed to the variation in heterochromatin.

**Figure 7.** Lengths of the metaphase chromosomes in species from the *An. gambiae* complex.

We also conducted a statistical pairwise Tukey's test for autosomes between the sibling species. We found that, unlike the X chromosome, the autosomal lengths do not significantly differ between the species (Figure 7, Table S5). This finding demonstrates the contrasting patterns of the sex chromosome and autosome evolution and highlights the rapids evolutionary changes in the X chromosome heterochromatin.

Based on mitotic chromosome measurements, banding patterns, and repeat mapping, we constructed X chromosome idiograms for *An. gambiae, An. coluzzii, An. arabiensis, An. quadriannulatus,* and *An. merus*. FISH with the genomic fragment AgY477–AgY53B revealed its multiple locations in the X chromosome heterochromatin bands in *An. merus* but a single location in the pericentric heterochromatin in other sibling species. These features were placed on respective idiograms for easy comparison of molecular features within heterochromatin among species (Figure 8). The identified differences in the X chromosome size and heterochromatin structure are consistent with *An. gambiae, An. coluzzii, An. arabiensis, An. quadriannulatus* grouping together while *An. merus* being a separate lineage in the species phylogeny [18] (Figure 2B). The *An. merus* X chromosome has the largest amount of heterochromatin among the X chromosomes of sibling species. Moreover, *An. merus* is the only species in this study with the rDNA locus located in the L arm of the X chromosome suggesting a fixed pericentric inversion that differentiate *An. merus* from the other species. The developed

idiograms will facilitate future studies of genetic variation in repeat-rich heterochromatic regions of malaria mosquitoes from the *An. gambiae* complex.

**Figure 8.** Idiograms of metaphase karyotypes of species from the *An. gambiae* complex. (**A**) An idiogram of metaphase karyotypes of *An. gambiae* and *An. coluzzii*. (**B**) Comparison of the X chromosome idiograms among species from the *An. gambiae* complex. Left and right arms are labeled with L and R, respectively. Black bands correspond to condensed heterochromatin. Dark gray bands correspond to diffuse heterochromatin. White areas represent the rDNA locus. Light gray areas show euchromatin. The area of constriction represents the putative centromere. Mapping of the BAC clones and satDNA repeats AgY53A and AgY53C were performed only with *An. coluzzii* or *An. gambiae*. Two different positions of AgY477–AgY53B on the X chromosome of *An. gambiae* and *An. coluzzii* represent polymorphism among strains.

#### **4. Discussion**

Mosquitoes, with their small number of diploid chromosomes, 2*n* = 6, present a convenient model to investigate the molecular organization of their genomes and chromosome evolution [58]. Since polytene chromosomes are well-developed in *Anopheles* mosquitoes, they are often preferred over mitotic chromosomes for gene mapping [40,59]. Analysis of polytene chromosomes has been useful for identifying genomic variations in euchromatin. Comparative genome mapping within the *Anopheles* genus has demonstrated that the euchromatic portion of the X chromosome is the fastest evolving by genomic rearrangements [42,60]. However, under-replication of heterochromatin in polytene chromosomes makes them less suitable for mapping repetitive elements of the genome. In this study, we utilized metaphase mitotic chromosomes for repeat mapping and for studying heterochromatin variation among species in the *An. gambiae* complex. Since the genome of *An. gambiae* was first published [34], polytene chromosomes have been used to improve the heterochromatin assembly and to characterize its genetic content [38–40]. Here, we established the correspondence of pericentric heterochromatin between the metaphase and polytene X chromosome of *An. gambiae*, and demonstrated that at least half of the *An. gambiae* X chromosome length is heterochromatic. Future long-read sequencing and assembly of the heterochromatin in malaria mosquitoes may yield important insights into structural and functional organization of the sex chromosomes, as demonstrated for *Drosophila* [61].

Our comparative analysis of the X chromosome pericentric heterochromatin identified qualitative differences among sibling species, *An. arabiensis*, *An. merus, An. quadriannulatus,* and *An. coluzzii* or *An. gambiae*, with postzygotic isolation. The identified fixed variations included the number and pattern of heterochromatic bands and chromosomal location of the satDNA repeats. A common emerging theme from studies performed in *Drosophila* suggests that heterochromatin plays an important role in hybrid incompatibility [62–66]. Rapidly evolving heterochromatic repeats and heterochromatin-binding proteins are considered major players in the evolution of intrinsic reproductive isolation. Reproductively isolated *Drosophila* species have different satDNA content and localization within heterochromatin [64,67]. Evolutionary forces lead to a great variation in the copy number and types of heterochromatin repeats between closely related species. As repetitive DNA elements replicate, mechanisms such as unequal crossover, rolling circle replication, and segmental duplication become sources of genome evolution [68,69]. Thus, our study supports observations from other organisms that sex chromosomes have a propensity to accumulate species-specific differences in heterochromatin [70].

SatDNA repeats are seen as prime candidates to trigger genome instability in interspecies hybrids [71]. They have been implicated in disruption of chromosome pairing, abnormal heterochromatin packaging, and selfish post-meiotic drive substitutions in *Drosophila* hybrids [67]. Disruption of chromosome pairing and synapsis is observed frequently in interspecies hybrids of various organisms. Interestingly, sex chromosomes more often than autosomes drive speciation and hybrid incompatibilities [70]. For example, pairing of X and Y chromosomes is more adversely affected than that of autosomes during the prophase I in male hybrids between Campbell's dwarf hamster and the Djungarian hamster [72]. a recent work has clarified that the autosomes of male hybrids between these hamster species undergo pairing and recombination normally as their parental forms do, but the heterochromatic arms of the X and Y chromosomes show a high frequency of asynapsis and recombination failure [73]. It was proposed that asynapsis of heterospecific chromosomes in prophase I may provide a recurrently evolving trigger for the meiotic arrest of interspecific F1 hybrids of mice [74,75]. Our recent study showed that meiotic failures in sterile F1 ♀*A. merus* × ♂*A. coluzzii* hybrids are accompanied by the disruption of sex chromosome pairing and insufficient chromosome condensation. These cytogenetic abnormalities lead to a premature equational division and formation of diploid immotile sperm [14]. Demonstrated here, qualitative inter-species differences in heterochromatin suggest that they may be responsible for incompatibilities of sibling species from the *An. gambiae* complex.

Unlike species with postzygotic isolation, *An. coluzzii* and *An. gambiae* do not show fixed differences in the pattern of the X chromosome pericentric heterochromatin. An early study observed an intra-specific polymorphism of the X chromosome heterochromatin among lab strains of *An. gambiae* [31]. The described polymorphism was based on the Hoechst staining of the heterochromatin. Here, we studied the pattern of heterochromatic bands or location of satDNA in multiple strains of *An. coluzzii* and *An. gambiae*, incipient species with prezygotic isolation. We found that, although the strains may differ from each other, the species do share variation in the relative positions of the fluorescence intensity peaks for the satDNA repeats and the proximal DAPI-positive heterochromatin band of the X chromosomes. The mechanism behind this inversion-like genetic polymorphism in malaria mosquitoes could be a rapid differential amplification of satellite repeats or actual structural rearrangement in the sex chromosome heterochromatin. It is possible that further evolution of heterochromatin in *An. coluzzii* and *An. gambiae* will contribute to higher genetic differentiation, and will eventually lead to postzygotic reproductive isolation between some populations of these important disease vectors. Extensive studies of *Drosophila* species show that heterochromatin, in general, and satellite DNA, in particular, can be a source of tremendous genetic variation [76–80]. Heterochromatin can also modulate differential gene expression and cause variable phenotypes, including differences in immune response [81,82]. It is possible that changes in repetitive sequences can have fitness consequences upon which selection will act. a population genomics study in *D. melanogaster* found that satDNA repeats often show population differentiation, but the population structure inferred from overall satellite quantities does not recapitulate the expected population relationships based on the demographic history of this species [79]. Moreover, some satellites have likely been involved in antagonistic interactions, as inferred from negative correlations among them.

The rDNA locus is immediately adjacent to the heterochromatin bands of the X chromosome. Its variation in size, among and within species, suggests polymorphism of the rDNA copy number. Natural polymorphism in the number of rDNA copies has been recorded in vertebrate and invertebrate animals, both among and within species [83]. Associations between the heterochromatin or rDNA variation and organismal phenotypes have also been demonstrated. a study in *Drosophila* showed that heterochromatin formation prolongs lifespan and controls ribosomal RNA synthesis [84]. Another study demonstrated that rDNA copy number decreases during aging, and that this age-dependent decrease in rDNA copy number is transgenerationally heritable [85]. In humans, the number of rDNA repeats can vary from 250 to 670 copies per diploid genome [86], and the changes in rDNA copy number can affect genome-wide gene expression [87]. Natural variation in rDNA and heterochromatin may contribute to genome evolution, formation of reproductive barriers, and eventually to diversification of species [88–90]. Genomics studies of the heterochromatin in natural populations of *An. coluzzii* and *An. gambiae* can determine the actual levels of polymorphism and divergence in satDNA and rDNA copy number variation. These studies may also identify heterochromatin variations that correlate with specific environmental adaptations or behaviors of the malaria vectors.

#### **5. Conclusions**

We discovered a new type of shared cytogenetic polymorphism in the incipient species, *An. gambiae* and *An. coluzzii*—an inversion of the satDNA location with respect to the proximal heterochromatin band. This finding suggests mechanisms of rapid differentiation of the sex chromosome heterochromatin during evolution: genomic rearrangement or differential amplification of heterochromatic repeats. Future genome sequencing using long-read technologies will be crucial for determining the precise nature of the observed polymorphism. Our study also demonstrated X chromosome heterochromatin divergence among mosquito species with post-zygotic isolation, which manifests in species-specific localization of repetitive DNA as well as size and number of heterochromatin bands. The identified differences in heterochromatin support the basal phylogenetic position of *An. merus* and point to the role of heterochromatin in speciation. The differences in molecular organization of the sex chromosome heterochromatin may impair meiotic synapsis during meiosis in sterile inter-species male hybrids. We suggest that inter-species divergence of heterochromatin structure represents a cytogenetic threshold that triggers the evolutionary shift from prezygotic to postzygotic isolation. The new idiograms developed here for the sibling species will aid in future studies of heterochromatin organization and evolution in these malaria mosquitoes.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/3/327/s1, Figure S1: FISH of 18S rDNA with the X chromosomes in strains of *An. gambiae* and *An. coluzzii*. (**A**) *An. gambiae* strains: PIMPERENA, ZANU, KISUMU. (**B**) *An. coluzzii* strains: MALI, SUA, MOPTI. Scale bar = 1 μM. Red = 18S rDNA, Blue = DAPI., Figure S2: Fluorescence intensities of X chromosome heterochromatin in three strains of *An. gambiae* and three strains of *An. coluzzii*. (**A**) Fluorescence intensities of the proximal heterochromatin band. (**B**) Fluorescence intensities of the distal heterochromatin band., Table S1: Mosquito strains used in the study of the X chromosome heterochromatin., Table S2: Probes and primer sequences used for FISH., Table S3: Length of mitotic chromosomes in species of the *An. gambiae* complex. Table S4: Statistical analyses of the X chromosome lengths using a nonparametric Kruskal-Wallis rank sum test followed by a Dunn's test., Table S5: Statistical analyses of the chromosome lengths using Tukey's honestly significant difference test.

**Author Contributions:** Conceptualization, M.V.S., and I.V.S.; methodology, A.S., N.A.K., V.A.T., and I.V.S.; software, N.A.K.; validation, A.S., M.V.S., and I.V.S.; formal analysis, A.S., N.A.K. and I.V.S.; investigation, A.S., N.A.K., and V.A.T.; resources, I.V.S.; data curation, A.S., N.A.K, M.V.S., and I.V.S.; writing—original draft preparation, A.S., M.V.S., and I.V.S.; writing—review and editing, I.V.S.; visualization, A.S., N.A.K., V.A.T.; supervision, M.V.S., and I.V.S.; project administration, I.V.S.; funding acquisition, M.V.S. and I.V.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The chromosome mapping and analyses were supported by the US NIH NIAID grants R21AI099528 and R21AI135298 to I.V.S., the graduate program in Genomics Bioinformatics and Computational Biology (GBCB) of Virginia Tech to N.A.K., the Fralin Life Sciences Institute, and the USDA National Institute of Food and Agriculture Hatch project 223822 to I.V.S. The development of chromosome idiograms was supported by the Russian Science Foundation grant No. 19-14-00130 to M.V.S.

**Acknowledgments:** The following reagents were obtained through the Biodefense and Emerging Infections Research Resources Repository, NIAID, NIH: *An. arabiensis*, Strain DONGOLA 2Ra, 2Rb/b and 3R, MRA-1235, contributed by Ellen M. Dotson; *An. coluzzii*, Strain MALI-NIH, Eggs, MRA-860, contributed by Nora J. Besansky; *An. coluzzii*, Strain MOPTI, Eggs, MRA-763, contributed by Gregory C. Lanzaro; *An. coluzzii*, Strain SUA2La, Eggs, MRA-765, contributed by Alessandra della Torre; *An. gambiae*, Strain KISUMU1, Eggs, MRA-762, contributed by Vincent Corbel; *An. gambiae*, Strain PIMPERENA (S form), Eggs, MRA-861, contributed by Nora J. Besansky; *An. gambiae*, Strain ZANU, MRA-594, contributed by Hilary Ranson and Frank H. Collins; *An. merus*, Strain MAF, MRA-1156, contributed by Maureen Coetzee; and *An. quadriannulatus*, Strain SANGWE, Eggs, MRA-1155, contributed by Willem Takken. We thank Semen Bondarenko for his help with statistical analysis using R, Jiangtao Liang for useful discussions, as well as Jean Clarke, Kristin Rose, and Janet Webster for proofreading the text.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Turtle Insights into the Evolution of the Reptilian Karyotype and the Genomic Architecture of Sex Determination**

#### **Basanta Bista \* and Nicole Valenzuela \***

Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA

**\*** Correspondence: bbista@iastate.edu (B.B.); nvalenzu@iastate.edu (N.V.); Tel.: +1-515-294-1285 (N.V.); Fax: +1-515-294-1337 (N.V.)

Received: 1 April 2020; Accepted: 8 April 2020; Published: 11 April 2020

**Abstract:** Sex chromosome evolution remains an evolutionary puzzle despite its importance in understanding sexual development and genome evolution. The seemingly random distribution of sex-determining systems in reptiles offers a unique opportunity to study sex chromosome evolution not afforded by mammals or birds. These reptilian systems derive from multiple transitions in sex determination, some independent, some convergent, that lead to the birth and death of sex chromosomes in various lineages. Here we focus on turtles, an emerging model group with growing genomic resources. We review karyotypic changes that accompanied the evolution of chromosomal systems of genotypic sex determination (GSD) in chelonians from systems under the control of environmental temperature (TSD). These transitions gave rise to 31 GSD species identified thus far (out of 101 turtles with known sex determination), 27 with a characterized sex chromosome system (13 of those karyotypically). These sex chromosomes are varied in terms of the ancestral autosome they co-opted and thus in their homology, as well as in their size (some are macro-, some are micro-chromosomes), heterogamety (some are XX/XY, some ZZ/ZW), dimorphism (some are virtually homomorphic, some heteromorphic with larger-X, larger W, or smaller-Y), age (the oldest system could be ~195 My old and the youngest < 25 My old). Combined, all data indicate that turtles follow some tenets of classic theoretical models of sex chromosome evolution while countering others. Finally, although the study of dosage compensation and molecular divergence of turtle sex chromosomes has lagged behind research on other aspects of their evolution, this gap is rapidly decreasing with the acceleration of ongoing research and growing genomic resources in this group.

**Keywords:** sex chromosome evolution; karyotypic and molecular evolution; genomic architecture of sexual development; adaptation and natural selection; genome organization and function; nucleolar organizing region; dosage compensation; faster-X and faster-Z; climate change and global warming; reptilian vertebrates

#### **1. Introduction**

A paramount event in the history of life is the early evolution of sexual reproduction. Sex, which is nearly universal in eukaryotes, joins half of two parental genomes from gametes produced by meiosis into unique combinations, contributing to the phenotypic diversity that is naturally selected during adaptive evolution [1]. However, the evolution of gametes into two types (small and mobile versus large and immobile) that led to the evolution of male and female functions meant that a developmental decision is necessary for individuals to become sperm-producing males, egg-producing females, or hermaphrodites that produce both. In multicellular organisms, where separated sexes evolved often, and particularly in animals, this decision is controlled most frequently by a pair of specialized chromosomes (the sex chromosomes) [2]. In other animals, this decision is plastic and occurs in

response to environmental cues (ESD), of which the most common in vertebrates is temperature (TSD) [3]. However, a clear explanation for why some vertebrate lineages rely exclusively on sex chromosomes, some exclusively on TSD, and some combine both [4–6], remains elusive, as are the mechanistic trajectories by which species transition between TSD and sex chromosomes. Reptiles are ideal to answer these questions because these evolutionary transitions have occurred repeatedly in this group, leading to varied sex-determining mechanisms. Namely, reptilian sex determination encompasses TSD in crocodilians and tuatara, genotypic sex determination (GSD) in snakes, and either TSD or GSD in turtles and lizards (some lizards even combine GSD and TSD) [3]. Furthermore, TSD and GSD is not the same in all reptiles [3,7]. Rather, TSD reptiles respond to temperature in one of three patterns: (a) by producing males at colder and females at warmer conditions, (b) the opposite, or (c) by producing males at intermediate values and females above and below (reviewed in [7]). GSD reptiles also vary, as some possess female- and others male-heterogametic sex chromosomes, including homomorphic or heteromorphic ZZ/ZW, XX/XY, X1X2Y, Z1Z2W, and ZW1W2 systems [3,8]. Here, we concentrate on turtles, an emerging model group with growing genomic resources, and explore the karyotypic changes that accompanied the evolution of the genomic architecture of sexual development during transitions between TSD and GSD in this group. Hereafter, we will refer to XX/XY ZZ/ZW systems as XY and ZW, respectively.

#### **2. Sex Chromosome Evolution**

Theoretical models suggest that vertebrate sex chromosomes evolved from ESD or from polygenic sex determination [2] by co-opting an autosomal pair of chromosomes harboring a sex-determining locus. This initial step is expected to trigger a cascade of events that leads to the eventual degeneration of the heterogametic Y or W due to their unusual mode of transmission through a single sex [2]. The model proposes that first, recombination is reduced adaptively via chromosomal inversions or via selection on a modifier locus favoring recombination suppression [9], to preserve the linkage disequilibrium between the sex-determining locus and sexually antagonistic genes [10–12] (e.g., Y genes that favor males but are harmful if they were expressed in females [13]). Consequently, mildly deleterious mutations accumulate in the non-recombining region via Muller's ratchet or genetic drift, causing Y or W genes to lose their function and ultimately disappear altogether [14]. Moreover, strong selection acting on the sex-determining region can induce background selection, genetic hitchhiking and selective sweeps that reduce genetic diversity in adjacent regions [2]. This degenerative process is prevented in the pseudo-autosomal regions of the sex chromosome where recombination remains intact [2]. Such extensive degeneration of the W and Y may constitute an evolutionary trap from which TSD evolution is difficult [15], because this transition requires traversing a valley in the fitness landscape (sensu [16]) where individuals are produced that carry suboptimal or lethal WW or YY genotypes. This problem may be averted when sex chromosomes are virtually homomorphic [2,17]. But do turtles follow this theoretical evolutionary trajectory? Valuable existing information sheds light on the evolutionary history of turtle sex chromosomes, despite how relatively little is known about their content compared to mammals or birds, partly because only a single GSD turtle genome assembly has been published (*Pelodiscus sinensis* [18]), and its sex chromosome scaffolds remain unmapped.

#### **3. Sex Chromosomes were Gained and Lost Multiple Times in Turtles**

Most turtles possess TSD, a system that appears ancestral to turtles, reptiles, and likely to all amniotes based on the most complete phylogenetic comparative analyses possible to date given the existing information [15,19–21]. These species-level phylogenetic analyses revealed that the history of turtle sex determination is marked by the retention of an ancestral TSD mechanism in most chelonian lineages, punctuated by few transitions to sex chromosomes (five so far identified), and two potential reversals from GSD back to TSD where sex chromosomes may have been lost [19,22] (Figure 1). However, these evolutionary transitions are not all created equal, and instead, they have followed unique trajectories accompanied by profound genomic modifications, as will be described.

**Figure 1.** Phylogenetic relationships of turtles with a known sex-determining mechanism (SDM) and diploid number (2N). Timing of hypothesized gains and losses of sex chromosomes correspond to the timing of split of the colored branches. Diamonds indicate hypothesized transitions in sex determination. Transitions 1–5 are the most parsimonious. Transitions 6–7 and 8–9 represent alternative hypotheses to transitions 4 and 5, respectively, proposed based on ancestral reconstruction using maximum likelihood [19]. Data from [3,19,23–26].

Indeed, of the >355 recognized species of turtles to date [27], sex determination is known in only 101 of them. These include 31 GSD species, of which 27 have a characterized sex chromosome system that vary in age, heterogamety, homology, shape, and size (turtles possess macro and micro chromosomes [28] (Figures 1 and 2). Four other GSD turtles were identified as such because they produce 1:1 sex ratios across incubation temperatures, thus ruling out TSD [4], but their heterogamety remains unknown. This gap exists because few studied turtles have large heteromorphic sex chromosomes easily visualized using classical cytogenetic techniques [29–31]. Thus, the detection of virtually homomorphic sex chromosomes in other turtles requires higher-resolution molecular cytogenetic approaches, such as comparative genome hybridization (CGH), which only became available recently [25,26,32–36]. Male or female heterogamety has been identified recently in some GSD turtles by PCR or qPCR amplification of molecular markers first developed in closely related taxa [23,24,37].

From the distribution of these sex chromosome systems in the turtle tree of life, male heterogamety was inferred to have evolved at least three times in the suborder Cryptodira (the turtles who hide their necks inside their shell) and at least once in the suborder Pleurodira (the turtles who bend their necks to the side outside their shell) (Figure 1). These events gave rise to the 16 turtle XY systems identified thus far: 11 in the pleurodirans within a single family Chelidae—*Acanthochelys radiolata, Elseya novaeguineae*, six in the genus *Chelodina,* and two in *Emydura*, [25,31–33], plus five from three cryptodiran families—*Siebenrockiella crassicollis*, two in *Glyptemys*, and two in *Staurotypus* [23,29,30,36,38].

Curiously, unlike the repeated independent evolution of XY in chelonians, a single origin of female heterogamety is known, in the cryptodiran family of softshell turtles (Tryonichidae). Indeed, ZW has been documented in 10 soft-shell turtles, two cases (*P. sinensis* and *Apalone spinifera*) by molecular cytogenetic data [35,39], and eight others by PCR/qPCR— three species in the genus *Lissemys*, three in *Nilssonia*, plus *Chitra indica* and *Amyda cartilaginea* [24]. The report of a non-softshell turtle (*Pangshura smithii*) having an independently evolved heteromorphic ZW was debunked recently, as it was due to a karyotyping error [40,41]. Therefore, it is unclear if *P. smithii* has GSD with homomorphic sex chromosomes or TSD. Similarly, the co-existence of sex chromosomes and TSD was empirically refuted in *P. sinensis* and *Chrysemys picta* [42,43], whereas thermal sex reversals occur in several lizards [44–48]. However, why XY systems are more likely to evolve in turtles than ZW systems is unclear, although this pattern matches theoretical models (reviewed in [2]). Some of these models predict that when sex chromosomes arise during the evolution of separate sexes in a population of hermaphrodites, selection favors first the spread of a recessive male-sterility mutation that produces females when homozygous, and then selection favors the spread of a dominant female sterility mutation in the hermaphrodites that produces males, leading to the formation of a XY system (reviewed in [2]). Alternatively, sexual selection, which is stronger in males in general, may favor the fixation of a major sex-determining factor beneficial to the heterogametic sex, thus favoring the evolution of an XY system (reviewed in [2]).

From a karyotypic perspective, the first difference that is noticeable between the 13 turtle sex chromosome systems for which cytogenetic data are available (Figure 2), is that five are micro-sex-chromosomes (μSC) (in the softshells *A. spinifera, P. sinensis* and in the Australian chelids of the genus *Chelodina*), whereas the other eight are macro-sex-chromosomes (MSC). The second notable karyotypic characteristic, is that four independent evolutionary events resulted in heterogametic sex chromosomes (Y or W) in turtles that are larger than the X or Z (in the genera *Emydura*, *Glyptemys*, *Siebenrockiella*, and in the family Trionychidae), whereas only two instances are known that resulted in a smaller Y than X (in the genera *Acanthochelys* and *Staurotypus*), and one case is known of the evolution of virtually homomorphic XY (in the genus *Chelodina*) (Figure 2). Cytogenetic data revealed that the larger size of the Y or W compared to the X or Z is due to the accumulation of repeat sequences, but the particular details vary among lineages. For instance, the repeats involved in this enlargement of the Y or Z encompass an expanded Z-linked NOR in trionychids [35], expanded heterochromatin (C- and G-bands) in the Y of *Glyptemys* [36] and *Siebenrockiella* [30], and accumulation of microsatellites in the Y of *Emydura* and *Elseya* [25,26,49]. Notably, the male-specific region of the *Emydura* Y likely evolved by an Y-autosome fusion that resulted in the translocation of an ancestral micro-Y chromosome present in *Chelodina* [25], which contributes to the larger Y in *Emydura*. This is the only case of Y-autosome fusion known in chelonians, whereas Y-autosome fusions are well documented in squamate reptiles and fish, and tend to be more common than X-autosome, Z-autosome, or W-autosome fusions [8]. Also remarkably, the larger size of *Staurotypus* X is not due to the degeneration of the Y but to the

translocation of the NOR to the X, such that the Y in this turtle represents the ancestral condition and the X is the derived sex chromosome [50]. Taken together, the current data indicate that turtle sex chromosome differentiation does not always involve the shrinkage of the Y or W proposed in classical models of sex chromosome evolution, and add support to the alternative that sex chromosomes of varying degrees and type of differentiation are evolutionary stable states [1].


**Figure 2.** Karyotypic characteristics of turtle sex chromosomes (SC) from species studied cytogenetically. Macro = macro-chromosomes; Micro = micro-chromosomes; Streaked regions = centromeres; blue = nucleolar organizing region (NOR); red = male-specific regions in the Y or female-specific region in the W, detected via CGH; as described in the text and reviewed in [28].

Despite the multiple gains of sex chromosomes identified in turtles, some evidence suggests that reversals back to TSD from GSD may have occurred also where sex chromosomes would have been lost, once in each chelonian suborder. In particular, two putative evolutionary reversals from GSD to TSD were identified by a maximum likelihood reconstruction of ancestral sex-determining mechanisms (SDMs): (1) one in the monotypic family Carettochelyidae (suborder Cryptodira) which split from the ZW softshell family Trionychidae in the Mesozoic ~148 Mya (http://www.timetree.org/), and (2) another in the superfamily Pelomedusoidea (suborder Pleurodira) which split from the XY

family Chelidae at around a similar time ~144 Mya (http://www.timetree.org/) [19] (Figure 1). If true, this would mean that the TSD system in *Carettochelys insculpta* and Pelomedusoidea are secondary and independently derived traits, and might be expected to differ mechanistically from the ancestral TSD retained in most other turtles. However, these reversals are questionable given that their inference is not parsimonious [20], and that reanalysis of the same dataset by maximum likelihood reconstruction of ancestral SDMs using an updated software package did not recover the same result [21]. Yet, the significant intensification of the rate of molecular evolution of sexual development genes observed in both Carettochelyidae and Podocnemididae [22], above and beyond the already accelerated rate seen in their GSD sister lineages, supports the notion that GSD-to-TSD reversals might have taken place. Moreover, the branching of these lineages suggests that these reversals might have occurred soon after GSD evolved in the common ancestor of Carettochelyidae and Trionychidae, and in the common ancestor of Podocnemididae and Chelidae, likely before extensive differentiation of their sex chromosomes accrued that would have been harder to overcome [1,15]. Nonetheless, the support for these putative reversals remains relatively scant and stronger inferences require further research.

#### **4. Independent and Convergent Evolution of Turtle Sex Chromosomes**

The existing data indicate that not all turtle sex chromosomes are homologous to each other [28]. Indeed, partial data on the gene content of the sex chromosomes of some cryptodiran turtles for which information is available (*Staurotypus triporcatus*, *Glyptemys insculpta*, *Siebenrockiella crassicollis,* and softshells such as *A. spinifera* and *P. sinensis*) revealed that the evolution of their XY and ZW chromosomes occurred by the co-option of different ancestral reptilian autosomes [28]. Yet, these turtle sex chromosomes may share a deeper homology with blocks of a more ancient proto sex chromosome [28]. In particular, gene content indicates that *S. triporcatus* XY share homology with chicken's Z (GGA-Z) and the ZW of *Gekko hokouensis*lizards, and with a block of the frog *Xenopus tropicalis* chromosome 1 (XTR1) [28,34,51]. This XTR1 block contains *Dmrt1*, chicken's sex-determining gene, whose action is dosage-dependent [28,51]. On the other hand, the ZW of softshells (e.g., *A. spinifera* and *P. sinensis*), which derive from a single gain in the common ancestor of the softshell family *Trionychidae* [24,35], are homologous to each other, to chicken GGA15, and to the X of *Anolis carolinensis* lizards [28,38], and share partial homology with a second block of XTR1 [28].

In yet another evolutionary twist, the only two turtle lineages known to have recruited the same pair of ancestral autosomes, *G. insculpta* and *S. crassicollis*, did so independently [28]. Indeed, the XY in these two species are homologous to each other, to GGA5, and to XTR5, which contains the male development gene *Wt1* [28,36]. The notion that these two XY systems represent convergent evolution and followed independent trajectories is also supported by a secondary homology shared between the short arm of *G. insculpta* (but not *S. crassicollis*') XY and GGA26, which surprisingly, is homologous to a third block of XTR1 [28,36]. To make matters even more intriguing, another (fourth) block of XTR1 shares homology with the sex chromosomes of other vertebrates, namely, the X of therian mammals including humans, and the Z of the lizard *Takydromus sexlineatus* [28,52]. Moreover, XTR1 is also homologous to the sex chromosomes of three other anuran amphibians [53].

Thus, the origin of these independently derived sex chromosomes appears to be non-random. Instead, all the data combined suggest that certain ancestral chromosomes constitute proto-sex chromosomes that are more likely to take on a role as sex chromosomes in various derived taxa (e.g., XTR1 or a putative ancestral XTR1 + XTR5 from which many turtle chromosomes derive) [28]. It is noticeable that a common ancestral chromosome can follow evolutionary trajectories leading to both male or female heterogamety (e.g., softshell turtles ZW versus *Anolis* lizard XY; *Staurotypus* turtles XY versus chicken's and *Gekko* lizards' ZW, etc.). Perhaps chromosomes which are rich in genes involved in sexual development are better suited to take on a role as sex chromosomes, and perhaps both the evolution of these sex chromosomes and their fixation in populations may be facilitated by genomic rearrangements [12]. Additionally, chromosomes with genes resilient to changes in dosage may also

be favored for co-option as sex chromosomes as they may be buffered naturally from the degeneration and gene content depletion that may be inevitable in the evolving Y or W [28,54,55].

Taken together, all evidence permits drawing parallels and contrasts between the sex chromosomes of chelonians and other reptilian lineages. Namely, like other reptiles, turtle sex chromosomes vary in the degree of heteromorphism (Figure 2), and some of them carry the genes of the nucleolar organizing region (NOR) [28]. Just like turtles, both lizards and snakes co-opted various ancestral autosomes as sex chromosomes independently [56–58]. Unlike turtles, however, the repeated evolution of sex chromosomes in squamates (lizards and snakes) occurred not only as transitions from TSD but also as transitions between male and female heterogamety, a process that has not been detected in turtles. For instance, the recent discovery of convergent XY sex chromosomes in pythons and boas refuted the long-held notion that they share the ZW system previously thought to be ubiquitous and homologous across snakes [57]. Also, no case of multiple sex chromosomes is known in turtles, whereas examples exist in squamates [3,58].

#### **5. Ecological and Karyotypic Correlates of the Birth and Death of Turtle Sex Chromosomes**

What are the causes and consequences of the gain or loss of turtle sex chromosomes? The answer to this question is not fully resolved, yet significant strides have been made towards solving this mystery. Some ecological and karyotypic correlates of sex chromosome evolution exist in turtles, including diploid number, climate change, population sex ratios, and longevity. In particular, turtles exhibit high variance in the diploid number of chromosomes ranging from 2N = 28 to 68 (Figure 1). The genomic rearrangements responsible for generating this diversity occurred at a rate 20× higher in turtle lineages that underwent an evolutionary transition in their sex-determining mechanism than in lineages whose sex determination remained static [19]. However, which is the cause and which the consequence remains unclear. For instance, perhaps the chromosomal rearrangements responsible for changes in chromosome number are fertile grounds for changes in sex determination, as they could affect gene expression through direct disruption of genes, changes in regulatory elements or altered positional effects [59]. Further, while chromosomal fusion or fission are large-scale changes that alter chromosome number, smaller-scale rearrangements can also have significant ramifications. For example, inversions induce the repression of recombination between homologs of a chromosomal pair, which is a hallmark of sex chromosome evolution [60]. Such inversions are known in turtles, and include multiple parts of the male-specific region of Y chromosome in *G. insculpta* that contains the male development gene *Wt1*, which might have facilitated the transition from ESD to GSD in this lineage [36]. But some other identified inversions and translocations involving sexual development genes (*Dax1*, *Fhl2*, *Fgf9*, *Sf1* and *Rspo1*) across TSD and GSD turtles do not correlate with transitions in sex determination [61]. Alternatively, perhaps transitions in sex determination trigger molecular changes that render the genome unstable and more susceptible to chromosomal fusion or fission which, in turn, will alter the chromosome number [19]. These pending questions remain the foci of active research.

Climate change is an ecological factor that may trigger such transitions in sex determination in turtles, particularly, the evolution of sex chromosomes from TSD, as an adaptive response to alleviate extreme biased sex ratios caused by steady warming or cooling events [19,62]. Specifically, a theoretical model proposes that ESD could be adaptive in species inhabiting patchy or heterogenous environments where some environments increase the fitness of one sex and other environments increase the fitness of the other sex (provided that offspring cannot predict the environment they enter – nor can their parents, and that individuals from all environments mate at random) [63]. Under these circumstances, the plasticity provided by ESD permits individuals to develop into the sex that is better suited in each environment. However, if environments become homogenous, or in the case of TSD, if climate change becomes directionally warmer or cooler over time, population sex ratios may be skewed. Then, natural selection could favor the evolution of sex chromosomes via the evolution of a masculinizing or a feminizing locus, resulting in a XY or ZW system, respectively [2], which will restore sex ratios closer

to parity. (This model assumes that 1:1 sex ratio is optimal, which is not always the case [62,64,65]. This scenario is supported in turtles by the observation that some lineages known to have experienced a transition in sex determination split from their sister lineages at geological times when global temperatures were near a peak [19]. However, relatively few transitions in sex determination are documented in turtles across repeated bouts of past climate change compared to squamates, the other reptilian clade with labile sex determination. This difference may be attributed to their contrasting life history, particularly their differences in lifespan [21]. Indeed, the greater longevity of turtles would enable them to withstand longer stretches of biased sex ratios caused by directional climate change compared to lizards and snakes, thus explaining the frequent retention of TSD in turtles and the evolution of GSD more readily in squamates [21]. However, the speed of contemporary climate change and the predicted warmer average temperatures that fluctuate more widely pose a challenge to extant TSD turtles in unprecedented ways [66,67].

#### **6. The Architecture of Sex Determination with and without Sex Chromosomes**

The regulatory gene network that underpins the development of males and females is composed of many common elements across vertebrates, but TSD and GSD lineages differ in the level of plasticity of this regulation to environmental inputs [68,69]. Indeed, the signal which initiates the cascade of events leading to sex differentiation is hard coded in the genome within the sex chromosomes in the form of a master sex-determining gene(s), whereas in TSD this developmental decision is triggered by an environmental factor (i.e., temperature) [4]. Thus, because sex chromosomes contain the only consistent genomic differences between the sexes, and reptilian sex chromosomes are diverse, their study illuminates the evolution of master sex-determining genes, and more generally, our understanding of the genetic architecture underlying sexual development in vertebrates. For instance, in therian mammals, the Y-linked *Sry* initiates male differentiation [70], whereas sex determination in birds relies on the dosage of the Z-linked gene *Dmrt1* in ZZ-males versus ZW-females [71]. However, no master sex-determining gene(s) has been discovered in reptiles. Some candidates exist in reptiles, including in chelonians [72–77], and the list is growing thanks to the advent of third generation sequencing and advanced molecular cytogenetics (e.g.,) [28,61,74–78]. Namely, the XY of *S. triporcatus* turtles contains *Dmrt1* [28,78], a gene whose molecular evolution is linked to transitions in sex determination in reptiles [79] and which displays sexually dimorphic expression in TSD turtles ([80] and references therein). Notably, while *Sry* is exclusive to therian mammals, *Dmrt1* or its orthologs or paralogs are master triggers of sex determination in various vertebrates (reviewed in [80]). However, whether *Dmrt1* is the master sex-determining gene in *Staurotypus* is untested. Nonetheless, *Dmrt1* retention in both the X and Y in *S. triporcatus* and in the Z and W of *G. hokouensis* lizards [28] indicates that *Dmrt1* s mode of action likely differs between birds and these reptiles. *Dmrt1* was demonstrated to be important in testicular differentiation of the GSD softshell *P. sinensis* [81], but its autosomal nature [61] invalidates its potential role as a sex-linked master sex-determining gene. Another example of a putative master sex-determining gene in GSD turtles is *Sf1*, a male-development gene that translocated to the ZW of *Apalone* softshells lineage from its ancestral autosomal location [61]. *Sf1* was proposed as a potential candidate for this role based on its monomorphic expression by temperature in *Apalone* softshell turtles, in contrast to its differential expression at male- versus female-producing temperature in the TSD turtle *C. picta* [72,73,76]. A third candidate comes from the XY of *G. insculpta* and *S. crassicollis* turtles, which contain the male development gene *Wt1* [28,36], a transcription factor whose relic thermosensitive expression in early embryos of *Apalone* is countered by the immediate downstream action of *Sf1* [82,83]. Noticeably, the relic thermosensitive expression in *Apalone* of another important sexual development gene, *Dax1*, is also countered by the immediate downstream action of *Sf1* [83].

The identification of top regulators of sexual development remains even more elusive in TSD turtles, because no genomic differences exist between the sexes, and finding candidates requires the analysis of the response of gene regulatory pathways and networks to incubation temperature [82]. Candidate gene approaches, along with transcriptome and methylome analyses, plus initial functional

assays have provided a number of candidates in TSD turtles for an upstream function in sexual development (e.g., [74–77], ruling out elements whose sexually dimorphic expression occurs too late in development to act as top master regulators (e.g., [73,76,82–89]). To complicate matters further, this body of evidence has uncovered significant divergence in the transcriptional patterns of genes in this regulatory network [73,80]. As this growing body of work cannot be summarized properly in the limited space available here, we direct the reader to some excellent reviews on this area (e.g., [68,90,91]), and highlight here only some salient results. Comparative data from *C. picta* (TSD) and *Apalone mutica* or *A. spinifera* (GSD) have provided important insights. In particular, early dimorphic transcription in *C. picta* of genes responsible for the formation of the bipotential gonad and later testicular development in vertebrates (*Sf1* and *Wt1*), rendered these elements as potential activators of *C. picta*'s thermosensitive period (when temperature exerts its strongest effect on population sex ratios) [72,73,82]. Their early expression in *C. picta* coincides with the time when the expression of genes involved in chromatin organization and chromatin modification is enriched [92,93] but also with the dimorphic expression of female-development regulators, such as *Ctnnb1* (β*-catenin*) and its downstream target *Fst* [93]. On the contrary, the later differential expression of *Dax1* [83], *Dmrt1* [80], *Sox9* and *Aromatase* [73] ruled them out for a role as upstream TSD thermal sensors/activators in *C. picta*. An interesting candidate TSD gene is *CIRBP*, which exhibits allelic-specific expression at male- and female-producing temperature in *Chelydra serpentina* turtles (TSD) [75], but which shows thermo-insensitive expression in *C. picta* and *A. spinifera* [93]. Epigenetic regulation of TSD also appears important. For instance, DNA methylation regulates *Aromatase* in *Trachemys scripta* turtles, a key ovarian development gene that is also affected by histone modifications [94,95]. Methylome analysis indicated that some of the thermosensitive responses of the regulatory network of sexual development in TSD turtles is mediated by DNA methylation of additional components other than *Aromatase* [92,96]. And other epigenetic modifications also influence sexual development. For instance, the histone demethylase KDM6B induces the transcription of *Dmrt1* in *T. scripta* [89], a gene important in testicular formation during the thermosensitive period in this turtle [87]. Furthermore, transcriptomic analysis of epigenetic machinery genes suggests that TSD, at least in *C. picta*, is potentially mediated by hormonally controlled epigenetic processes, or by epi-genetically controlled hormonal pathways (via acetylation, methylation, and ncRNAs) [77]. Importantly, the response of the epigenetic machinery genes to temperature indicate that differences in key epigenetic events before the onset of the thermosensitive period may define the divide between TSD and GSD, as represented by *C. picta* and *A. spinifera* [77]. It should be noted that while these growing efforts are helping to resolve the position of these factors in the sexual development cascade, further research is still needed.

#### **7. Consequences of Sex Chromosome Evolution—Dosage Compensation and Faster Molecular Evolution**

The degeneration of the heterogametic sex chromosomes Y and W over evolutionary time [97,98] described earlier can have ill fitness consequences unless these effects are balanced by the evolution of counter mechanisms. For instance, in human, chicken, and multiple animal species with sex chromosomes, only a small fraction of genes remain active in Y or W compared to X or Z [99,100]. This loss of function or physical loss of genes in the Y or W generates a gene dosage imbalance between autosomes and sex chromosomes, and between males and females (i.e., the homogametic sex will have twice as much dosage of X- or Z-genes compared to the heterogametic sex) [2]. However, the balance of dosage is very important for genes to maintain their proper function as they are part of genetic networks. An adaptive solution to this problem evolved in the form of dosage compensation mechanisms that modify the transcription of genes with differential dosage so that their expression is balanced fully or partially, either along the entire sex chromosome (global dosage compensation) or only for some important genes (local dosage compensation) [2,14,101,102].

Earlier animal data suggested that global and complete dosage compensation is more predominant in XY taxa (such as therian mammals), while local and partial dosage compensation is more common in ZW taxa (such as birds), but numerous exceptions discovered over time called this conclusion into question [101–103]. Very little is known about dosage compensation in reptiles, with only three studies published on squamates and none on turtles thus far, despite the fact that the diversity of independently evolved sex chromosomes in closely related reptiles renders them an ideal group to address this issue. Existing reptilian data indicate that *A. carolinensis* lizards utilize a mixed pattern of dosage compensation [104]. Namely, some regions of the X chromosome in *A. carolinensis* show complete dosage compensation and some regions lack complete compensation, indicating that the evolution of complete dosage compensation might be an ongoing process [104]. On the other hand, incomplete dosage compensation is seen in two ZW squamates, i.e., snakes and the Komodo dragon [56,105].

Sex chromosomes and autosomes also differ in ways that influence the rate of divergence of sex-linked genes compared to their autosomal counterparts. Specifically, the Y or W are hemizygous, experience reduced recombination and are inherited through one sex only, whereas the X or Z are inherited through both sexes but spend twice as much time in one sex [1,2]. These differences lead to striking differences in their molecular evolution. Indeed, faster-Z and faster-X divergence is expected because reduced recombination of sex chromosomes facilitates the accumulation of beneficial mutations and the removal of deleterious mutations at higher rates than in autosomes, and because their smaller population size renders them more susceptible to genetic drift [106–108]. While faster rates of divergence of coding sequences are documented in X and Z of many species, including birds (e.g., *Gallus gallus*, *Taeniopygia guttata*), mammals (*Mus castaneus* and *Homo sapiens*) and fruit flies (*Drosophila melanogaster*) [107–112], research in reptiles is in its infancy. Namely, initial studies in turtles examining a very small set of sex-linked genes detected faster-Z in *P. sinensis* but not in *A. spinifera* (both softshell turtles), and slower-X in *S. triporcatus* turtles compared to orthologs in sister taxa where these genes are autosomal [76]. Additionally, a comparison of turtles, crocodilians, squamates, birds and mammals, detected three genes (*Dmrt1*, *Ctnnb1, Ar*) with faster amino acid substitution rates when they are Z-linked (*Dmrt1* and *Ctnnb1*, Z-linked in birds and snakes) but not when they are X-linked (*Ar*, X-linked in mammals), compared to when they are autosomal [22]. Similar to turtles, faster-Z is observed in chicken and other birds [107] whose sex chromosomes are homologous to *Staurotypus* X, and faster-X is supported in *A. carolinensis* [104] whose sex chromosomes are homologous to *Apalone*'s Z. On the contrary, the report of faster-Z in snakes [56] needs revisiting, since it is based on data from *Boa* whose purported ZW chromosomes are now known to be autosomes [57]. The major roadblock to study molecular evolution of sex chromosomes in turtles and other reptiles should be alleviated with the publication of additional chelonian genomes with mapped sex chromosomes.

#### **8. Conclusions**

Reptiles have diverse systems of sex determination as well as sex chromosomes which are unmatched in mammals and birds. Although sex chromosomes and sex determination in many species of reptiles have been studied, the genomic basis of sexual development has yet to be fully characterized in reptiles. Turtles represent a clade where multiple sex chromosomes evolved independently (XY and ZW) from ancestral TSD systems (or were perhaps lost during GSD to TSD reversals), giving rise to sex chromosomes of with varying size, age and homology. Unlike lizards, there is no evidence reported for the influence of environmental factors overriding sex chromosomes in GSD turtles. We argue that the characterization of sex chromosomes and sexual development in GSD turtles might be the key to identifying major players of sexual development including the master sex determining gene(s) in reptiles. Convergent evolution of sex chromosome in *G. insculpta* and *S. crassicollis* where XY systems evolved independently from the same ancestral autosome would offer a great opportunity to identify genes that are co-opted as sex determining genes. Taken together, existing data indicate that turtles support some tenets of classic theoretical models of sex chromosome evolution, while other tenets are countered. For instance, the evolution of some turtle sex chromosomes involves chromosomal rearrangements such as the translocation of sexual development genes, or inversions that may contribute to their divergence, yet that divergence does not always result in a morphologically degenerate Y or W. Finally, advanced technologies like whole genome sequencing, transcriptomics, methylomics, and other epigenetic approaches should improve our understanding of chromosomal structure and content, global and local gene expression and epigenetic signatures, all of which will be vital to decipher the enigmatic evolutionary trajectories of sex chromosomes and to develop a model of sexual development in turtles, reptiles, and vertebrates.

**Author Contributions:** Conceptualization, investigation, writing, review and editing, B.B. and N.V.; visualization, B.B.; supervision and funding acquisition, N.V. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work and the APC were funded in part by National Science Foundation of USA grant IOS-1555999 to N.V.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### **Complex Structure of** *Lasiopodomys mandarinus vinogradovi* **Sex Chromosomes, Sex Determination, and Intraspecific Autosomal Polymorphism**

**Svetlana A. Romanenko 1,\*, Antonina V. Smorkatcheva 2, Yulia M. Kovalskaya 3, Dmitry Yu. Prokopov 1, Natalya A. Lemskaya 1, Olga L. Gladkikh 1, Ivan A. Polikarpov 2, Natalia A. Serdyukova 1, Vladimir A. Trifonov 1,4, Anna S. Molodtseva 1, Patricia C. M. O'Brien 5, Feodor N. Golenishchev 6, Malcolm A. Ferguson-Smith <sup>5</sup> and Alexander S. Graphodatsky <sup>1</sup>**


Received: 25 February 2020; Accepted: 27 March 2020; Published: 30 March 2020

**Abstract:** The mandarin vole, *Lasiopodomys mandarinus*, is one of the most intriguing species among mammals with non-XX/XY sex chromosome system. It combines polymorphism in diploid chromosome numbers, variation in the morphology of autosomes, heteromorphism of X chromosomes, and several sex chromosome systems the origin of which remains unexplained. Here we elucidate the sex determination system in *Lasiopodomys mandarinus vinogradovi* using extensive karyotyping, crossbreeding experiments, molecular cytogenetic methods, and single chromosome DNA sequencing. Among 205 karyotyped voles, one male and three female combinations of sex chromosomes were revealed. The chromosome segregation pattern and karyomorph-related reproductive performances suggested an aberrant sex determination with almost half of the females carrying neo-X/neo-Y combination. The comparative chromosome painting strongly supported this proposition and revealed the mandarin vole sex chromosome systems originated due to at least two *de novo* autosomal translocations onto the ancestral X chromosome. The polymorphism in autosome 2 was not related to sex chromosome variability and was proved to result from pericentric inversions. Sequencing of microdissection derived of sex chromosomes allowed the determination of the coordinates for syntenic regions but did not reveal any Y-specific sequences. Several possible sex determination mechanisms as well as interpopulation karyological differences are discussed.

**Keywords:** aberrant sex determination; chromosome painting; comparative cytogenetics; genome architecture; mandarin vole; microdissection; high-throughput sequencing; rearrangements; rodents; sex chromosomes

#### **1. Introduction**

Most therian mammals have a conventional XX/XY sex chromosome system with the Y-borne testis-determining *SRY* gene. Nevertheless, several dozen species with nonstandard systems of chromosomal sex determination have been described among mammals [1]. There are species with isomorphic sex chromosomes in males and females (three species of *Ellobius* genus), with the absence of the regular Y chromosome (e.g., *Dicrostonyx torquatus*) or the *SRY* gene (e.g., *Ellobius lutescens, Tokudaia*), with the Y chromosome in females (e.g., *Myopus schisticolor*), with heteromorphism of the X chromosomes or multiple sex chromosomes (see more examples in [2]). Most species of mammals with aberrant sex chromosome systems belong to the subfamily Arvicolinae (Myomorpha, Rodentia). One such example is the mandarin vole, *Lasiopodomys mandarinus*.

The first karyotype descriptions of *L. mandarinus* made in the 1970s and further works showed the variability of chromosomal numbers among and within populations of this species. In the mandarin voles from Mongolia and Buryatia (*Lasiopodomys mandarinus vinogradovi*) the diploid chromosome number (2n) is 47–48 [3], whereas Chinese populations display 2n = 49–52 (*L. m. mandarinus*, Henan province [4–6]), 2n = 48–50 (*Lasiopodomys mandarinus mandarinus*, Shandong province [7]), or 2n = 47–50 (*Lasiopodomys mandarinus faeceus*, Jiangsu province [8]). Comparative cytogenetic studies made with G-banding and routine staining indicated intrapopulation variability in morphology of some chromosome pairs in karyotypes of *L. mandarinus,* specifically, two pairs of autosomes (No. 1 and No. 2) and sex chromosomes. Each of the studied populations is characterized by large heteromorphic X chromosomes that differ both in shape and size. Wang et al. [7] suggested that the unusual X chromosome variability in *L. mandarinus* originated through translocation of autosomes onto sex chromosomes. The autosomal polymorphism is subspecies-specific, not associated with sex and sex chromosomes, and caused by presumed inversions based on G-banding analysis [7,9]).

Sex chromosome systems of *L. m. vinogradovi* have been investigated first with G-banding and routine staining [3] and recently by cross-species chromosome painting [10]. Using the last method, Gladkikh et al. [10] demonstrated the origin of neo-X chromosomes by at least two independent autosome-sex chromosome translocation events. The complex of sex chromosomes in the only female (2n = 47) studied by these authors consisted of one metacentric chromosome (neo-X1), one submetacentric chromosome (neo-X2), and one small acrocentric (neo-X3). But at least two other sex chromosome systems exist in *L. m. vinogradovi*. In some females (2n = 47) the system is represented by the neo-X2 and two small acrocentrics. A male combination (2n = 48) is represented by the neo-X1 plus three small acrocentrics, one of which is considered to be the Y chromosome [3]. Both of these karyomorphs were described based on the examination of a relatively small sample with traditional methods unable to determine homology among small acrocentrics [3].

All studied males from the Chinese population had an unpaired acrocentric chromosome that could be a Y chromosome [7,8,11]. Analysis of the synaptonemal complex of *L. mandarinus* from China showed that there was indeed a chromosome that could pair with the X chromosome [12,13]. It was also shown that the sex chromosomes of the male *L. m. vinogradovi* pair and recombine at pachytene [14]. Studies on *L. m. mandarinus* demonstrated that sex determination in the subspecies is independent of *SRY* or *R-spondin 1* [13]. Chen et al. [15] also excluded the *Sall 4* gene as a potential testis-determining factor in this subspecies. Up to now, all attempts to find a chromosome carrying any Y chromosome-specific genes or regions (*SRY*, *Rbm*-gene family, PAR) in *L. mandarinus* using molecular approaches failed [5,15,16]. Thus, the question about the presence of a Y chromosome in karyotypes of male mandarin voles is actually controversial.

Despite the unusual heteromorphism of X chromosomes and failure to detect any testis-determining gene, the mandarin vole was, by default, considered as a species with a standard, XY males/non-Y females, sex determination system. Within the framework of this hypothesis, the absence of several predicted sex chromosome combinations (specifically, neo-X1/neo-X1 females expected in the progeny of males and females carrying neo-X1 chromosome, and neo-X2/Y males, expected in the progeny of males and females carrying neo-X2 chromosome) needs explanation. The failure to reveal these

combinations may be either the consequence of small sample size or low viability of their carriers. In the latter case, the reduced fertility is expected for the neo-X2 females because three-quarters of their offspring from crossing with neo-X1/Y males (Y/0, neo-X1/0, and neo-X2/Y) should be nonviable. Also, under conventional sex determination with normal meiotic chromosome segregation, female carriers of a single neo-X2 should deliver only daughters. These predictions can be tested by the crossbreeding experiments.

To elucidate the sex determination system in *L. mandarinus vinogradovi,* we carried out a comprehensive study which combined several different conventional and molecular cytogenetic methods, single chromosome DNA sequencing, and breeding experiments revealing the chromosome segregation pattern as well as the reproductive performance of different karyomorphs. Comparative molecular cytogenetic research methods have been applied to achieve a more detailed description of the karyotype of this species and a deeper study of the autosomal polymorphism.

#### **2. Materials and Methods**

#### *2.1. Ethics Statement*

All applicable international, national, and/or institutional guidelines for the care and use of animals were followed. All experiments were approved by the Ethics Committee on Animal and Human Research of the Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences (IMCB, SB RAS), Russia (order No. 32 of 5 May 2017). This article does not contain any studies with human participants performed by any of the authors.

#### *2.2. Specimens Sampled*

In total, the karyotypes of 205 voles (163 females and 42 males) were examined with conventional cytogenetic methods. Of them, 27 individuals were captured in Selenginskii and Dzhidinskii districts of Buryatia in 2002–2017. The rest were captive-born descendants of these voles. Twelve animals (7 voles from the same laboratory colony and 5 voles captured in Selenginskii districts of Buryatia in 2017) were chosen for molecular cytogenetic study and chromosome sequencing.

#### *2.3. Chromosome Preparation and Chromosome Staining*

For karyotyping, chromosome suspensions were obtained from bone marrow and/or spleen by a standard method with preliminary colchicination of animals [17]. For some individuals, short-term culture of bone marrow was used. For molecular cytogenetic study, metaphase chromosome spreads were prepared from primary fibroblast cultures as described previously [18,19]. The fibroblast cell lines were derived from biopsies of skin, lung, and tail tissues in the case of laboratory animals and from finger biopsy in the case of wild animals as described previously [10]. All cell lines were deposited in the IMCB, SB RAS, cell bank ("The general collection of cell cultures", No. 0310-2016-0002). Cell cultures and chromosome suspensions were obtained in the Laboratory of animal cytogenetics, the IMCB, SB RAS, Russia.

G-banding was performed on chromosomes of all animals prior to fluorescence in situ hybridization, using the standard trypsin/Giemsa treatment procedure [20]. C-banding has followed the classical method [21] or the method with some modifications [10,21].

#### *2.4. Crossbreeding Experiments*

We sexed 327 offspring delivered by 38 females and surviving to at least 24 days of age. The dams were karyotyped, and offspring sex ratios in the pooled progeny obtained from dams of each karyomorph were compared. For each female karyomorph, the observed ratio of male to female offspring in the pooled progeny was compared with an even sex ratio with Chi-square goodness-of-fit test or, in case of a small sample, with Fisher's exact test. Female offspring (*n* = 64) born to 19 of the same dams were karyotyped, and the proportions of daughters holding different karyomorphs in the pooled progeny were calculated and compared between karyomorphs using2X3 Fisher's exact test.

#### *2.5. Female Reproductive Success in Relation to Karyomorphs*

Thirty-two virgin females older than 70 days were paired with unrelated unfamiliar males. All pairs were maintained under standard conditions (see [22] for details). The females were weighed weekly until the detection of pregnancy, after which the nests were checked every two days until delivery, and then again once a week. Thus, the litter sizes were determined no later than the second day after birth. The number of surviving offspring was determined at weaning (on Day 24 after birth). The pairs were monitored for three months. The dams were karyotyped immediately or within a few months after the end of this experiment. We estimated the effects of the dam's karyomorph on the following characteristics of reproductive success over a three-month period: Proportion of females who gave birth (Fisher's exact test), number of litters, total number of the delivered offspring, and total number of the weaned offspring; the last three parameters were determined and compared for those females who gave birth (Student's *t*-test).

All tests were two-tailed and the α level of significance was 0.05.

#### *2.6. Microdissection and Probe Amplification*

G-banding by trypsin using Giemsa (GTG-banding) was performed before microdissection to accurately identify chromosomes. Glass needle-based microdissection was performed as described earlier [23]. One copy of each sex chromosome was collected. Chromosome-specific libraries were obtained using whole genome amplification (WGA) kits (Sigma). After amplification, DNA was purified using nucleic acid purification kits for DNA (BioSilica). DNA libraries were labeled using WGA. Chromosome-specific probes were obtained for chromosomes neo-X1 (probe L2), neo-X2 (probe L3) from LMAN19f, neo-X2 (probe L33) and neo-Y (probes L11 and L13 (distal part)) from LMAN14f, neo-X1 (probe L8) and neo-Y (probe L5) from LMAN15m, and neo-X2 (probe L31) from LMAN5f. Here, LMAN is an individual of *L. m. vinogradovi* (f, female; m, male).

#### *2.7. Fluorescence in Situ Hybridization (FISH)*

The sets of flow-sorted chromosomes of field vole (*Microtus agrestis,* MAG) and Arctic lemming (*Dicrostonyx torquatus,* DTO) painting probes were described previously [10,24–27]. The telomeric DNA probe was generated by PCR using the oligonucleotides (TTAGGG)5 and (CCCTAA)5 [28]. Clones of human ribosomal DNA (rDNA) containing partial 18S, full 5.8S, and a part of the 28S ribosomal genes and two internal transcribed spacers were obtained as described in Maden et al. [29]. FISH was performed following previously published protocols [30,31]. Images were captured using VideoTest-FISH software (Imicrotec) with a JenOptic charge-coupled device (CCD) camera mounted on an Olympus BX53 microscope. Hybridization signals were assigned to specific chromosome regions defined by G-banding patterns previously photographed and captured by the CCD camera. All images were processed using Corel Paint Shop Pro X3 (Jasc Software).

#### *2.8. Sequencing*

Libraries for sequencing were prepared according to the TruSeq Nano Library Preparation Kit (Illumina). Size selection was performed using the Pippin Prep. Quantification of the libraries before sequencing was performed using real-time PCR with SYBR GREEN. Then, 300-base pair paired-end reads were generated on Illumina MiSeq using the Illumina MiSeq Reagent Kit v3, according to the manufacturer's instructions. Raw reads were deposited in the Sequence Read Archive of the National Center for Biotechnology Information under accession PRJNA613194.

#### *2.9. Bioinformatic Analysis*

The reads obtained by sequencing were used in the DOPseq\_analyzer pipeline (https://github. com/ilyakichigin/DOPseq\_analyzer) to search for syntenic regions in the mouse genome assembly GRCm38. The operation of this pipeline was reported by Makunin et al. [32]. It can be briefly described as follows. First, the cutadapt 1.18 tool [33] removes the sequences of Illumina adapters and primers used for amplification. The purified reads are aligned on the mouse genome GRCm38 (to search for target regions) and the human genome GRCh38 (to remove contamination reads) using Burrows-Wheeler Aligner 0.7.17 [34], low-quality alignments (alignment length <20, mapping quality <20) were discarded. Then, by calculation, the density of alignment and the identification of target regions occurs using DNAcopy package [35]. The resulting coordinates are then checked manually in the UCSC (University of California, Santa Cruz) genome browser (https://genome.ucsc.edu).

The obtained coordinates for syntenic blocks are slightly different between libraries since they contain different amounts and diversity of the target DNA. To establish more accurate averaged boundaries of the evolutionary breakpoints, the reads obtained for all libraries were combined and reused in DOPseq\_analyzer.

#### **3. Results**

#### *3.1. Sex Chromosome Combinations Revealed by Extensive Karyotyping*

The karyotypes of the studied individuals included, in addition to 22 pairs of autosomes common to males and females, four combinations of large heteromorphic sex chromosomes and small acrocentric chromosomes unidentifiable with conventional cytogenetic methods.

All studied males (42 individuals) had 2n = 48 and an identical system of sex chromosomes: Neo-X1 and three small acrocentrics (karyomorph I, thereafter KI). Males with neo-X2 were not found. Among 163 females, three sex chromosome combinations were revealed. Karyomorph II (KII, 47% of the studied females) corresponded to the sex chromosome system described by Gladkikh et al. [10] and had 2n = 47 with neo-X1, neo-X2, and neo-X3. Karyomorph III (KIII, also 47% of the studied females) had 2n = 47 with neo-X2 and two small acrocentrics. Karyomorph IV (KIV, 6% of females) had 2n = 48 with neo-X1, neo-X1, and two small acrocentrics.

#### *3.2. Hybridization Experiment*

Males were presented in the pooled progeny of all female karyomorphs. Sex ratio was female-biased in offspring of KII and KIII females. To the contrary, only sons were born to the few breeding KIV females (Table 1). The differences in the offspring sex ratio between KIV and the other two karyomorphs were significant (Fisher's exact test: KII vs. KIV, *p* = 0.001; KIII vs. KIV, *p* = 0.002). Sex ratio in the pooled sample, including the offspring of all females, was significantly female-biased (40% of males, χ<sup>2</sup> = 12.96, df (degrees of freedom) = 1, *p* < 0.001).



Karyotyping of 19 dams and their 64 daughters showed that KII and KIII females produced mainly KIII and KII daughters, respectively. As one can expect, KIV females were not found among the daughters of KIII dams. This variant was very rare in the progeny of KII females. There was a significant difference between the two most common karyomorphs in the proportions of KII:KIII:KIV daughters in progeny (KII dams: 9:25:3; KIII dams: 21:6:0; *p* < 0.001).

#### *3.3. Female Reproductive Success Related to Their Karyomorphs*

Of the 32 females participating in the experiment, 13 (41%) belonged to KII, 15 (47%) to KIII, and four (13%) to KIV. Female carriers of the two most common karyomorphs did not differ in any measure of reproductive success (Table S1). At the same time, none of the rare KIV females produced offspring during a three-month period. This karyomorph significantly differed from the other two in the proportion of carriers that gave birth (Fisher's exact test for KII vs. KIV: *p* = 0.002; KIII vs. KIV: *p* = 0.009) (Table S1).

#### *3.4. Comparative Molecular Cytogenetic Investigation of Di*ff*erent L. m. vinogradovi Karyomorphs*

Comparative chromosome painting with two sets of painting probes was used for the analysis of karyotypes of 12 animals (Table 2). Since the set of *M. agrestis* probes showed almost complete identity of the autosomal sets in various individuals of *L. m. vinogradovi*, only partial localization of the *D. torquatus* probes was carried out on the chromosomes of most individuals. Application of comparative chromosome painting allowed us to establish that the acrocentric chromosomes participating in formation of complex sex chromosome systems in *L. m. vinogradovi* are homologous to MAG13/X/13 (designated here as neo-Y) and MAG17/19 (designated here as neo-X3, according to [10]) (Figures 1 and 2).

**Table 2.** The list of investigated individuals of *Lasiopodomys mandarinus vinogradovi* with abbreviated names, diploid numbers (2n), origin, systems of sex chromosomes, and types of autosome LMAN2; f, female; m, male; 2a, the acrocentric with the order of syntenic blocks MAG1/5; 2b, the acrocentric with the order of syntenic blocks MAG1/5/1/5/1/5; 2c, the submetacentric with the order of syntenic blocks MAG5/1/5/1. See comments in the text.


Sex chromosomes of KI (males) were represented by the largest metacentric (neo-X1) and a small-sized acrocentric (putative neo-Y) (Figure 1a). The autosomes homologous to MAG17/19 (neo-X3) should also be included in the complex of male sex chromosomes as they were present in single copy in KII. The MAGX probe hybridized to the p-arm of the neo-X1 chromosome and to the interstitial part of neo-Y chromosome (Figure 2e). MAG13 labeled the q-arms of neo-X and neo-Y (Figure 1a). The neo-X1 chromosome had three C-positive blocks on the q-arm (Figure 3a,b).

**Figure 1.** GTG-banded karyotype of *L. m. vinogradovi*: (**a**) LMAN10m, (**b**) LMAN0f [10], (**c**) LMAN5f. Black dots mark the position of centromeres. Vertical black bars mark the localization of *Microtus agrestis* (MAG) chromosome painting probes, vertical grey bars mark the localization of *Dicrostonyx torquatus* (DTO) painting probes. Numbers along the vertical lines correspond to chromosome numbers of *M. agrestis* and *D. torquatus*. Black triangles indicate sites of localization of ribosomal DNA clusters. Grey triangles indicate localization of the largest interstitial telomeric block.

**Figure 2.** Examples of fluorescent in situ hybridization: (**a**) MAG5 (green) and MAG1 (red) onto LMAN1f, (**b**) MAG5 (green) and MAG1 (red) onto LMAN10m, (**c**) MAGX (green) and MAG13+14 (red) onto LMAN10m, (**d**) MAG17 (green) and MAGX (red) onto LMAN6f, (**e**) MAG23 (green) and MAG13+14 (red) onto LMAN1f, (**f**) MAG1 (red) and MAG5 (green) onto LMAN15m. Scale bar is 10 μm.

**Figure 3.** Polymorphic sex chromosomes and a pair of autosomes 2 in karyotypes of *Lasiopodomys mandarinus vinogradovi*: (a) Males (karyomorph I), (b) females, karyomorph II, (c) females, karyomorph III. From top to bottom: Sex chromosome system in the karyomorph, C-banding of metaphase chromosomes (chromosomes of LMAN10m (a), LMAN19f (b), and LMAN14f (c) are given as an example for each karyomorph), a pair of autosomes 2 of different individuals with localization of *Microtus agrestis* samples. Black arrows marked possible neo-Y.

Sex chromosomes of KIII (females) were represented by the large neo-X2, one small unpaired acrocentric corresponding to neo-X3, and another small unpaired acrocentric homologous to MAG13/X/13 (neo-Y) (Figure 1c). As in the case of KII the neo-X2 chromosome had a block of grey heterochromatin in the area homologous to MAGX (Figure 3b,c)

KIV (females) have not been found among the animals analyzed by molecular cytogenetic methods but the structure of their karyotype can be unequivocally reconstructed based on the analysis of other karyomorphs. Their sex chromosome complex can be described as a pair of neo-X1 chromosomes and two acrocentrics homologous to MAG17/19 (neo-X3).

MAGY probe labeled heterochromatic, C-positive parts of neo-X1 and neo-X2 chromosomes of male and female *L. m. vinogradovi*.

The field vole and the Arctic lemming painting probes showed the conservatism of autosomal sets in the species and the same localization of the probes as was shown previously for LMAN0f with the only exception the chromosome 2 [10]. The chromosome is designated here as LMAN2 without an additional letter. We described four variants of the distribution of syntenic blocks homologous to MAG1 and MAG5 onto LMAN2 due to inversions: Both homologs are acrocentrics with the order of probes MAG1/5 (LMAN6f, 10m, 16f) or MAG1/5/1/5/1/5 (LMAN0f, 1f), or MAG1/5 and MAG1/5/1/5/1/5 (LMAN2f, 3f, 5f). Both homologs are submetacentric with the order of probes MAG5/1/5/1 (LMAN19f). The LMAN2 is heteromorphic, and this heteromorphism is formed by an acrocentric chromosome homologous to MAG1/5 and a submetacentric chromosome with a derived structure MAG5/1/5/1 (LMAN14f, 15m, 17f) (Figures 1 and 2, Table 2).

Noncentromeric interstitial telomeric sequences (ITS) were localized on the autosome 1 and on the neo-X2 chromosome of all studied specimens. Each individual carried three rDNA clusters located on chromosomes 2, 18, and 22 (Figure 2). The size of rDNA clusters was different on homologs of autosome 2.

In general, no correlation was observed between sex chromosome systems and the morphology of chromosome pair 2, both among wild-caught and captive-born animals (Table 2).

#### *3.5. Sequencing and Bioinformatic Analysis*

Eight sex chromosome libraries were sequenced on the Miseq Illumina platform (Table S2). All sequencing and alignment statistics are presented in Table S3. Based on the low-coverage sequencing data obtained, the coordinates of large syntenic regions and the boundaries of evolutionary rearrangements relative to the mouse (*Mus musculus*, MMU) genome were identified (Table S2).

False-positive regions were detected on the chromosome neo-Y where the painting probe of MAGX chromosome labeled a small interstitial region (Figure 2a,c). The sequencing data did not show homology of the chromosome to the MMUX but indicated the synteny of the neo-Y chromosome with a region of the MMU18 only.

#### **4. Discussion**

The fascinating sex chromosome polymorphism of the mandarin vole raises three closely related questions: (1) How is such an unusual system maintained in a population? (2) How did the system evolve, that is, what chromosome rearrangements led to the polymorphism of sex chromosomes? (3) What is the specific (molecular) mechanism of sex determination in this species?

Up to now, the investigators mainly addressed the latter two problems. Here, on the contrary, we found it useful to focus on the first two questions, and our results provided the premise to approach solving the third, and perhaps, the main question for many evolutionary biologists.

#### *4.1. How Is Such an Unusual System Maintained in Populations of L. m. vinogradovi?*

If one assumes the presence of a single neo-Y chromosome to be both necessary and sufficient to initiate testis development, the described variation of sex chromosome combination is hard to explain. Why some of the expected karyomorphic variants are missing, and where do the females with a single X chromosome (KIII) come from?

Theoretically, the offspring resulting from a cross between the females with the neo-X1 chromosome and males also carrying the neo-X1 chromosome should include neo-X1/neo-X1 females. These females have not been revealed in previous studies, but here we did find this rather infrequent karyomorph (KIV) due to comprehensive sampling. At the same time, no males with neo-X2/neo-Y, which are expected to occur among both KII and KIII females' progeny, were detected among more than 40 karyotyped male voles. From the "standard sex determination" point of view, their absence may be explained by the lethality of male embryos with neo-X2/neo-Y combination. However, this means that the KIII females produce three-quarters of nonviable embryos (neo-Y/0, neo-X1/0, and neo-X2/neo-Y), and KII females produce one-quarter of the nonviable embryos (neo-X2/neo-Y). According to this hypothesis, only KIV females are lucky to have no costs associated with nonviable offspring and, therefore, are predicted to have the highest reproductive success. This prediction does not seem to be supported by the results of our experiments. Further and most important, under the standard sex determination and normal sex chromosome segregation, KIII dams should produce only KII daughters. In fact, they also produce KIII daughters and KI sons. Finally, the finding that KII dams delivered a large proportion of KIII daughters also needs explanation. Within the framework of the standard sex determination hypothesis, the observed patterns require both a nondisjunction of Y chromosome in the second division and nonviability of most karyomorphs. This scenario appears to be very unlikely. On the other hand, the chromosome segregation pattern inferred from our results appears to be in good agreement with the "XY female hypothesis" [36–38]. This hypothesis requires neither chromosome nondisjunction nor the lethality of large proportions of offspring; the only one nonviable combination (neo-Y/neo-Y) is implied. The unequal proportions of different karyomorphs in progeny of each type of females may be explained by two phenomena. The first one is apparent lower viability of neo-X1/neo-X1 females due to, for example, a violation in gene dosage compensation, or some other unknown causes. Unfortunately, the mechanisms of gene dosage compensation in this species are unexplored. The second plausible phenomenon is that the relative success of male neo-X1 and neo-Y spermatozoa depends on the karyomorph of a fertilized female. From our chromosome segregation data, it looks like male gametes carrying neo-Y are favored in neo-X1/neo-X2 females, resulting in higher than expected proportions of both sons (neo-X1/neo-Y) and neo-X2/neo-Y daughters (each about 40% vs. the expected 25%). To the contrast, an excess of sons (40% vs. 33% expected) and neo-X1/neo-X2 females (40% vs. 33% expected) in a progeny of neo-X2/neo-Y dams and predominance of sons in a progeny of neo-X1/neo-X1 (100% in our small sample) dams suggest the spermatozoa with neo-X1 chromosome to be favored in these crossbreeding combinations. The cytogenetic mechanisms underlying these phenomena are unclear but it is noteworthy that they all reduce a sex ratio distortion. Thus, in terms of ultimate causes, these mechanisms might be selected as they increased the investment to sons which have higher reproductive value than daughters in a population with a female-biased primary sex ratio [39].

Our proposition that neo-X2/neo-Y mandarin voles are actually present but have a female phenotype received strong support from the results of fluorescence in situ hybridization and the comparative molecular sex chromosome investigation.

#### *4.2. Complex Systems of Sex Chromosomes in L. m. vinogradovi and Their Origin*

The comparative chromosome painting convincingly showed that at least two autosomal translocations on sex chromosomes took place in the evolution of the mandarin vole karyotypes forming the neo-X and neo-Y chromosomes in *L. m. vinogradovi*.

Comparing our sex chromosome sequencing data to previous comparative chromosome painting data [26] confirms that MAG13=MMU18, MAG17=MMU13/15, and MAG19=MMU15. However, our neo-Y sequencing data did not show homology of this chromosome to the mouse X chromosome, and this contrasts to the clear detection of X chromosome signal on the neo-Y chromosome by FISH. Therefore, it is likely that the FISH-signal represents shared repetitive sequences that are not included in the bioinformatic analyses. The fact that an unpaired chromosome with a small interstitial block of heterochromatin was detected by C-banding in karyotypes of both males and females further confirmed the presence of a block of repeated sequences on the chromosome (Figure 3a,c).

The regular Y chromosome has not been revealed in comparative chromosome painting experiments based on localization of the *M. agrestis* Y chromosome probe. Previously Zhao et al. [40] also failed to find a regular Y by FISH experiments with localization of partial human and whole mouse Y. Detection of signals from the MAGY probe in the heterochromatic, C-positive, parts of neo-X1 and neo-X2 chromosome of *L. m. vinogradovi* males and females may be caused by repeated sequences. As the Y chromosome of *M. agrestis* carries a huge block of heterochromatin, it can be assumed that both these arvicoline species have similar repeated sequences on their sex chromosomes. This assumption does not exclude the presence of sequences responsible for masculinization function in this area. This phenomenon requires a thorough study.

By analyzing the low-coverage chromosome sequencing data, we also failed to identify any Y chromosome-specific genes or regions. This may indicate either the elimination of the regular Y chromosome in this species or the insufficiency of our approach of aligning the reads of chromosomes on the mouse genome to search for the Y chromosome due to its rapid evolution and complex repetitive structure. To date, there are no Y chromosome assemblies for the representatives of Cricetidae family; it is possible that the forthcoming release of such genomic assemblies will allow us to answer the question about the presence of a regular Y chromosome using bioinformatic comparative genomic analysis methods.

It should be noted here that *L. mandarinus* is one of the most unusual species in terms of the synaptic behavior of its sex chromosomes. The nature of the XY pairing observed in this species differs markedly from that revealed in all other arvicolines. It was proposed based on the recombination pattern detected in pachytene that the XY synapsis in *L. m. vinogradovi* is a derivative condition resulting from de novo translocated autosomal material [14]. The FISH results obtained here completely confirm the suggestion. Preservation of ITS at the confluence sites of ancestral autosomes and sex chromosomes also indicates that the translocation has occurred recently.

#### *4.3. Possible Mechanisms of Sex Determination in L. m. vinogradovi*

The pattern of association between phenotypic sex and sex chromosome combinations found in *L. m. vinogradovi* is similar to that in the wood lemming (*Myopus schisticolor*), collared lemmings (genus *Dicrostonyx)*, and the African pygmy mouse (*Mus minutoides*) [41–44]. It suggests an X-linked mutation (in *L. m. vinogradovi*, the mutation on the neo-X2 chromosome) to prevent masculinization of neo-X2/neo-Y individuals, but exact genetic bases of the male-to-female sex reversal are unknown. In the case of the mandarin vole, the problem of sex determination mechanism is additionally complicated by the failure to reveal any Y-specific genes or regions. The following scenarios can be suggested.

(1) The neo-X2 chromosome contains an epistatic locus (B) suppressing the dominant male development trigger (A) located on the neo-Y (Figure 4a).

**Figure 4.** Possible mechanisms of sex determination in *Lasiopodomys mandarinus vinogradovi*: (**a**) a mechanism suggesting the presence of an epistatic locus on neo-X2 (X2) chromosome suppressing the dominant male development trigger on neo-Y (Y) chromosome, (**b**) a mechanism suggesting the presence of a locus on neo-X1 (X1) chromosome complementing the dominant male development trigger on neo-Y, (**c**) a mechanism suggesting the presence of male development trigger on neo-X1 chromosome and nonrandom inactivation of neo-X chromosomes. See comments in the text.


This scenario, however, appears to be the least plausible as it requires the dominant male development trigger to be somehow inactive in double doses to produce neo-X1/neo-X1 females.

Whichever of the proposed scenarios is true, we believe that it should be the same in *L. m. vinogradovi* and *L. m. faeceus*. This assumption is based on the fact that the studied sample of voles from this Chinese population (Henan province) was represented by the same combinations of large X chromosomes and small acrocentrics, and in approximately the same proportions. In this population, a male karyomorph, KI, and female karyomorphs, KII and KIII, were common while KIV females were found as a rare variant [5]. In our opinion, information on the karyotypes of the mandarin voles from another Chinese population, Shandong province, deserves special attention. According to Wang et al. [7], females with a single X chromosome (KIII), common in other populations, were not found here, whereas an additional male karyomorph corresponding KIII has been reported. Thus, we assume that the aberrant sex determination system, in which some of the carriers of the Y chromosome display a female phenotype, has either not yet appeared or has already disappeared in this population. In our opinion, a detailed study of the sex chromosomes of voles from Shandong could shed light on the evolution of molecular mechanisms of sexual determination in this species.

#### *4.4. Chromosomal Di*ff*erences of L. mandarinus from Di*ff*erent Populations*

Comparison of *L. mandarinus* showed clear differences in the karyotypes of individuals from different populations. So, the mandarin voles from Mongolia and Buryatia (*L. m. vinogradovi*), having diploid chromosome number 2n = 47–48, carried three pairs of metacentric chromosomes corresponding to pairs 1, 4, and 18 described in this work ([3], present data). The sizes of two types of chromosomes bearing regions homologous to MAGX, were approximately the same. All animals analyzed in the current work had stable and identical pairs of chromosome 1 (LMAN1), presented by two metacentric chromosomes. Both homologs carried interstitial telomeric sequences (ITS) in q-arms, separating synteny MAG8/21 and MAG11/13 (Figure 2). As previously suggested, that the association MAG11/13 is ancestral for the subgenus *Lasiopodomys*, the presence of ITS showed that the first pair of metacentric chromosomes was formed by the evolutionary recent fusion of the two ancestral pairs of chromosomes [10].

LMAN2 is homologous to MAG1 and MAG5. This fusion is characteristic of *L. m. vinogradovi* and it has never been found in any other arvicolines. The pair was polymorphic in the animal analyzed here due to multiple para- and pericentric inversions. It is important to notice here that both homologs of LMAN2 carry big clusters of ribosomal genes in the distal part of q-arms. It is possible that the fusion MAG1/5 is also characteristic of mandarin voles from other populations, but molecular cytogenetic methods must be used to verify this assumption.

In the karyotypes of all the studied individuals from China, there are only two stable pairs of bi-armed autosomes (corresponding to LMAN1 and 4 described in this work). Polymorphism of LMAN1 revealed in the Chinese population is not characteristic for individuals from the Buryatia. Zhang and Zhu [46] postulated that a Robertsonian fission is the main reason for the polymorphism of chromosome 1.

*L. m. faeceus* inhabiting the Jiangsu province in China has 2n = 47–50. The pair of largest autosomes is formed by two submetacentrics [8]. In the absence of molecular cytogenetic studies, it is difficult to state unequivocally, but it seems that sex chromosome systems are similar to those described in this work. Nevertheless, the relative sizes and ratio of lengths of arms of the submetaand metacentric chromosomes attributed to the sex chromosomes are different, which may indicate a different accumulation of repeated sequences, or the presence of intrachromosomal rearrangements. It cannot be ruled out that other pairs of autosomes participated in the translocation of X chromosomes to autosomes. However, there is a polymorphism in one pair of autosomes in *L. m. faeceus*, apparently smaller than a pair of chromosomes 2 in *L. m. vinogradovi*. So, the autosomal polymorphism, previously described only for pairs of chromosomes 1 and 2, possibly affects other pairs of autosomes in karyotypes of the mandarin voles from different populations.

It is shown that *L. m. mandarinus* from Henan province (China) has a diploid number 2n = 49–52 [4–6], while the same subspecies in the Shandong province (China) has 2n = 48–50 [7]. Based on G-banding comparison we propose that the pair of chromosomes 1 in [7] is homologous to our LMAN1q. We should also note that the number of chromosomes bearing nucleolus organizer regions identified by Wang et al. [7] and in this work is different (4 vs. 3). Moreover, clusters of ribosomal genes are located on a pair of chromosomes 1 [7] (which corresponds to localization on LMAN1q). The morphology of sex chromosomes in *L. m. mandarinus* is similar to that described for *L. m. faeceus*. Surprisingly, among *L. m. mandarinus*, some males with a sex chromosome system morphologically similar to the karyomorphs III (females) described in this work were found [7], whereas among *L. m. faeceus*, females were discovered whose sex chromosome system was represented by two large submetacentric chromosomes [5].

It is known that the number of rDNA clusters and their localization can vary on various chromosomes even between closely related species [47]. This instability can be caused by a clustered structure of ribosomal genes that facilitate translocations by illegitimate recombination between nonhomologous chromosomes. Among mammals, multiple cases of interspecific variation in localization of rDNA clusters were described including presence on sex chromosomes [48]. Two of the three pairs of chromosomes carrying clusters of ribosomal genes in *L. m. vinogradovi* had stable morphology and localization of probes. At the moment, it remains unclear whether the polymorphism of the pair of autosomes 2 in *L. m. vinogradovi* and pair of autosomes 1 in *L. m. mandarinus* is associated with the location of the cluster of rDNA on them. The reasons for the significant polymorphism of these particular pairs of autosomes remain unclear.

In addition to the differences described above, individuals from different populations exhibit a different amount and distribution of heterochromatin ([5,7]; this work).

Thus, the karyotypes of the mandarin vole from all currently studied geographical populations are significantly different. In order to give a taxonomic assessment of these differences, it is necessary to study the karyotypes of *L. mandarinus* from different populations by molecular cytogenetic methods as well as applying molecular genetic data for the establishment of phylogenetic relationships between populations.

#### **5. Conclusions**

Euchromatic parts of mammalian sex chromosomes are highly conserved. Only rare cases of their involvement in rearrangements have been described in myomorph rodents, bats, carnivores, primates, and cetartiodactyls. Mandarin voles undoubtedly represent a unique species even among myomorphs and arvicolines. Their karyotypic features, such as the presence of different polymorphic pairs of autosomes and nonstandard sex chromosome systems, indicate significant plasticity of their genome, as well as ongoing processes of karyotypic evolution within the species. Such a diversity of sex chromosome systems as found in mandarin voles (within the same species) seems unique and has not been described yet in any other mammalian species. Such factors as modifications of the epigenetic state of DNA and accumulation of a large number of repeats may be required to trigger evolutionary plasticity [49].

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/4/374/s1, Table S1: Comparison between KII and KIII females by the reproductive success over a three-month period for the two most common female karyomorphs. Table S2: Chromosome-specific microdissected probes of *L. m. vinogradovi* and syntenic regions in mouse genome assembly GRCm38. Table S3: Sequencing and alignment statistics.

**Author Contributions:** Conceptualization, A.S.G. and A.V.S.; methodology, A.S.M., A.V.S., Y.M.K., S.A.R., O.L.G., N.A.L., and P.C.M.O.; software, D.Y.P.; validation, A.V.S., Y.M.K., S.A.R., and V.A.T.; formal analysis, S.A.R., A.V.S., Y.M.K., N.A.L., O.L.G., D.Y.P., and V.A.T.; investigation, A.V.S., Y.M.K., S.A.R., N.A.L., O.L.G., P.C.M.O., V.A.T., and N.A.S.; resources, A.V.S., Y.M.K., I.A.P., and F.N.G.; data curation, A.V.S. and S.A.R.; writing—original draft preparation, S.A.R. and A.V.S.; writing—review and editing, V.A.T., D.Y.P., M.A.F.-S., and P.C.M.O.; visualization, S.A.R. and A.V.S.; supervision, A.S.G.; project administration, A.S.G. and A.V.S.; funding acquisition, A.S.G. and A.V.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Russian Science Foundation No. 19-14-00034 (A.S.G.) and Russian Foundation for Basic Research No. 19-04-00538 (A.V.S.) and 19-04-00557 (F.N.G.).

**Acknowledgments:** The authors gratefully acknowledge the resources provided by the "Molecular and Cellular Biology" core facility of the IMCB, SB RAS.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### *Article*
