**Chromosomics: Bridging the Gap between Genomes and Chromosomes**

**Janine E. Deakin 1,\*, Sally Potter 2,3,**†**, Rachel O'Neill 4,**†**, Aurora Ruiz-Herrera 5,6,**†**, Marcelo B. Cio**ffi **7, Mark D.B. Eldridge 3, Kichi Fukui 8, Jennifer A. Marshall Graves 1,9, Darren Gri**ffi**n 10, Frank Grutzner 11, Lukáš Kratochvíl 12, Ikuo Miura 13, Michail Rovatsos 11, Kornsorn Srikulnath 14, Erik Wapstra <sup>15</sup> and Tariq Ezaz 1,\***


Received: 18 July 2019; Accepted: 13 August 2019; Published: 20 August 2019

**Abstract:** The recent advances in DNA sequencing technology are enabling a rapid increase in the number of genomes being sequenced. However, many fundamental questions in genome biology remain unanswered, because sequence data alone is unable to provide insight into how the genome is organised into chromosomes, the position and interaction of those chromosomes in the cell, and how chromosomes and their interactions with each other change in response to environmental stimuli or over time. The intimate relationship between DNA sequence and chromosome structure and function highlights the need to integrate genomic and cytogenetic data to more comprehensively understand the role genome architecture plays in genome plasticity. We propose adoption of the term 'chromosomics' as an approach encompassing genome sequencing, cytogenetics and cell biology, and present examples of where chromosomics has already led to novel discoveries, such as the sex-determining gene in eutherian mammals. More importantly, we look to the future and the questions that could be answered as we enter into the chromosomics revolution, such as the role of chromosome rearrangements in speciation and the role more rapidly evolving regions of the genome, like centromeres, play in genome plasticity. However, for chromosomics to reach its full potential, we need to address several challenges, particularly the training of a new generation of cytogeneticists, and the commitment to a closer union among the research areas of genomics, cytogenetics, cell biology and bioinformatics. Overcoming these challenges will lead to ground-breaking discoveries in understanding genome evolution and function.

**Keywords:** cytogenetics; sex chromosomes; chromosome rearrangements; genome plasticity; centromere; genome biology; evolution

#### **1. Introduction**

Advances in technology have made sequencing the entire genome of an organism essentially routine. However, DNA sequence is only one relatively static component of the highly dynamic entities within the nucleus of a cell—chromosomes. Where a particular sequence is located on a chromosome and how it interacts with other parts of the genome are important aspects of genome biology often overlooked in genome sequencing projects. We propose a new framework for studying genome biology that integrates approaches in genome sequencing, cytogenetics and cell biology, as well as a renewed focus on training the next generation of genome biologists in the skills required for the integration of these data. We propose the adoption of the term 'chromosomics', which combines the original definition of cytogenetics (chromosomes and cytology) with genomics (gene content, structure and function for an entire organism), to ensure a closer integration of these fields. The term chromosomics was originally proposed by Uwe Claussen to introduce the branch of cytogenetics that deals with the three-dimensional structure of chromosomes and their associated gene regulation [1]. However, we propose that this term encompass the integration of the latest advances in cytogenetics, genome sequencing, epigenomics and cell biology. The adoption of a chromosomics approach to answering the big fundamental questions in biology will undoubtedly lead to major discoveries that were previously beyond reach.

In recent times, the field of genomics has largely distanced itself from cytogenetics, the field providing insight into chromosome structure, function and evolution. This separation of the fields has been to the detriment of a full understanding of how the genome works in the cell. These two fields were never intended to work in isolation. In 1920, Hans Winkler coined the term 'genome' to combine the study of genes and chromosomes [2], yet in modern interpretations of 'genome', chromosomes are often forgotten and the focus is solely on the DNA sequence. Similarly, Walter Sutton in 1902 (no published record) used the term 'cytogenetics' to combine cytology (the study of cell structure and function) with genetics (the study of genes, genetic variations and heredity). However, the cytological aspects of cytogenetics are largely ignored by most modern cytogenetic studies. As these respective fields have narrowed their focus, the result has been the development of technological and methodological advancements (examples in Table 1) that could allow us to more fully capture the dynamic nature and evolution of chromosomes from potentially any species to provide insight into fundamental biological questions.


**Table 1.** Recent Technological Advances in Cytogenetics and Genomics.


**Table 1.** *Cont.*

Chromosomes play a vital role in the nucleus, as they are essential for DNA to replicate and segregate during cell division. They are not randomly positioned in the nucleus, but organised into specific areas called chromosomal territories [20] that change during the cell cycle [21,22] and development [23–25]. Maintenance of these territories is important for proper cell functioning, replication, and the accurate division and differentiation of cells. If we repack the DNA into a chromosome, we see that the DNA is wrapped around a nucleosome consisting of eight histone proteins to produce a chromatin fibre, which is attached to a backbone of non-histone proteins called the chromosome scaffold (Figure 1). The dynamic nature of the chromosome throughout the cell cycle and in response to environmental influences is enabled by the ability of the chromatin fibre to vary the level of DNA compaction and histone composition (epigenetics), and the ability of the scaffold proteins to follow the changes of the chromatin fibre [26]. Chromatin remodelling and changes in chromatin conformation affect interactions between sequences in different genomic regions and can influence gene regulation. The close connection between DNA, chromosome structure and the position of chromosomes in the nucleus highlights the need to integrate genomic data to better understand chromosome architecture and function. This information will provide a more comprehensive understanding of the evolutionary plasticity and organisational functions of genome architecture and mechanisms of faithful transmission of the genome to offspring.

**Figure 1.** Repacking the DNA into a chromosome. The double-stranded DNA helix is wrapped around a nucleosome consisting of eight histone proteins to produce a chromatin fibre, which is attached to a backbone of non-histone proteins called scaffold proteins which form the chromosome scaffold.

In the past, incremental advances in understanding genome biology have been made through combining information from cytology, cytogenetics and genomics, often from data gathered by different groups focused on one particular question and over many years (e.g., the discovery the Philadelphia chromosome causing chronic myelogenous leukemia or the discovery of the sex-determining gene, *SRY*; Figure 2). Now is the time to reunite cytogenetic and sequencing approaches. Not only has the resolution of chromosomes under various forms of microscopy greatly accelerated (e.g., deconvolution system [27], structured illumination microscopy [26] and super-resolution microscopy [28]), but new sequencing technologies are now promising to make genome assemblies close to chromosome level a reality. In addition, new sequence-based techniques for chromosome conformation capture promise to fill in the details in our cytological picture of how active and inactive chromatin is assembled and arranged into functional units in the interphase nucleus. Collectively, these advances afford the capability to answer key fundamental questions in genome biology.

**Figure 2.** The incremental advances made through combined cytogenetic and genomic information in the discovery of the Philadelphia chromosome causing chronic myelogenous leukemia [29–33] and the discovery of the sex-determining gene *SRY* [34–38].

#### **2. Development of Genome Sequencing from Cytogenetics**

The proposal to sequence the human genome led to the start of the genomics era. When we consider the human genome project, it was approached from an understanding that the position of the sequence on the chromosome was important [39]. Indeed, the forerunner of the Human Genome Project was a series of meetings of an international Human Gene Mapping consortium, which met annually to put together increasingly detailed physical maps of all the human chromosomes, combining data from linkage analysis, somatic cell genetics and radiation hybrid analysis, and in situ hybridisation. This consortium was organised into separate committees for each human chromosome, as well as committees for the mitochondrial genome. Moreover, the consortium included a comparative gene mapping committee, which started out largely focused on mouse but grew to encompass many other mammals, birds and fishes. Slowly, the physical maps developed by these working groups expanded

and were filled in with other markers. Sequencing crept in to offer the ultimate detail of individual genes (or at least exomes). The advent of large insert clones like BACs (bacterial artificial chromosomes) greatly aided the extension of DNA sequence to encompass larger genomic intervals beyond individual genes [39].

Physical BAC or yeast artificial chromosome (YAC) maps of each chromosome were constructed and sequenced, resulting in a chromosome-based genome assembly and enabling the integration of gene and genetic mapping data accumulated over many years and by many different researchers [40]. The subsequent ENCODE (Encyclopedia of DNA elements) project saw the integration of sequence data with information on chromatin states, which provided an exceptional insight into dynamic gene regulation [41]. However, just as genome sequences for other species were needed to help interpret the human genome [42], comparative data from a broad range of species are required to fully understand the role many chromatin modifications play in genome function. The interpretation of the ENCODE data for the human genome was only made possible by the chromosome-based genome assembly, affording an appreciation for regional transcriptional control, dynamic chromatin states and long-range interlocus interactions. At present, a challenge for genomes from non-traditional model species is the difficulty in overlaying chromatin remodelling data when genome assemblies are not yet at the chromosome level. The platypus (*Ornithorhynchus anatinus*) genome is an excellent example of the difficulty in accurately overlaying and interpreting DNA methylation data on a fragmented genome assembly. The platypus genome was sequenced to approximately six-fold coverage by a whole genome shotgun approach using Sanger sequencing, and only around 21% of this genome assembly was anchored to platypus chromosomes [43]. Although a valuable resource, the low percentage of the genome anchored to chromosomes greatly reduced the number of genes that could be examined in a recent comparative study of reduced representation bisulphite sequencing data [44].

#### **3. The Integral Role of Cytogenetics in Genome Projects**

With increasingly cost-effective high throughput sequencing, most recently assembled genomes feature short contigs and often lack even a basic physical map or chromosome number and morphology information. While chromosome-level assemblies might not be feasible for all genomes targeted for sequencing, they should be well represented across all lineages to allow comparative genome biology studies that, by their very nature, rely on knowing the position of orthologous sequences among genomes. All genome sequencing projects should incorporate some level of cytogenetic analysis from the very start. For example, a logical first step in any whole genome sequence project would be to ensure that the individual being sequenced is not carrying chromosomal aberrations, particularly if the genome assembly is to be used as a reference for population-level sequencing. Basic karyotyping would ensure the ploidy level of the species, the absence of aneuploidy and confirm the genetic sex of the individual when cytogenetically distinguishable sex chromosomes are present, a particularly important consideration in species subject to environmental sex reversal (e.g., *Pogona vitticeps* [45]). Karyotyping will also determine if there are large heterochromatic chromosomes or regions. The flow sorting of chromosomes can be used for gross assessment of aneuploidy. Flow cytometry with appropriate standards is a reliable and fast method for estimating genome size [46], an important consideration in determining the amount of sequencing required to achieve the desired level of genome assembly.

A whole genome sequence is much more informative if it is assigned and oriented onto chromosomes, and is far more intuitive to visualise as chromosomes than unconnected and unordered scaffolds. When whole genome sequences fall short of this 'chromosome level assembly', their use for critical aspects of evolutionary and applied biology is significantly limited. Assigning sequence contigs to chromosomes has most often been achieved by integrating sequence data with molecular cytogenetic mapping data. This can be achieved by determining the location of a large-insert clone by fluorescence in situ hybridisation (FISH) on metaphase chromosomes or even extended chromatin fibres (fibre FISH), facilitating physical fine mapping of contigs. In Sanger sequenced genomes, this was accomplished by assigning BAC clones corresponding to individual, large sequence scaffolds. For example, the opossum

genome, with a scaffold N50 (a measurement of assembly quality where 50% of scaffolds are this size or larger) of 59.8 Mb and 97% of the sequence contained in 216 scaffolds, was anchored onto the eight opossum autosomes and the X chromosome by mapping 415 BAC clones [47,48]. However, the proportion of the genome assembly assigned to chromosomes is dependent on the quality of the genome (i.e., N50 size and number of scaffolds). The platypus genome is a prime example, where the high repeat content resulted in a scaffold N50 of 957 kb, and thus only about 21% of the genome was chromosome-anchored [43]. An excellent example of where a cytogenetics approach vastly improved the accuracy of genome assembly is the tomato genome. The tomato genome, sequenced by a combination of Sanger and next generation sequencing technologies [49], benefitted greatly from the physical assignment of sequence scaffolds of BACs by FISH and confirmation by optical mapping [50]. The original tomato assembly was ordered based on a high-density linkage map. Differences in arrangement between the linkage and cytogenetic/optical maps were detected for one-third of these scaffolds, mainly in pericentric regions where a reduced level of recombination renders linkage mapping less reliable [50]. The benefits gained from assigning even a portion of the sequence to chromosomes are immense, as highlighted by the chromosomics successes listed in Table S1.

Many genomes sequenced over the past decade have used a 'shotgun' approach based on short read sequence technologies to produce a series of scaffolds, often several hundred per chromosome, which are neither anchored to, nor ordered on, the chromosomes. Anchoring every scaffold to a chromosome would be a labour-intensive task, particularly if the assembly has a higher number of scaffolds; by combining computational approaches to merge scaffolds with either cytogenetic mapping and/or PCR-based scaffold verification, chromosome-level assemblies are a more achievable exercise [9,51]. In addition, the development of universal BAC clone probe sets that can be used in a high-throughput, cross-species, multiple hybridisation approach are speeding up the process of developing cytogenetic maps [9]. The advances in sequencing technology that are producing more contiguous genome assemblies, such as the contact sequencing approach of HiC-seq (e.g., Dovetail) [17], linked-read sequencing approach (10X Genomics) [12,16], long read technologies like PacBio [13] and Oxford Nanopore [15], and optical mapping (BioNano) [12], combined with high-throughput cytogenetic methods, will place chromosome-level assemblies within reach for many species.

#### **4. The Big Questions in Genome Biology Requiring a Chromosomics Approach**

Despite the meticulous approach taken for the human genome, gaps remained in the genome sequence when the 'finished' euchromatic sequence of the human genome was published in 2004 [52]. These 'black holes' of the genome corresponded to the most repetitive regions such as, but not limited to, the critically important centromeres [53], nucleolar organiser regions (NORs) [54] and the Y chromosome [36]. Repetitive regions are some of the most rapidly evolving sequences and therefore, are among the most interesting regions of the genome. By employing a chromosomics approach, these hot spots of evolution are beginning to lose their black hole status in the human genome, as well as in other species. We discuss the fundamental questions arising from these evolutionary dynamic regions in relation to two overarching themes associated with genome evolution: genome plasticity and sex chromosome evolution.

#### *4.1. Genome Plasticity and Chromosome Evolution*

Why do species have specific karyotypes? Why do chromosome numbers vary greatly within some groups, but are largely the same in others? Why are some of the regions of the genome so well conserved? Why are genomes so extensively changed among closely related species and others strongly conserved? Why do some chromosome rearrangements appear to lead to speciation, yet others are tolerated within a species or population of species? Is the underlying mechanism responsible for chromosomal speciation the same as that leading to chromosomal rearrangements in a disease context (i.e., cancer)? These are fundamental questions regarding genome plasticity that remain unanswered, and a chromosomics approach is essential for major breakthroughs. The answers to these questions will have wide-ranging impacts in the field of biology. Unlocking the genomic basis of speciation is a biological research priority, fuelled by the ongoing debate on species concepts and facilitated by the availability of an unprecedentedly large number of genomic resources.

The concept of chromosomal speciation, at one stage considered a major contributor in separating populations that differ by a structural rearrangement, was virtually abandoned in favour of theories of a gradual accumulation of mutations in 'speciation genes' (e.g., Reference [55]). The implementation of the most recent 'suppressed recombination model' [56,57] has now fuelled the field using a combination of sequence and cytogenetics [58–60]. In this context, chromosome rearrangements could have a minimal influence on fitness, but would suppress recombination, leading to the reduction of gene flow across genomic regions and to the accumulation of incompatibilities.

Understanding chromosomal speciation is also critical to determining the mechanism(s) underlying genome adaptation to environmental factors and how biodiversity is generated and transmitted to subsequent generations. With so many threatened species across the globe, understanding why some structural variants are tolerated within a population while others lead to reproductive isolation could prove important for the management of breeding programs for species conservation programs. In a disease context, a greater knowledge of the drivers of genome instability will aid research into human and animal diseases, particularly cancers.

Regions of genome instability can have dramatic effects for an organism. Despite being the subject of many studies using a range of species, the underlying molecular mechanisms resulting in genome restructuring/reshuffling are relatively poorly understood. For example, it remains unclear if chromosomal changes associated with speciation arise because there is an adaptive value to a specific chromosomal configuration, and what causes the genomic instability in the first place. The combined use of comparative genomics and cytogenetics of both closely and distantly related mammalian species has been extremely useful in defining models that explain genome structure and evolution [61–67]. Such reconstructions have revealed that the genomic regions implicated in structural evolutionary changes disrupt genomic synteny (evolutionary breakpoint regions, EBRs) and are clustered in regions more prone to breaking and reorganisation [61–63]. In searching for the origin (and consequences) of this evolutionary instability, approaches based purely on genome sequence have only revealed that EBRs are enriched for repetitive sequences, including segmental duplications and transposable elements, which provide the templates for non-allelic homologous recombination, resulting in inversions and additional structural changes [68,69]. Likewise, repetitive sequences in centromeric regions have been implicated in illegitimate recombination events forming Robertsonian fusions [62,70,71]. EBRs also typically occur in gene-dense regions, enriched with genes involved in adaptive processes, where changes to gene expression caused by a chromosomal rearrangement may provide a selective advantage [60,63,72]. Consequently, given the diversity of factors associated with EBRs, it is unlikely that the sequence composition of genomes is solely responsible for genomic instability during evolution and speciation.

As chromosomes are more than just DNA sequence, a more comprehensive approach that incorporates global genomic information on recombination rates, chromatin accessibility, gene function data and nuclear architecture is providing more insight into the factors underpinning genome instability. Of course, chromosome-level assemblies are an essential resource for accurate interpretation of the combination of all these data because, without such assemblies, we have a very limited (if any) understanding of the extent of the genomic restructuring that may have occurred between species, or between normal and disease states, that facilitated changes in global genomic features.

Furthermore, it has become clear that the interplay between the organisation of the genome and nuclear architecture is central to genome function [73]. We have seen a rapid evolution of methods by which to analyse genome organisation and nuclear architecture, moving from cytogenetic approaches, providing a direct measurement within individual cells of distances between loci, to chromosome conformation capture approaches (3C, 4C, 5C and Hi-C), which infer the contact among loci, typically in populations of cells rather than single cells [74,75]. However, comparison of the results obtained

from chromosome conformation capture methods and FISH analyses demonstrates that care needs to be taken when interpreting data obtained solely by one method, suggesting that the use of a combined molecular and cytogenetic approach will lead to more accurate 3D models of genome organisation [76,77].

A new model for genome rearrangements, referred to as the integrative breakage model, has recently been proposed, bringing together all of these features [64]. It posits that genome reshuffling permissiveness is influenced by (i) the physical interaction of genomic regions inside the nucleus, (ii) the accessibility of chromatin states and (iii) the maintenance of essential genes and/or their association with long-range cis-regulatory elements (Figure 3). An initial test of the integrative breakage model using rodents has supported this model [65]. EBRs were found to not only coincide with regions enriched for repetitive sequences and genes, especially genes involved in reproduction and pheromone detection, but possessed the characteristics of open, actively transcribed chromatin. The challenge remains to use a broader spectrum of species to fully test this model and dissect the underlying mechanism for chromosomal rearrangements. Such studies will now be possible with the ability to achieve chromosome-level assemblies and obtain information on chromatin modifications and nuclear architecture for non-traditional model species. A similar approach could be extended to intraspecific comparisons, such as normal versus disease samples or samples across a population where structural variants are known.

**Figure 3.** The integrative breakage model, a multilayer framework for the study of genome evolution that takes into account the high-level structural organisation of genomes and the functional constraints that accompany genome reshuffling [64]. Genomes are compartmentalised into different levels of organisation that include: (i) chromosomal territories, (ii) 'open' (termed 'A')/'closed' (termed 'B') compartments inside chromosomal territories, (iii) topologically associated domains (TADs) and (iv) looping interactions. TADs, which are delimited by insulating factors such as CTCF and cohesins, harbour looping topologies that permit long-range interactions between target genes and their distal enhancers, thus providing 'regulatory neighbourhoods' within homologous syntenic blocks (HSBs). In this context, the integrative breakage model proposes that genomic regions involved in evolutionary reshuffling (evolutionary breakpoint regions, EBRs) which will likely be fixed within populations are (i) those that contain open chromatin DNA configurations and epigenetic features that could promote DNA accessibility and therefore genomic instability, and (ii) that do not disturb essential genes and/or gene expression.

#### *4.2. Sex Chromosome Evolution: Genetics and Epigenetics*

Sex chromosomes represent one of the most dynamic parts of any genome, as they are highly variable in morphology and sequence content across the plant and animal kingdoms. The special evolutionary forces experienced by sex chromosomes have rendered them highly complex entities within the genome; thus, it remains a challenge for evolutionary biologists to disentangle the varied mechanisms involved in their evolution. There are still many fundamental questions that remain

unanswered because of the genomic black hole status of sex chromosomes. Why do sex chromosomes evolve and degenerate in some species but not in others? Why do sex chromosomes have a propensity to accumulate repetitive sequences that, in most cases, are species-specific? How do sex chromosomes drive speciation and hybrid incompatibilities? Why do sex chromosomes vary within a species? Why do some species have complete dosage compensation mechanisms while others do not? The complexity of sex chromosome origin, evolution and gene organisation is multilayered and cannot be understood by studying a single aspect of its biology alone. Therefore, a chromosomics approach, taking into account all aspects of cellular and molecular biology, will be essential to answering these questions.

Many genome projects were undertaken without considering the sex chromosomes, with most projects intentionally choosing to sequence the homogametic sex in order to obtain higher sequence coverage and better assembly of the X or Z chromosome, completely neglecting to obtain sequence for the Y or W, thus ignoring the complexities of sex-delimited sex chromosome variation. Simple karyotyping with basic banding analysis, or painting one sex chromosome onto the other, can be very informative about the DNA content of the sex chromosomes. Such experiments can provide valuable information to support the adoption of appropriate sequencing technologies to obtain sequences from those unique but difficult to sequence regions of the genome. In a recent review, Tomaszkiewicz et al. [78] highlighted the need to sequence sex chromosomes, and elegantly described challenges and opportunities for combining new and emerging technologies to sequence these difficult regions of the genome. Only a chromosomics approach, combining cytogenetics and appropriate sequencing platform(s), can answer the fundamental questions regarding sex chromosome evolution. As an example, a human Y chromosome of African origin was recently assembled by flow sorting nine million Y chromosomes and sequencing using the Oxford Nanopore MinION platform, resulting in a Y chromosome assembly with an N50 of 1.46Mb [79], yet this method was much more time- and cost-efficient than that used to obtain the original human Y chromosome sequence [36,80].

Determining the epigenetic status of the sex chromosomes and the genes they contain is also extremely valuable. For example, the Chinese half-smooth tongue sole (*Cynoglossus semilaevis*) is a species with genetic sex determination (ZZ males and ZW females), but with a temperature override mechanism, where exposure of developing embryos to high temperatures causes genetic ZW females to develop as males (sex reversal). The sex determining gene *dmrt1* [81] is epigenetically silenced by DNA methylation in ZW females, but not in sex-reversed ZW males, where *dmrt1* expression is upregulated, leading to initiation of the male development pathway [82].

Dosage compensation, a mechanism equalizing the expression of genes on the sex chromosomes between males and females, is epigenetically controlled. A comparison of the gene content of the X chromosomes of eutherians and marsupials would suggest that a dosage compensation mechanism, in the form of X chromosome inactivation, may be shared between these two mammalian groups, as the X chromosome of marsupials is homologous to approximately two-thirds of the X of eutherians [83]. However, epigenetic analyses point to an independent evolutionary origin of X chromosome inactivation in marsupials and eutherians [84]. Similarly, there are striking differences in the extent and mechanisms of dosage compensation between more divergent taxa. For example, *Drosophila melanogaster* increases X chromosome transcription by the binding of Male Specific Lethal (MSL) complex to the single X chromosome in males to achieve dosage compensation [85]. In contrast, many species, including insects, fishes, birds, reptiles and platypus, have incomplete dosage compensation [86]. Reports of incomplete dosage compensation have most often relied purely on a sequence-based approach to measure the average transcriptional output of the X or Z chromosome between males and females for a population of cells, which does not afford an understanding of the mechanisms that impact differential transcription. However, examination of individual cells and measures of gene copies from the two Z or X chromosomes using a technique detecting nascent transcription (RNA-FISH) provides information on a single cell basis. For example, in the homogametic sex of chicken (*Gallus gallus*) and platypus, RNA-FISH detected a portion of cells expressing a gene from one copy of the X/Z, while a portion was

expressed from both copies, which explains the incomplete dosage compensation pattern observed by transcriptome approaches measuring population of cells [87–89].

Sex chromosomes also have an impact beyond simply facilitating sex determination. In *Drosophila*, for example, polymorphisms in repetitive sequences on the Y chromosome influence gene expression of genes across the genome, particularly those involved in chromosome organisation and chromatin assembly [90,91]. Essentially, the polymorphic Y chromosome is a source of epigenetic variation in *Drosophila*. This epigenetic variation has implications for speciation. Engineered species hybrids showed either reduced fertility or rescued fertility depending on the origin of the Y chromosome and grandparental genetic background of the hybrid, suggesting that the Y chromosome may contribute to reproductive isolation [92]. The regulatory effect of the Y chromosome on gene expression is not limited to *Drosophila*, but has been demonstrated, at least for immune-related genes, in humans and mice [93,94]. This regulatory role for the Y highlights the importance of ascertaining not only the DNA sequence, but also the epigenetic status of sex chromosomes, in addition to, and in the context of, the rest of the genome.

#### **5. Challenges Ahead for Chromosomics**

Chromosomics approaches can have far greater success for answering fundamental biological questions than either genomics or cytogenetics approaches alone; this begs the question: why haven't these two fields merged more extensively? We have identified three major challenges that may be preventing a closer union of these fields.

The biggest challenge for chromosomics is the dwindling number of researchers worldwide with expertise in cytogenetics. Rejuvenating the training of cytogeneticists is essential if the potential of chromosomics as a field is to reach its full potential. At a "Cytogenetics in the Genomics Era" workshop held in 2017 at the University of Canberra, we identified a need to renew excitement in chromosomes among undergraduate and graduate students worldwide. Genetics and genomics courses are often taught by those who have little experience in or appreciation for chromosomes, perhaps leading to anxiety in students around the study of chromosome biology. The origin of this mismatch can also be found in the backgrounds of leaders in the fields of DNA sequencing and of cytogenetics. Genome sequencing researchers often have backgrounds in biochemistry, and their training may be entirely devoid of exposure to genetics. In contrast, those who gravitate to chromosome work often have a background of zoology or botany, and may perhaps have had little exposure to biochemistry. At a time when we are more aware than ever before of the important role genome organisation and nuclear architecture plays in genome function, it is imperative that students gain an understanding of, and appreciation for, the basics of chromosome biology from their first introduction to the world of genetics and genomics. This early introduction needs to be followed by reinforcement throughout their studies. Some good courses in integrated cell and molecular biology would be a step in the right direction.

Furthermore, graduate students have been attracted to the rapidly advancing world of genome sequencing, where mountains of data are now being rapidly and cheaply generated, as opposed to cytogenetics projects, where data are accumulated more slowly. We need to instil in students the incredible experience of observing the amazing structure of chromosomes under a microscope, and the importance of understanding chromosome structure and continuing to develop more high-throughput approaches to cytogenetics to keep pace with the rapidly advancing world of genome technology. The genomics field over the past decade has made major technological advances in obtaining genome sequences faster and more cheaply, driven mostly by the large community in this field. Advancements of a similar level in cytogenetics will require a larger community of researchers to drive the need for the technology. By increasing the training of researchers in cytogenetics, we not only increase the uptake of chromosomics, but generate a potential pool of people able to develop technologies to achieve cytogenetic analyses fast and cheaper.

A challenge for cytogeneticists is that chromosome work, which used to be the cheapest aspect of a genome project, is now the most expensive. New sequencing technology has brought down the cost of sequencing by six orders of magnitude, and the speed at which data are generated has dramatically increased. Technical innovation in cell biology, in contrast, has greatly magnified costs. High throughput really does not apply to chromosome observation or experimentation.

The second most challenging skills area for chromosomics is the need for sophisticated bioinformatics. Whole genomes can now be rapidly sequenced, but assembling the sequence is much slower, and more labour- and computationally-intensive. Likewise, overlaying genome sequence with 3D chromatin structure data often presents a computational challenge [95]. More importantly, the incorporation of cytogenetic information with genomic data is not commonly attempted. For chromosomics approaches to be more readily applied in the future, we require more bioinformaticians to be trained generally in genome assembly, as well as with an appreciation for cytogenetics.

Another challenge is ensuring samples, whether from wild species, laboratory species or clinical specimens, are collected appropriately for cytogenetic analysis. The collection of samples for DNA analysis is now routine, but the collection of material for the combination of cytogenetic and genomic analysis is not. The special requirements of samples collected for cytogenetic analysis need to be disseminated more widely. This is a relatively easy challenge to address by making a 'field guide to chromosomics' available to field researchers and a similar one to those working in a clinical setting, detailing how samples should be collected and stored for the implementation of chromosomic techniques. With the appropriate samples, a chromosomics approach could be employed to study structural and epigenetic variation at population or biogeography levels, where there is the potential to uncover the underlying genetic or epigenetic basis for adaptation to a particular environment. These data could have an impact on the conservation and management of threatened species and lead to a greater understanding of the factors underlying disease phenotypes.

#### **6. Future Opportunities**

#### *6.1. Sequencing Genome Black Holes*

Resolving highly repetitive regions of genomes, i.e., the black holes, is now possible, and we will soon have the capability to explore these previously under-represented regions of the genome to more fully understand their evolution and function. An important step towards understanding the evolution and function of repetitive regions has been the development of tools able to analyse and visualise these regions of the genome, such as RepeatExplorer [96]. Furthermore, sequencing of highly repetitive regions is now possible with long-read sequencing technology. For example, the centromeric sequence of the human Y chromosome has recently been assembled [97], demonstrating that nanopore long technology has the potential to fill in genome black holes. Likewise, combinations of different approaches where individual sex chromosomes are sequenced are proving successful in resolving at least the non-repetitive regions of sex chromosomes and identifying candidate sex-determining genes [98].

#### *6.2. Spatial Chromosome Organisation*

The importance of the territorial organisation of chromosomes (chromosome territories) in plant and animal cells was proposed over a century ago by several cytologists (reviewed in Reference [3]). We are currently at a stage where the capacity to study the changes in spatial organisation in a population of cells, or even a single cell, is possible. The recent advances in chromatin analysis, coupled with next generation sequencing (e.g., Hi-C, Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET)) and 3D and 4D FISH, live-cell and super-resolution microscopy, provide opportunities to garner a more comprehensive understanding of chromosomal activities within the nucleus contributing to gene regulation, expression and ultimate phenotypic outcomes of an individual [22–25,99]. Such a

chromosomics approach is already underway for the human genome with the launch of the 4D Nucleome project to understand how the changes in chromosome dynamics contribute to gene regulation in different cell types and biological states [99]. We will gain unprecedented insight into the role of the spatial organisation of chromosomes in genome evolution by extending this same approach to many more species.

#### **7. Concluding Remarks**

We are on the verge of an exciting new revolution in biology, with a change from thinking of genomes as one-dimensional entities to defining the ways every component of the genome is packaged and changes through space and time. Chromosomics is the best path forward, providing one comprehensive analysis to answer complex questions in evolution and disease contexts. However, this can only be achieved if genomicists, cytogeneticists, cell biologists and bioinformaticians commit to forming a closer union for advancing this new era in genome biology.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/10/8/627/s1. Table S1: Examples of research questions answered with a chromosomics approach.

**Author Contributions:** The concept and idea of this review was developed during the workshop "Cytogenetics in the Genomics Era" organised by T.E. and J.E.D. at the Institute for Applied Ecology, University of Canberra, February 2017. J.E.D. and T.E. prepared the first draft after S.P., R.O., A.R-H., M.B.C., M.D.B.E., K.F., J.A.M.G., D.G., F.G., L.K., I.M., M.R., K.S. and E.W. contributed to drafting the manuscript outline. JED, S.P., R.O, A.R-H., M.B.C., M.D.B.E., K.F., J.A.M.G., D.G., F.G., L.K., I.M., M.R., K.S., E.W. and T.E. contributed to writing and revising drafts of the manuscripts. J.E.D., S.P. and A.R.-H. prepared figures. All authors approved the final version.

**Funding:** The workshop was funded by Institute for Applied Ecology, University of Canberra strategic funds awarded to T.E. and J.E.D.

**Acknowledgments:** We thank Arthur Georges, Craig Moritz, Stephen Sarre who contributed to discussions as part of the "Cytogenetics in the Genomics Era"workshop.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Decoding the Role of Satellite DNA in Genome Architecture and Plasticity—An Evolutionary and Clinical A**ff**air**

**Sandra Louzada 1,2, Mariana Lopes 1,2, Daniela Ferreira 1,2, Filomena Adega 1,2, Ana Escudeiro 1,2, Margarida Gama-Carvalho <sup>2</sup> and Raquel Chaves 1,2,\***


Received: 16 December 2019; Accepted: 8 January 2020; Published: 9 January 2020

**Abstract:** Repetitive DNA is a major organizational component of eukaryotic genomes, being intrinsically related with their architecture and evolution. Tandemly repeated satellite DNAs (satDNAs) can be found clustered in specific heterochromatin-rich chromosomal regions, building vital structures like functional centromeres and also dispersed within euchromatin. Interestingly, despite their association to critical chromosomal structures, satDNAs are widely variable among species due to their high turnover rates. This dynamic behavior has been associated with genome plasticity and chromosome rearrangements, leading to the reshaping of genomes. Here we present the current knowledge regarding satDNAs in the light of new genomic technologies, and the challenges in the study of these sequences. Furthermore, we discuss how these sequences, together with other repeats, influence genome architecture, impacting its evolution and association with disease.

**Keywords:** satellite DNA; genome architecture; chromosome restructuring; Robertsonian translocations; satellite DNA transcription

#### **1. Introduction**

The linear organization of DNA sequences in the genome and how these sequences are packed into chromosomes define their architecture and influence its evolution. Repetitive DNA represents a major organizational component of eukaryotic genomes and includes sequences dispersed throughout the genome like transposable elements (TEs) and tandemly repeated sequences, such as satellite DNA (satDNA) [1,2]. Together with TEs, satDNAs contribute significantly to the differences in genome size between species, accounting for more than 50% of some species total DNA [3]. SatDNAs can be found in varied locations in the chromosomes, such as pericentromeric, subtelomeric and interstitial regions, forming blocks of constitutive heterochromatin (CH) [2–7] that are part of vital structures like centromeres and telomeres [2]. However, satDNA location is not restricted to CH with some satDNAs being found also dispersed throughout euchromatic regions in different species [5,8]. Multiple lines of evidence show that satDNAs have key roles in centromere function, heterochromatin formation and maintenance and chromosome pairing [9–12]. Interestingly, despite their association to critical chromosomal structures, satDNA families can display an astounding sequence variation even among closely related species. This results from their highly dynamic behavior, leading to rapid changes in sequence composition and array size within short evolutionary periods, which can lead to speciation

(reviewed in [13]). Moreover, these sequences have been consistently correlated with fragile sites and evolutionary breakpoint regions in diverse species [14–19] and are intrinsically involved in frequent chromosomal rearrangements like Robertsonian translocations [20,21]. SatDNA dynamics has been shown to promote genome plasticity and to have an active involvement in the modulation of genomic architecture by promoting rearrangements.

Nevertheless, some satDNAs seem to have been preserved or "frozen" across different taxa during long evolutionary periods [22–24] with some of them being transcribed into satellite non-coding RNAs (satncRNAs). Indeed, transcripts of satDNAs have been reported in different species, highlighting a possible role for satncRNAs in the regulation of gene expression, cancer outcomes and aging [25–27]. This suggests that functional constraints may be causing the preservation of these sequences over the time [24,28]. Accordingly, some species centromeric satDNAs have been found to share a 17 bp motif known as the centromere protein B (CENP-B) box, representing the binding site for centromere protein B (CENP-B) [29,30]. It has been demonstrated that the CENP-B box is required for de novo centromere chromatin assembly and CENP-B protein is involved in centromere functions [31]. In this case, the conservation of a sequence motif across diverse mammalian species satDNAs [32] seems to be related to a specific function.

Over the years, different techniques have been used to address satDNA sequences. The advances in sequencing technology and computational approaches have revolutionized the study of these regions, known as the "black holes" of the genome. The increasing number of studies assessing the genomic abundance and sequence variation of satDNAs in different species has led to the coining of new terms to describe the whole collection of repeats (repeatome) and satDNAs (satellitome) in a species genome [33,34], and contributed to improve our knowledge regarding the evolution and function of these sequences [35].

In this review, we contextualize satDNA sequences in the genomes/chromosomes of different species in the light of recent data provided by new technologies and bioinformatic tools and the challenges of studying these DNA sequences and their associated non-coding RNAs. We also discuss the contribution of repetitive sequences to the organization of genomes and their participation in the restructuring of species karyotypes during evolution, focusing on their involvement in rearrangements with evolutionary and clinical significance: Robertsonian translocations. Finally, we address the structural role of satDNA transcripts in the genome.

#### **2. SatDNA Features and Organization in the Genome and Chromosomes: Emerging Technologies and Changing Concepts**

The concept of satDNA suffered considerable changes through time. Early experiments historically coined the term "satellite DNA" referring to tandemly arranged sequences that formed satellite bands separate from the rest of the genomic DNA during density gradient centrifugation [36]. Given that no function was initially attributed to these sequences, they were considered as genomic "junk", representing parasites proliferating independently in the genomes [37]. Today, satDNAs are viewed as important genomic functional components. In order to understand participation of these sequences in genome architecture and evolution, we need to briefly address their organizational features, localization and mode of evolution.

SatDNA is typically organized as long arrays of head-to-tail linked repeats and usually present in the genomes in several million copies [1]. The length of the repeating unit (monomer) can range from a few base pairs up to more than 1 kb, forming arrays that may reach 100 Mb in length (reviewed by [38]), and that can form higher-order repeat (HOR) units (e.g., [39–41]). Human chromosome centromeres are populated by α satDNA (α*SAT*) organized in HORs that are structurally distinct and confer chromosome specificity [39,42]. Complex HORs have been found in non-human mammals such as insects, mouse, swine, bovids, horse, dog and elephant (reviewed in [43]), and more recently in Callitrichini monkeys [44] and Teleostei fish [45]. SatDNA arrays are mainly found clustered in heterochromatin, although studies also report the presence of short satDNA arrays dispersed along

euchromatic regions [2–7]. These sequences can be found in varied locations in the chromosomes, such as pericentromeric, subtelomeric and interstitial regions [2,46–48], as well as being part of vital structures like centromeres and telomeres [2].

Usually more than one family of satDNAs can be found in the same genome, thus forming a library, which can be shared among closely related species. The satDNAs within the library may differ in monomer sequence, size, abundance, distribution and location (reviewed in [12]). Expansions and contractions of satDNA arrays can dramatically change the landscape of repetitive sequences, leading to significant differences of satDNA copy number among related species [49,50]. That is the case of the *Drosophila* genus, which contains very dissimilar satDNAs, varying from 0.5% in some species genomes to as high as 50% in others [51,52]. Such striking differences in satDNA abundance in *Drosophila* sp. were proposed to result predominantly from lineage-specific gains accumulated over the past 40 MY of evolution [53], ultimately causing species reproductive barriers [54,55].

The mechanisms proposed to be responsible for the amplification/deletion of repetitive DNA, consequently leading to their rapid evolutionary turnover, are unequal crossing over, replication slippage and rolling circle amplification [56]. SatDNA sequence divergence among species is quite variable, as some repeats are species-specific, while others are widely conserved, being shared across distantly related species [22,24,57]. SatDNAs have a unique mode of evolution, known as concerted evolution, a two-level process in which mutations are homogenized throughout monomers of a repetitive family and concomitantly fixed within a group of reproductively linked organisms [58,59].

The study and characterization of satDNA has lagged behind when compared with other genomic sequences. Throughout time, different methodological approaches have generated insights into the structure, organization, function and evolution of these sequence elements, although this characterization has been significantly hampered by their highly repetitive nature. The advent of high-throughput sequencing technologies and associated bioinformatics tools opened the door to whole genome sequencing projects, and as the technology became more robust and cheaper, the number of sequenced species increased exponentially. In 2018, the Earth BioGenome project was launched, aiming to increase the number of sequenced eukaryotic genomes from 2534 species (of which only 25 comply with the standard for contig and scaffold N50 established by the Genome 10K organization) to characterize the genomes of the 1.5 million known species within a 10 year time frame [60]. Of note, satDNA, as well as other repetitive sequences, have been systematically omitted from the genome projects, due to difficulties in sequence alignment and assembly, given that the read length of current sequencing technologies is unable to span the longer repeats and tandem arrays [61,62]. Nevertheless, high-throughput sequencing contributed significantly to increase our knowledge regarding satDNA sequences [63]. Next generation sequencing (NGS; e.g., Illumina), allied to newly developed bioinformatics tools capable of identifying satDNA sequences in unassembled data (e.g., RepeatExplorer) [64–66], helped uncover the extent of satDNAs present in the genome of different species, revealing unpredicted levels of satDNA diversity (e.g., [34,67–71]). For instances, 62 satDNA families were identified in the genome of the migratory locust, leading to the coining of the term 'satellitome' to refer to the whole collection of satDNA families found in a single genome [34], a part of the 'repeatome', a term proposed previously [33] to refer to the collection of all repetitive sequences in a genome (TEs, satDNAs, etc.). This number has been surpassed by a recent study where 164 satDNA families have been identified in Teleostei fish, being this the biggest satellitome characterized for a given species so far [70]. The availability of a methodology capable of assessing satDNA array abundance and diversity led to an explosion of comparative studies across a wide range of clades, including mammals, insects and plants (e.g., [44,45,69,71–73]) providing insights into these sequences.

The development of sequencing technologies that generate long-range data has allowed the community to overcome some of the limitations imposed by NGS and is fueling the study of repeats. Single-molecule real-time sequencing and nanopore sequencing technologies (commercialized by PacBio and Oxford Nanopore Technologies (ONT), respectively) can generate longer reads capable of

spanning repetitive regions, thus enabling their assembly into contigs (reviewed in [62]). For instances, ONT nanopore sequencers have been shown to generate unprecedented ultra-long reads that can reach mega-base lengths, leading to significant improvements in the human genome assembly [74–77], with some of the repetitive-containing gaps being closed [78,79]. By using long-read methods we are gaining access to important repeated-rich structures, like centromeres, revealing further insights into their sequence content and structure [80]. For instances, *Drosophila* centromeric satDNAs were recently shown to be intermingled with TEs [81]. Other recent studies report the improvement of human Y chromosome centromere assembly [78] and the reconstruction of a 2.8 megabase centromeric satDNA array, with the potential to achieve for the first-time telomere-to-telomere sequencing of the X chromosome [79].

Several studies demonstrate that the combination of different high-throughput sequencing methods (e.g., Illumina, ONT and PacBio) with other techniques, such as optical mapping, cytogenetics and molecular techniques, is beneficial and sometimes essential to determine satDNA features. The use of PacBio long-read sequencing together with optical mapping proved to be helpful in the assembly of satDNA arrays with large monomers and provided insights regarding recombination rates in the Eurasian crow [82]. Positional data derived from fluorescent in situ hybridization (FISH) remains vital to determine the physical location of satDNAs, since such information cannot be achieved for genomes that have not yet been properly assembled (e.g., [34,44,71,81,83]), and sequences mapping by FISH on extended DNA fibers can provide significant assistance to the process of genome assembly, aiding in contig ordering (e.g., [84,85]). Improved techniques based on FISH, helped shedding light into repetitive-rich chromosome regions with centromeric function (e.g., [86]). Other methods have also shown to provide a valid and expedite analysis of repetitive sequences profile, such as PCR-based approaches, that have been used to determine satDNA copy number differences between healthy and cancer cells/tissues [87]. In particular, the use of droplet digital PCR (ddPCR) combined with other methodologies has contributed to the validation and quantification of rare retrotransposon insertion events in different tissues including tumors [88] and the detection and accurate quantification of human *SATII* ncRNA in cancer patients [89]. The integration of genomic, cytogenetic and cell biology data helps to establish a connection between sequence information, its localization in the chromosomes and their interaction with other components of the genome, defining the field of chromosomics [90]. We believe that this approach is essential to fully understand the organization of repetitive sequences.

Other aspects of satDNA biology are also becoming accessible through the use of recent methodologies, such as the characterization of their expression and chromatin state, namely by using RNA sequencing (RNA-seq) and chromatin immunoprecipitation approaches followed by DNA sequencing (CHIP-seq) [91,92]. In particular, for CHIP-seq experiments several studies report the use of a specific antibody against DNA binding centromere-specific histone H3 (CENH3), which is an ortholog for human CENP-A. This methodology has proven to be useful for clarifying the satDNA content in the centromere, improving some organisms reference sequence and uncovering satDNA variability (e.g., [93,94]).

The data generated is now being used to determine satDNA sequences organization in the genome [95], explore predicted evolutionary patterns and hypothesis (e.g., [35,68,96,97]), as well as to shed light into the function of these sequences [81,98]. We are now closer than ever to fully access the sequence information hidden within repetitive-rich chromosome structures like centromeres and telomeres. However, we still need to further develop and adapt currently available approaches to achieve a combination of genomic, cytogenetic and molecular techniques to optimally address these regions, which we propose could be referred to as centrOMICs and telOMICs (Figure 1). SatDNAs represent one of the most intriguing and also interesting components of the genome and their full characterization will help us to better understand genome organization, architecture and evolution.

**Figure 1.** Challenges in the study of satellite DNA (satDNA) sequences and the importance to fully understand the repetitive genomic fraction. SatDNAs can be found clustered at the centromeres, telomeres and forming interstitial heterochromatin (CH) blocks, as well as scattered (interspersed) throughout the chromosomes. The full characterization of satDNAs needs to be addressed in two levels: 1-Disclose satDNAs linear sequence and improve their representation in genome assemblies. Despite currently used sequencing strategies (e.g., next generation sequencing (NGS)) contributed for satDNA studies, the full characterization of these sequences will only be achieved by using sequencing technologies capable of long reads, bioinformatics pipelines suitable for highly repetitive sequences, together with other techniques (e.g., FISH, optical mapping). These strategies need to be directed to specific chromosome structures such as centromeres (centrOMICs) and telomeres (telOMICs), which harbor large amounts of satDNA. Important also is the integration of genomic data with sequence localization in the chromosomes, and their interaction with other components of the genome (chromosomics); 2- Clarify satDNAs function(s) in the genome by studying the satellite non-coding RNAs (satncRNA) and their interaction with other components and structures in the genome. In this field there is the need to develop adequate biology techniques to address repetitive sequences transcription study. The disclosure of satDNA sequences will help to better understand its genomic architecture ant its role in genome restructuring in evolution and disease.

#### **3. Modulating Genome Architecture with SatDNAs**

The architecture of genomes confers identity to species. From a generalist point of view, the genomic architectural configuration is the product of a series of sequential molecular events that occurred during the evolutionary process. The impact of these events on genome organization is reflected by chromosome size, number and morphology. Eukaryotic genomes, and particularly, karyotypes, can be viewed as a set of homologous chromosomes, each harboring a combination of syntenic

blocks—conserved blocks that can be differently assembled between species [99]. The events with capability for shaping genomes are based on structural and quantitative chromosomal alterations (e.g., [100]) of variable dimensions, from small to large regions that may completely change the morphology and number of species chromosomes and karyotypes. Amongst these, chromosome fusions (i.e., Robertsonian translocations), fissions (reviewed in [99]) and inversions [101], are perhaps the ones with a stronger impact on the architectural appearance of genomes during species evolution.

Chromosome structural variation may originate from illegitimate non-homologous recombination between different chromosome fields, such as centromeres, chromosome arms and telomeres during meiosis, requiring double strand breaks in at least two chromosomes or chromosome regions [102–105]. The resulting rearranged chromosomes are transmitted either as potentially harmful alterations, or as new variants associated with a selective advantage that will eventually conduct to speciation [99,106,107].

Even before the routine use of advanced molecular technologies, cytogeneticists could realize that the regions where chromosomes break and rearrange (the so-called chromosomal breakpoints) were enriched in constitutive heterochromatin, evidenced by C-bands [57]. Molecular technologies demonstrated that evolutionary breakpoint regions are composed of repeats [107–111]. The involvement of repetitive sequences, including TEs (e.g., [112–115]), segmental duplications (e.g., [108,110,116]) and tandem repeats (e.g., [14–17]), in genome restructuring and evolution is now widely recognized.

The evolutionary rate of tandemly repeated satDNA was shown to be higher than in other genomic sequences, presenting significant changes in short evolutionary times. It is thought that the mechanisms leading to the rapid turnover of these sequences promote chromosome rearrangements and consequently contribute to re-shaping of the genomes. Unequal crossing-over events seem to be responsible for the rapid evolution and divergence found among satDNA families, specifically at the levels of monomer length, nucleotide sequence, complexity and copy number [1,14,49,117,118]. DNA polymerase slippage during DNA replication and recombination in meiosis caused by faulty alignment of repetitive elements further contributes to the instability of these repeat rich regions and to chromosome rearrangements (e.g., [107,119,120]).

SatDNAs can display complex structural organization resulting from the formation of secondary DNA structures, including hairpins, triplexes [121] and even tetraplexes (G-quadruplexes) [122,123]. The formation of such structures can cause problems during genome duplication in the S-phase by slowing down or even stalling the replication fork, resulting in double-strand breaks [124,125]. This damage is then targeted for repair by means of homologous recombination-based mechanisms, which may lead to chromosome and genome architecture alterations due to the selection of identical sequences in non-homologous regions as the template for repair [1,19,126].

Several studies document the presence of TEs intermingled with centromeric satDNA [127,128], in some cases forming complex structures [24,129–131]. TEs are highly represented in some vertebrate species, making up to 60% or more of their genomes. They are characterized by their mobility within genomes using either a direct cut-and-paste mechanism to alter their position (transposons) or requiring an RNA intermediate (retrotransposons) [127,132]. This intrinsic feature makes them active elements of the genome and has been associated with genomic instability. TEs may cause double strand breaks, not only during the transposition process itself, but also by TE–TE ectopic recombination, which may lead to chromosomal rearrangements and consequently to alterations in the genome architecture [133–136]. The integration of TEs in the genome may also result in the disruption of a functional DNA sequence (reviewed in [128]), which can have adverse consequences. Together with segmental duplications, these elements share a high degree of similarity between different intra- and inter-chromosomal regions, making them the perfect templates for non-allelic homologous recombination [137–139]. TEs dynamics has shown to be linked with satDNA origin and evolution. Evidences suggest that some mobile elements may lead to the generation of new repetitive sequences that can be amplified into long arrays of satDNAs [140]. Moreover, it has been suggested that the autonomous LINE-1 retrotransposons could enable amplification and intragenomic movements of satDNA sequences throughout the genome [141]. It thus seems plausible to think that

TEs, especially retrotransposons may, in fact be an adjuvant for satDNA evolution and consequently lead to the creation of genomic innovations.

The dynamic nature of repetitive elements is clearly a basilar reason for genomic plasticity (e.g., [14,102,142,143]) and it is in fact a way of having a low impact on the euchromatic genome [14,97]. Today, an increasing body of evidence strongly validates the involvement of satDNA in the modulation of genomic architectures of a large number of taxa, as in the case of bovids [21,104,144,145], rodents [17,111,143,146], suiformes [57] or genets [147]. This largely extends beyond mammalian evolution, as it can also be observed in insects (e.g., [54,148]), reptiles (e.g., [149]), plants [150] and many other lineages. SatDNAs and TEs can thus be considered the 'engine' triggering genome evolution [14,107], with the regions harboring these sequences functioning as 'hotspots' or fragile sites for structural chromosome rearrangements, leading to species-specific genome architectures [14,17,57,107,139,143,151] and contributing to the generation of key variations responsible for the success of vertebrates [152].

#### *3.1. Repetitive Sequences, Chromosome Instability and Disease*

Alterations of genomic architecture can also be pathogenic and have a detrimental effect in organisms, either if occurring at the germinal lineage or somatically. This is the case of many diseases caused or boosted by genomic instability that impacts on nuclear architecture, such as cancer, neurodegenerative disorders and other genetic diseases [153–155]. In fact, alterations in genome architecture can interfere both with chromosomal territories and with topological positioning of chromosomes and genes in the nucleus. Due to the constraints in the regulation of genes and gene networks and to differences in somatic mutation frequencies between genome regions located at the nuclear periphery or core (higher in the periphery) [156], structural variations of critical genome regions can in fact threaten normal cell function. Again, in these situations, repeats seem to be the main actors at play [125,128,153,157]. The repetitive fraction of the eukaryotic genome and in particular, of the mammalian genome, is usually methylated and repressed by a highly condensed chromatin state, which seems to be essential to maintain genome integrity (reviewed in [158,159]). When perturbation of the epigenetic landscape of specific genomic regions occurs (e.g., [160]), repeats that are usually silenced can become active and unconstrained, which may lead to mobilization (in the case of TEs) and an open chromatin state that allows the occurrence of double strand breaks at fragile or hotspot regions. This results in chromosome rearrangements with impact on the three-dimensional genome architecture and gene expression regulation [100], which may lead to disease onset and progression.

#### 3.1.1. Remodeling Genome Architecture Through Robertsonian Translocations from a SatDNA Perspective

The most frequent rearrangements occurring in genomes are Robertsonian translocations (rob). These rearrangements are commonly found in two different genomic scenarios: as an evolutionary rearrangement involved in mammalian karyotypic evolution; and as a chromosomal abnormality with clinical/polymorphic meaning [161,162]. The occurrence of Robertsonian translocations involves a break near or at the centromeric region, followed by the fusion of the entire long (q) arms of two acrocentric chromosomes, forming a dicentric or monocentric chromosome. The associated breakpoints, as well as the subsequent mechanistic steps, have been shown to involve reorganization of satDNA sequences at the centromere level [14,20]. The illegitimate recombination between homologous sequences, such as satDNA on non-homologous chromosomes, has been suggested as a possible path for the occurrence of Robertsonian translocations in mice and humans [163,164]. In fact, the high frequency of rob chromosomes linked to genome remodeling events can be caused not only by the homology of the satDNA sequences shared by the acrocentric chromosomes involved in each translocation, but also by the nicking activity of the centromere protein B (CENP-B), originating the double-strand breaks that precede the fusion events [165]. Robertsonian translocations are complex rearrangements that require, in addition to the double-strand breaks, mechanisms of repair, the silencing of possible additional

centromeric sequences and the adjustment of the amount of CH/satDNA over time, in order to maintain chromosome viability [20,162]. This assigns a primordial task to satDNA in the control, success and viability of Robertsonian translocation events [14,162].

One of the well-known examples of the dual character of these rearrangements is the rob (1;29), which assumes a special relevance as it is the most widespread chromosome rearrangement occurring in domestic cattle with clinical significance [166–169]. In parallel, the rob (1;29) is also a constitutional chromosome rearrangement fixed in several wild bovid species, such as most of the Tragelaphini [170].

The analyses of the sequences at the breakpoint regions preceding a translocation are of great importance in understanding the translocation event [21,131]. These sequences are essentially centromeric satDNAs, whose detailed physical and organizational analysis contributed much to better comprehend the chromosomal mechanism behind the rob (1;29) translocation [20,145]. In 2000, Chaves and colleagues suggested that this chromosomal abnormality might not be a single event [144] and in 2003, using centromeric satDNA sequences, the same group proposed, for the first time, a two-step mechanism for this rearrangement [20]. This translocation mechanism involved, besides the centric fusion of the two acrocentric chromosomes, the loss and reorganization of specific satDNA families that were retained in the translocated chromosome [20]. Later, Di Meo and colleagues [145], using both satDNA and BAC probes, validated the pericentric inversion previously proposed [20]. This event would probably be necessary for satDNA reorganization at the centromeric level, highlighting the active role of satDNA sequences in the translocation mechanism and reinforcing their functional relevance in genome reorganization [14,20].

In humans, the Robertsonian translocations are also the most common structural chromosome abnormality [171,172], with rob (13;14) and rob(14;21) being the most frequent examples [163]. During several decades, aspects such as the high frequency of de novo robs in the human population, their origin during oogenesis, and the non-random participation of the acrocentric chromosomes, have supported the hypothesis that there must be a specific mechanism leading to the formation of these robs [163,164,173]. However, and despite the high frequency of these rearrangements and their clinical implications, there is still insufficient information on the molecular mechanism and exact genomic location of the breakpoints [174]. The rob translocation event has been deeply connected with satDNA sequence homology and consequent recombination [174] giving rise to two alternative explanations: (i) the presence of a homologous inversely-oriented segment on chromosome 14 shared with chromosomes 13 and 21 [163,175]; (ii) the human satellite DNA *SATIII* ability to form uncommon DNA structures that could facilitate the illegitimate recombination [176,177]. However, these hypotheses need further research to be validated. Indeed, in the study of Robertsonian translocations, finding the breakpoint location is a problematic task due to the low resolution of the physical maps at the centromere and short arms of the acrocentric chromosomes [174]. Highly repetitive satDNA undoubtedly represents a major gap in the current human genome assemblies, significantly contributing to the lack of high-resolution sequencing studies in the field of centromere genomics [74,178].

#### **4. Transcribing SatDNAs: Targeting Genomic Functions**

The previous sections highlight the role of satDNA sequences in genome architecture and in specific chromosomal rearrangements. However, the participation of these sequences in shaping genome architecture goes beyond their DNA molecule. Currently the transcription of satDNA is a widely accepted feature across species. Different functions have been assigned to satellite non-coding RNAs (satncRNAs) in several cellular contexts, such as cell proliferation, stress response, development or cancer [27] (Figure 2). In fact, satDNA transcripts seem to participate in the most primordial concept of genomic function, being related to centromere structure, chromosome pairing/segregation and kinetochore assembly [2,27,179]. Moreover, recently satDNA transcripts have shown to be associated with male fertility in *Drosophila* sp. [180]. Unfortunately, the function of most satncRNAs remains unknown or unclear due to the inefficient methodologies currently available to analyze molecules of such repetitive nature.

**Figure 2.** Summary of current knowledge regarding satellite non-coding RNAs (satncRNAs) and how they can contribute to genome remodeling. Even though satDNAs present in the heterochromatin and euchromatin can be transcribed, the most studied satncRNAs are the ones originated from pericentromeric and centromeric satDNAs families. For some satncRNAs reported, chromosome location of the origin satDNA cannot be determined. SatDNA transcription has been shown to be associated to cells response to stress, cancer progression, particular developmental stage and some are differentially expressed in specific cell types, tissues and organs. General recognized functions attributed to satncRNAs are listed. The aberrant expression of satncRNAs may result in abnormal chromosome segregation, and chromosome rearrangements that re-shape the genome and can lead to cancer progression or be fixed during species evolution. Further effort is needed to identify and better characterize satncRNA and their involvement in cellular functions and disease.

Concerning centromeric satDNAs, the human α satellite transcripts (α*SAT*) have been shown to be crucial for cell cycle progression, as depletion of α*SAT* resulted in defective centromeric protein A (CENP-A) loading and cell cycle arrest [181]. α*SAT* ncRNAs also seem to regulate spindle microtubule attachment and sister chromatid disjunction through association with AURORA B proteins [182]. These molecules were further shown to be associated with the SUV39H1 histone methyltransferase, thereby suggesting a regulatory function in heterochromatin maintenance [183–185].

Contrary to what is believed for most condensed genomic regions, centromeric sequences remain transcriptionally active during mitosis [186,187], essentially promoting kinetochore stabilization and centromere cohesion [188,189]. These functions have been similarly attributed to transcriptionally active centromeric satDNAs from other species [190–192], in spite of the observed sequence differences. This suggests that satncRNAs are involved in critical functions, which appear to be associated with their intrinsic molecular characteristics and most probably also with the genomic location of their satDNA sequence.

Pericentromeric satDNA transcripts have been related with pericentric chromatin formation [193–195], acting as molecular scaffolds for the accumulation of HP1 [194]. The presence of human *SATIII* ncRNA can be closely associated with cell response to stress. Particularly, heat shock can trigger *SATIII* transcription by the action of HSF1 (Heat Shock Factor 1) [196,197], giving rise to nuclear stress bodies (nSBs) close to *SATIII* DNA regions [198]. The splicing of relevant genes for stress response may be influenced by *SATIII* ncRNAs, which have been proposed to sequester RNA processing factors and downregulate global transcription [199], providing protection against stress-induced cell death [200]. However, *SATIII* transcription is not thermal stress-exclusive, as a basal level of expression is detectable even in the absence of cellular stress [201]. This same satDNA family can exhibit different genomic locations and its transcripts can be involved in multiple functions, making their study even more difficult. *SatDNA III* from *Drosophila melanogaster*is located at the centromere and pericentromere of the X chromosome and at the pericentromere of chromosomes 2 and 3 [202]. Its transcripts have been shown to play different roles in chromatin silencing/heterochromatinization, centromeric function and upregulation of X-linked genes [191,202–204].

Another interesting case is the *FA-SAT*, the major satDNA sequence of *Felis catus* (cat) genome, located at the (sub)telomeres and (peri)centromeres of chromosomes [153] and also in an interspersed fashion [24]. This satDNA is highly conserved in its primary sequence among Bilateria species (e.g., human, *Drosophila*, oyster, cattle, among others), a rare event observed in satDNA sequences [24]. In these species (non-*Felis* species), an interspersed distribution of this satDNA was proposed, with the exception of the other carnivore analyzed, *Genetta genetta*, in which it also presents a centromeric location. Of note, *FA-SAT* is transcribed in all these species [24], and an important conserved function (in cat and human) was ascribed to this ncRNA as a PKM2 interactor involved in the cross-talk between proliferation and apoptosis [205]. In fact, the absence of this satncRNA in both species results in cell death [205]. These transcripts possibly originate from the transcription of *FA-SAT* interspersed DNA (at current knowledge, the common location among these species). A putative connection between *FA-SAT* ncRNAs with cancer was also recently hypothesized [205].

With the progressive acceptance of satDNA transcriptional activity, the aberrant expression of satncRNAs has been increasingly associated with cancer progression (reviewed in [27]). In fact, the observed overexpression of satncRNAs in stress conditions may be comparable to cancer, since loss of sister chromatids cohesion, incorrect chromosome segregation or aneuploidy are common features of both states [206]. Overexpression of satncRNAs in cancer has been reported alongside decondensation and hypomethylation of pericentromeric DNA [154,207]. The transcription of satDNAs, the change in nuclear architecture and the altered sequestration of transcription factors may all be related to gene expression deregulation induced by the hypomethylation of *SATII* and *SATIII* satDNA sequences [208,209]. SatncRNAs may thus be involved in more general disease contexts associated to chromatin decondensation, DNA breaks and subsequent genomic rearrangements [209] (Figure 2). However, the value of satncRNAs as cancer biomarkers is still an unexplored field [27].

Due to their repetitive nature, high copy number and multiple genomic locations (different chromosomes and/or genomic regions), the study of satncRNAs remains a difficult challenge, namely regarding the original genomic location of their DNA sequence and the determination of their primary sequence(s) (Figure 2). Indeed, the most common next-generation sequencing platforms presents significant limitations in the analysis of satncRNA sequences in RNA-seq libraries, namely regarding their inability to assemble large repetitive transcripts from very short reads. This could be overcome in the future with the application of ultra-long read sequencing technology. The fact that most of the methods currently available that are mainly directed towards the analysis of gene coding sequences set a requirement for essential improvements or adjustments in order to support the efficient study of satncRNAs (Figure 1).

Although the importance of satncRNA in normal cell function and disease states is becoming increasingly accepted by the scientific community, in the wake of recent studies on these transcripts, much remains to be understood about their functions in different contexts. This will only be overcome

through the development of improved methods for the study of repetitive sequences, as well as the commitment of the scientific community to this field of research.

#### **5. Concluding Remarks**

In this review we outlined the critical importance of satDNA sequences in driving karyotype evolution and genomic architecture, as well as their involvement in various basic genomic functions. However, to this day, significant technological limitations hinder the progress of this important field in biology and medicine, in particular in the study of diseases involving this genomic fraction. It is imperative to boost the study of satDNA sequences and their transcripts by adapting and developing sequencing technologies and bioinformatics pipelines capable of assembling chromosomes from telomere-to-telomere as well as focused approaches that follow the concept of chromosomics. Significant effort is needed from the entire scientific community to value these important genomic elements, which have been so neglected over time. Only then can we begin to fully understand the largest fraction of our genome.

**Author Contributions:** The concept and idea and the drafting of this review was done by R.C. and S.L. S.L., M.L., D.F., F.A., A.E. contributed to writing and M.G.-C. and R.C. revised the drafts of the manuscript. R.C. and M.L. prepared figures. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Ph.D. grant (SFRH/BD/147488/2019), by a Scientific Employment Stimulus 2017 junior research contract in the biological sciences field and from the BioISI project with the reference UID/MULTI/04046/2019 from FCT, all from the Science and Technology Foundation (FCT) from Portugal.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Taxonomic Diversity Not Associated with Gross Karyotype Di**ff**erentiation: The Case of Bighead Carps, Genus** *Hypophthalmichthys* **(Teleostei, Cypriniformes, Xenocyprididae)**

**Alexandr Sember 1,\*, Šárka Pelikánová 1, Marcelo de Bello Cio**ffi **2, Vendula Šlechtová 1, Terumi Hatanaka 2, Hiep Do Doan 3, Martin Knytl <sup>4</sup> and Petr Ráb <sup>1</sup>**


Received: 26 February 2020; Accepted: 24 April 2020; Published: 28 April 2020

**Abstract:** The bighead carps of the genus *Hypophthalmichthys* (*H. molitrix* and *H. nobilis*) are important aquaculture species. They were subjected to extensive multidisciplinary research, but with cytogenetics confined to conventional protocols only. Here, we employed Giemsa-/C-/CMA3- stainings and chromosomal mapping of multigene families and telomeric repeats. Both species shared (i) a diploid chromosome number 2*n* = 48 and the karyotype structure, (ii) low amount of constitutive heterochromatin, (iii) the absence of interstitial telomeric sites (ITSs), (iv) a single pair of 5S rDNA loci adjacent to one major rDNA cluster, and (v) a single pair of co-localized U1/U2 snDNA tandem repeats. Both species, on the other hand, differed in (i) the presence/absence of remarkable interstitial block of constitutive heterochromatin on the largest acrocentric pair 11 and (ii) the number of major (CMA3-positive) rDNA sites. Additionally, we applied here, for the first time, the conventional cytogenetics in *H. harmandi*, a species considered extinct in the wild and/or extensively cross-hybridized with *H. molitrix*. Its 2*n* and karyotype description match those found in the previous two species, while silver staining showed differences in distribution of major rDNA. The bighead carps thus represent another case of taxonomic diversity not associated with gross karyotype differentiation, where 2n and karyotype structure cannot help in distinguishing between genomes of closely related species. On the other hand, we demonstrated that two cytogenetic characters (distribution of constitutive heterochromatin and major rDNA) may be useful for diagnosis of pure species. The universality of these markers must be further verified by analyzing other pure populations of bighead carps.

**Keywords:** comparative fish cytogenetics; cytotaxonomy; chromosome banding; East Asian cypriniform fishes; FISH; rDNA; snDNA

#### **1. Introduction**

The bighead carps of the genus *Hypophthalmichthys* (Bleeker, 1860) represent a small, well-defined group of morphologically highly distinct and ecologically unique cyprinoid fishes [1] formerly

recognized as cyprinid subfamily Hypophthalmichthiynae. Recent formal taxonomy includes this genus into family Xenocyprididae (sensu [2]) and, at the same time, the genus is a member of monophyletic clade harboring several East Asian morphologically distinctly differentiated genera [3]. Collectively, the bighead carps once consisted of monotypic genus *Aristichthys* (Oshima, 1919) with species *Aristichthys nobilis* (Richardson, 1844) (bighead carp) and genus *Hypophthalmichthys* (Bleeker, 1860) with two recognized species: silver carp, *Hypophthalmichthys molitrix* (Valenciennes, in Cuvier & Valenciennes, 1844) and Harmand's silver carp (or large-scaled silver carp), *Hypophthalmichthys harmandi* (Sauvage, 1884). However, Howes [1] synonymized the genus *Aristichthys* with *Hypophthalmichthys* based on morphological characteristics—a taxonomic action not always accepted [4]. The systematic status of *H. harmandi* is not well understood at present, and while some authors [4] recognized it as a species distinct from *H. molitrix*, others [5] consider it as subspecies of silver carp only; nevertheless, both species differ in a number of morphological, physiological and reproductive characters (for details, see Supplementary File 1: Text S1).

In their native range (from Amur R. in the north to the Red R. basin in Vietnam and Hainan Island in the south) and elsewhere in temperate regions in Eurasia, they are highly economically important fishes as objects of both lacustrine and riverine fishery and aquaculture [6]. However, bighead carps have been introduced and/or stocked into rivers and lakes outside their native range such as, e.g., in North America (see [7] and references therein), India [8], South Africa [9], and elsewhere in a number of countries [10], where they consequently became invasive aliens which degraded aquatic ecosystems, changing significantly the food webs (see, e.g., in [10–14]). Bighead carps have been and still are objects of intense investigation in various types of studies; for instance, search on 25 April 2020 shows 1155 records on Web of Science and ~19,200 records on Google Scholar when using the term 'Hypophthalmichthys'. Similarly, the chromosomes of bighead and silver carp have been studied by relatively high number of authors (reviewed in Table 1), although mostly just at the level of conventionally Giemsa-stained chromosomes.


**Table 1.** Summary of reported data on diploid chromosome number (2*n*), numbers of chromosomes in particular morphological categories (m—metacentric, sm—submetacentric, st—subtelocentric, a—acrocentric) and number of chromosome arms (NF value).

Note: During the search for data on cytogenetics of bighead carps, we found also eight other studies (published between years 1976–1985) but we did not include them in this summary because they provided 2*n* only and/or were found methodically very problematic. Their list is available upon request from the corresponding author.

All those studies identically reported 2*n* = 48 but differed markedly in the karyotype description, evidently due to the low quality of chromosome preparations, except the reports of Liu [26,30] where mitotic chromosomes from the leukocyte cultures were successfully prepared. Only a few of those studies tried to investigate some other chromosomal characteristics using silver staining of nucleolar organizer regions (NORs; Ag-NOR technique) [27], C-banding [27,34], G-banding [35], or BrdU replication banding [33], all with very ambiguous and not reliable results except the one of Almeida-Toledo et al. [27] who evidenced multiple NOR regions on chromosomes of both bighead carp species. However, the chromosomes of *H. harmandi* have not been studied as yet.

Aiming to more deeply examine the karyotype organization in *H. molitrix* and *H. nobilis*, we combined conventional cytogenetics (Giemsa-, C-, and CMA3- stainings) with the chromosomal mapping of 5S and 18S rDNA, U1 and U2 snDNA, and (TTAGGG)*<sup>n</sup>* tandem repeats. In addition, we have undertaken Giemsa karyotyping and Ag-NOR analysis in a third species, *H. harmandi*, which is considered extinct in the wild and/or extensively cross-hybridized with *H. molitrix*. We analyzed individuals of *H. harmandi* from a unique gene pool strain, not hybridized with silver carp.

#### **2. Material and Methods**

#### *2.1. Sampling*

We analyzed four juveniles of *H. molitrix* and five juveniles of *H. nobilis* originated from Fishery Farm, Pohoˇrelice, Czech Republic. The geographical origin of the stock of the former is unknown (original brood fishes were imported from Hungary), while the stock of the latter has been derived from imports from U.S.S.R., which have originated in Amur R. Nine juveniles of *H. harmandi* belonged to a pure line maintained at the Research Institute of Aquaculture No. 1, Dinh Bang, Tu Son, Bac Ninh, Vietnam; it originates from Red River in Vietnam and has been derived from the wild population in the late 1950s, i.e., before silver carp introductions from China. These fishes were imported into the Laboratory of Fish Genetics in 1991. Individuals of *H. molitrix* and *H. nobilis* used for the cytogenetic analysis were tested biochemically to confirm the species identity according to the method of Šlechtová et al. [36], who found species-specific alleles in eight allozyme loci. As the analyzed fishes were juveniles, the sex could not be determined. Samples came from the Czech Republic (Petr Ráb) and Vietnam (Hiep Do Doan) in accordance with the national legislation of the countries concerned. To prevent fish suffering, all handling of fish by collaborators followed European standards in agreement with §17 of the Act No. 246/1992 coll. The procedures involving fish were also supervised by the Institutional Animal Care and Use Committee of the Institute of Animal Physiology and Genetics CAS, v.v.i., the supervisor´s permit number CZ 02361 certified and issued by the Ministry of Agriculture of the Czech Republic. All fishes were euthanized using 2-phenoxyethanol (Sigma-Aldrich, St. Louis, MO, USA) before being dissected.

#### *2.2. Chromosome Preparation and Conventional Cytogenetics*

Chromosome preparations were produced using leukocyte cultures in the case of juveniles of *H. molitrix* and *H. nobilis* [37,38], while those of *H. harmandi* were achieved by a direct preparation from the cephalic kidney [39,40]. The quality of chromosomal spreading was enhanced by a dropping method described by Bertollo et al. [40]. Chromosomes were stained with 5% Giemsa solution (pH 6.8) (Merck, Darmstadt, Germany) for a conventional cytogenetic analysis or kept unstained for other methods. For sequential stainings, selected Giemsa-stained slides were distained in a cold fixation with methanol: acetic acid 3:1 (v/v) before the application of other technique. For FISH, slides were dehydrated in an ethanol series (70, 80, and 96%, 3 min each) and stored at −20 ◦C.

Constitutive heterochromatin was visualized by C-banding according to Haaf and Schmid [41]; chromosomes were counterstained with 4 ,6-diamidino-2-phenolindole (DAPI) (Sigma-Aldrich). Fluorescence staining was done by GC-specific fluorochrome Chromomycin A3 (CMA3) and AT-specific fluorochrome DAPI (both Sigma-Aldrich), following Mayr et al. [42] and Sola et al. [43]. The banding

protocols were performed either separately or sequentially on the metaphases previously treated by other method(s). In *H. harmandi,* only silver-nitrate impregnation of NORs (i.e, Ag-NOR staining) was performed, according to Howell and Black [44].

#### *2.3. DNA Isolation and Preparation of FISH Probes*

Total genomic DNA was extracted from fin and blood tissue using the Qiagen DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany). 5S and 28S rDNA fragments were obtained by polymerase chain reaction (PCR) with primers and thermal profiles described in Sember et al. [45]. Amplification of 18S rDNA and U1 snDNA was done by PCR with the primers 18SF (5 -CCGAGGACCTCACTAAACCA-3 ) and 18SR (5 -CCGCTTTGGTGACTCTTGAT-3 ) [46]; U1F (5 -GCAGTCGAGATTCCCACATT-3 ) and U1R (5 -CTTACCTGGCAGGGGAGATA-3 ) [47], using the thermal profiles described in Yano et al. [48] and Silva et al. [47], respectively. The obtained PCR products were purified using NucleoSpin Gel and PCR Clean-up (Macherey-Nagel GmbH, Düren, Germany) according to manufacturer's instructions. The subsequent procedures involving cloning of the purified products and a plasmid isolation, sequencing (in both strands) of selected positive clones, assembly of chromatograms from obtained sequences and sequence alignment followed essentially the same workflow as described in Sember et al. [49]. Some portion of obtained products was sequenced (in both strands) by Macrogen company (Netherlands). The content of resulting consensus sequences was verified using NCBI BLAST/N analysis [50] and selected clones were used for a FISH probe preparation. For the chromosomal mapping of U2 snDNA, we used the probe obtained previously from a botiid fish *Leptobotia elongata* (for details, see Sember et al. [49]). Furthermore, the FISH results from the mapping of *Hypophthalmichthys*-derived 28S rDNA probe were verified by 28S rDNA probes generated from the nemacheilid loach *Schistura corica* [45] and botiid loach *Botia almorhae* [49].

DNA probes were labeled mostly by PCR, either with biotin-16-dUTP or with digoxigenin-11-dUTP (both Roche, Mannheim, Germany). Due to its long size, the 18S rDNA probe was generated in two steps: (i) non-labeling PCR amplification from a verified 18S rDNA clone and (ii) nick translation (2 h) of the amplified 18S rDNA product using Nick Translation Mix (Abbott Molecular, Illinois, USA). A portion of U1 and U2 snDNA probes was also labeled by Nick Translation Mix (Abbott Molecular); the template DNA was in this case the entire plasmid DNA containing U1 or U2 snDNA insert. A dual-color FISH for each slide involved 200 ng of each probe and 25 μg of sonicated salmon sperm DNA (Sigma-Aldrich). The final hybridization mixtures were prepared according to Sember et al. [45].

#### *2.4. FISH Analysis*

Dual-color FISH experiments were conducted essentially according to Sember et al. [45]. Briefly, chromosome preparations were thermally aged (overnight at 37 ◦C and 1 h at 60 ◦C), then pre-treated in RNase A (200 μg/mL in 2× SSC, 60–90 min, 37 ◦C) (Sigma-Aldrich) and pepsin (50 μg/mL in 10 mM HCl, 3 min, 37 ◦C), and finally denatured in 75% formamide in 2× SSC (pH 7.0) (Sigma-Aldrich) for 3 min at 72 ◦C. Probes were denatured at 86 ◦C for 6 min, cooled on ice, and dropped on the chromosome slides. Hybridization took place in a moist chamber at 37 ◦C overnight. A post-hybridization washing was done under high stringency, i.e., two times in 50% formamide/2× SSC (42 ◦C, 10 min) and three times in 1× SSC (42 ◦C, 7 min). Prior to the probe detection, 3% bovine serum albumin (BSA) (Vector Labs, Burlington, Canada) in 0.01% Tween 20/ 4× SSC was applied to the slides to block unspecific binding of antibodies. Hybridization signals were detected by Anti-Digoxigenin-FITC (Roche; dilution 1:10 in 0.5% BSA/PBS) and Streptavidin-Cy3 (Invitrogen Life Technologies, San Diego, CA, USA; dilution 1:100 in 10% NGS (normal goat serum)/PBS). Experiments with altered labeling (e.g., biotin for 18S and digoxigenin for 5S rDNA) were included to verify the observed patterns. All FISH images presented here have a unified system of pseudocolored signals—red for the 18S rDNA and U2 snDNA probes, and green for the 5S rDNA and U1 snDNA probes. Finally, all FISH slides were mounted in antifade containing 1.5 μg/mL DAPI (Cambio, Cambridge, UK).

Telomeric (TTAGGG)*<sup>n</sup>* repeats were detected by FISH using a commercial telomere PNA (peptide nucleic acid) probe directly labeled with Cy3 (DAKO, Glostrup, Denmark) according to the manufacturer's instructions, with a single modification concerning the prolonged hybridization time (1.5 h).

#### *2.5. Microscopic Analyses and Image Processing*

Giemsa-stained chromosomes and FISH images were inspected using a Provis AX70 Olympus microscope equipped with a standard fluorescence filter set. FISH images were captured under immersion objective 100× with a black and white CCD camera (DP30W Olympus) for each fluorescent dye separately using DP Manager imaging software (Olympus). The same software was used to superimpose the digital images with the pseudocolors. Karyotypes from Giemsa-stained chromosomes were arranged in Ikaros (Metasystems) software. Final images were optimized and arranged using Adobe Photoshop, version CS6.

At least 15 metaphases per individual and method were analyzed, some of them sequentially. Chromosomes were classified according to Levan et al. [51], but modified as m—metacentric, sm—submetacentric, st—subtelocentric, and a—acrocentric, where st and a chromosomes were scored as uniarmed to calculate NF value (Nombre Fondamental, number of chromosome arms sensu Matthey [52]). Chromosome pairs were arranged according to their size in each chromosome category.

#### **3. Results**

#### *3.1. Karyotypes and Chromosome Banding Characteristics*

Analyzed fishes of all three species possessed invariably a 2*n* = 48 (Figure 1a,c,e), confirming thus previous reports (Table 1). Besides, they also possessed the same karyotype compositions: four pairs of m, 12 pairs of sm, and eight pairs of st-a chromosomes (Figure 1). Chromosomes of *H. molitrix* and *H. nobilis* displayed a very low content of constitutive heterochromatin concentrated in the pericentromeric chromosome regions, except for significantly heterochromatinized short (*p*) arms of the largest st chromosome pair in *H. molitrix* and additional interstitial block of heterochromatin on this pair in *H. nobilis* only (Figure 1b,d). CMA3 fluorescence revealed six positive signals in the karyotype of *H. molitrix* (*p*-arms of the largest and middle-sized st chromosome pairs; Figure 2a), while it displayed altogether 10 signals in *H. nobilis* (all in *p*-arms of st chromosome pairs including the largest st element; Figure 2b). In the karyotype of *H. harmandi*, four Ag-positive signals in the *p*-arms in st chromosome pairs (likely Nos. 17 and 18) were observed (Figure 1f).

#### *3.2. Sequence Analysis of Repetitive DNA Fragments*

PCR amplification resulted consistently in approximately 150 bp (U1 snDNA), 200 bp (5S rDNA), 300 bp (28S rDNA), and 1800 bp (18S rDNA) long fragments. Searches with the BLAST/N program at NCBI yielded the following results; 18S rDNA (*H. molitrix*)—sequenced 1380 bp long part showed 96–99% identity with 18S rDNA fragments of many fish species; 28S rDNA (both from *H. molitrix* and *H. nobilis*) displayed high similarity results (96–98% identity) with 28S rDNA sequences of many teleosts; 5S rDNA (both from *H. molitrix* and *H. nobilis*): 176–178 nt of our sequenced fragment was subjected to BLAST/N and showed 87–88% identity with sequence of 5S rDNA and non-transcribed spacer of *Megalobrama amblycephala* (Sequence ID: KT824058.1), *Cyprinus carpio* (Sequence ID: LN598602.1) and *Danio rerio* (Sequence ID: AF213516.1), and further 97% identity was shown in 104–114 nt long part of our PCR fragment with the coding region of 5S rDNA of many fishes. Finally, 123 nt of our U1 snDNA fragment showed 97% identity with the predicted U1 snRNA gene region of many fish species. Sequences for 18S rDNA and U1 snDNA (from *H. molitrix*) and for 5S rDNA (from both *H. molitrix* and *H. nobilis*) were deposited in GenBank under the accession numbers MT165584-MT165587. We have not investigated U2 snDNA genes from *Hypophthalmichthys* as the U2 snDNA probe from *Leptobotia elongata* has proven to be fully sufficient for FISH.


**Figure 1.** Karyotypes of three *Hypophthalmichthys* species arranged from mitotic metaphases after Giemsa staining, C-banding or Ag-NOR staining. (**a**,**b**) *H. molitrix* (individual HM3), (**c**,**d**) *H. nobilis* (individual HN4), and (**e**) *H. harmandi* (individual HH1). (**a**,**c**,**e**) Giemsa staining; (**b**,**d**) C-banding. Note two distinct blocks of constitutive heterochromatin on pair No. 11 in *H. nobilis* (**d**). (**f**) Ag-NOR staining in *H. harmandi* (individual HH3). The metaphase is incomplete (one chromosome missing; 2*n* = 47), but the most representative one regarding the spreading quality and the signal strength and it is also to higher extent sufficient enough to present required features (i.e., note a lack of Ag-NOR signal on the largest acrocentric chromosome pair No. 11). Scale bar = 10 μm.

**Figure 2.** Mitotic metaphases of *Hypophthalmichthys molitrix* and *H. nobilis* after CMA3/DAPI staining. (**a**) *H. molitrix*, individual HM3, (**b**) *H. nobilis*, individual HN4. For better contrast, images were pseudocolored in red (for CMA3) and green (for DAPI). Arrows indicate CMA3-positive sites. Scale bar = 10 μm.

#### *3.3. Hybridization Patterns of Repetitive DNA Probes*

5S rDNA probe mapped consistently to the proximal region of the largest acrocentric pair No. 11 in both species (Figure 3a,b). On the same chromosome pair, adjacent to 5S rDNA cluster, tandem arrays of 18S rDNA were found to cover the entire *p*-arms (Figure 3a,b). Additional 18S rDNA loci resided in the terminal part of *p*-arms or encompassed entire *p*-arms of several chromosomes. The complete number of 18S rDNA signals was eight in *H. molitrix* (chromosome pairs 11, 14, 20, and 21) and ten in *H. nobilis* (chromosome pairs 11, 14, 15, 20, and 21) (Figure 3a,b). On the other hand, 28S rDNA probes (generated from herein studied species or utilized from other cypriniforms formerly analyzed by us [45,49]) did not generate any hybridization signals, suggesting that a 300 bp long probe is too short to visualize small rDNA clusters present in *Hypophthalmichthys*, while 1800 bp of 18S rDNA can produce signals of sufficient intensity. Although all the 18S rDNA sites corresponded with CMA3-positive signals, some 18S rDNA clusters in *H. molitrix* were not revealed by this GC-specific fluorochrome (compare Figures 2a and 3a), again probably reflecting small size (i.e., relatively low copy number of tandem arrays) of major rDNA cistrons.

U1 and U2 snDNA probes co-localized in both species in a pericentromeric region of small st chromosome pair (No. 7) (Figure 3c,d). Neither the co-localization between snDNA and rDNA (Figure 4a,b and Figure 5), nor intraspecific variability in the number of hybridization signals of any multigene family were observed among analyzed individuals of both species. Telomere FISH marked only ends of all chromosomes, with no additional interstitial sites (Figure 4c,d).

As we analyzed not sexed juvenile individuals, we could not directly assess possible sex-related differences in the karyotypes and in patterns of analyzed cytogenetic markers. Nonetheless, we did not observe any type of within-species polymorphism in our sampling, and it has been formerly shown that both *Hypophthalmichthys* species display a sex ratio around 1:1 due to genetic sex determination governed most likely by a homomorphic (i.e., cytologically indistinguishable) XX/XY sex chromosome system [53].

**Figure 3.** Karyotypes of *Hypophthalmichthys molitrix* and *H. nobilis* arranged after 5S/18S rDNA and U1/U2 snDNA FISH. (**a**,**b**) 18S rDNA (red) and 5S rDNA (green) probes. Insets show separately 5S and 18S rDNA signals on the largest acrocentric pair. Note the adjacent position of 5S and 18S rDNA signals on chromosome pair No. 11 in both species. (**c**,**d**) U1 (green) and U2 (red) snDNA probes mapped on mitotic chromosomes of (**c**) *H. molitrix* and (**d**) *H. nobilis*. Note the co-localization of a single pair of U1 and U2 snDNA signals in small sm chromosome pair No. 7. Insets show separate hybridization signals for each individual probe. Chromosomes were counterstained with DAPI (blue). Identification codes of individuals: (**a**) *H, molitrix* HM2, (**b**) *H. nobilis* HN1, (**c**) *H. molitrix* HM4, and (**d**) *H. nobilis* HN3. Scale bar = 10 μm.

**Figure 4.** Mitotic metaphases of *Hypophthalmichthys molitrix* and *H. nobilis* after different cytogenetic treatments. (**a**,**c**) *H. molitrix* and individual HM4 in both methods; (**b**,**d**) *H. nobilis*, individuals HN4, and HN3, respectively. Images (**a**,**b**) clarify an independent location of distinct cytogenetic markers. (**a**) FISH with U1 snDNA (green, arrowheads) and 18S rDNA (red, arrows) probes. (**b**) FISH with U2 snDNA (red, arrowheads) and 5S rDNA (green, arows) probes. Chromosomes were counterstained with DAPI (blue). (**c**,**d**) PNA FISH with telomeric probe; for better contrast, pictures were pseudocolored in green (telomeric repeat probe) and red (DAPI). Scale bar = 10 μm.

#### **4. Discussion**

The chromosomes of the two species of bighead carps, *H. molitrix* and *H. nobilis*, were extensively studied (Table 1), evidently due to their high aquacultural value. On the other hand, 2*n* and karyotype of the third species of the genus, *H. harmandi*, is reported in our study for the first time. Our current assessment of the karyotype structure and the hybridization patterns of selected multigene families in *H. molitrix* and *H. nobilis* is summarized in Figure 5. Our study confirmed 2*n* = 48 for these two species and revealed the same chromosome count for *H. harmandi*. The karyotype structures in *H. molitrix* and *H. nobilis*, however, differed markedly among various studies. The reason for these discrepancies might be linked with the following facts; (i) chromosomes of cypriniform fishes generally exhibit very small size when compared to other teleosts (see, e.g., in [45,54–56]); (ii) furthermore, cyprinoid chromosomes also exhibit a gradual decrease in size, with the centromere positions ranging stepwise from median to nearly terminal, making it difficult to assess the chromosomal categories with accuracy; and (iii) inspection of published chromosome pictures showed that previous reports were based on highly condensed chromosomes which also made it impossible to describe the karyotype accurately. However, careful analysis of a number of metaphase cells with less condensed chromosomes demonstrated

that karyotypes of all three species of bighead carps at the level of conventionally Giemsa-stained chromosomes are in fact identical.

**Figure 5.** Representative idiograms of two *Hypophthalmichthys* species highlighting the distribution of analyzed multigene families. 18S (red) and 5S (green) rDNA sites and U1 (blue) and U2 (pink) snDNA sites on the chromosomes of *H. molitrix* and *H. nobilis*. Note the co-localization of snDNA sites on the chromosome pair 7 and the adjacent arrangement of 5S and 18S rDNA sites on the chromosome pair 11. Moreover, notice the additional 18S rDNA site on chromosome pair 15 in *H. nobilis* (marked by arrow) in comparison to the karyotype of *H. molitrix*. Finally, an asterisk denotes the location of the differential interstitial C-band, which is present in *H. nobilis* but absent in *H. molitrix*. Insets with the chromosome pair 11 (right) display the chromosomes dissected from prometaphase plates after rDNA FISH, where the adjacent arrangement of both rDNA classes is clearly visible.

We thus show that potential interspecific hybrids between *H. harmandi* and *H. molitrix* cannot be revealed after basic karyotype analysis alone. Nonetheless, we observed that karyotypes of *H. molitrix* and *H. nobilis* differ in two other cytogenetic characters; one of them displays distinctive pattern also in *H. harmandi*. First, the presence of an additional interstitial C-band on the largest acrocentric pair in all individuals of *H. nobilis* clearly distinguishes this species from *H. molitrix,* at least in our sampled populations. This additional location of constitutive heterochromatin in *H. nobilis* might potentially emerge after a pericentric inversion which did not affect the general morphology of the chromosome but relocated part of the heterochromatic block from the *p*-arm to the proximal region of the long (*q*) arm. Our data, however, cannot rule out the involvement of other mechanisms such as centromere repositioning [57,58]. For the third species, *H. harmandi*, data from C-banding are not available; therefore, we cannot confirm if this method alone may provide enough information to discriminate the karyotypes of all three species. Nonetheless, even if we could do that, we would have to take into account that constitutive heterochromatin might display a polymorphic distribution among populations of diverse taxa (including teleosts; exemplified in [59–62]) and thus this feature might limit the resolution power of C-banding for interspecific diagnosis. Second, we found a difference in the number and position of major rDNA sites—four loci in *H. harmandi*, eight in *H. molitrix* and ten in *H. nobilis*. Our results are partially not consistent with those of Almeida-Toledo et al. [27] who also reported four pairs bearing Ag-NORs in *H. molitrix*, but only three pairs in *H. nobilis* (in contrast to five pairs revealed by us via FISH). Our view on this discrepancy is that either (i) the Ag-NOR method detected only clusters active in preceding interphase, while our FISH analysis showed all major rDNA sites irrespective of their transcriptional activity, or (ii) Almeida-Toledo et al. [27] examined the hybridized individuals which remained undetected due to lack of testing for genome admixtures, i.e., the step that we included in our present study. In either case, both studies collectively suggest that the patterns of major rDNA distribution might be stable at least in *H. molitrix* and that it differs from the one found in *H. nobilis*, strengthening the possibility that this marker may be useful also in species diagnosis in other *Hypophthalmichthys* populations.

Major (45S; NOR-forming) and minor (5S; located outside NOR) rDNA clusters are by far the most utilized cytotaxonomic markers in fishes [63–65]. Major rDNA is usually visualized by 18S or 28S rDNA probes. Despite the ever-growing number of studies showing lability of their site number and patterns of distribution in fish genomes (with many cases documenting intra- and inter-populational variability) (see, e.g., in [66–68]) and even their vulnerability to change rapidly under different environmental conditions [69] or hybridization [70], certain arrangements of rDNA classes can help to clarify a presence of species complexes or cryptic species (see, e.g., in [71–73]), to uncover the genome composition in hybrid specimens [74,75], to confirm the ploidy level, and to deduce the mechanism of polyploidy [76–79]. It has been repeatedly documented that even closely related species may possess dramatically different number of rDNA loci [45,71,80]. A difference in number of 5S rDNA clusters between emerald and darter goby (two vs. 42) [81] may serve as an illustrative example. Besides the difference in number and position of positive signals, also the linkage between 45S and 5S or them with other multigene families may represent a valuable cytotaxonomic determiner (see, e.g., in [82–85]).

Among Cypriniformes, many studies have been conducted on polyploid species and especially on those of high aquacultural importance, such as genera *Cyprinus* and *Carassius* (see, e.g., in [55,74,86,87]) or on unisexually reproducing taxa such as *Squalius*, *Cobitis*, and *Misgurnus* and on species closely related to them [45,77,79,88–92]. Some reports revealed amplified number of either 5S or 18S rDNA signals [45,55], different types of inter-individual/inter-populational polymorphisms in number and location of rDNAs [70,88–91,93] or high interspecific variability in this character [91,92,94], while still other studies found rather standard patterns, with just one locus of one or both rDNA classes per haploid genome [45,77,95] or only a slight elevation in number of sites [56,95,96]. Among two *Hypophthalmichthys* species analyzed herein, a single pair of 5S rDNA loci occupied apparently homeologous chromosomes and were found adjacent to one of the multiple 18S rDNA clusters. Similar links between 5S and 18S rDNA sites provided valuable cytotaxonomic markers in some cyprinids (see, e.g., in [86,92,96]). In our study, however, as this arrangement is shared by both species, it cannot be considered as useful cytotaxonomic determiner. Nonetheless, Ag-NOR analysis in *H. harmandi* clearly showed that NORs are not present on this largest acrocentric pair, hence potential hybrids containing the *H. harmandi* genome could be identified this way. What is further evident is the interspecific difference in the number of 18S rDNA sites, which could be helpful as a cytotaxonomic marker, but its intraspecific stability must be further verified in other pure populations of both species. In this sense, it may be difficult to discriminate all 18S rDNA loci due to their tiny size, therefore the analysis should be treated with caution.

Genes for small nuclear RNA (snRNA) are yet readily used for chromosome mapping in fishes, though studies employing U2 snDNA as a cytogenetic marker are steadily growing in the last years ([84,97–101], to name a few). On the other hand, U1 snDNA has been so far chromosomally mapped only in a cichlid *Oreochromis niloticus* [102], several South American characiforms of the genera *Astyanax* [47] and *Triportheus* [85], and further in African characiform representative *Hepsetus odoe* [103], one species from Gadiformes [104] and one taxon (suspected species complex) belonging to Mugiliformes [73]. Among cypriniforms, only a single recent work mapped U2 snDNA sites, namely in diploid and tetraploid loaches of the family Botiidae [49], therefore our present study is the first one showing the position of U1 snDNA on cypriniform chromosomes. In botiids, perhaps surprisingly, the mapping of U2 snDNA showed mostly a conserved single pair of U2 snDNA signals irrespective of the ploidy level. What is more, the location of U2 snRNA arrays in the pericentromeric/interstitial region as revealed in botiids was also found herein in both *Hypophthalmichthys* species and, interestingly, the same or similar pattern has been encountered in approximately half of fish species inspected for U2 snDNA

distribution to date (see [84,101] and examples listed in Yano et al. [99]). It seems that a strong selective pressure operates to maintain such a location for this gene. Moreover, in botiids [49] as well as in two herein studied *Hypophthalmichthys* species and in some other fish species [47,101,105,106] snDNA clusters are located on rather small-sized chromosomes. It is tempting to hypothesize that this location may facilitate more efficient expression as small chromosomes tend to occupy rather interior, transcriptionally active part of the interphase nucleus (see, e.g., in [107]). What is less conserved, is the so far known association of U2 snDNA with other multigene families. Several combinations of syntenic/adjacent or intermingled arrangements can be found among fishes such as between 5S rDNA and U1 snDNA [47,85], 5S rDNA and U2 snDNA [84,98,100,105], 18S rDNA with U2 snDNA [99,106,108], 5S and 18S rDNA together with U2 snDNA [99] and even with several histone genes [109]; further U1 and U2 snDNA [97,104] or U1 and U2 snDNA together with 5S rDNA [110]. Therefore, these arrangements may potentially serve as useful cytotaxonomic markers. In our study, both investigated *Hypophthalmichthys* species shared the co-localization of U1 and U2 snDNA cistrons along with an independent location of these sites with respect to rDNA classes.

FISH aimed to map the vertebrate telomeric (TTAGGG)*n* repeat motif showed signals only in their usual location at termini of all chromosomes. No interstitial telomeric sequences (ITSs), which might point to previous structural chromosomal rearrangements (see, e.g., in [111]), were detected, neither in *H. molitrix* nor in *H. nobilis*. More importantly, this type of analysis did not reveal any differences between analyzed species that would be helpful in their discrimination.

Recently, all three herein studied species are included in the genus *Hypophthalmichthys* (Bleeker, 1860) [1] but Kottelat [4] noted that not all authors agree with synonymization of the genus *Aristichthys* (Oshima, 1919). From the cytotaxonomic view, it is not possible to contribute to this problem due to lack of significant karyotype differences. Table 2 further summarizes all available data for members of the monophyletic East Asian clade of the family Xenocyprididae (sensu Tan and Ambruster [2]). Though the quality of such data was affected by the facts discussed above (i.e., the characteristics of cyprinoid chromosomes), their critical assessment demonstrates that these species possess (i) the same 2*n* = 48; (ii) very similar karyotype structures; and, where studied [112,113], also (iii) multiple NOR sites, supporting thus molecular phylogeny of the clade [3].


**Table 2.** Review of reported cytogenetic data for members of the monophyletic clade of several East Asian morphologically distinct genera.

The stability of 2n (with either 48 or 50 chromosomes) is widely documented for majority of non-polyploid cyprinoids [91,92,95,120] as well as in other related cypriniforms (see, e.g., in [45,89]), indicating its high conservatism. These signs of the so-called karyotype stasis, in which identical or almost identical karyotypes are maintained within a certain taxonomic group even over considerable long evolutionary time, are observable also in other teleost lineages such as in the pikes of the genus *Esox* [121,122], several lineages of salmonid fishes with A-type karyotype [123,124], and further especially in knifefishes of the family Notopteridae (see [125] and references therein) and many percomorph groups [126–130]. Karyotype stasis has been also documented in diverse clades across the tree of life (e.g., typically in birds [131] and in feline lineages [132]). The underlying evolutionary mechanisms for this mode of karyotype evolution have not been identified so far but they may be at least partially linked with the functional arrangement of chromatin within the interphase nucleus and the degree of tolerance to its change [133,134]. Nonetheless, it is highly probable that such a high degree of karyotype similarity may significantly contribute to the rate of interspecific hybridization [135], which has been repeatedly documented among many cyprinids [19,55,70,75] as well as between the *Hypophthalmichthys* species [18–21,36].

#### **5. Conclusions**

Our cytogenetic study of all three species of the genus *Hypophthalmichthys* documented that their karyotype macrostructure, i.e., the number of chromosomes in respective morphological categories, is identical, therefore these characteristics alone may not help in the identification of pure species and interspecific hybridizations. A brief overview of available cytogenetic data of other members of the monophyletic clade of East Asian fishes, to which *Hypophthalmichthys* belongs, shows identical 2*n* = 48, very similar karyotypes and, in a subset of analyzed species, also multiple NOR sites, supporting thus the molecular phylogeny of the clade. The bighead carps thus belong to the teleost lineages where the taxonomic diversity is not associated with extensive karyotype repatterning. However, an important difference has been unraveled in the present study between *H. molitrix* and *H. nobilis* as the latter species exhibits additional interstitial band of constitutive heterochromatin on the largest acrocentric pair 11. Lack of data for *H. harmandi* did not allow us to assess the usefulness of this marker in this practically extinct species. On the other hand, a combined set of FISH and Ag-NOR results showed that the karyotypes of all three species differ among each other in the number and position of major rDNA sites—four in *H. harmandi*, eight in *H. molitrix*, and ten in *H. nobilis*. Particularly important is the absence of major rDNA on the largest pair 11 in the karyotype of *H. harmandi*, which may distinguish this species from the other two. Therefore, the combination of both cytogenetic methods may be useful for the species diagnosis inside *Hypophthalmichthys*. Testing of their universality across different pure *Hypophthalmichthys* populations together with concomitant generation of another cytogenetic markers (such as, e.g., species-specific satellite DNA classes) is an inevitable further research step.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/5/479/s1, Supplementary File 1: Text S1. Morphological differences between *Hypophthalmichthys molitrix* and *H. harmandi*, supplemented with a photographical documentation.

**Author Contributions:** H.D.D. and P.R. managed the fish material. A.S. and P.R. conceived and designed the experiments. A.S., Š.P., M.d.B.C., T.H. and M.K. conducted the experiments. A.S., M.d.B.C., M.K., Š.P., V.Š., P.R. analyzed the data. M.K. and P.R. contributed reagents, materials and analytical equipment. A.S. and P.R. wrote the draft of the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** P.R. was supported by the project EXCELLENCE CZ.02.1.01/0.0/0.0/15\_003/0000460 OP RDE. A.S., P.R., and V.Š. were supported by RVO: 67985904 of IAPG CAS, Libˇechov. M.d.B.C. was supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (401962/2016-4 and 302449/2018-3) Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (2018/22033-1). T.H. was supported by CAPES.

**Acknowledgments:** The authors are grateful to the fishery manager Vladimír Chytka, Fishery Farm, Pohoˇrelice, Czech Republic, for providing bigheads carp individuals for this study and information about stock origins. Thanks are due to also Vˇera Šlechtová who biochemically analyzed individuals of *H. molitrix* and *H. nobilis* to confirm pure species. Zuzana Majtánová is deeply acknowledged for help in arranging *H. harmandi* karyotype from ancient photographs.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Centric Fusions behind the Karyotype Evolution of Neotropical** *Nannostomus* **Pencilfishes (Characiforme, Lebiasinidae): First Insights from a Molecular Cytogenetic Perspective**

**Alexandr Sember 1, Ezequiel Aguiar de Oliveira 2,3, Petr Ráb 1, Luiz Antonio Carlos Bertollo 2, Natália Lourenço de Freitas 2, Patrik Ferreira Viana 4, Cassia Fernanda Yano 2, Terumi Hatanaka 2, Manoela Maria Ferreira Marinho 5, Renata Luiza Rosa de Moraes 2, Eliana Feldberg <sup>4</sup> and Marcelo de Bello Cio**ffi **2,\***


Received: 9 December 2019; Accepted: 8 January 2020; Published: 13 January 2020

**Abstract:** Lebiasinidae is a Neotropical freshwater family widely distributed throughout South and Central America. Due to their often very small body size, Lebiasinidae species are cytogenetically challenging and hence largely underexplored. However, the available but limited karyotype data already suggested a high interspecific variability in the diploid chromosome number (2*n*), which is pronounced in the speciose genus *Nannostomus*, a popular taxon in ornamental fish trade due to its remarkable body coloration. Aiming to more deeply examine the karyotype diversification in *Nannostomus*, we combined conventional cytogenetics (Giemsa-staining and C-banding) with the chromosomal mapping of tandemly repeated 5S and 18S rDNA clusters and with interspecific comparative genomic hybridization (CGH) to investigate genomes of four representative *Nannostomus* species: *N. beckfordi*, *N. eques*, *N. marginatus*, and *N. unifasciatus*. Our data showed a remarkable variability in 2*n*, ranging from 2*n* = 22 in *N. unifasciatus* (karyotype composed exclusively of metacentrics/submetacentrics) to 2*n* = 44 in *N. beckfordi* (karyotype composed entirely of acrocentrics). On the other hand, patterns of 18S and 5S rDNA distribution in the analyzed karyotypes remained rather conservative, with only two 18S and two to four 5S rDNA sites. In view of the mostly unchanged number of chromosome arms (FN = 44) in all but one species (*N. eques*; FN = 36), and with respect to the current phylogenetic hypothesis, we propose Robertsonian translocations to be a significant contributor to the karyotype differentiation in (at least herein studied) *Nannostomus* species. Interspecific comparative genome hybridization (CGH) using whole genomic DNAs mapped against the chromosome background of *N. beckfordi* found a moderate divergence in the repetitive DNA content among the species' genomes. Collectively, our data suggest that the karyotype differentiation in *Nannostomus* has been largely driven by major structural rearrangements, accompanied by only low to moderate dynamics of repetitive DNA at the sub-chromosomal level. Possible mechanisms and factors behind the elevated tolerance to such a rate of karyotype change in *Nannostomus* are discussed.

**Keywords:** comparative genomic hybridization; karyotype variability; repetitive DNAs; Robertsonian translocation

#### **1. Introduction**

The Neotropical region harbors the richest freshwater ichthyofauna in the world, with approximately 5200 species belonging to 17 orders, thus representing about 40% of the freshwater biodiversity worldwide [1–3]. Moreover, the amount of cryptic and until now morphologically undistinguishable species suggests much higher species diversity (e.g., [4–9]). Fueled by these discoveries, the knowledge about the karyotype differentiation in Neotropical fishes has been rapidly growing (especially during the last few decades) and several important models for studying both sympatric and allopatric speciation, species complexes, and sex chromosome evolution have emerged [9–11]. As a prominent example, a remarkable cytogenetic variability has been found in the Erythrinidae family (Characiformes) and especially in *Erythrinus erythrinus* and *Hoplias malabaricus*, where several cases of multiple karyotype forms per species, high dynamics of repetitive DNA distribution, and intriguing diversity of chromosomal sex determination have been reported [10,12].

The family Lebiasinidae, which contains at least 72 valid species widely distributed throughout Central and South America, is divided into two subfamilies: The Lebiasininae (genera *Lebiasina*, *Piabucina,* and *Derhamia*) and the Pyrrhulininae (*Pyrrhulina*, *Nannostomus*, *Copeina*, and *Copella*) [13]. The latter represents the most diverse clade and it is also characterized by an extreme reduction of body size in some of its representatives. The most speciose genera in the subfamily are *Nannostomus* and *Pyrrhulina*, as each of them involves 19 species. Members of the genus *Nannostomus*, commonly referred to as pencilfishes, inhabit typically the flooded forests of the Amazon basin and they are valuable for the aquarist pet trade due to their colorful pigmentation. However, from the taxonomic viewpoint, it is one of the most challenging Lebiasinidae genera; therefore, a suite of complementary methodologies, such as cytogenetic comparisons and molecular analyses, including recently applied DNA barcoding, are highly valuable for clarifying this issue ([14,15] and references therein).

The small size of most Lebiasinidae fishes (i.e., ranging from 16 to 70 mm in length) makes the cytogenetic investigations of this group challenging and labor intensive, which may explain the large gaps in their cytogenetic data [16–18]. Nevertheless, a steadily growing body of information on karyotype characteristics in Lebiasinidae has been generated within recent years, using both conventional and molecular cytogenetic techniques, bringing new important pieces into the puzzle of lebiasinid karyotype differentiation and its underlying evolutionary mechanisms. More specifically, high rate of repetitive DNA dynamics and the occasional emergence of neo-sex chromosomes were found among four *Pyrrhulina* taxa; one of them may represent a new, yet undescribed species [19,20]. Furthermore, contrasting patterns of repetitive DNA content and distribution as well as a putative nascent sex chromosome system were also reported for *Lebiasina* species, supporting at the same time relationships between the Lebiasinidae and Ctenoluciidae families [21]. In addition, the first molecular cytogenetic report on *Copeina* species is filling another gap in this research [22]. Hence, for comparative purposes, similar data are necessary to be gathered in the remaining four lebiasinid genera (i.e., in *Copella*, *Derhamia*, *Piabucina*, and *Nannostomus*). A proper comparative cytogenetic survey might further contribute to cytotaxonomic comparisons between Lebiasinidae and evolutionarily related lineages.

In contrast to relative stability of the 2*n* in *Copeina*, *Lebiasina* and *Pyrrhulina* species karyotyped to date [16–22], representatives of *Copella* and *Nannostomus* display remarkable karyotype variability [16,18]. Indeed, even from limited karyotype data, it can be inferred that *Nannostomus* exhibits a wide range of 2*n*, from 22 (in *N. unifasciatus*) to 46 (in *N*. *trifasciatus*) [16,18,23], suggesting an important role of Robertsonian rearrangements in its karyotype differentiation.

The aim of the present study was to provide the first finer-scale cytogenetic investigation in the genus *Nannostomus*, performed both by conventional (Giemsa staining and C-banding) and molecular (fluorescence in situ hybridization (FISH) with 5S and 18S rDNA probes and comparative genomic hybridization (CGH)) methods in four species, namely *N. beckfordi*, *N. eques*, *N. marginatus,* and *N. unifasciatus*.

#### **2. Materials and Methods**

#### *2.1. Sampling*

The collection sites, numbers, and sex of the individuals investigated are presented in Figure 1 and Table 1. All the specimens were collected under the appropriate authorization of the Brazilian environmental agency ICMBIO/SISBIO (License number 48628-2) and SISGEN (A96FF09). The specimens were taxonomically identified and sexed based on morphological characters and they were deposited in the fish collection site of the Museu de Zoologia da Universidade de São Paulo (MZUSP) under the voucher numbers 123071, 123079, 123083, and 123084. The experiments followed ethical and anesthesia conducts, in accordance with the Ethics Committee on Animal Experimentation of the Universidade Federal de São Carlos (Process number CEUA 1853260315) (Figure 1).

**Figure 1.** The map of Brazil with highlighted collection sites of *Nannostomus beckfordi* (blue circle), *N. eques*, *N. unifasciatus* (orange circle), and *N. marginatus* (red circle). The map was created using the following softwares: QGis 3.4.3, Inkscape 0.92, and Photoshop 7.0.

**Table 1.** Collection sites, 2*n* and the sample sizes (N) of the investigated *Nannostomus* species.


#### *2.2. Chromosome Preparation and C-Banding*

Mitotic chromosomes were obtained from kidney tissue using the air-drying technique according to Bertollo et al. [24]. Constitutive heterochromatin was visualized by C-banding following Sumner [25].

#### *2.3. Repetitive DNA Mapping with Fluorescence In Situ Hybridization (FISH)*

We mapped 5S and 18S rDNA tandem repeats generated from the genomic DNA of wolf fish *Hoplias malabaricus* by PCR amplification [26,27]. In the case of 5S rDNA, the resulting amplification product contained 120 base pairs (bp) of the 5S rRNA encoding region and 200 bp of the non-transcribed spacer (NTS). The second amplified fragment encompassed a 1400-bp-long segment of the 18S rRNA gene. 5S rDNA was labeled with digoxigenin-dUTP and 18S rDNA by biotin-dUTP, respectively, both by a nick translation kit, according to the manufacturer's recommendations (Roche, Mannheim, Germany). Fluorescence in situ hybridization (FISH) was performed under high stringency conditions, essentially following Yano et al. [28]. The hybridization mixture for each slide contained 100 ng of each probe, 50% deionized formamide, and 10% dextran sulphate (pH = 7.0), and it was denatured at 86 ◦C for 6 min prior to application. Chromosome preparations were denatured in 70% formamide in 2× SSC (pH = 7.0) for 3 min at 70 ◦C. Following overnight incubation at 37 ◦C in a moist chamber, post-hybridization washes were performed once in 2× SSC (5 min at 42 ◦C) and once in 1× SSC (5 min, Room Temperature). Prior to the probe detection, 3% non-fat dried milk (NFDM) in 2× SSC was applied on each slide (5 min, RT) to avoid the non-specific binding of antibodies. Probes were then detected using Avidin-FITC (Sigma, St. Louis, MO, USA) and Anti-Digoxigenin-Rhodamin (Roche, Basel, Switzerland). Finally, chromosomes were counterstained with 4.6-diamidino-2-phenylindole (DAPI) (1.2 μg/mL) and mounted in an antifade solution (Vector, Burlingame, CA, USA).

#### *2.4. Comparative Genomic Hybridization (CGH)*

We designed a set of experiments aimed at inter-specific genomic DNA comparison among all studied *Nannostomus* species. For this purpose, genomic DNAs (gDNA) from males and females of all species were isolated from liver tissue using a standard phenol/chloroform/isoamyl alcohol extraction [29]. We performed a set of separate experiments, where the *N. beckfordi* genomic probe was co-hybridized with the gDNA of one of the remaining species under study, against the chromosome background of *N. beckfordi*. The probes were generated again by nick translation reaction (Roche) as described above, with a differential labeling system employing biotin-dUTP (for *N. beckfordi*) and digoxigenin-dUTP (for *N. eques*, *N. marginatus*, and *N. unifasciatus*). Besides 500 ng of each labeled probe, the final hybridization mixture also contained 10 μg of unlabeled C0t-1 DNA generated from a *N. beckfordi* female and 10 μg of unlabeled C0t-1 DNA from the female of the compared species, in order to outcompete the excess of shared repetitive sequences. C0t-1 DNA was prepared according to Zwick et al. [30]. The probes were precipitated with 100% ethanol and the air-dried pellets were mixed with a hybridization buffer containing 50% formamide, 10% SDS, 10% dextran sulfate, 2× SSC, and Denhardt's buffer (pH 7.0). The hybridization process took place in a moist chamber at 37 ◦C for 72 h. The hybridization procedure was performed according to Sember et al. [31]. After post-hybridization washes, done twice in 50% formamide in 2× SSC, pH 7.0 (44 ◦C, 10 min each) and three times in 1× SSC (44 ◦C, 7 min each), the probes were detected using Anti-Digoxigenin-Rhodamin (Roche, Basel, Switzerland) and Avidin-FITC (Sigma, St. Louis, MO, USA). Chromosomes were then counterstained with DAPI in antifade solution, as described above.

#### *2.5. Microscopy and Image Processing*

In total, 10 to 20 metaphases per individual were analyzed to confirm the 2*n*, chromosome morphology and FISH results. Images were captured using an Olympus BX50 microscope (Olympus Corporation, Ishikawa, Japan) with CoolSNAP and the images were processed using Image-Pro Plus 4.1 software (Media Cybernetics, Silver Spring, MD, USA). Chromosomes were classified as metacentric (m), submetacentric (sm), subtelocentric (st), or acrocentric (a) according to their centromere positions [32]. Karyotypes were arranged according to the chromosome size within each chromosome category.

#### **3. Results**

#### *3.1. Conventional Cytogenetic Characteristics*

The examined species differed markedly both in the 2*n* and karyotype composition (Figure 2). The karyotypes of *N. beckfordi* (2*n* = 44, FN = 44, where FN stands for the number of chromosome arms, i.e., fundamental number) and *N. eques* (2*n* = 36, FN = 36) were formed exclusively by acrocentric chromosomes. *N. marginatus* displayed, however, 2*n* = 42 and FN = 44, with only one (the largest) metacentric pair in an otherwise fully acrocentric set of chromosomes. In striking contrast, the karyotype of *N. unifasciatus* exhibited 2*n* = 22 and FN = 44, where all chromosomes were bi-armed only (i.e., metacentric and submetacentric ones) (Figure 2).

**Figure 2.** Karyotypes of *Nannostomus* species arranged after conventional cytogenetic protocols. Giemsa staining (left panel), C-banding (right panel). Abbreviations: NBE = *Nannostomus beckfordi*, NEQ = *N. eques*, NMA = *N. marginatus*, NUN = *N. unifasciatus*. Note a remarkable difference in the number, size, and morphology of chromosomes in *N. unifasciatus* in comparison to other studied species. Bar = 5 μm.

C-banding revealed that the constitutive heterochromatin is mainly confined to centromeric regions in all species. Terminal bands could be occasionally found in several (*N. beckfordi*) to few (*N. marginatus*, *N. eques*) chromosomes. Conspicuous heterochromatic blocks were found flanking the centromeres of all metacentric chromosomes in *N. unifasciatus* and the same counts also for a single metacentric pair in *N. marginatus* (Figure 2).

#### *3.2. Patterns of 5S and 18S rDNA Distribution as Revealed by FISH*

All karyotypes resulting from the rDNA FISH experiments are shown in Figure 3. The 5S rDNA probe revealed only one pair of signals in *N. beckfordi* and *N. unifasciatus* while the karyotypes of other species displayed two pairs with this repeat. The second pair of 5S rDNA signals was placed on the short (*p*) arms of the acrocentric chromosome pair No. 10 (in *N. eques*) and in the pericentromeric region of the acrocentric chromosome pair No. 19 (*N. marginatus*), respectively. Moreover, the karyotype of *N. unifasciatus* differed from those of the other species in that it had two syntenic sites in the large metacentric pair No. 2, one located in the proximal region and the second placed interstially.

**Figure 3.** Karyotypes of *Nannostomus* species arranged after dual-color FISH with 5S and 18S rDNA probes. The FISH scheme includes 5S rDNA (red signals) and 18S rDNA (green signals) probes), and chromosomes were counterstained with DAPI. Abbreviations: NBE = *Nannostomus beckfordi*, NEQ = *N. eques*, NMA = *N. marginatus*, NUN = *N. unifasciatus*. Note the exceptional hybridization patterns in *N. unifasciatus* (specifically, the doubled 5S rDNA sites and the position of both rDNA classes near the centromeres of large metacentric chromosomes). Bar = 5 μm.

The 18S rDNA probe marked a single chromosomal pair in all species; however, the location of the signals differed slightly among species. While they were situated on the *p*-arms of the acrocentric pair No. 3 in *N. eques* and No. 4 in *N. marginatus*, respectively, *N. beckfordi* bore 18S rDNA sites on the terminal part of the long (*q*) arms of the acrocentric pair No. 18. Finally, *N. unifasciatus* displayed these cistrons in the proximal region of the metacentric pair No. 7.

#### *3.3. Patterns of Interspecific Genome Divergence as Revealed by CGH*

Cross-species CGH analysis revealed in each separate experiment rather equal binding of both co-hybridized genomic probes to all *N. beckfordi* chromosomes, thus yielding composite yellow signals (i.e., a combination of green and red). This hybridization pattern indicates the shared repetitive DNA content in the respective regions. Both probes hybridized preferentially to many centromeric and telomeric regions. In some cases, the intensity of the signals was biased towards either *N. beckfordi* genomic probe or to the probe of the compared species, probably reflecting the differential amount of specific repetitive DNA classes in the compared genomes. In addition, the chromosomes of *N. beckfordi* also showed many repetitive DNA accumulations which were stained exclusively by the *N. beckfordi* probe (Figure 4).

**Figure 4.** Mitotic chromosome spreads of *Nannostomus beckfordi* after interspecific CGH. Male-derived genomic DNA probe from (**A**) *N. eques*, (**B**) *N. marginatus*, and (**C**) *N. unifasciatus* mapped against male chromosomes of *N. beckfordi*. First column: DAPI images (blue); Second column: hybridization pattern produced by the genomic probe from one of the compared species; Third column: hybridization patterns produced by the genomic probe of *N. beckfordi*. Fourth column: merged images of both genomic probes and DAPI counterstaining. The common genomic regions are highlighted in yellow (i.e., a combination of the green and red hybridization probe). Bar = 5 μm.

#### **4. Discussion**

The herein studied *Nannostomus* species displayed a significant variability in the 2*n* values but with a stable FN equal to 44 in all but one species (*N. eques*; FN = 36). These patterns strongly indicated a series of Robertsonian rearrangements, for which the classification of chromosome arm numbers, i.e., NF value, was originally developed [33]. Nonetheless, because of a lack of clear landmarks to identify the individual chromosome pairs, the comparison across species is arbitrary and based on the chromosomal size and morphology only.

It is necessary to determine whether the evolutionary trajectory of karyotype change in *Nannostomus* is directed mainly towards centric fusions or fissions [34]. For this, the modal 2*n* of characiform fishes and the phylogenetic relationship of *Nannostomus* with the nearest lebiasinid lineages may provide a first useful indication (the principle reviewed in Dobigny et al. [35]). Thus, taking into account that (1) the modal 2*n* for characiforms very likely may be 2*n* = 54 [36], (2) *Lebiasina*, the most basal genus of Lebiasinidae [14] is characterized by 2*n* = 36 [21], and (3) the same 2*n* is also present among Ctenoluciidae species [37], a probable sister family of Lebiasinidae [38], we can infer that the reduction of 2*n* among the *Nannostomus* species was most likely achieved by a series of chromosome fusions. Specifically, according to Benzaquem et al. [15], *N. unifasciatus*, with 2*n* = 22 and with a karyotype formed exclusively by bi-armed chromosomes, is phylogenetically closely related to *N. beckfordi*, which possesses 2*n* = 44 and a karyotype formed by acrocentric chromosomes only. Therefore, considering their relationship, the chromosomal divergence between these species is clearly evidenced by their same NF and different karyotype compositions. Altogether, this suggests that centric fusions were the most probable mechanism behind the emergence of 22 m-sm chromosomes present in *N. unifasciatus*.

From the cytogenetic standpoint only, certain repetitive DNA markers, including 5S and 18S rDNA, have been formerly found to be involved in the formation of centric fusions (e.g., [39–41]). In the case of rDNAs, this may be possibly linked with the susceptibility of these tandemly repeated clusters to double-stranded DNA breaks, perhaps resulting from (1) a frequent rRNA transcription and thus break-prone R-loop emergence, (2) intermingling of NOR (Nuclear Organizer Region)-bearing

chromosomes in the interphase nucleus, or (3) possible association of rDNA-bearing sites during the meiotic prophase I [42–48]. With a few exceptions, the terminal position of the 18S rDNA loci on chromosomes appears to be a common feature for all Lebiasinidae genera analyzed up to now (i.e., *Nannostomus*, *Pyrrhulina*, *Lebiasina*, and *Copeina*) ([19–22], this study). Altogether with Ctenoluciidae [37], this pattern can be considered as symplesiomorphy for both families. Although 5S rDNA displays a more dynamic evolution, with both terminal and interstitial signals among lebiasinids ([19–22], this study), it is noteworthy that *N. unifasciatus* underwent structural chromosome rearrangements involving both 18S and 5S rDNA loci, which have led to a derived pattern of rDNA distribution in this species. It is a rather expected scenario for *N. unifasciatus*, since this species exhibits the lowest 2*n* among Lebiasinidae fishes (2*n* = 22) and hence it may be speculated that the proximal 18S and 5S rDNA sites found in *N. unifasciatus* might rather represent hallmarks of fusion, suggesting the probable direction of chromosome change in this genus. However, despite this observation, it is obvious that, especially in *N. unifasciatus*, there might not be a preferential involvement of rDNA-bearing chromosomes in the formation of centric fusion, as many other uni-armed elements have been engaged in this process, leading to entirely bi-armed karyotype. Therefore, inversely, no major role of rDNA sites in the formation of fusions can so far be hypothesized in *Nannostomus*. Finer-scale analysis expanded in both taxonomic breadth and the number of cytogenetic markers is needed in order to better characterize the karyotype dynamics and to track whether there were also centric fissions or other types of rearrangements occurring in parallel in *Nannostomus* karyotype differentiation.

As another layer of evidence supporting the significant contribution of fusions in the karyotype dynamics of *Nannostomus*, the large blocks of constitutive heterochromatin flanking the centromeres of rather large-sized metacentric chromosomes, as found in the karyotypes of *N. marginatus* and *N. unifasciatus*, may be potentially considered as relics of two previously independent centromeres linked together by the process of fusion. In fact, such a situation has been repeatedly observed in many teleost species (sometimes, again, accompanied by the presence of rDNA sites in the fusion points) [49–54] and it was reported also in other animal taxa, e.g., amphibians [55] or mammals [56,57]. Nonetheless, other studies show that the large pericentromeric heterochromatic blocks can also be found evenly distributed throughout the karyotype regardless of the fusion events (see, e.g., Houck et al. [57] and Sousa et al. [58]).

Despite centric fusions not being a dominant type of chromosome rearrangement in teleosts, it seems that such a mechanism might indeed predominate in some lineages [59]. Within Teleostei, similar patterns of karyotype differentiation as those unraveled in *Nannostomus* have also been reported for African annual killifish genera *Nothobranchius* [60] and *Chromaphyosemion* [61,62], Gobiidae [63], Nothothenoidei [64], ophichthid eels (Ophichthidae) [54], Umbridae [65,66], and, in a broader context, also in the paleopolyploid Salmonidae family, where this process is apparently linked to the re-diploidization processes [67].

Gradual fixation of chromosome fusions may be linked to various selective pressures or to genetic drift [34,68,69]. It is also conceivable that the degree to which centric fusions are tolerated by the species' genome might be determined by specific properties linked to chromatin functional arrangement within the interphase nucleus, such as, e.g., elevated plasticity in the organization of chromosome territories, compartments, and topologically-associating domains [70–76]. It has been shown, for instance, that properly separated chromosome territories prevent the formation of inter-chromosomal fusions [77] and that changes in the architecture of the interphase genome may lead to severe consequences in gene expression [76]. Nonetheless, a recent study shows a high tolerance to disruption of the genome topology by rearrangements in the fruit fly *Drosophila melanogaster* [78]. Therefore, we may theorize that some organisms may better tolerate such alterations while others may be very sensitive to them, with selection acting strongly against the formation of inter-chromosomal rearrangements. Examples of both scenarios can be found among Teleostei, where some clades maintain constant 2*n* equal to 48 or 50 chromosomes while other lineages, including the genus *Nannostomus*, undergo frequent Robertsonian rearrangements [79]. It will therefore be an important aim for the future research to determine the main drivers behind such contrasting karyotype dynamics.

*Genes* **2020**, *11*, 91

The distribution of repetitive DNAs may provide important clues about the pace of genome dynamics and it may also answer several taxonomic issues [9,10,80,81]. Chromosomal mapping of rDNA clusters has repeatedly helped to unveil diverse evolutionary issues (e.g., [82,83]). Particularly in fishes, it provided valuable clues about the incidence of cryptic, morphologically indistinguishable sibling species [5,6,8,10,84], polyploidization and interspecific hybridization events [85,86], a geographical gradient of genomic and morphological change [87], patterns of sex chromosome differentiation [80,88–90], and the correlation of genome dynamics in response to environmental cues [91,92]. Among the *Nannostomus* species investigated here, chromosomal mapping revealed somewhat uniform patterns of distribution for both rDNA classes, with one to few sites of accumulation, as found in most fishes [93,94], as well as in some other lebiasinids [21,22] investigated to date. While some of these sites may appear to be orthologous among the species under study, the frequently high dynamics of these repetitive DNA classes do not allow us to make certain conclusions without additional data (for an exemplary study, see Milhomem et al. [95]). Nonetheless, in addition to the fact that some rDNA sites were clearly involved in Robertsonian fusions (as mentioned above), it may be inferred that like some other related lebiasinids [19], *Nannostomus* species do not show a substantial level of intrachromosomal dynamics that could be detected by the markers selected by us. This inference is further supported by the low to moderate amount of constitutive heterochromatin revealed by C-banding. Another supporting evidence for this assumption came from the CGH experiments. Despite CGH and related methods represent rather "rough" molecular tools, they may show patterns of the genomic divergence between species, as they rely on the presence of genome-specific repetitive DNA classes. As repetitive DNA usually evolves rapidly in diverging genomes, such an approach may yield specific patterns of hybridization depending on the compared species, which (within a certain evolutionary timeframe) correlate with the degree of their divergence [31,96–98]. In the present study, rather minor interspecific differences in the composition of repetitive DNA among the compared *Nannostomus* species were shown. In summary, we propose that the karyotype differentiation in *Nannostomus*, at least in the species under study, was driven mainly by major structural rearrangements and the repetitive DNA content has not yet diverged significantly among the investigated genomes.

Chromosome rearrangements may not always be directly linked to speciation [68], but they may often provide an effective mechanism for post-zygotic reproductive isolation (in the case of fusions, e.g., [57,99,100]). By altering gene expression or by joining previously unlinked genetic material together, for instance, they might facilitate the emergence of evolutionarily advanced (e.g., locally adapted) sub-populations of a given species, thus contributing to diversification [69,101]. In addition, they might also be linked to the emergence of novel sex chromosome systems, such as that recently found in the lebiasinid genus *Pyrrhulina* [20]. Lastly, although additional detailed cytogenetic studies are still needed on a wider taxonomic scale, the present data reinforced the assumption that chromosomal fusions were important drivers of the karyotype evolution in the Neotropical family Lebiasinidae and, especially, in the pencil fishes of the genus *Nannostomus*.

**Author Contributions:** Conceptualization, A.S., P.R.; Formal analysis, A.S., E.A.d.O., P.F.V., C.F.Y., M.d.B.C., M.M.F.M., R.L.R.d.M. and E.F.; Funding acquisition, A.S., L.A.C.B. and M.d.B.C.; Investigation, A.S., E.A.d.O., L.A.C.B., N.L.d.F., P.F.V., C.F.Y., T.H., M.M.F.M. and E.F.; Methodology, A.S., E.A.d.O., N.L.d.F., P.F.V., C.F.Y., M.d.B.C., T.H., M.M.F.M. and R.L.R.d.M.; Project administration, M.d.B.C.; Supervision, A.S., L.A.C.B. and M.d.B.C.; Validation, A.S., P.R., E.A.d.O., N.L.d.F., C.F.Y., T.H., M.M.F.M., R.L.R.d.M., P.R. and E.F.; Visualization, L.A.C.B., P.F.V., C.F.Y., M.d.B.C. and T.H.; Writing—original draft, A.S., P.R. and N.L.d.F.; Writing—review and editing, E.A.d.O., L.A.C.B., P.F.V., C.F.Y., M.d.B.C., T.H., M.M.F.M., R.L.R.d.M., P.R. and E.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** M.B.C. was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (Proc. nos 401962/2016-4 and 302449/2018-3) and CAPES/Alexander von Humboldt (Proc. No. 88881.136128/2017-01). L.A.C.B. was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (Proc. nos 401575/2016-0 and 306896/2014-1), and the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (Proc. No. 2018/24235-0). M.M.F.M. was supported by the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) (Proc. No. 2017/09321-5; 2018/114115). This study was financed in part by the

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, Brasil (CAPES), Finance Code 001.A.S. was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq (152105/2016-6), PPLZ: L200451751 and with the institutional support RVO: 67985904. P.R. was supported by the project EXCELLENCE CZ.02.1.01/0.0/0.0/15\_003/0000460 OP RDE and by RVO: 67985904.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
