**3. Discussion**

Application of the RE pipeline for analysis of whole genome shotgun Illumina reads from the genomes of seven diploid *Chenopodium* species from divergent lineages revealed that the investigated CficCl-61-40 satDNA family is the most abundant and oldest component of the *Chenopodium* genome, given that related sequences were found in both *Chenopodium* and *Beta* species. Regarding these two genera, it is essential to note that the genome of *Beta* should be recognized as more static, at least because it contains many fewer species (approximately 7–8 species in total, [33]) in comparison with *Chenopodium* (approximately 150 species [24]). Alignment of the satellite monomers allowed identification of the ancestral DNA fragment of 37 bp that showed 100% identity between *B. corolliflora* from one side and *C. bryoniifolium* and *C. vulvaria* from the other (supplementary data 2). The latter two are species that split off early and possess a modified sequence that is still recognizable by BLAST as a ~40 bp variant of the ancestral monomer. The identified DNA fragment served as a benchmark for our subsequent analyses, in which we intended to characterize intra-unit evolutionary transformations in the diverse *Chenopodium* lineages.

Remarkably, the evolutionary history of the *C. album* aggregate revealed by cpDNA spacers and two low-copy genes [27] correlates fairly well with significant paleoclimatic events. Thus, the early differentiation coincides with the beginning of the Miocene Climatic Optimum in the Burdigalian Age (approximately 20 Mya) (Figure 1). Clade H (*C. vulvaria*) separated upon transition between the Serravallian and Tortonian Ages, ~11 Mya. However, the main lineages were formed in the Pliocene, when due to a cooler and dry, seasonal climate, grasslands spread on all continents, and savannahs and deserts appeared in Asia and Africa. Subsequent speciation within the lineages and the appearance of the majority of polyploids occurred in the Quaternary Period, when the glacial and interglacial epochs succeeded each other. During this time, since there were no places on Earth with identical climate history and since the species of aggregate were spread widely, the CficCl-61-40 satDNA arrays evolved divergently. Excluding clade H, which split off early and is now very different, k-mer-based distance estimation of basic monomer show the most significant differences in genomes of species from clades A and D. It is most likely that both lineages separated early from the ancestral group and evolved independently. This is consistent with the present species distribution ranges and with molecular phylogenetic data [26,27]. However, the pace of evolution of these clades was probably different and is most likely connected with the climatic history of the species distribution areas. In clades B and E, the species are much more similar in the CficCl-61-40 satDNA family structure (Figure 2).

The concept of "molecular drive" [19] postulates that mutations can gradually spread throughout a satDNA family by several of ubiquitous mechanisms of DNA turnover (homogenization) and become fixed in a population. SatDNA families can show a rapid rate of inter-specific evolutionary changes concerning DNA sequence and high levels of conservation between species separated for long evolutionary times [22,34,35]. Although these trends are also true for the CficCl-61-40 satDNA family when monomers are homogenized on the species level in the genomes of different *Chenopodium* lineages, each of them has its own mode and tempo. Although the genome of *C. vulvaria* presents an exception, it seems that concerted evolution does not operate there. This example of non-concerted evolution will be discussed below.

In addition to mutations in basic satellite monomers, a distinct trend toward increased complexity and length of the monomer (HOR unit formation) was recorded in the species of Clades A, D, E and H of the *C. album* aggregate. HORs occur by concurrent amplification and homogenization of different monomers in the original satDNA when a complex monomer is first formed, after which it merges into a more complex HOR unit [17]. The origin of such structures has been described for the alpha satellite of primates [36], for the satellite families in bovids [37,38] and for the plant species *Vicia grandiflora* [39]. A detailed analysis of the CficCl-61-40 satDNA family tandem arrays in the genomes of *C. acuminatum*, *C. bryoniifolium*, *C. iljinii* and *C. vulvaria* along with the basic ~40 bp monomer revealed related but longer monomers of up to 332 bp, suggesting the generation of new species-specific HOR units. Cloning of PCR-amplified DNA fragments in most cases confirmed the accuracy of the monomer/array

compilation produced by the RE pipeline, and the physical counterparts were mostly in agreement with the consensus sequences. However, the exact satDNA array structure of the species could be determined by complete genome sequencing, assembly and annotation [40].

FISH experiments further prove the genesis of species-specific HOR units and their separate locations on the chromosomes. CficCl-61-40 arrays were thus found in all species. On the other hand, related CacuCl-1-117 arrays were found exclusively in *C. acuminatum*, where they form multiple, sometimes separate chromosome clusters, thus creating a species-specific chromosomal pattern (Figure 5). Formation of HOR units based on two or more monomers has been reported in primates and bovids (for a review, see [17]). We observed a similar process but based on the single tribe-specific monomer when unequal changes in the initial sequence in diverging satDNA sets led to monomer alterations with the subsequent merging of the modified monomers in a complex HOR unit. A similar process (i.e., HOR formation based on one initial repeated unit in *Vicia* sp.) was reported by Macas et al. [39]. Presumably, the process of HOR formation on the basis of a single monomer can take more time (in our research, it appears predominantly in ancient species) than that involving two or several monomers, although it apparently contributes to satDNA divergence.

We might next ask whether the formation of HORs is common for plant satDNA evolution. As another example of supposed HOR formation in plants, we can provide a complex structure of the *Hieracium* species centromeric tandem array [41]. Analysis of both RE clusters and the sequenced physical counterparts revealed a complex structure with 21 repetitive elements identified by TRF (ranging from 21 bp to 348 bp) and with two abandoned motifs of 21 and 23 bp. Eventually, we can also observe the stages of HOR formation based on the two short monomers in centromeric regions. It is essential to note that although chromosome segregation machinery is highly conserved across all eukaryotes, centromeric DNA evolves rapidly, and discovered tandem repeats are absent in related *Pilosella* species. Incompatibilities between rapidly evolving centromeric components may be responsible for both the organization of centromeric regions and the reproductive isolation of emerging species [42].

The above examples and the fact that the presented species refer to different large clades of flowering plants suggests that the HOR formation process may not only occur in the *Chenopodium, Hieracium,* and *Vicia* genomes but that this mechanism is also ubiquitous for at least angiosperms and could underlie satDNA divergence in related plant species, as it does in animal genomes. It should also be noted that HOR formation is presumably a species-specific event; in clade B (*C. ficifolium* and *C. suecicum*), neither species showed any sign of HORs. In contrast, in clade E, CficCl-61-40 satDNA family arrays of *C. pamiricum* are uniform, while HORs were detected in the *C. iljinii* genome. However, it is still not clear what triggers the HOR formation in a particular genome [17].

In generalizing the life history of the CficCl-61-40 satDNA family stretching from the ancestral basic repeat unit to species-specific sequences, it is worth noting that the family consists of an extensive group of related, divergent repeats. It is a dominant and old component of *Chenopodium* species genomes and can be characterized by a high complexity of evolution. Independently amplified in each genome, it ultimately acquires lineage-specific profiles due to differential stochastic amplifications, contractions or both. Additionally, in several lineages, a clear trend toward increased complexity and satellite monomer length was observed. Long tandem arrays are characterized by HOR units whose organization and nucleotide sequence are specific for a particular species. Analysis of the sequence organization of these diverged subsets provides a framework for considering mechanisms of sequence diversity generation and for understanding the evolutionary processes of satDNA family homogenization and polymorphism [37]. Homogenization of satellite repeats driven by molecular mechanisms of nonreciprocal sequence transfer occurs simultaneously, which makes satDNA evolve mostly in a concerted manner [3]. Nevertheless, as mentioned above, the small genome of *C. vulvaria* (2C value 0.945 pg) is an exception to this rule. The observed variability indicates a low level of CficCl-61-40 satDNA family homogenization, with multidirectional trends in the *C. vulvaria* genome (non-concerted evolution). Although the data are unusual, our unpublished results on the NGS-based

qualitative analysis of TEs in genomes of the same *Chenopodium* diploid species (where we observed that *C. vulvaria* possesses a unique pool of different and diverse retrotransposons [43]) make it possible to hypothesize a link between the TE dynamics and abnormalities in the homogenization of satDNA families, given that satDNA could be a target for TE insertions [44] and evolve further to species-specific tandem repeats [45]. Suppression of concerted evolution resembles those described for termites by Luchetti et al. [46]. This was proposed to be evoked by the limited number of reproducers, especially considering that *C. vulvaria* is an ancient species, restricted to nutrient-rich bare soil largely of anthropogenic impact and not tolerant of competition [47]. Specific habitats may presumably cause abnormal repeatome composition that, in turn, may support the models assuming that genotypes from marginal populations are evolutionarily significant [48–51]. Despite the causes, discovered suppression of homogenization itself may result in alteration of satDNA libraries, ultimately leading to spontaneous transformation of the entire repeatome, thus producing a novel set of satDNA families for the next round of the conversion cycle, and genomes undergoing non-concerted evolution can be proposed as a significant source of genomic diversity.
