**1. Introduction**

Genome evolution can be defined as the multifactorial process of variation of nuclear genome components over time. The process is heterogeneous, and different genomic fractions evolve at different rates. The most rapid changes were recorded for repeatomes, which form the basis of most eukaryotic genomes and consist of repeated and repeat-derived sequences [1,2]. As a subject of concerted evolution, the repeatomes of diverging species mostly change non-independently in a concerted way those results in a sequence similarity of repeating units greater within than among species [3]. Repetitive DNA complexes play an important role in evolutionary genome transformation, and determination of their origin, composition and dynamics is crucial for understanding genomic diversity [4].

The repeatome consists of several large classes, among which transposable elements (TEs) and satellite DNA (satDNA) predominate [5,6]. The latter consists of long, late-replicating, non-coding arrays of tandemly arranged monomers [5,7]. These sequences are often species or genus specific and are considered the most variable fraction of the eukaryotic genome, thus reflecting trajectories of short-term evolutionary change [8–11]. Recent studies suggest that satDNA, which is predominantly concentrated in the heterochromatic regions of chromosomes, is involved in various functions ranging from chromosome organization and pairing to cell metabolism and adjustment of gene functions [12–16]. Despite their particular importance for understanding genome functioning and restructuring during micro- and macroevolutionary processes and the growing awareness of their structure and functional significance, knowledge on the origin and dynamics of satDNA is fragmentary, especially in non-model species.

It is generally accepted that an intraspecific monomer change in various satDNA families is permanent [17]. Related species share a common satDNA library that was present in the common ancestor. Differential amplification of satellites from this library and acquisition of mutations in diverse lineages results in interspecific differences in that fraction [18]. Spreading of a new variant processed by non-Mendelian molecular mechanisms is followed by the fixation of the new variant within a population by sexual reproduction [19–21]. Thus, intraspecific homogenization of the satDNA family and fixation of species-specific polymorphisms occur simultaneously [22], and the main trend of satDNA conversion can be considered as a transformation from the common ancestral to the species-specific tandem repeats. The process appears to be a significant part of speciation at the molecular level [4]. Recently, the possibility of unraveling details of this ubiquitous phenomenon by next-generation sequencing (NGS) technology appeared through comparative analysis of the entire species repeatome. Importantly, this method is applicable not only for model organisms but also for a wide range of wild species, which allows the construction of a generalized model.

In the present study, we sought to explore NGS data using the RepeatExplorer (RE) pipeline [23] to infer satDNA evolutionary dynamics in the genomes of *Chenopodium* s. str. (also referred to as the *Chenopodium album* aggregate). Species of the *C. album* aggregate are distributed worldwide, with the highest species diversity in temperate areas [24]. The majority of these diploid-polyploid species are phenotypically exceptionally plastic [25], in some cases widely distributed and able to grow under a wide range of conditions [26]. We focused on diploid species (2*n* = 2*x* = 18) of the aggregate that represent separate lineages. Specifically,: (i) "clade A" are the species native to America and East Asia (the latter area being represented by *C. bryoniifolium* Bunge); (ii) "clade B" of the Eurasian temperate species *C. ficifolium* Sm. and the boreal species *C. suecicum* Murr.; (iii) "clade D" comprising the only East and Central Asian species, *C. acuminatum* Willd; (iv) "clade E" represented by the Central Asian *C. pamiricum* Iljin and *C. iljinii* Golosk.; (v) "clade H" comprising presumably European and southwest Asian species *C. vulvaria* L; and clades C, F and G consist of polyploid species. By the existence of basic diploid lineages, the origin of the majority of Eurasian polyploid species can be explained as hybridization among the diploid lineages that created subgenomic combinations of individual polyploid taxa (see [27] for details) (Figure 1). This group was selected based on the following two criteria: (i) analyzed species of the genus *Chenopodium* provide an example of a diploid/polyploid complex [26,27] that is very typical for angiosperms and, to a certain extent, can be regarded as a standard model for the divergent evolution of higher plants; and (ii) a basic repeat unit with pan-chromosomal distribution and also related to the satellite monomer of *Beta corolliflora* was previously found in the genome of a *Chenopodium* species [28,29]. This combination of favorable factors makes the study promising for describing satDNA family evolution in a typical group of flowering plants. Given the worldwide distribution of the *C. album* aggregate and its tens of millions of years of evolution [27], we hypothesize the presence of different types of satDNA family transformations in diverged lineages.

**Figure 1.** Phylogenetic tree calculated using Bayesian inference within the *C. album* aggregate estimated based on the concatenated dataset of three chloroplast DNA spacers (adapted from [27]). Major evolutionary lineages (**A**–**H**) are marked by grey rectangles. The numbers above branches correspond to the ages of the particular clades (in millions of years) as inferred by the analysis in BEAST2. Positions of explored diploid species are shown in red. Polyploid species are shown in blue. The schematic stratigraphic time scale (Miocene–Holocene) is shown at the bottom of the **Figure 1.** Phylogenetic tree calculated using Bayesian inference within the *C. album* aggregate estimated based on the concatenated dataset of three chloroplast DNA spacers (adapted from [27]). Major evolutionary lineages (A–H) are marked by grey rectangles. The numbers above branches correspond to the ages of the particular clades (in millions of years) as inferred by the analysis in BEAST2. Positions of explored diploid species are shown in red. Polyploid species are shown in blue. The schematic stratigraphic time scale (Miocene–Holocene) is shown at the bottom of the figure.
