*2.2. Sequence Analysis in the CficCl-61-40 satDNA Family*

Among the multitude of clusters produced by RE pipeline, a BLAST search determined a single cluster that belongs to the CficCl-61-40 satDNA family in the genomes of *C. acuminatum*, *C. bryoniifolium*, *C. ficifolium*, *C. iljinii*, *C. pamiricum*, and *C. suecicum* and seven clusters in genome of *C. vulvaria* (supplementary data 1, Figure 2). The highest percentages of the CficCl-61-40 satDNA family were observed in the *C. acuminatum* and *C. bryoniifolium* genomes (Table 2). Subsequent tandem repeat finder (TRF) analysis allows determination of consensus monomer(s) (supplementary data 1). The algorithm of TRF looks for tandem repeats that are often hidden in larger homologous regions or which may fall well below the level of significance required for other programs to report a match. The detection criteria are based on a stochastic model of tandem repeats specified by percent identity and frequency of insertions and deletions rather than some minimal alignment score and align repeat copies against a consensus sequence, revealing patterns of common mutations [30]. Nucleotide sequence divergence among monomers within satDNA arrays is usually quite low, generally, not exceeding a few percent, and for the purpose of sequence analysis, it is acceptable to manipulate with the satDNA consensus sequence [17]. For *C. ficifolium*, *C. pamiricum* and *C. suecicum* a single monomer of ~40 bp was detected. However, for *C. acuminatum*, *C. bryoniifolium*, *C. iljinii*, and *C. vulvaria*, several derivatives from CficCl-61-40 satDNA family monomers were found inside the single cluster. The following two levels of CficCl-61-40 satDNA family variability in the genomes of *C. album* aggregate diploid species were thus observed: (i) at the inter-cluster level, namely single or multiple RE clusters, and (ii) at the intra-cluster level, namely single monomer or a set of related monomers of different lengths detected by TRF.

*Int. J. Mol. Sci.* **2019**, *20*, x 6 of 18

**Figure 2.** RepeatExplorer (RE) analysis of next-generation sequencing (NGS) data in *Chenopodium*  diploids. (**A**) Cluster 61of *C. ficifolium* demonstrate layouts that are typical for tandem repeats where nodes represent the sequence reads and edges between the nodes correspond to similarity hits; (**B**) Self-to-self comparisons of the contig 25 cluster 61 displayed as dot plots (genomic similarity search tool YASS program output) where parallel lines indicate tandem repeats (the distance between the diagonals equals the lengths of the motifs ~40 bp); (**C**) Agarose gel electrophoresis of PCR products obtained with primers designed from consensus monomer sequence of *C. ficifolium* (Cluster 61) showing typical ladder structure of tandem array. **Figure 2.** RepeatExplorer (RE) analysis of next-generation sequencing (NGS) data in *Chenopodium* diploids. (**A**) Cluster 61 of *C. ficifolium* demonstrate layouts that are typical for tandem repeats where nodes represent the sequence reads and edges between the nodes correspond to similarity hits; (**B**) Self-to-self comparisons of the contig 25 cluster 61 displayed as dot plots (genomic similarity search tool YASS program output) where parallel lines indicate tandem repeats (the distance between the diagonals equals the lengths of the motifs ~40 bp); (**C**) Agarose gel electrophoresis of PCR products obtained with primers designed from consensus monomer sequence of *C. ficifolium* (Cluster 61) showing typical ladder structure of tandem array.
