*2.4. Simple Sequence Repeats (SSRs) and Repeat Structure Analyses*

A simple sequence repeat (SSR), which is also known as microsatellite DNA, is a tandem repeat sequence consisting of one to six nucleotide repeat units [22]. SSRs are widely used as molecular markers in species identification, population genetics, and phylogenetic investigations due to their high polymorphism level [33,34]. A total of 238, 226 and 217 SSRs were identified in the chloroplast genomes of *M. cochinchinensis*, *M. tricolor* and *M. bibracteolatus*, respectively (Table 3). Amongst all SSRs, the numbers of mononucleotide repeats were the highest, with values detected at 169, 166 and 162 times in *M. cochinchinensis*, *M. tricolor* and *M. bibracteolatus*, respectively. Amongst these mononucleotide repeats, A/T was found to be the most frequent SSR. In accordance with the number of repeats, mononucleotide and dinucleotide SSRs exhibited a certain base preference that mainly contained A/T units. Long repeat sequences should be >30 bp, and these repeats are mainly distributed in the gene spacer and intron sequences. The result shows that *M. cochinchinensis* presented the highest number, comprising six forward, seven palindromic, four reverse and one complement repeats (Figure 5). Two types of *M. tricolor*, comprising six forward and nine palindromic repeats, were present. *M. bibracteolatus* presented seven forward, six palindromic and two reverse repeats.


**Table 3.** Types and amounts of simple sequence repeats (SSRs) in the chloroplast genomes of three *Macrosolen* species.


**Figure 5.** Repeat sequences in the chloroplast genomes of three *Macrosolen* species. F, P, R, and C indicate the repeat types F (forward), P (palindrome), R (reverse) and C (complement), respectively. Repeats with different lengths are indicated in different colors.

#### *2.5. Comparative Genomic Analyses*

The complete chloroplast of the three chloroplast genomes were compared with that of *M. cochinchinensis* as a reference using the mVISTA program. As shown in Figure 6, the *ycf1* and *ccsA* genes were found to be the most mutant genes. Except for these genes, the other genes were found to be highly conserved, and most of them showed similarities of >90%. The variations in the coding regions were smaller than those in the noncoding regions. Amongst the three chloroplast genomes, the most divergent regions were found to be localized in the intergenic spacers such as *trnF-trnM*. The rRNA genes of the three species were highly conservative, and almost no variations were observed. The K values (sequence divergence between species) were calculated, and the sliding windows of the K values were constructed by the DnaSP [35] (Figure 7). Figure 7 shows that the sequence divergence between *M. tricolor* and *M. cochinchinensis* was much higher than the other two K values. *M. bibracteolatus* and *M. tricolor* showed a small divergence (K < 0.05). The LSC and SSC regions were more divergent than IRs. Two mutational hotspots were found with high K values, and they were located at the LSC and SSC regions. Combined with genes location and the mVISTA result, the two hotspots were found to be *trnF-trnM* and *ycf1*.

**Figure 6.** Sequence identity plot comparing the three chloroplast genomes with *M. cochinchinensis* as a reference by using mVISTA. Grey arrows and thick black lines above the alignment indicate genes with their orientation and the position of their IRs, respectively. A cut-off of 70% identity was used for the plots, and the Y-scale represents the percent identity ranging from 50% to 100%.

**Figure 7.** Sliding window analyses of the three whole chloroplast genomes. X-axis: position of a window. Y-axis: sequence divergence between species of each window. K(a): K values between *M. bibracteolatus* and *M. tricolor*; K(b): K values between *M. bibracteolatus* and *M. cochinchinensis*; K(c): K values between *M. tricolor* and *M. cochinchinensis*.
