*2.4. Repeat Analysis*

## 2.4.1. Long Repeats

Repeat sequences in the chloroplast genomes of the four Capparaceae species were determined by the REPuter program with default settings; the obtained results clearly show that forward, reverse, palindrome and complemented repeats were detected in the cp genomes (Figure 3). The long repeat analysis in *C. farinosa*, *C. glandulosa, M. crassifolia* and *M. oblongifolia* showed 25–26–18–24 palindromic repeats, 12–12–14–13 forward repeats, 9–8– 16–11 reverse repeats and 3–3–1–1 complement repeats, respectively (Figure 3 and Tables S9–S12). For the majority of the repeats, their sizes are: In *C. farinosa*—20–29 bp (69.38%), followed by 10–19 bp (22.44%), followed by 30–39 bp (4.08%), whereas 40–49 bp and

60–69 bp are the least common, at 2.04%. In *C. glandulosa*—20–29 bp (87.75%), followed by 30–39 bp (6.12%), whereas 10–19 bp, 40–49 bp and 60–69 bp are the least common, at 2.04%. In *M. crassifolia*—20–29 bp (48.97%), followed by 10–19 bp (38.77%), with 50–59 bp and 40–49 bp being the least common, at 6.12% and 4.08%, respectively, whereas 30–39 bp was at 2.04%. In *M. oblongifolia*—20–29 bp (65.30%), followed by 10–19 bp (26.53%), followed by 50–59 bp (4.08%), whereas 30–39 bp and 40–49 bp are the least common, at 2.04%. In total, there are 49 repeats in the chloroplast genomes of the four species. In the first location, the codon region harbored 42.85% of the repeats in *C. farinosa, M. crassifolia* and *M. oblongifolia* and 34.69% in *C. glandulosa;* tRNA contained 7 repeats (14.28%) in *C. farinosa*, 8 repeats (16.32%) in *C. glandulosa*, 9 repeats (18.36%) in *M. crassifolia* and 10 repeats (20.40%) in *M. oblongifolia;* the remainder of the repeats are located in the protein-coding genes—7 repeats (14.28%) in *C. farinosa* and *C. glandulosa,* 6 repeats (12.24%) in *M. crassifolia* and 12 repeats (24.48%) in *M. oblongifolia.* The length of repeated sequences in the four Capparaceae chloroplast genomes ranged from 10 to 59 bp, analogous to the lengths in other angiosperm plants [62–64].

**Figure 3.** Number of different repeats in four chloroplast genomes of four species of Capparaceae. *p* = palindromic, F = forward,R=reverse and C= complement.

2.4.2. Simple Sequence Repeats (SSRs)

The SSRs or microsatellites are a group of short repeat sequences of nucleotide series (1–6 bp), which are used as a tool to facilitate the assessment of molecular diversity [65]. The genetic variation within and among species with the valuable molecular marker of the SSRs is extremely important for studying genetic heterogeneity and contributes to species recognition [66–68]. In this study, there are 249 microsatellites found in the plastid genome of *C. farinosa*, in *C. glandulosa* there are 251, in *M. crassifolia* there are 227 and in *M. oblongifolia* there are 233 (Table 3). The majority of SSRs in the cp genome in *C. farinosa*, *C. glandulosa, M. crassifolia* and *M. oblongifolia* are mononucleotide (88.75%, 89.24%, 90.74% and 90.12%, respectively), of which most are poly T and A (Figure 4). Polythymine (poly T) constituted 50.60%, 52.19%, 51.98% and 52.78%, respectively, whereas polyadenine (poly A) constituted 37.75%, 36.65%, 37.88% and 36.48%, respectively. Only a single polycytosine (poly C) (0.40% and 0.42%) was present in *C. farinosa* and *M. oblongifolia,* whereas two (0.88%) were present in *M. crassifolia*, and only a single polyguanine (poly G) (0.39% and 0.42%) was present in *C. glandulosa* and *M. oblongifolia*. Among the dinucleotides, AT/AT, AC/GT and AG/CT were found in all genomes. Reflecting series complementary, only one trinucleotide, AAT/ATT, six tetranucleotides, AAAC/GTTT, AAAG/CTTT, AAAT/ATTT, AATT/AATT, AACT/AGTT and AGAT/ATCT, and five

pentanucleotides, AAAAT/ATTTT, AAATT/AATTT, AACAT/ATGTT, AAACT/AGTTT and AATAG/ATTCT, were discovered in the genome, while no hexanucleotide repeat was present (Figure 4). A high richness in mononucleotides poly A and T has been observed in most flowering plants' cp genomes [62].


**Table 3.** Simple sequence repeats in the *C. farinosa*, *C. glandulosa*, *M. crassifolia* and *M. oblongifolia* chloroplast genomes.

**Figure 4.** Frequency of different SSR motifs in different repeat types in *C. farinosa*, *C. glandulosa*, *M. crassifolia* and *M. oblongifolia* chloroplast genomes.

The comparison of simple sequence repeats between the chloroplast genomes of the four Capparaceae species (Figure 5) indicated that the more frequent occurrences are the mononucleotide repeats in all the genomes. The largest number of mononucleotides in

*C. glandulosa* was 224, while it did not possess a pentanucleotide, like the remaining three species. Hexanucleotide was not present in any of the four species.

## *2.5. Comparative Analysis of the Capparaceae Species Cp Genome*

To analyze the DNA sequence divergence in the chloroplast genomes of the five species of Capparaceae, a comparative analysis was done using the mVISTA program to align the sequences. Sequence alignment was conducted among four chloroplast genomes of Capparaceae and compared with the chloroplast genome of *Capparis versicolor* (MH142726), available in GenBank. To understand the structural characteristics in the cp genomes, the annotation of *C. farinosa* was used as a reference. The alignment outcome reveals highly conserved genomes with few variations. As in most chloroplast genomes of angiosperm plants, non-coding counterparts were conserved less than the gene-coding regions (Figure 6). Among the five cp genomes, the results showed that *trnH*(*GUG*)-*psbA*, *rps16*-*trnQ*, *psbItrnS, trnS-trnR, petN-psbM, psbM-trnD, trnE-trnT, trnS-trnG, trnT-trnL, trnF-ndhJ, rbcL-accD, psbE-petL, rbs16-rbs3* and *ndhF-rpl32* were the most divergent non-coding regions. However, it was detected that some variations occurred in the following genes: *atpF, rpoC2, rps19* and *ycf1*.

Although angiosperms retain the structure and size of the chloroplast genome [68], some evolutionary events occur in the genome, such as expansion and contraction, that alter the size of the genome and the boundaries of the LSC, SSC, IRa and IRb regions [69,70]. We compared between IR-LCS and IR-SSC the boundaries of the five cp genomes of Capparaceae (*Cadaba farinosa*, *Cadaba glandulosa*, *Maerua crassifolia*, *Maerua oblongifolia* and *Capparis versicolor*) and the result presented a similarity among the compared plastomes of *Cadaba* and *Maerua* species, with a slight variation among *C. versicolor* (Figure 7). The chloroplast genome of *C. versicolor* (155,051 bp) was the smallest, whereas the genome of *C. glandulosa* (156,560 bp) was the largest. The smallest IR region is in *C. versicolor* (26,141 bp). The lengths of LSC regions varied among the five Capparaceae species (85,565 bp, 85,681 bp, 84,624 bp, 84,153 bp, 84,315 bp, respectively). The location of the *rpsl9* gene is between the junction of the LSC and IRb regions in five species and is in the LSC region in *C. versicolor*. The *ycf1* gene is located in IRb regions, except in *C. versicolor*, and it crosses the SSC/IRa region and extends by different lengths into the SSC region within the genome (*C. farinosa* and *C. glandulosa* 4360 bp; *M. crassifolia* 4393 bp; *M. oblongifolia* 4414 bp and *C. versicolor* 4566 bp). The *ndhF* gene is found in the IRb/SSC and is 38 bp in *C. farinosa* and *C. glandulosa*, 32 bp in *M. crassifolia* and 35 bp in *M. oblongifolia* in the IRb region, and it extends into the SSC region by 2209 bp in *C. farinosa* and *C. glandulosa* and 2206 bp in *M. crassifolia* and *M. oblongifolia*, and is 174 bp away from the border in the *C. versicolor* genome.

**Figure 6.** Alignment of chloroplast genomes of *C. farinosa*, *C. glandulosa*, *M. crassifolia, M. oblongifolia* and *C. versicolor* performed with *C. farinosa* as reference. Transcription direction is indicated by the gray arrows at the top, protein coding is represented by blue bars, non-coding sequence CNS is represented by pink bars and tRNAs and rRNAs are represented by light green. The cp genome is identified by the coordinates in the x-axis, while the y-axis represents the percentage identity within 50–100%.

**Figure 7.** Comparison of the IR, SSC and LSC junction positions among five chloroplast genomes of Capparaceae.
