**3. Discussion**

Due to the plasticity in corolla shape and color, *Utricularia amethystina* is one of the most polymorphic species within the *Utricularia* genus. This polymorphism resulted in a historically taxonomic complicated group with its systematics only partially resolved to date. During the last decades, several efforts attempted to separate the different *U. amethystina* morphotypes into different species, yet without much success [7,12,13].

In this study, we analyzed the cpDNAs of three morphologically distinct *Utricularia amethystina* from different populations: the purple, white, and yellow morphotypes, aiming to detect intra- and interspecific variations and phylogenetic signals and provide new cpDNA regions for evolutionary studies. In addition, we evaluated the transcription and RNA editing sites for *U. amethystina* populations.

In an attempt to diminish the environmental conditions bias, we have collected the specimens from close populations (~2.8 km between purple and white, 0.2 km between white and yellow, and 2.82 km between purple and yellow. The specimens of *U. amethystina* cpDNA have a typical quadripartite structure present in most land plants and have a similar organization and GC content to other *Utricularia* [24,27].

Among the three *Utricularia amethystina* morphotypes, we found an inversion between the *pet*N and *psb*M genes in *U. amethystina* yellow, representing the first known gene inversion in LSC region identified in Lentibulariaceae chloroplast genomes. Indeed, the same inversion was detected in the chloroplast genome of species of Cannabaceae [30], and microstructural short inversions of 10 bp were also found in the *pet*N-*psb*M region in *Solanus* species [31]. Some comparative cpDNA studies have also identified structural mutations in monilophyte chloroplast genomes, including as many as six inversions and some gene losses (e.g., in [32–34]).

In general, chloroplast deletions/losses are observed among Lentibulariaceae. Indeed, *Utricularia reniformis* suffered a major SSC region retraction due to the losses of NAD(P)H-dehydrogenase (*ndh*) complex genes [27]. In contrast, all other sequenced *Utricularia* cpDNAs have complete *ndh*s gene complexes. These chlororespiratory genes are *ndh*A, B, C, D, E, F, G, H, I, J, and K, and encode subunits of the NADH dehydrogenase complex in plant chloroplast genomes that play a role in plant signaling in the photosynthesis reaction [35] and the reduction and oxidation of plastoquinones [36]. As *U. reniformis* is a terrestrial species, it has been proposed that possibly all terrestrial species of *Utricularia* may lack members from the *ndh* genes complex [27,28]. However, *U. amethystina* is terrestrial, and all three morphotypes retain all plastid *ndhs* complex genes. Therefore, our results now suggest that the *ndh*s in terrestrial *Utricularia* were independently lost and regained, thus refuting the hypothesis (at least for *Utricularia*) that terrestrial species have experienced the loss of *ndh*s genes.

Chloroplast repeats are important regions for replication and DNA stability [37]. Microsatellites or SSRs are tandem repeats of 1–6 base pairs units long that can be used as genetic markers [38]. They are most commonly found in plants and due to genetic variation in the number of tandem repeats units. Therefore, as they produce polymorphism detectable with PCR-based methods banding pattern and genotyping, the SSRs are widely used in population genetics and evolutionary studies [39]. *Utricularia amethystina* has high amounts of mononucleotide repeats in the cpSSR, which is similar to other angiosperms, such as *Arabidopsis thaliana* [40], and other Lentibulariaceae [24,25,27]. Previous results for *Utricularia* indicated that most of SSR were found in coding regions for *U. gibba*, *U. macrorhiza*, *Genlisea margaretae*, and *Pinguicula ehlersiae*. However, for *U. reniformis*, more cpSSR were found in non-coding regions. In *U. amethystina*, long repeats have similar quantities between populations, and as seen in other *Utricularia*, most of them are in coding regions [24,27], an uncommon fact for other angiosperms chloroplast genomes (e.g., see [41]), which could indicate high rates of recombination and rearrangement, as discussed in Silva et al. (2016) [27]. Although long repeats could be the cause for gene rearrangements, we could not find repeats in flanking regions of the genes *psb*M and *pet*N in *U. amethystina* yellow. Therefore, this indicates that other evolutionary forces were involved with the observed inversion of these genes in this species.

For some *Utricularia*, DNA barcoding approaches have been considered a difficult task to perform. For instance, the DNA-barcoding markers, such as ITS, *rbc*L, and *mat*K, could not discriminate all *Utricularia* accessions at the species and population level due to their low level of polymorphism (e.g., *Utricularia* sect. *Utricularia* in Astuti et al., 2019 [42]). Furthermore, *rps*16, *trn*L-F, and *trn*D-T markers cannot discriminate *U. amethystina* populations [13]. Therefore, it is important to explore

regions with high variability at inter- and intraspecies levels that represent potentially useful markers for future studies. Using mVISTA results for the interspecific divergence analysis, it is noticeable that the LSC and SSC regions are more variable than IR regions, corroborating with the results found for identity analyses with *Genlisea* species [25] and other angiosperms [41]. The results showed highly variable regions between the different species, mostly represented by intergenic spacers, such as *trn*H-*psb*A, *trn*K-*rps*16, and *rps*16-*trn*Q, which could be used for interspecies identification.

It is previously proposed that populations from closely related environments should be less divergent if they are of the same species. However, we observed high intraspecific chloroplast sequence variability, although geographical sampling covered a restricted area. Among the regions with high nucleotide diversity and intraspecific variations, there is the intergenic spacer, *trn*H-*psb*A, which is already being used as DNA barcoding in many studies [43]. This study also revealed spots that can be used for populations and phylogenetic analyses due to high variability, such as the genes *trn*H, *psb*A, and intergenic regions, such as *trn*H-*psb*A, *ycf* 3-*trn*S, and *rps*16-*trn*Q (see more in the Results section and Supplementary Material S6). Nevertheless, the spots of diversity near the genes *pet*N and *psb*M should be avoided due to low primer annealing considering that the region could be inverted, as seen in *U. amethystina* yellow.

The preparation of paired-end libraries was enriched for polyadenylated transcripts which causes the instability of organelle transcripts, therefore there is probably underrepresentation of transcripts [44]. However, we were able to observe that almost all chloroplast protein-coding genes are expressed in all sampled flower tissue of *U. amethystina*, except for *ycf* 15, both *rpl*23 duplicated genes in *U. amethystina* purple and yellow, and the *atp*F gene in *U. amethystina* purple (Supplementary Table S7).

The expression profile is similar between samples of the same morphotypes and even expression profile clustering corresponds to the phylogenetic hypothesis proposed in this research. The *rbc*L gene was one of the most highly expressed genes and encodes for one of the most abundant enzymes in nature, the large subunit of ribulose-1-5-biphosphate carboxylase [45]. This protein is involved in fixing CO2 and photorespiration [46]. Moreover, high levels of gene expression were found in Photosystem I (PSI) and II genes (PSII), such as *psa*A and *psa*B, and *psb*A, *psb*B, *psb*C, and *psb*D, these proteins are involved in photosynthesis [47]. Studies of barley leaf activities showed that dark-grown plants were deficient in PSI and PSII proteins [48]. Moreover, Klein et al. (1988) showed that the elongation of translation in *psa*A, *psa*B, *psb*A, and *rbc*L are regulated by light [49,50]. Therefore, considering that the corollas were collected during the day, our results are congruent with the hypothesis of a protein exhibiting light-induced translation.

Interestingly, the *pet*N and *psb*M genes are expressed in all *Utricularia amethystina* biological samples, indicating that, the inversion observed in *U. amethystina* yellow did not affect the expression of these genes.

RNA editing sites are common features of a plant chloroplast. These mutations usually occur from C-to-U in mRNA molecules, and thus have an important role in the differential amino acid generation that can lead to different proteins originated from the same gene [51]. RNA-Seq-based results showed that there is a sum of eight editing sites for all *U. amethystina* morphotypes (Table 2).

The PREPACT3 s prediction showed that most nonsynonymous substitutions were characterized as Alanine to Valine and Serine to Leucine. Both lead to protein variations (Table 2, Tables S8–S10), whereas amino acid changes from Alanine to Valine, Histidine to Tyrosine, Leucine to Phenylalanine, Proline to Phenylalanine, Proline to Leucine, Proline to Serine, Arginine to Tryptophan, Threonine to Isoleucine, and Threonine to Methionine result in no physicochemical properties changes in protein. In addition, the Arginine to Cysteine, Serine to Leucine, and Serine to Phenylalanine mutations can modify protein formation due to hydrophilic (Serine and Arginine) to hydrophobic (Leucine, Phenylalanine, and Cysteine) molecule changes [52,53]. Moreover, PREPACT3 has predicted that the genes *rps*2 and *rpl*32 can be edited from Glutamine into a Stop codon, and despite they could be polycistronic genes as in other plants [54], these genes are still transcribed according to RNA-Seq data.

The presented evolutionary history, based on whole chloroplast DNA genomes, and reconstructed by ML and BI approaches supported the same relationship within the Lentibulariaceae when compared with one or few loci–loci analyses (Figure 8) [2,55]. These analyses and many other studies indicated that *Utricularia amethystina* can be paraphyletic [13]. However, in this study, despite the differences in the specimen, they are still a monophyletic taxon. This indicated that *U. amethystina* morphotypes have a common ancestry, but the sampling of other species from sect. *Foliosa* (*U. tricolor* and *U. tridentata*) and species from the close phylogenetically related sect. *Psyllosperma* would be necessary to shine this issue.

Our results support that the sampling based on three different morphotypes proved to be insufficient to allow firm conclusions on the*U. amethystina* species separation, considering we sampled one individual per morphotype. However, the scenario presented here based on chloroplast genomes suggests that *U. amethystina* morphotypes may be different species as previous studies based on morphometric approach [12] and phylogeny with few loci [13], but with more populations, have suggested.

Moreover, the comparative and functional analyses provided by this study bring new insights into the *Utricularia* chloroplast genome architecture, in particular, the evolutionary history of *ndh* complex genes and other important photosynthesis-related genes. Taken together, these results prove that we are just in the beginning for the understanding of the evolution of chloroplast photosynthesis machinery in the Lentibulariaceae.
