Next Article in Journal
Liquid Chromatography‒Tandem Mass Spectrometry Analysis of Primary Metabolites and Phenolic Acids Across Five Citrus Species
Previous Article in Journal
Oncogene-Induced Senescence Transcriptomes Signify Premalignant Colorectal Adenomas
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Complete Chloroplast Genome Sequences of Three Canna Species: Genome Characterization, Comparative Analyses, and Phylogenetic Relationships Within Zingiberales

1
Jiangsu Key Laboratory for the Research and Utilization of Plant Resources, Institute of Botany, Jiangsu Province and Chinese Academy of Sciences (Nanjing Botanical Garden Mem. Sun Yat-Sen), Nanjing 210014, China
2
Jiangsu Engineering Research Center of Aquatic Plant Resources and Water Environment Remediation, Nanjing 210014, China
3
College of Horticulture, Nanjing Agricultural University, Nanjing 210095, China
4
College of Biology and the Environment, Nanjing Forestry University, Nanjing 210037, China
*
Author to whom correspondence should be addressed.
Curr. Issues Mol. Biol. 2025, 47(4), 222; https://doi.org/10.3390/cimb47040222
Submission received: 12 February 2025 / Revised: 13 March 2025 / Accepted: 13 March 2025 / Published: 25 March 2025
(This article belongs to the Section Molecular Plant Sciences)

Abstract

:
Canna, the sole member of the Cannaceae family, is widely cultivated as an ornamental plant for its decorative flowers and foliage and is also a potential tuber crop due to its high starch content. This study sequenced, assembled, and analyzed the complete chloroplast (cp) genomes of three common Canna species with distinct leaf colors (green, purple, and variegated). The four cp genomes ranged from 164,427 to 164,509 bp in length, had a GC content of 36.23–36.25%, and exhibited identical gene content and codon preferences. Each genome contained 130 genes, including 110 unique genes (78 protein-coding genes, four of unknown function, four rRNAs, and 28 tRNAs), 18 duplicated genes located in the IR regions (six protein-coding genes, two of unknown function, four rRNAs, and eight tRNAs), and two trnM-CAU genes in the LSC region. SSR and long-repeat showed differences in long repeats numbers and distributions among the four cp genomes, highlighting potential molecular markers for Canna species identification and breeding. Comparative analysis showed high conservation across Canna cp genomes. Phylogenetic analysis confirmed a close relationship between Cannaceae and Marantaceae and supported a [Musaeceae (Cannaceae + Marantaceae)] clade as a sister group to Costaceae. The cp genome data generated in this study provide valuable insights for developing molecular markers, resolving taxonomic classifications, and advancing phylogenetic and population genetic studies in Canna species.

1. Introduction

Canna, or canna lily, native to the American tropics, is a well-known ornamental plant valued for its brilliantly colored flowers and large tropical foliage. It is widely used in landscapes, gardens, parks, and patios [1,2]. Canna is the only genus in the family Cannaceae, which belongs to the order Zingiberales. The genus Canna comprises about 51 species, 10 of which are native to America. C. glauca, C. indica, C. iridiflora, C. warscewiczii, and C. flaccid are recognized as the basal species responsible for the origin of garden cannas [1].
Canna is also used in constructed wetlands for water and wastewater treatment due to its high biomass production, rapid growth, and fibrous root structure. The aerenchyma in Canna delivers oxygen to the rhizosphere, facilitating bacterial nitrification processes [3,4,5]. In addition to removing environmental nutrients, Canna-planted constructed wetlands efficiently eliminate toxic contaminants such as fluoride, heavy metals, pesticides, pharmaceuticals, and industrial chemicals [3,4,5,6,7,8]. Canna root starch has long been used in the food industry, e.g., bakery [9,10]. It has been reported that 49.07% of Canna root starch is classified as high-amylose starch [10]. Additionally, Canna contains various phytochemicals and exhibits antibacterial, antiviral, anti-inflammatory, analgesic, antioxidant, and other medicinal properties [11,12].
As an ornamental plant valued for its large foliage, Canna exhibits leaves in various colors, including green, ruby, purplish, and multicolored forms [2]. Chlorophyll in chloroplasts (cp) is the primary factor influencing leaf color; therefore, chloroplast development and differentiation are crucial for foliage ornamental plants [13,14]. In Hydrangea macrophylla var. maculata, irregular chloroplast development results in silver-white leaf coloration [15]. In Liquidambar formosana, a reduction in chloroplast number and size contributes to leaf color changes under cold stress [16]. Cp, which possess their own genome independent of the nuclear genome, are semi-autonomous organelles primarily responsible for photosynthesis in plants [17,18]. Additionally, chloroplast function influences plant growth and stress tolerance, both of which are critical for phytoremediation [4,19,20]. In higher plants, Cp also participate in various biological processes, including metabolite synthesis and the assimilation of sulfur and nitrogen. These functions not only affect plant growth but also play a key role in plant stress tolerance [17,21].
Due to its independent genome, not all cp proteins are encoded in the nuclear genome [18]. The cp genome encodes many key proteins involved in photosynthesis and other metabolic processes, highlighting its importance in studying plant light systems, leaf color, and environmental interactions [17,22]. Additionally, cp genomes are commonly used in evolutionary and comparative genomic studies due to their maternal inheritance and highly conserved structure [23,24]. Understanding plant origins and evolution can also aid plant breeding [25]. The difficulty in germinating Canna seeds has limited hybridization efforts. Cp genomes may serve as promising targets for genome editing in plant breeding [1,26]. Advancements in high-throughput sequencing and algorithm innovations have made cp genome assembly and whole-cp-genome-based phylogenomics more accessible and cost-effective [23]. However, despite Canna being a well-known ornamental plant genus with applications in landscaping, medicine, and phytoremediation, only three complete cp genomes have been sequenced and assembled: C. indica (MK561603), C. edulis (MK561602, MN832865) [27]. In this study, we selected three Canna species with different leaf colors—C. edulis L. (green leaves), C. warscewiezii A. Dietr. (purplish leaves), and C. generalis ‘Striata’ (green leaves with yellow stripes)—to investigate potential relationships between cp genome structures and leaf coloration (Figure 1). Four different cp genomes from these species were sequenced using high-throughput sequencing, and the complete cp genome sequences were assembled. Their genomic structures were characterized, and phylogenetic analyses were conducted. Given the multicolored leaves of C. generalis ‘Striata’ cp DNA was separately isolated, sequenced, and analyzed from both the green and golden leaf regions. Additionally, the assembled cp genomes were compared with related species in the Zingiberales order to further explore the phylogenetic position of Cannaceae within Zingiberales. This study primarily characterizes the cp genome structure of Canna species and explores their phylogenetic position within Zingiberales. The findings provide valuable insights for further research on Canna phylogenetic classification, molecular marker development, chloroplast gene discovery, and functional genomics.

2. Materials and Methods

2.1. Plant Material, DNA Extraction and Sequencing

Canna species with different leaf colors (green, purple, and multicolored) were selected for this study. Fresh leaves were collected from C. edulis, C. warscewiezii, and C. generalis ‘Striata’, which were deposited at the Institute of Botany, Jiangsu Province, and the Chinese Academy of Sciences. For C. generalis ‘Striata’, both the green and golden parts were sampled. Total leaf DNA was extracted using the EZgeneTM SuperFast Plant Leaves DNA Kit (Biomiga, San Diego, CA, USA). DNA quality was assessed via spectrophotometry and agarose gel electrophoresis. An average of 350 bp paired-end libraries were prepared using the Illumina TruSeq DNA Sample Prep Kit (Illumina Inc., San Diego, CA, USA) and sequenced on an Illumina NovaSeq 6000 platform following the manufacturer’s protocol.

2.2. Genome Assembly and Annotation

Raw reads were filtered using fastp [28], retaining those with <5% unidentified nucleotides and >50% of bases with a quality score > 20 as high-quality reads. The high-quality reads were then aligned to a reference cp genome database constructed by Genepioneer Biotechnologies (Nanjing, China) to extract cp-like reads using Bowtie2 [29]. The extracted reads were assembled into contigs and scaffolds using the de novo assembler SPAdes v3.15.5 [30] and SSPACE 2.1.1 [31], followed by gap filling with Gapfiller v1.11 [32]. Coding sequences (CDS), rRNA, and tRNA were predicted using Prodigal v2.6.3 [33], HMMER 3v.3.2 [34], and ARAGORN v1.2.41 [35], respectively. The genome map was visualized using OrganellarGenomeDRAW (OGDRAW) [36] [https://chlorobox.mpimp-golm.mpg.de/OGDraw.html, accessed on 17 August 2024] and the R package Chloroplot v0.2.4 [37].

2.3. The Relative Synonymous Codon Usage Analysis (RSCU), Simple Sequence Repeats (SSR) Prediction

Unique CDS were filtered using Perl scripts developed by Genepioneer Biotechnologies (Nanjing, China). The RSCU value was calculated as the ratio of the observed frequency to the expected frequency of a particular codon. Synonymous codon preference was categorized into four models: high preference (RSCU > 1.3), moderate preference (1.2 ≤ RSCU ≤ 1.3), low preference (1.0 < RSCU < 1.2), and no preference (RSCU ≤ 1.0) [38]. Simple sequence repeats (SSRs) were identified using MISA v2.1 [39] with the following search parameters: mono-nucleotide units appearing at least 8 times, di- nucleotide units 5 times, tri-nucleotide units 3 times, and tetra-, penta-, and hexa-nucleotide units 3 times each.

2.4. Sequence Divergence Analyses of the Four Canna Cp Genomes

Using the cp genome of C. edulis (MN832865) as a reference, the four Canna complete cp genomes were compared using mVISTA in Shuffle-LAGAN mode (https://genome.lbl.gov/vista/mvista/submit.shtml, accessed on 17 August 2024) [40,41]. IRscope v0.1.R was used to analyze variations in the LSC/IRb/SSC/IRa region borders [42].

2.5. Phylogenetic Analysis

A phylogenetic tree was constructed based on the four cp genome datasets and 30 additional Zingiberales cp genomes downloaded from GenBank, with Zea mays L. (NC001666) as the outgroup. Genome sequence alignment was performed using MAFFT v7.453 [43,44]. Additionally, phylogenetic trees were generated using MAFFT and MEGA11, applying the Maximum Likelihood method and Tamura-Nei model with 1000 bootstrap replicates [45,46].

3. Results

3.1. Genome Assembly and Structure of the Four Canna Species Cp Genomes

Four different cp genomes of Canna species (C. with green leaves, C. warscewiezii A. Dietr. with purplish leaves, and C. generalis ‘Striata’ with green leaves and golden stripes) were sequenced. Then, the complete circular chloroplast genomes of them were assembled. The assembled cp genomes were similar in size: C. edulis (164,479 bp), C. warscewiezii (164,509 bp), C. generalis ‘Striata’ (green, 164,427 bp), and C. generalis ‘Striata’ (yellow, 164,479 bp). Each cp genome exhibits a typical quadripartite structure, consisting of a large single-copy (LSC) region, a small single-copy (SSC) region, and a pair of inverted repeat (IR) regions (Table 1). The overall GC content ranges from 36.23% to 36.25%, with the IR regions having the highest GC content (42.36–42.48%) and the SSC region the lowest (30.14–30.23%).

3.2. Gene Annotation of the Four Canna Species Cp Genomes

The predicted genes of the four cp genomes were assembled and annotated (Figure S1). Notably, all four genomes share an identical structure. Each genome contains 130 predicted genes, including 110 unique genes (78 protein-coding genes, including four of unknown function, four rRNAs, and 28 tRNAs), 18 duplicated genes (six protein-coding genes, including two of unknown function, four rRNAs, and eight tRNAs) located in the IR regions, and two trnM-CAU genes in the LSC region (Figure 2, Table 1). Most genes lack introns, except for nine protein-coding genes and six tRNA genes, which contain a single intron, and three protein-coding genes, which contain two introns. A total of 18 genes with introns are shared with many other plants [47]. Gene function analysis classified the genes into four categories: 44 associated with photosynthesis, 56 involved in self-replication, six with other functions, and four of unknown function (Table 2). trnK-UUU has the longest intron (2765 bp), which contains matK. This finding is consistent with those in the cp genomes of Ilex, Zingiber, Cymbidium, and Forsythia [48].

3.3. Codon Preference Analysis

The four cp genomes in this study exhibit the same codon preference pattern, with three termination codons and 63 codons encoding amino acid sequences in genes. Only tryptophan encoding shows no codon preference (Figure 3, Table S1). Among all codons, 31 preferred codons were identified—30 encoding 18 amino acids and one being a stop codon. An RSCU value of 1.0 < RSCU < 1.2 indicates a weak preference, 1.2 ≤ RSCU ≤ 1.3 represents a moderate preference, and RSCU > 1.3 signifies a strong preference [38]. A total of 66 codon types (25,987 codons in total) encoded 20 different amino acids, with 31 codons having RSCU > 1. Three codons showed weak preference (9.68%), five exhibited moderate preference (16.13%), and 23 displayed strong preference (74.19%). Similarly to many other cp genomes, most preferred codons ended with A or T, except for TTG [48].

3.4. Repeat Structure and SSR Analysis

Simple sequence repeats (SSRs) loci in the four cp genomes were identified, revealing similar SSR distribution patterns (Figure 4). A total of 181 SSR loci were identified in C. edulis and C. generalis ‘Striata’, whereas C. warscewiezii had 180, one fewer than the other three cp genomes. Among the SSRs, 43, 44, 44, and 43 complex SSRs with multiple repeats were found in C. edulis, C. warscewiezii, C. generalis ‘Striata’ (green part), and C. generalis ‘Striata’ (yellow part, respectively. The number of complex SSRs was significantly higher than in other plant species such as Ilex and Eremochloa species [48,49]. Among perfect SSRs, 84, 83, 84, and 84 mononucleotide repeats were identified in C. edulis, C. warscewiezii, C. generalis ‘Striata’ (green part), and C. generalis ‘Striata’ (yellow part, respectively. The number of dinucleotide SSRs was 7, 6, 6, and 7. The trinucleotide and tetranucleotide SSRs were consistent across cp genomes, with 42 trinucleotide and five tetranucleotide repeats each. Most mononucleotide repeats consisted of polyadenine (poly A) and polythymine (poly T), with only a few polyguanine (poly G) and polycytosine (poly C) repeats (seven in each cp genome). All dinucleotides in the four cp genomes were AT/TA repeat motifs, while GA/TC repeat motifs occurred only in complex SSR loci. Pentanucleotide repeats were identified exclusively in complex SSR loci, with one AAAAT and one AATTT repeat motif in all four cp genomes and one TATTA repeat motif in C. generalis ‘Striata’ (yellow part). SSR distribution varied across genomic regions. Each cp genome contained 73 SSRs in introns and 23 in coding regions. In intergenic regions, 85 SSRs were detected in C. edulis and C. generalis ‘Striata, whereas C. warscewiezii had 84. More than half of the SSRs were located in LSC regions (109, 108, 110, and 110 in each cp genomes, respectively).
The long-dispersed repeats exhibit greater differences among the four cp genomes (Figure 5) than SSRs. Long repeats can be classified into four types based on direction and complementarity: forward, reverse, complement, and palindromic. In the cp genomes of C. edulis and C. warscewiezii, 129 long repeats were identified. The C. generalis ‘Striata’ yellow part cp genome contains 125 long repeats, while the green part cp genome has only 119. More than 40 repeats in each cp genome are forward repeats (ranging from 45 to 49), most of which are located in the LSC regions (Table S2). The C. generalis ‘Striata’ cp genomes contain fewer reverse repeats (31 in each) compared to C. edulis and C. warscewiezii (37 and 35, respectively). Similarly to cp genomes in some other plants, most long repeats are 30–40 bp in length [22,48,50].

3.5. Comparative Chloroplast Genome Analysis

The entire cp genomes of C. edulis, C. warscewiezii, and C. generalis ‘Striata’ were compared with the published C. edulis cp genome (MN832865) as a reference using mVISTA (Figure 6). White peaks in the figure indicate sequence variation among Canna species. The results reveal a high degree of sequence conservation between the four cp genomes and the published C. edulis cp genome. Sequence variations were mainly found in non-coding regions rather than coding regions, with only one variable site detected in the exons of ycf2. Although the four materials exhibit different leaf colors, only minor variations were observed between their cp genomes, suggesting that cp genomes have a limited effect on leaf color. However, these variations could be useful for developing DNA markers for Canna species identification or phylogenetic research.

3.6. Expansion and Contraction of IRs

The comparison of the IR boundaries and their adjacent genes in the four cp genomes is presented in Figure 7. Overall, the junctions are highly conserved across all four cp genomes, particularly for JSB (SSC/IRb) and JSA (SSC/IRa). The pseudogenes located at JSA and JSB are both ycf1, and the distances between the gene boundaries and the junctions are identical among all four cp genomes. The distances between ycf1 boundaries and JSB are 1469 bp and 12 bp, while for JSA, they are 3942 bp and 1469 bp. The pseudogene closest to JLB is rps19, and the distances between them vary among the four cp genomes. The shortest distance is 27 bp in the C. edulis cp genome, 57 bp in the C. warscewiezii cp genome, and over 120 bp in both C. generalis ‘Striata’ cp genomes. The trnH pseudogenes are located within the IR regions in all four cp genomes. The distance between trnH and the LSC boundary in the C. edulis cp genome is 260 bp, which is 29 bp longer than in the C. warscewiezii cp genome. In C. generalis ‘Striata’, the distances are shorter: 150 bp for the green part and 166 bp for the yellow part.

3.7. Phylogenetic Relationship Analysis of Cannaceae

The phylogenetic relationships of Canna species and Cannaceae were determined. The four cp genomes generated in this study, along with the published cp genome sequences of 31 Zingiberales species, were aligned using MAFFT. Zea mays (Gramineae) was set as the outgroup. A phylogenetic tree was constructed using the Maximum Likelihood method and the Tamura-Nei model with 1000 bootstrap replicates in MEGA11 (Figure 8). Species within each family clustered into distinct clades. Cannaceae contains only one genus, Canna, in which C. edulis (MK561602, MN832865, and the genome generated in this study), C. warscewiezii, and C. generalis ‘Striata’ formed a closely related group. In contrast, C. indica (MK561603) was placed outside this group, suggesting that cultivars with different leaf colors may have been bred from C. edulis. According to the APG IV system, Cannaceae is classified under the order Zingiberales [51] and is closely related to Marantaceae, consistent with previous research findings [52]. However, the relationships between Cannaceae and other ginger families (Costaceae and Zingiberaceae) were not well supported. It is generally believed that Costaceae and Zingiberaceae form a group close to Cannaceae and Marantaceae. However, in the phylogenetic tree generated in this study based on cp genomes, Cannaceae and Marantaceae were more closely related to Musaceae than to Costaceae or Zingiberaceae. This result aligns with Zhu et al.’s findings using the same cp genome-based approach [27,53]. Additionally, Costaceae was found to be more closely related to Cannaceae than Zingiberaceae.

4. Discussion

4.1. The Structures and Gene Identification in Canna Species Cp Genomes

In land plants, cp genomes are highly conserved in both size and structure. The cp genomes of most land plants range from 120 to 160 kb in length; however, genome size varies from 15,553 bp in Asarum minus to 521,168 bp in Floydiella terrestris [47,54]. The cp genome structures of vascular plants are more dynamic than those of nonvascular plants [47]. In this study, the cp genomes of the four Canna species are approximately 164.5 kb in length, similar to other reported Canna species cp genomes [27] and Zingiberales cp genomes, which range from 161 kb in Zingiber montanum to 173 kb in Musa basjoo. All four Canna cp genomes exhibit a circular structure with a typical quadripartite organization, featuring LSC and SSC regions separated by a pair of IRs, similar to most angiosperms. The expansion and contraction of IRs are key mechanisms driving cp genome size variation. IRs in seed plants are typically larger (20–30 kb) than those in other land plants (10–15 kb) [47]. The IR sizes in the four Canna species range from 27.18 kb (C. generalis ‘Striata’) to 27.28 kb (C. edulis), which are larger than those in many Gramineae species (20–23 kb). However, the LSC and SSC regions in Canna species are also larger than those in Gramineae species [49,55,56]. Additionally, the four cp genomes share similar GC content across the complete genome and different regions.
In land plants, each cp genome typically contains about 80 protein-coding genes, 4 rRNAs, and 30 tRNAs [47,57,58]. The gene structure of the four cp genomes is identical in both gene number and function, with 130 predicted genes categorized into the same functional groups. In these cp genomes, IR expansion has relocated ycf2 and trnH from the LSC into the IR, similar to the IR expansion observed in Acorus species cp genomes [47,59]. Interestingly, in the four Canna species cp genomes, trnM-CAU is not only duplicated in the IR regions but also present in the LSC region with two copies. This duplication pattern, also found in a previously reported Canna cp genome [27], is uncommon in other plant cp genomes such as those of Ilex [48], Gramineae [49], Trifolium [22], or Lespedeza [60]. As ATG is the start codon for methionine (M), the multiple copies of trnM-CAU in Canna species may reflect dynamic translation activity in their chloroplasts.

4.2. Repeat Sequences in Canna Species Cp Genomes

Different types of repeat sequences are major components of plant genomes, accounting for up to 90% of genome size. These repeat sequences are either distributed throughout the genome or confined to specific regions [61,62]. They play multiple roles, including gene expression regulation, genome stability, recombination, and chromatin modulation [63]. In this study, more than 100 dispersed repeats were identified in each of the cp genomes, with most located in the LSC regions, similar to other plant cp genomes such as Ilex dabieshanensis [48] and Eremochloa ophiuroides [64]. A high content of dispersed repeats in the LSC may be a common feature of plant cp genomes.
SSRs are short sequences arranged in tandem repeats of 1 to 6 nucleotide motifs. They are highly abundant and randomly dispersed throughout nuclear and plastid genomes [65]. Due to their high polymorphism and abundance, SSRs are commonly used for marker-assisted selection (MAS) and map-based cloning [66,67]. Previous research has shown that SSRs provide sufficient polymorphism among cp genomes [68]. However, in the four Canna cp genomes, SSRs exhibited similar patterns, with nearly identical numbers (180–181) and repeat type. In contrast, long repeats showed greater variation in type, number, and distribution in types, numbers, and distributions. Canna breeders could utilize long repeats as molecular markers through techniques such as PCR or Sanger sequencing due to their convenience and low cost [66,67]. Most SSR motifs in the four cp genomes are rich in adenine (A) and thymine (T), consistent with previously reported cp SSRs [22,48,64].

4.3. Phylogenetic Analysis via Cp Genomes

Over the past three decades, plant cp genomes have been widely used in evolutionary and phylogenetic research due to their sequence conservation, small and simple genome, and limited horizontal gene transfer [69]. To date, thousands of complete cp genome sequences from over 2000 species have been uploaded to GenBank [47], providing valuable resources for studying phylogenetic relationships in plants [69]. Newly sequenced cp genomes will continue to enhance our understanding of relationships among various orders, families, and genera. In this study, the four cp genomes provide new sequences for research on the evolution of the Cannaceae family and the Zingiberales order. The phylogenetic analysis suggests a clade of (Musaceae (Cannaceae + Marantaceae)) as a sister group to Costaceae, differing from the clade ((Cannaceae + Marantaceae) + Costaceae + Zingiberaceae)) identified using multiplexed exon capture technology, which strongly supports the monophyly of the ginger families [52]. However, similar results have also been reported in previous studies based solely on cp genomes [27]. Conflicts between phylogenetic relationships inferred from cp genomes and those based on nuclear genes or genomes are common. These conflicts may provide insights into significant evolutionary events such as ancient hybridization, polyploidy, and incomplete lineage sorting [69]. Only a few complete cp genomes of the Cannaceae family have been sequenced and assembled, far fewer than those of Musaceae or Zingiberaceae (available in GenBank). Additionally, compared to cp genomes, mitochondrial genomes (mt genomes) vary greatly in both size and structure. Integrating mt genome information could offer new perspectives on taxonomy [70]. However, mt genome data for Zingiberaceae remain limited. To clarify relationships within the Zingiberales order, more cp, mt, and nuclear genomes from the Cannaceae family should be sequenced and uploaded, given the ease of obtaining complete sequences using next-generation sequencing technology [23,47].

5. Conclusions

The complete cp genomes of four Canna species with different leaf colors were sequenced, assembled, and analyzed in this study. All four cp genomes exhibited the same gene prediction pattern and codon preference. Each cp genome contained 130 predicted genes, including 110 unique genes, 18 duplicated genes in the IR regions, and 2 trnM-CAU genes in the LSC region. Codon usage showed a bias toward A/T endings, similar to other plants. Comparative cp genome analyses showed high conservation among the four cp genomes. Dispersed long repeats exhibited greater polymorphism than SSRs, suggesting their potential utility in Canna breeding. These findings further highlight the conserved nature of plant cp genomes. Phylogenetic analysis confirmed the close relationship between Cannaceae and Marantaceae. However, Musaceae was found to be more closely related to Cannaceae and Marantaceae than to Costaceae or Zingiberaceae, providing new insights into their evolutionary history. The cp genome data generated in this study are valuable for germplasm identification, taxonomic clarification, phylogenetic resolution, and genetic improvement of Cannaceae. Additionally, integrating these data with other omics approaches, such as transcriptomics and metabolomics, could provide a more comprehensive understanding of Canna biology and its potential for genetic enhancement.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cimb47040222/s1.

Author Contributions

Conceptualization, Y.C.; methodology, L.S. and J.L.; software, F.L. and L.S.; validation, W.W.; Materials, J.L. and Y.C.; data curation, F.L. and J.L.; writing—original draft preparation, L.S.; writing—review and editing, L.S. and Y.C.; visualization, L.S.; supervision, Y.C.; funding acquisition, Y.C. and D.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Six Talent Peaks Project in Jiangsu Province (TD-JNHB-008), the Projects of Independently Development (BM2018021-7) of Jiangsu Provincial Department of Science and Technology of China, and Open Research Foundation of Jiangsu Key Laboratory for the Research and Utilization of Plant Resources (JSPKLB202301).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are openly available in GenBank (www.ncbi.nlm.nih.gov/, accessed on 4 October 2024.), accession number OR502631, OR502632, OR502633, and OR502634.

Acknowledgments

We are grateful to all lab members for their suggestions, support, and encouragement.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
cpChloroplast
MASMolecular assistant selection
MLMaximum likelihood
LSCLong single-copy
SSCShort single-copy
SSRSimple sequence repeat
IRinverted repeats

References

  1. Srivastava, R.; Punetha, P. Canna. In Floriculture and Ornamental Plants; Datta, S.K., Gupta, Y.C., Eds.; Springer: Singapore, 2021; pp. 1–22. ISBN 9789811515545. [Google Scholar]
  2. Zhao, T.; Pan, X.; Ou, Z.; Li, Q.; Zhang, W. Comprehensive evaluation of waterlogging tolerance of eleven Canna cultivars at flowering stage. Sci. Hortic. 2022, 296, 110890. [Google Scholar] [CrossRef]
  3. Cule, N.; Vilotic, D.; Nesic, M.; Veselinovic, M.; Drazic, D.; Mitrovic, S. Phytoremediation potential of Canna indica L. in water contaminated with lead. Fresenius Environ. Bull. 2016, 25, 7. [Google Scholar]
  4. Pavlineri, N.; Skoulikidis, N.T.; Tsihrintzis, V.A. Constructed floating wetlands: A review of research, design, operation and management aspects, and data meta-analysis. Chem. Eng. J. 2017, 308, 1120–1132. [Google Scholar] [CrossRef]
  5. Karungamye, P.N. Potential of Canna indica in constructed wetlands for wastewater treatment: A review. Conservation 2022, 2, 499–513. [Google Scholar] [CrossRef]
  6. Zhu, H.; Yu, X.; Xu, Y.; Yan, B.; Bañuelos, G.; Shutes, B.; Wen, Z. Removal of chlorpyrifos and its hydrolytic metabolite in microcosm-scale constructed wetlands under soda saline-alkaline condition: Mass balance and intensification strategies. Sci. Total Environ. 2021, 777, 145956. [Google Scholar] [CrossRef]
  7. Li, X.; Zhu, W.; Meng, G.; Zhang, C.; Guo, R. Efficiency and kinetics of conventional pollutants and tetracyclines removal in integrated vertical-flow constructed wetlands enhanced by aeration. J. Environ. Manag. 2020, 273, 111120. [Google Scholar] [CrossRef]
  8. Li, D.; Li, B.; Gao, H.; Du, X.; Qin, J.; Li, H.; He, H.; Chen, G. Removal of perchlorate by a lab-scale constructed wetland using achira (Canna indica L.). Wetl. Ecol. Manag. 2022, 30, 35–45. [Google Scholar] [CrossRef]
  9. Tanaka, N. The utilization of edible Canna plants in southeastern Asia and southern China. Econ. Bot. 2004, 58, 112–114. [Google Scholar] [CrossRef]
  10. Castillo-Paz, A.M.; Correa-Piña, B.L.; Pineda-Gómez, P.; Barrón-García, O.Y.; Londoño-Restrepo, S.M.; Rodriguez-Garcia, M.E. Structural, morphological, compositional, thermal, pasting, and functional properties of isolated Achira (Canna indica L.) starch. Int. J. Biol. Macromol. 2024, 282, 136710. [Google Scholar] [CrossRef]
  11. Al-Snafi, A.E. Bioactive components and pharmacological effects of Canna indica—An overview. Int. J. Pharmacol. 2015, 5, 71–75. [Google Scholar]
  12. Tymofieieva, S.; Kyslychenko, O.; Zhuravel, I. The study of phenolic compounds in Canna lily flowers using HPLC. Scr. Sci. Pharm. 2018, 5, 25–27. [Google Scholar] [CrossRef]
  13. Wu, Z.; Zhang, X.; He, B.; Diao, L.; Sheng, S.; Wang, J.; Guo, X.; Su, N.; Wang, L.; Jiang, L.; et al. A chlorophyll-deficient rice mutant with impaired chlorophyllide esterification in chlorophyll biosynthesis. Plant Physiol. 2007, 145, 29–40. [Google Scholar] [CrossRef]
  14. Zhao, M.-H.; Li, X.; Zhang, X.-X.; Zhang, H.; Zhao, X.-Y. Mutation mechanism of leaf color in plants: A review. Forests 2020, 11, 851. [Google Scholar] [CrossRef]
  15. Qi, X.; Chen, S.; Wang, H.; Feng, J.; Chen, H.; Qin, Z.; Deng, Y. Comparative physiology and transcriptome analysis reveals that chloroplast development influences silver-white leaf color formation in Hydrangea macrophylla var. maculata. BMC Plant Bio. 2022, 22, 345. [Google Scholar] [CrossRef]
  16. Yin, G.; Wang, Y.; Xiao, Y.; Yang, J.; Wang, R.; Jiang, Y.; Huang, R.; Liu, X.; Jiang, Y. Relationships between leaf color changes, pigment levels, enzyme activity, photosynthetic fluorescence characteristics and chloroplast ultrastructure of Liquidambar formosana Hance. J. For. Res. 2022, 33, 1559–1572. [Google Scholar] [CrossRef]
  17. Daniell, H.; Lin, C.-S.; Yu, M.; Chang, W.-J. Chloroplast genomes: Diversity, evolution, and applications in genetic engineering. Genome Biol. 2016, 17, 134. [Google Scholar] [CrossRef]
  18. Dobrogojski, J.; Adamiec, M.; Luciński, R. The chloroplast genome: A review. Acta Physiol. Plant. 2020, 42, 98. [Google Scholar] [CrossRef]
  19. Fletcher, J.; Willby, N.; Oliver, D.M.; Quilliam, R.S. Phytoremediation Using Aquatic Plants. In Phytoremediation; Shmaefsky, B.R., Ed.; Concepts and Strategies in Plant Sciences; Springer International Publishing: Cham, Switzerland, 2020; pp. 205–260. ISBN 978-3-030-00098-1. [Google Scholar]
  20. Hu, H.; Li, X.; Wu, S.; Yang, C. Sustainable livestock wastewater treatment via phytoremediation: Current status and future perspectives. Bioresour. Technol. 2020, 315, 123809. [Google Scholar] [CrossRef]
  21. Bobik, K.; Burch-Smith, T.M. Chloroplast signaling within, between and beyond cells. Front. Plant Sci. 2015, 6, 781. [Google Scholar] [CrossRef]
  22. Xiong, Y.; Xiong, Y.; He, J.; Yu, Q.; Zhao, J.; Lei, X.; Dong, Z.; Yang, J.; Peng, Y.; Zhang, X.; et al. The complete chloroplast genome of two important annual clover species, Trifolium alexandrinum and T. resupinatum: Genome structure, comparative analyses and phylogenetic relationships with relatives in Leguminosae. Plants 2020, 9, 478. [Google Scholar] [CrossRef]
  23. Dong, W.; Xu, C.; Cheng, T.; Lin, K.; Zhou, S. Sequencing angiosperm plastid genomes made easy: A complete set of universal primers and a case study on the phylogeny of Saxifragales. Genome Biol. Evol. 2013, 5, 989–997. [Google Scholar] [CrossRef]
  24. Sun, C.; Chen, F.; Teng, N.; Xu, Y.; Dai, Z. Comparative analysis of the complete chloroplast genome of seven Nymphaea Species. Aquat. Bot. 2021, 170, 103353. [Google Scholar] [CrossRef]
  25. Daniell, H.; Jin, S.; Zhu, X.G.; Gitzendanner, M.A.; Soltis, D.E.; Soltis, P.S. Green giant—A tiny chloroplast genome with mighty power to produce high-value proteins: History and phylogeny. Plant Biotechnol. J. 2021, 19, 430–447. [Google Scholar] [CrossRef]
  26. Arimura, S.I.; Nakazato, I. Genome editing of plant mitochondrial and chloroplast genomes. Plant Cell Physiol. 2024, 65, 477–483. [Google Scholar] [CrossRef]
  27. Zhu, Q.; Cai, L.; Li, H.; Zhang, Y.; Su, W.; Zhou, Q. The complete chloroplast genome sequence of the Canna edulis Ker Gawl. (Cannaceae). Mitochondrial DNA Part B 2020, 5, 2427–2428. [Google Scholar] [CrossRef]
  28. Chen, S.; Zhou, Y.; Chen, Y.; Gu, J. Fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018, 34, i884–i890. [Google Scholar] [CrossRef]
  29. Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef]
  30. Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; Pham, S.; Prjibelski, A.D.; et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. J. Comput. Mol. Cell Biol. 2012, 19, 455–477. [Google Scholar] [CrossRef]
  31. Boetzer, M.; Henkel, C.V.; Jansen, H.J.; Butler, D.; Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 2011, 27, 578–579. [Google Scholar] [CrossRef]
  32. Nadalin, F.; Vezzi, F.; Policriti, A. GapFiller: A de novo assembly approach to fill the gap within paired reads. BMC Bioinform. 2012, 13, S8. [Google Scholar] [CrossRef]
  33. Hyatt, D.; Chen, G.-L.; LoCascio, P.F.; Land, M.L.; Larimer, F.W.; Hauser, L.J. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 2010, 11, 119. [Google Scholar] [CrossRef]
  34. Eddy, S.R. Accelerated profile HMM searches. PLoS Comput. Biol. 2011, 7, e1002195. [Google Scholar] [CrossRef]
  35. Laslett, D.; Canback, B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 2004, 32, 11. [Google Scholar] [CrossRef]
  36. Greiner, S.; Lehwark, P.; Bock, R. OrganellarGenomeDRAW (OGDRAW) version 1.3.1: Expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 2019, 47, W59–W64. [Google Scholar] [CrossRef]
  37. Zheng, S.; Poczai, P.; Hyvönen, J.; Tang, J.; Amiryousefi, A. Chloroplot: An online program for the versatile plotting of organelle genomes. Front. Genet. 2020, 11, 576124. [Google Scholar] [CrossRef]
  38. Zuo, L.-H.; Shang, A.-Q.; Zhang, S.; Yu, X.-Y.; Ren, Y.-C.; Yang, M.-S.; Wang, J.-M. The first complete chloroplast genome sequences of Ulmus species by de novo sequencing: Genome comparative and taxonomic position analysis. PLoS ONE 2017, 12, e0171264. [Google Scholar] [CrossRef]
  39. Beier, S.; Thiel, T.; Münch, T.; Scholz, U.; Mascher, M. MISA-Web: A web server for microsatellite prediction. Bioinformatics 2017, 33, 2583–2585. [Google Scholar] [CrossRef]
  40. Frazer, K.A.; Pachter, L.; Poliakov, A.; Rubin, E.M.; Dubchak, I. VISTA: Computational tools for comparative genomics. Nucleic Acids Res. 2004, 32, W273–W279. [Google Scholar] [CrossRef]
  41. Mayor, C.; Brudno, M.; Schwartz, J.R.; Poliakov, A.; Rubin, E.M.; Frazer, K.A.; Pachter, L.S.; Dubchak, I. VISTA: Visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 2000, 16, 1046–1047. [Google Scholar] [CrossRef]
  42. Amiryousefi, A.; Hyvönen, J.; Poczai, P. IRscope: An online program to visualize the junction sites of chloroplast genomes. Bioinformatics 2018, 34, 3030–3031. [Google Scholar] [CrossRef]
  43. Katoh, K.; Misawa, K.; Kuma, K.; Miyata, T. MAFFT: A novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 2002, 30, 3059–3066. [Google Scholar]
  44. Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef]
  45. Tamura, K.; Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 1993, 10, 512–526. [Google Scholar] [CrossRef]
  46. Tamura, K.; Stecher, G.; Kumar, S. MEGA11: Molecular evolutionary genetics analysis version 11. Mol. Biol. Evol. 2021, 38, 3022–3027. [Google Scholar] [CrossRef]
  47. Mower, J.P.; Vickrey, T.L. Chapter Nine—Structural Diversity Among Plastid Genomes of Land Plants. In Advances in Botanical Research; Chaw, S.-M., Jansen, R.K., Eds.; Plastid Genome Evolution; Academic Press: Cambridge, MA, USA, 2018; Volume 85, pp. 263–292. [Google Scholar]
  48. Zhou, T.; Ning, K.; Mo, Z.; Zhang, F.; Zhou, Y.; Chong, X.; Zhang, D.; El-Kassaby, Y.A.; Bian, J.; Chen, H. Complete chloroplast genome of Ilex dabieshanensis: Genome structure, comparative analyses with three traditional Ilex tea species, and its phylogenetic relationships within the family Aquifoliaceae. PLoS ONE 2022, 17, e0268679. [Google Scholar] [CrossRef]
  49. Gao, Y.; Shen, G.; Yuan, G.; Tian, Z. Comparative analysis of whole chloroplast genomes of three common species of Echinochloa (Gramineae) in paddy fields. Int. J. Mol. Sci. 2022, 23, 13864. [Google Scholar] [CrossRef]
  50. Gui, L.; Jiang, S.; Xie, D.; Yu, L.; Huang, Y.; Zhang, Z.; Liu, Y. Analysis of complete chloroplast genomes of Curcuma and the contribution to phylogeny and adaptive evolution. Gene 2020, 732, 144355. [Google Scholar] [CrossRef]
  51. Group, T.A.P. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc. 2016, 181, 399–436. [Google Scholar] [CrossRef]
  52. Sass, C.; Iles, W.J.D.; Barrett, C.F.; Smith, S.Y.; Specht, C.D. Revisiting the Zingiberales: Using multiplexed exon capture to resolve ancient and recent phylogenetic splits in a charismatic plant lineage. PeerJ 2016, 4, e1584. [Google Scholar] [CrossRef]
  53. Angiosperm Phylogeny Website. Available online: http://www.mobot.org/MOBOT/research/APweb/ (accessed on 22 August 2023).
  54. Brouard, J.-S.; Otis, C.; Lemieux, C.; Turmel, M. The exceptionally large chloroplast genome of the green alga Floydiella terrestris illuminates the evolutionary history of the Chlorophyceae. Genome Biol. Evol. 2010, 2, 240–256. [Google Scholar] [CrossRef]
  55. Shahid Masood, M.; Nishikawa, T.; Fukuoka, S.; Njenga, P.K.; Tsudzuki, T.; Kadowaki, K. The complete nucleotide sequence of wild rice (Oryza nivara) chloroplast genome: First genome wide comparative sequence analysis of wild and cultivated rice. Gene 2004, 340, 133–139. [Google Scholar] [CrossRef] [PubMed]
  56. Maier, R.M.; Neckermann, K.; Igloi, G.L.; Kössel, H. Complete sequence of the maize chloroplast genome: Gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J. Mol. Biol. 1995, 251, 614–628. [Google Scholar] [CrossRef] [PubMed]
  57. Jansen, R.K.; Ruhlman, T.A. Plastid Genomes of Seed Plants. In Genomics of Chloroplasts and Mitochondria; Bock, R., Knoop, V., Eds.; Advances in Photosynthesis and Respiration; Springer: Dordrecht, The Netherlands, 2012; pp. 103–126. ISBN 978-94-007-2920-9. [Google Scholar]
  58. Wicke, S.; Schneeweiss, G.M.; dePamphilis, C.W.; Müller, K.F.; Quandt, D. The evolution of the plastid chromosome in land plants: Gene content, gene order, gene function. Plant Mol. Biol. 2011, 76, 273–297. [Google Scholar] [CrossRef] [PubMed]
  59. Goremykin, V.V.; Holland, B.; Hirsch-Ernst, K.I.; Hellwig, F.H. Analysis of Acorus calamus chloroplast genome and its phylogenetic implications. Mol. Biol. Evol. 2005, 22, 1813–1822. [Google Scholar] [CrossRef]
  60. Somaratne, Y.; Guan, D.-L.; Wang, W.-Q.; Zhao, L.; Xu, S.-Q. The complete chloroplast genomes of two Lespedeza species: Insights into codon usage bias, RNA editing sites, and phylogenetic relationships in Desmodieae (Fabaceae: Papilionoideae). Plants 2020, 9, 51. [Google Scholar] [CrossRef]
  61. Mehrotra, S.; Goyal, V. Repetitive sequences in plant nuclear DNA: Types, distribution, evolution and function. Genom. Proteom. Bioinform. 2014, 12, 164–171. [Google Scholar] [CrossRef]
  62. Chen, S.; Chen, Y.; Sun, F.; Waterman, M.S.; Zhang, X. A new statistic for efficient detection of repetitive sequences. Bioinformatics 2019, 35, 4596–4606. [Google Scholar] [CrossRef]
  63. Liu, Q.; Li, X.; Zhou, X.; Li, M.; Zhang, F.; Schwarzacher, T.; Heslop-Harrison, J.S. The repetitive DNA landscape in Avena (Poaceae): Chromosome and genome evolution defined by major repeat classes in whole-genome sequence reads. BMC Plant Biol. 2019, 19, 226. [Google Scholar] [CrossRef]
  64. Wang, H.; Zhang, Y.; Zhang, L.; Wang, J.; Guo, H.; Zong, J.; Chen, J.; Li, D.; Li, L.; Liu, J.; et al. Molecular Characterization and Phylogenetic Analysis of Centipedegrass [Eremochloa ophiuroides (Munro) Hack.] Based on the Complete Chloroplast Genome Sequence. Curr. Issues Mol. Biol. 2024, 46, 1635–1650. [Google Scholar] [CrossRef]
  65. Amiteye, S. Basic concepts and methodologies of DNA marker systems in plant molecular breeding. Heliyon 2021, 7, e08093. [Google Scholar] [CrossRef]
  66. Sun, L.; Yang, W.; Li, Y.; Shan, Q.; Ye, X.; Wang, D.; Yu, K.; Lu, W.; Xin, P.; Pei, Z.; et al. A wheat dominant dwarfing line with Rht12, which reduces stem cell length and affects gibberellic acid synthesis, is a 5AL terminal deletion line. Plant J. 2019, 97, 887–900. [Google Scholar] [CrossRef] [PubMed]
  67. Cobb, J.N.; Biswas, P.S.; Platten, J.D. Back to the future: Revisiting MAS as a tool for modern plant breeding. Theor. Appl. Genet. 2019, 132, 647–667. [Google Scholar] [CrossRef] [PubMed]
  68. Nadeem, M.A.; Nawaz, M.A.; Shahid, M.Q.; Doğan, Y.; Comertpay, G.; Yıldız, M.; Hatipoğlu, R.; Ahmad, F.; Alsaleh, A.; Labhane, N.; et al. DNA molecular markers in plant breeding: Current status and recent advancements in genomic selection and genome editing. Biotechnol. Biotechnol. Equip. 2018, 32, 261–285. [Google Scholar] [CrossRef]
  69. Gitzendanner, M.A.; Soltis, P.S.; Yi, T.-S.; Li, D.-Z.; Soltis, D.E. Plastome Phylogenetics: 30 Years of Inferences Into Plant Evolution. In Advances in Botanical Research; Elsevier: Amsterdam, The Netherlands, 2018; Volume 85, pp. 293–313. ISBN 978-0-12-813457-3. [Google Scholar]
  70. Wu, Z.Q.; Liao, X.Z.; Zhang, X.N.; Tembrock, L.R.; Broz, A. Genomic architectural variation of plant mitochondria—A review of multichromosomal structuring. J. Syst. Evol. 2022, 60, 160–168. [Google Scholar] [CrossRef]
Figure 1. Leaf color phenotypes of three Canna species in this study (a) C. edulis L.; (b) C. warscewiezii A. Dietr.; (c) C. generalis ‘Striata’.
Figure 1. Leaf color phenotypes of three Canna species in this study (a) C. edulis L.; (b) C. warscewiezii A. Dietr.; (c) C. generalis ‘Striata’.
Cimb 47 00222 g001
Figure 2. Gene map of the four complete chloroplast genomes. Genes shown above are transcribed clockwise, and genes below the circle are transcribed counterclockwise. Genes belonging to the same functional groups are color coded.
Figure 2. Gene map of the four complete chloroplast genomes. Genes shown above are transcribed clockwise, and genes below the circle are transcribed counterclockwise. Genes belonging to the same functional groups are color coded.
Cimb 47 00222 g002
Figure 3. Codon content of 20 amino acids in all cp protein-coding genes of Canna species. The bars out of the circle indicate RSCU, while the pie inside indicates the codons. From inside to outside, the first to third circles represent three bases of each codons with different bases in different colors. The fourth circle represents different amino acids, while different colors were used to distinguish different amino acids. *: Termination codon.
Figure 3. Codon content of 20 amino acids in all cp protein-coding genes of Canna species. The bars out of the circle indicate RSCU, while the pie inside indicates the codons. From inside to outside, the first to third circles represent three bases of each codons with different bases in different colors. The fourth circle represents different amino acids, while different colors were used to distinguish different amino acids. *: Termination codon.
Cimb 47 00222 g003
Figure 4. Distribution of SSR Types in Canna species cp genomes. SSR numbers in (a) intergenic regions, introns, and coding regions; (b) different genomic regions; (c) different repeat class types.
Figure 4. Distribution of SSR Types in Canna species cp genomes. SSR numbers in (a) intergenic regions, introns, and coding regions; (b) different genomic regions; (c) different repeat class types.
Cimb 47 00222 g004
Figure 5. Distribution of long repeat sequences in the four Canna species cp genomes. (a) Number of the long repeats in different types; (b) Length distribution of forward repeats; (c) Length distribution of palindromic repeats; (d) Length distribution of reverse repeats.
Figure 5. Distribution of long repeat sequences in the four Canna species cp genomes. (a) Number of the long repeats in different types; (b) Length distribution of forward repeats; (c) Length distribution of palindromic repeats; (d) Length distribution of reverse repeats.
Cimb 47 00222 g005
Figure 6. Comparisons between four Canna species cp genomes and published cp genome of C. edulis (MN832865). Gray arrows above represent gene orientation. Dark blue bars represent exons. Light blue bars represent UTRs. Pinks bars represent conserved non-coding sequences. The vertical axis indicates the percentage of identity. White peaks represent sequence variation among the cp genomes.
Figure 6. Comparisons between four Canna species cp genomes and published cp genome of C. edulis (MN832865). Gray arrows above represent gene orientation. Dark blue bars represent exons. Light blue bars represent UTRs. Pinks bars represent conserved non-coding sequences. The vertical axis indicates the percentage of identity. White peaks represent sequence variation among the cp genomes.
Cimb 47 00222 g006
Figure 7. Comparison of the IR region boundaries among the four Canna species cp genomes. LSC, IR, and SSC regions were indicated by light blue, yellow, and green. Boxes above and below the bars represent denoted genes. The base length between genes and boundaries are labeled (bp). Different colors represent different regions.
Figure 7. Comparison of the IR region boundaries among the four Canna species cp genomes. LSC, IR, and SSC regions were indicated by light blue, yellow, and green. Boxes above and below the bars represent denoted genes. The base length between genes and boundaries are labeled (bp). Different colors represent different regions.
Cimb 47 00222 g007
Figure 8. Phylogenetic tree of the Zingiberales species based on the complete cp genome data constructed using the Maximum likelihood (ML) method and Tamura-Nei model. 31 species were used to reconstruct a phylogenetic tree. Zea mays was used as the outgroup.
Figure 8. Phylogenetic tree of the Zingiberales species based on the complete cp genome data constructed using the Maximum likelihood (ML) method and Tamura-Nei model. 31 species were used to reconstruct a phylogenetic tree. Zea mays was used as the outgroup.
Cimb 47 00222 g008
Table 1. Summary of the four Canna species cp genomes.
Table 1. Summary of the four Canna species cp genomes.
Genome FeaturesC. edulisC. warscewieziiC. generalis ‘Striata’ GreenC. generalis ‘Striata’ Yellow
Genome size(bp)/ GC content (%)164,479/36.24164,509/36.23164,427/36.25164,479/36.24
LSC size (bp)/GC content (%)91,249/33.8291,335/33.8091,459/33.7791,437/33.77
SSC size (bp)/GC content (%)18,674/30.1518,676/30.1518,616/30.2318,674/30.14
IRa size (bp)/GC content (%)27,278/42.3627,249/42.3927,176/42.4927,184/42.48
IRb size (bp)/GC content (%)27,278/42.3627,249/42.3927,176/42.4927,184/42.48
Total gene number130130130130
mRNAs84848484
tRNAs38383838
rRNAs8888
pseudogenes0000
Genes duplicated in IRs18181818
Table 2. Summary of gene annotation for the four cp genomes.
Table 2. Summary of gene annotation for the four cp genomes.
CategoryGene GroupGene Name
PhotosynthesisSubunits of photosystem IpsaA, psaB, psaC, psaI, psaJ
Subunits of photosystem IIpsbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
Subunits of NADH dehydrogenasendhA*, ndhB*(2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Subunits of cytochrome b/f complexpetA, petB*, petD*, petG, petL, petN
Subunits of ATP synthaseatpA, atpB, atpE, atpF*, atpH, atpI
Large subunit of rubiscorbcL
Subunits photochlorophyllide reductase-
Self-replicationProteins of large ribosomal subunitrpl14, rpl16*, rpl20, rpl22, rpl23(2), rpl32, rpl33, rpl36
Proteins of small ribosomal subunitrps11, rps12**(2), rps14, rps15, rps16*, rps18, rps19, rps2, rps3, rps4, rps7(2), rps8
Subunits of RNA polymeraserpoA, rpoB, rpoC1*, rpoC2*
Ribosomal RNAsrrn16(2), rrn23(2), rrn4.5(2), rrn5(2)
Transfer RNAstrnA-UGC*(2), trnC-ACA*, trnC-GCA, trnD-GUC, trnE-UUC, trnE-UUC*(2), trnF-GAA, trnG-GCC, trnH-GUG(2), trnK-UUU*, trnL-CAA(2), trnL-UAA*, trnL-UAG, trnM-CAU(4), trnN-GUU(2), trnP-UGG, trnQ-UUG, trnR-ACG(2), trnR-UCU, trnS-CGA*, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC(2), trnW-CCA, trnY-GUA
Other genesMaturasematK
ProteaseclpP**
Envelope membrane proteincemA
Acetyl-CoA carboxylaseaccD
c-type cytochrome synthesis geneccsA
Translation initiation factorinfA
other-
Genes of unknown functionConserved hypothetical chloroplast ORFycf1(2), ycf2(2), ycf3**, ycf4
Gene*: Gene with one introns; Gene**: Gene with two introns; Gene(2): Number of copies of multi-copy genes.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, L.; Liu, J.; Liu, F.; Wang, W.; Chang, Y.; Yao, D. Complete Chloroplast Genome Sequences of Three Canna Species: Genome Characterization, Comparative Analyses, and Phylogenetic Relationships Within Zingiberales. Curr. Issues Mol. Biol. 2025, 47, 222. https://doi.org/10.3390/cimb47040222

AMA Style

Sun L, Liu J, Liu F, Wang W, Chang Y, Yao D. Complete Chloroplast Genome Sequences of Three Canna Species: Genome Characterization, Comparative Analyses, and Phylogenetic Relationships Within Zingiberales. Current Issues in Molecular Biology. 2025; 47(4):222. https://doi.org/10.3390/cimb47040222

Chicago/Turabian Style

Sun, Linhe, Jixiang Liu, Fangyu Liu, Wei Wang, Yajun Chang, and Dongrui Yao. 2025. "Complete Chloroplast Genome Sequences of Three Canna Species: Genome Characterization, Comparative Analyses, and Phylogenetic Relationships Within Zingiberales" Current Issues in Molecular Biology 47, no. 4: 222. https://doi.org/10.3390/cimb47040222

APA Style

Sun, L., Liu, J., Liu, F., Wang, W., Chang, Y., & Yao, D. (2025). Complete Chloroplast Genome Sequences of Three Canna Species: Genome Characterization, Comparative Analyses, and Phylogenetic Relationships Within Zingiberales. Current Issues in Molecular Biology, 47(4), 222. https://doi.org/10.3390/cimb47040222

Article Metrics

Back to TopTop