Next Article in Journal
Effects of Moisture Content and Grain Direction on the Elastic Properties of Beech Wood Based on Experiment and Finite Element Method
Previous Article in Journal
Insights into Distribution of Soil Available Heavy Metals in Karst Area and Its Influencing Factors in Guilin, Southwest China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Seven Complete Chloroplast Genomes from Symplocos: Genome Organization and Comparative Analysis

1
Department of Forest Bioresources, National Institute of Forest Science, Suwon 16631, Korea
2
Warm Temperate and Subtropical Forest Research Center, National Institute of Forest Science, 22, Donnaeko-Ro, Seogwipo-Si 63582, Korea
*
Author to whom correspondence should be addressed.
Forests 2021, 12(5), 608; https://doi.org/10.3390/f12050608
Submission received: 4 March 2021 / Revised: 29 April 2021 / Accepted: 6 May 2021 / Published: 12 May 2021
(This article belongs to the Section Genetics and Molecular Biology)

Abstract

:
In the present study, chloroplast genome sequences of four species of Symplocos (S. chinensis for. pilosa, S. prunifolia, S. coreana, and S. tanakana) from South Korea were obtained by Ion Torrent sequencing and compared with the sequences of three previously reported Symplocos chloroplast genomes from different species. The length of the Symplocos chloroplast genome ranged from 156,961 to 157,365 bp. Overall, 132 genes including 87 functional genes, 37 tRNA genes, and eight rRNA genes were identified in all Symplocos chloroplast genomes. The gene order and contents were highly similar across the seven species. The coding regions were more conserved than the non-coding regions, and the large single-copy and small single-copy regions were less conserved than the inverted repeat regions. We identified five new hotspot regions (rbcL, ycf4, psaJ, rpl22, and ycf1) that can be used as barcodes or species-specific Symplocos molecular markers. These four novel chloroplast genomes provide basic information on the plastid genome of Symplocos and enable better taxonomic characterization of this genus.

1. Introduction

Chloroplasts (CPs) are characteristic plant organelles that play an important role in photosynthesis. The CP genome is markedly similar across most land plant lineages in terms of gene order, gene content, structure, and intron content [1]. The CP genome can harbor as many as 101–118 genes including 66–82 protein-coding genes, 29–32 tRNA genes, and four rRNA genes [2]. CPs contain independently replicated genomes, most of which exhibit a four-segment molecular structure with a large single-copy (LSC, 80–90 kb in length) and a small single-copy (SSC, 16–27 kb in length) region separated by a pair of inverted repeats (IRa and IRb, 20–28 kb in length) [1,3]. However, this typical structure is altered in some plant lineages. For instance, in Cupressaceae [4] and Taxaceae [5], one IR has been lost. In Pinaceae, the IR length is reduced to below 1 kb [6,7]. In contrast, in Ericaceae, IR region expansion resulted in a significant decrease in the SSC region size [8,9]. Additionally, events such as rearrangement, gene loss, gene replication, pseudogene generation, and intron gain/loss have occurred in the CP genomes of various plant lineages [10,11]. CPs are frequently used in taxonomic and evolutionary studies [12] as they are uniparentally inherited (mostly maternally transmitted, but paternally transmitted in conifers), have well-preserved gene arrangement and content, and small size [13].
The genus Symplocos Jacquin consists of woody flowering plants found mainly in humid tropical forest woods, with approximately 300 species distributed in the New World and the Western Pacific Rim [14]. Symplocos was originally recognized as the sole genus of Symplocaceae Jacquin [7,15], but the Angiosperm Phylogeny Group [16,17] now recognizes two Symplocos genera (Cordyloblaste Moritzi and Symplocos). Although several molecular studies of Symplocos have been conducted and supported their monophyly, only some protein-coding gene sequences (rpl16, matK) and partial non-coding sequences (nr-ITS, trnL–trnF, trnC-trnD, and trnH-psbA) were used in the analyses, and no genomic comparative analyses of Symplocos species have been conducted to date [14,18,19].
Four Symplocos species are endemic to South Korea [20,21,22]. Of these, S. prunifolia Siebold & Zucc. and S. coreana (H. Lév.) Ohwi grow only on Jeju Island in South Korea [23]. S. prunifolia is classified as an endangered, rare plant [24]. Thus, comparing the CP genomes of these species is essential to enable the discrimination of these species at the molecular level and supports the ongoing conservation of these plants.
To date, the CP genome sequences of only three Symplocos species (S. paniculata [Thunb.] Miq., S. ovatilobata Noot., and S. costaricana Hemsl.) have been deposited in the National Center for Biotechnology Information database (NCBI), and no complete CP genome sequence of the Korean Symplocos species has been reported. In the present study, we aimed to sequence the CP genomes of four species of South Korean Symplocos. These CP genomes will provide the basis for studying the evolutionary history of Symplocos species and enable accurate taxonomic identification of vulnerable species.

2. Materials and Methods

2.1. Sample Collection, DNA Extraction, and CP Genome Sequencing

Fresh leaf samples were obtained from four Symplocos species growing on Jeju Island in South Korea, and total genomic DNA was extracted using a Plant SV Mini Kit (GeneAll Biotechnology, Seoul, Korea), according to the manufacturer’s instructions. Intact leaf specimens were deposited into the herbarium at the Warm Temperate and Subtropical Forest Research Center (WTFRC; Table 1). The extracted DNA was quantified using a spectrophotometer (ND-1000, Nano-Drop Technologies, Wilmington, DE, USA). Genomic DNA libraries were produced, amplified, and sequenced using an Ion Xpress™ Plus Fragment Library Kit (Thermo Fisher Scientific, Waltham, MA, USA), Ion PI™ Hi-Q™ Sequencing 200 Kit (Thermo Fisher Scientific), and Ion PI™ Chip v3 Kit (Thermo Fisher Scientific).

2.2. CP Genome Assembly and Annotation

CP DNA data were filtered using SPAdes [25]. Four CP genomes were assembled using Geneious 10.2.6 [26] and annotated using DOGMA [27], followed by manual editing of non-annotated portions such as exons and introns. The tRNA sequences were confirmed using tRNAscan-SE 1.21 [28]. All annotations were checked against the reference genomes (MG719832, MF770705, and MF179496). Genome maps were drawn using OrganellarGenomeDRAW (OGDRAW) [29].

2.3. Genome Comparison

The CP genomes were aligned using MAFFT [30]. The complete CP genomes of the seven Symplocos species were compared using m-VISTA [31]. Additionally, the CP genome junctions were visualized and compared using IRscope [32].

2.4. Simple-Sequence Repeat (SSR) and Long Repeat Sequence Analysis

SSR within the seven CP genomes were detected using the MISA Perl script (MIcroSAtellite) [33]. The minimum number of mononucleotide repeats was set to 10; that of dinucleotide repeats to 5; trinucleotide repeats to 4; and tetra-, penta-, and hexanucleotide repeats to 3. REPuter was used to identify forward, reverse, complementary, and palindromic sequences with a minimum repeat size of 30 bp and the sequence identity set to 90% [34].

2.5. Divergent Hotspot Identification

The seven Symplocos CP genomes were aligned using MAFFT and Geneious 10.2.6. Nucleotide diversity was analyzed using DnaSP version 6.12.03. [35], with the window length set to 800 bp and the step size set to 200 bp.

2.6. Phylogenetic Analysis

The complete CP genomic sequences of seven Symplocaceae and four other Ericales species were downloaded from the NCBI database (Changiostyrax dolichocarpus (C.J.Qi) Tao Chen (MG722902), Pterostyrax hispidus Siebold & Zucc. (MG719840), Halesia carolina L. (MG719830), Sinojackia xylocarpa Hu (MG719827)), and used for maximum likelihood (ML) phylogenetic analysis. Eighty genes from 11 species were aligned using MAFFT in Geneious 10.2.6. The program jModelTest 2 was employed to determine the optimal substitution model [36]. ML analysis was performed using RAxML and a GTR+I+G model [37].

3. Results

3.1. General Features of the CP Genomes

Using the Ion Torrent system, we obtained the sequences of whole CP genomes of four South Korean Symplocos species [S. chinensis. pilosa (Nakai) Ohwi, S. coreana, S. prunifolia, and S. tanakana Nakai]. Sequencing of these genomes generated 7.24 GB (S. chinensis for. pilosa), 11 GB (S. coreana), 9.75 GB (S. prunifolia), and 12.8 GB (S. tanakana) of raw data, with an average read length of 175 bp. The genome lengths ranged from 159,961 bp for S. chinensis for. pilosa (MW307951) to 157,365 bp for S. coreana (MW307952). The SSC region length ranged from 17,795 bp for S. prunifolia (MW307953) to 17,879 bp for S. coreana (MW307954). The LSC region length ranged from 87,006 bp for S. chinensis for. pilosa to 87,434 bp for S. coreana, while the IR region length ranged from 26,026 bp for S. coreana to 26,095 bp for S. prunifolia (Table 1 and Figure 1). The four assembled CP genomes had the same number of genes and introns, and the same gene order. The Symplocos CP genome contains 132 genes including 87 protein-coding genes, 37 tRNA genes, and eight rRNA genes. Eighteen genes located in the IR region include seven protein-coding genes (rpl2, rpl23, rps7, rps12, ndhB, ycf15, and ycf2), four ribosomal RNA genes (rrn4.5, rrn5, rrn16, and rrn23), and seven tRNA genes (trnA-UGC, trnI-GAU, trnI-CAU, trnL-CAA, trnN-GUU, trnR-ACG, and trnV-GAC). Nine protein-coding genes (atpF, ndhA, ndhB, rpl2, rpl16, rps12, rps16, rpoC1, and petB) contain one intron, and two protein-coding genes (clpP and ycf3) contain two introns (Table 2). The rps12 was confirmed to be a trans-spliced gene consisting of three exons: exon1, found in the LSC region, and exon2 and exon3, located in the IR regions. The GC content of the four Symplocos CP genomes was the same (37.5%).

3.2. Comparison of CP Genomes of Seven Symplocos Species

We used m-VISTA to compare the gene sequences and content of seven CP genomes of Symplocos. The analyzed Symplocos CP genomes were almost identical (Figure 2), with the coding regions more conserved than the non-coding regions, and the IR regions more conserved than the LSC and SSC regions. Boundary structure was compared for the seven Symplocos CP genomes. The overall identity of the CP genomes was confirmed at the JLB (LSC/IRb) and JLA (IRa/LSC) junctions. Furthermore, trnH-GUG was located in the LSC region, 11 bp away from the JLA junction. However, the length of ψycf1 genes ranged from 982 bp to 1053 bp in the IRa region. The CP genome of S. coreana includes 1 bp of the ψycf1 pseudogene at the JSB junction. However, integration of 2 bp of the ndhF gene into the IRb was observed in other species. At the JLB junction, 15 bp of the rps19 gene was included in the IR region in the seven species (Figure 3).

3.3. SSR and Long Repeat Analysis

SSRs (microsatellites) are tandem repeats of 1–6 nucleotide motifs. The distribution of SSRs was analyzed in seven Symplocos CP genomes using MISA. We identified 44–59 repeats across the genomes. Mononucleotide repeats were the most abundant SSR in all species. Dinucleotide repeats were the least abundant, with three dinucleotide repeats in S. coreana only. S. coreana and S. costaricana harbored two trinucleotide repeats, and the rest of the species harbored only one. The number of tetranucleotide repeats was the lowest in S. coreana. Three pentanucleotide repeats were identified in S. costaricana and S. ovatilobata, and two in S. coreana. Hexanucleotide repeats were not identified in any of the species (Figure 4). Most SSRs consisted of the A/T motif rather than the G/C motif (Table 3 and Supplementary Data File S1).
The long repeat analysis identified more forward and palindromic repeats than reverse and complementary repeats in the seven Symplocos species. There were 29 long repeats across the seven species. Only one reverse repeat was found in S. coreana and S. ovatilobata (Figure 5a). The length of most repeats ranged from 20 to 29 bp, whereas the largest repeat was 46-bp long (S. coreana, S. prunifolia, and S. tanakana, Figure 5b). The location and number of iterations of the long repeats are shown in Table 4 and Supplementary Data File S2. SSR analysis of these species has helped to identify potential molecular markers for species-level identification of Symplocos.

3.4. Divergent Hotspots in the Symplocos CP Genome

Mutations that affect only a single nucleotide are called single nucleotide polymorphisms (SNPs). Overall, 1580 SNPs were identified in the CP genomes of seven Symplocos species (Figure 6 and Supplementary Data File S3). The level of sequence divergence was determined by calculating the nucleotide variability values for all CP genomes. Furthermore, 634 (40.1%) SNPs were located in the coding regions, and 946 (59.9%) in the intergenic spacer (IGS) region and introns. The average nucleotide diversity (Pi) for SNPs in the coding sequence (CDS) ranged from 0.00049 (clpP) to 0.00974 (rbcL), with an average value of 0.0038. The Pi value for SNPs in the IGS ranged from 0.00066 (trnN-GUU~ndhF) to 0.0197 (rpl36~infA), with an average value of 0.0077. SNPs were identified in three tRNA genes and in the 23S rRNA gene. Figure 7 shows the minimum, maximum, and average Pi values for five classes of genomic regions: CDSs, tRNAs, rRNAs, IGSs, and introns. The divergence of IGS was almost twice that of the next highest grade (CDS). rRNAs showed the lowest sequence divergence, with an average of 0.00027 (23S rRNA).
These divergent hotspot regions could be used as markers for phylogenetic characterization of the Symplocos species, with more divergence observed in the non-coding regions than in the coding regions. We compared the whole CP genome and found differences in some regions between the seven species: trnH-GUG~psbA, psbI~trnS-GCU, rpoC1rpoB, rpl36infA, psbLpsbF, rpl36infA, and ccsA~ndhD. The five highly variable regions were identified based on a significantly higher Pi value of > 0.008 (rbcL, ycf4, psaJ, rpl22, and ycf1 genes (Supplementary Data File S3)). Identification of species-level differences is essential to the ongoing conservation of vulnerable members of the Symplocos genus.

3.5. Phylogenetic Analysis

CP sequences are increasingly used to construct plant phylogenies. Phylogenetic analysis was performed using the ML method, using 80 genes from 11 analyzed genomes including the four newly-analyzed Symplocos CP genomes and three previously reported Symplocos CP genomes (Figure 8). The resulting phylogeny shows that the monophyly of Symplocaceae clade is highly bootstrap supported (BS = 100). S. coreana is the most closely related to S. ovatilobata, forming the first branching taxa of Symplocaceae, with high bootstrap support (BS = 100). S. chinensis for. pilosa is most closely related to S. tanakana (BS = 100), and S. prunifolia is most closely related to S. paniculata (BS = 100).

4. Discussion

Some Symplocos species have long been used for medicinal purposes and dyes, especially S. racemosa Roxb., which is an important traditional Indian drug used to treat liver and uterine disorders and leucorrhea [38], and S. tanakana, which has been used as a mordant in South Korea [39]. These species can be used as a new bio-industrial material in the future, and we intend to provide basic data for genetic information through genome comparison. In the present study, we report the CP genome structure of four Symplocos species from South Korea, and present a novel comparative study of Symplocos species genomes. The CP genomes of Symplocos species reported here are well conserved, with an equal number of genes, gene order, and genome structure including its traditional quadripartite molecular structures. These findings are consistent with those for CP genomes of other Symplocaceae species [40]. The genome lengths ranged from 159,961 bp (S. chinensis for. pilosa) to 157,365 bp (S. coreana) (Figure 1).
SSRs are often used as molecular (genetic) markers in conservation biology, population genetics, polymorphism investigations, and evolutionary biology because of their co-dominant properties and high reproducibility of analysis [41,42,43]. We identified 369 SSRs in seven Symplocos CP genomes, most of which were located in the intergenic regions. The Symplocos CP genome has a high A/T content; accordingly, most of the detected mononucleotide repeats were composed of A/T. These SSRs can be used as molecular markers for genetic diversity analysis and genetic evolution studies [44].
The IR region of the CP genome provides stability under various stress conditions [45]. However, the IR region has contracted and expanded in different species during CP evolution [46]. Recently, an expansion of the IR region in Clematis was confirmed [47]. In Symplocos, the IR is very stable, with the only differences in the length of ψycf1 [10,48,49,50]. However, some Ericales CP genome families differ significantly from others, confirming rearrangements and changes in the IR region length during evolution. In particular, Rhododendron, Vaccinium, and Arbutus showed extreme shortening of the SSC region due to IR expansion [51,52,53].
Molecular markers with high sequence variation are useful for species identification and phylogenetic research in land plants [49,54]. To date, there have been no studies on the identification of species-level molecular markers for Symplocos. Many phylogenetic studies of seed plants have used the CP genome for species-level identification [52,55,56,57]. Wang et al. [14] phylogenetically studied Symplocos based on the sampling of about 111 species using the nuclear ribosomal internal transcribed spacer (nr-ITS) and three chloroplast markers (rpl16, matK, and trnL–trnF regions). Of the four traditionally recognized subgenera, the subgenus Hopea, distributed in East Asia, is monophyletic and sister to a group comprising all other Symplocos species, but the phylogenetic relationship of some taxa was not clear. Furthermore, Fritsch et al. sampled 74 species and their results were consistent with Wang’s findings [18]. Soejima and Nagamasu sampled 30 species distributed in Japan and conducted studies based on nr-ITS, trnL-trnF, and trnH-psbA regions, suggesting that the section Palura, a deciduous group in the subgenus Hopea, has an independent status [19]. Previously used cp regions identified in this study had relatively low nucleotide diversity (rpl16: 0.00303, matK: 0.0074, trnL–trnF: 0.00234). Moreover, 17 genes including psbI, psbL, ndhB, and ndhE were identical in the seven Symplocos species analyzed (Pi = 0, Supplementary Data File S3). These genes are not suitable for Symplocos molecular studies. In our study, high sequence divergence was detected in the following regions: trnH-GUG~psbA, psbI~trnS-GCU, rpoC1–rpoB, rpl36infA, psbLpsbF, rpl36infA, ccsA~ndhD, rbcL, ycf4, psaJ, rpl22, and ycf1 (Figure 6 and Supplementary Data File S3). The regions with high nucleotide diversity identified in the current study can be used in molecular studies (e.g., to confirm the molecular phylogeny of Symplocos). In addition, the phylogenetic analysis and identification of divergence hotspots in the current study provide fundamental data for understanding the relationships among Symplocaceae species. Further studies involving extensive sampling may be needed to better understand the detailed phylogenetic relationships among Symplocaceae and their evolutionary history.

5. Conclusions

In the present study, we sequenced the complete CP genomes of four Symplocos species: S. chinensis for. pilosa, S. coreana, S. prunifolia, and S. tanakana. We demonstrated that Symplocos species are separated into four phylogenetic groups: (1) the S. coreanaS. ovatilobata group; (2) the S. costaricana group; (3) the S. chinensis for. pilosaS. tanakana group; and (4) the S. prunifolia–S. paniculata group. Additionally, important genetic information including that on SNPs, SSRs, long repeats, divergent hotspot regions, and phylogeny was obtained. Technological advances in plant science have made the CP genome an important tool for plant research. The complete CP genomic data of Symplocos will provide useful information for studying genetic diversity and species identification, and the current study could be used for phylogenetic studies of Symplocos and whole-CP genome comparisons.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/f12050608/s1, Supplementary Data File S1: SSR information for seven Symplocos species, Supplementary Data File S2: Long repeats information for seven Symplocos species, Supplementary Data File S3: Single nucleotide polymorphisms (SNPs) identified in the chloroplast genomes of seven Symplocos species.

Author Contributions

Conceptualization, S.-C.K.; Methodology, S.-C.K.; Investigation, B.-K.C.; Resources, J.-W.L. and B.-K.C.; Validation, J.-W.L. and B.-K.C.; Software, S.-C.K.; Writing—original draft preparation, S.-C.K.; Writing—review and editing, S.-C.K.; Project administration, J.-W.L.; Funding acquisition, J.-W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institute of Forest Science, Republic of Korea, grant number FG0802-2011-01.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data that support the findings of this study are openly available in GenBank, NCBI (https://www.ncbi.nlm.nih.gov; accessed on 1 December 2020), under the accession numbers MW307951 (S. chinensis for. pilosa:), MW307952 (S. coreana), MW307953 (S. prunifolia), and MW307954 (S. tanakana).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Jansen, R.K.; Raubeson, L.A.; Boore, J.L.; de Pamphilis, C.W.; Chumley, T.W.; Haberle, R.C.; Wyman, S.K.; Alverson, A.J.; Peery, R.; Herman, S.J.; et al. Methods for obtaining and analyzing whole chloroplast genome sequences. Methods Enzymol. 2005, 395, 348–384. [Google Scholar]
  2. Jansen, R.K.; Ruhlman, T.A. Plastid genomes of seed plants. In Genomics of Chloroplasts and Mitochondria; Bock, R., Knoop, V., Eds.; Springer: Dutch, The Netherlands, 2012; pp. 103–126. [Google Scholar]
  3. Palmer, J.D. Comparative organization of chloroplast genomes. Annu. Rev. Genet. 1985, 19, 325–354. [Google Scholar] [CrossRef] [PubMed]
  4. Kim, S.C.; Lee, J.W. The complete chloroplast genome of Chamaecyparis obtusa (Cupressaceae). Mitochondrial DNA B Resour. 2020, 5, 3278–3279. [Google Scholar] [CrossRef] [PubMed]
  5. Shin, S.; Kim, S.C.; Hong, K.N.; Kang, H.; Lee, J.W. The complete chloroplast genome of Torreya nucifera (Taxaceae) and phylogenetic analysis. Mitochondrial DNA B Resour. 2019, 4, 2537–2538. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. CBOL Plant Working Group. A DNA barcode for land plants. Proc. Natl. Acad. Sci. USA 2009, 106, 12794–12797. [Google Scholar] [CrossRef] [Green Version]
  7. Cronquist, A.; Takhtadzhian, A.L. An Integrated System of Classification of Flowering Plants; Columbia University Press: New York, NY, USA, 1981. [Google Scholar]
  8. Kang, H.I.; Lee, H.O.; Lee, I.H.; Kim, I.S.; Lee, S.W.; Yang, T.J.; Shim, D. Complete chloroplast genome of Pinus densiflora Siebold & Zucc. and comparative analysis with five pine trees. Forests 2019, 10, 600. [Google Scholar]
  9. Kim, S.C.; Lee, J.W.; Lee, M.W.; Baek, S.H.; Hong, K.N. The complete chloroplast genome sequences of Larix kaempferi and Larix olgensis var. koreana (Pinaceae). Mitochondrial DNA B Resour. 2018, 3, 36–37. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Kim, S.C.; Kim, J.S.; Kim, J.H. Insight into infrageneric circumscription through complete chloroplast genome sequences of two Trillium species. AoB Plants 2016, 8, plw015. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Rabah, S.O.; Shrestha, B.; Hajrah, N.H.; Sabir, M.J.; Alharby, H.F.; Sabir, M.J.; Alhebshi, A.M.; Sabir, J.S.M.; Gilbert, L.E.; Ruhlman, T.A.; et al. Passiflora plastome sequencing reveals widespread genomic rearrangements. J. Syst. Evol. 2019, 57, 1–14. [Google Scholar] [CrossRef] [Green Version]
  12. Parks, M.; Cronn, R.; Liston, A. increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes. BMC Biol. 2009, 7, 84. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Bock, R. Structure, function, and inheritance of plastid genomes. In Topics in Current Genetics; Springer: Berlin/Heidelberg, Germany, 2007; pp. 29–63. [Google Scholar]
  14. Wang, Y.; Fritsch, P.W.; Shi, S.; Almeda, F.; Cruz, B.C.; Kelly, L.M. Phylogeny and infrageneric classification of Symplocos (Symplocaceae) inferred from DNA sequence data. Am. J. Bot. 2004, 91, 1901–1914. [Google Scholar] [CrossRef]
  15. Takhtajan, A.L. Diversity and Classification of Flowering Plants; Columbia University Press: New York, NY, USA, 1997. [Google Scholar]
  16. Angiosperm Phylogeny Group. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG III. Bot. J. Linn. Soc. 2009, 161, 105–121. [Google Scholar] [CrossRef] [Green Version]
  17. Angiosperm Phylogeny Group. An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc. 2016, 181, 1–20. [Google Scholar] [CrossRef] [Green Version]
  18. Fritsch, P.W.; Cruz, B.C.; Almeda, F.; Wang, Y.; Shi, S. Phylogeny of Symplocos based on DNA sequences of the chloroplast trnC–trnD intergenic region. Syst. Bot. 2006, 31, 181–192. [Google Scholar] [CrossRef]
  19. Soejima, A.; Nagamasu, H. Phylogenetic analysis of Asian Symplocos (Symplocaceae) based on nuclear and chloroplast DNA sequences. J. Plant Res. 2004, 117, 199–207. [Google Scholar] [CrossRef]
  20. Ghimire, B.; Park, B.K.; Oh, S.; Lee, J.; Son, D.C. Wood anatomy of Korean Symplocos Jacq. (Symplocaceae). Korean J. Pl. Taxon. 2020, 50, 333–342. [Google Scholar] [CrossRef]
  21. Park, S.H.; Lee, J.K.; Kim, J.H. A morphological study of Symplocaceae in Korea. Korean J. Pl. Taxon 2007, 37, 255–273. [Google Scholar] [CrossRef]
  22. Park, S.H.; Lee, J.K.; Kim, J.H. A systematic relationship of the Korean Symplocaceae based on RAPD analysis. Korean J. Pl. Taxon. 2007, 37, 225–237. [Google Scholar] [CrossRef]
  23. Kim, C.S.; Son, S.G.; Tho, J.H.; Kim, J.E.; Hwang, S.I.; Cheong, J.H. Distribution characteristics of woody plants resources in Jeiu, Korea. Korean J. Plant Resour. 2007, 20, 424–436. [Google Scholar]
  24. Korea National Arboretum. Rare Plants Data Book in Korea; Korea Forest Service: Seoul, Korea, 2008.
  25. Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; Pham, S.; Prjibelski, A.D.; et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012, 19, 455–477. [Google Scholar] [CrossRef] [Green Version]
  26. Kearse, M.; Moir, R.; Wilson, A.; Stones-Havas, S.; Cheung, M.; Sturrock, S.; Buxton, S.; Cooper, A.; Markowitz, S.; Duran, C.; et al. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 2012, 28, 1647–1649. [Google Scholar] [CrossRef] [PubMed]
  27. Wyman, S.K.; Jansen, R.K.; Boore, J.L. Automatic annotation of organellar genomes with DOGMA. Bioinformatics 2004, 20, 3252–3255. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Lowe, T.M.; Chan, P.P. TRNAscan-SE On-Line: Integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 2016, 44, W54–W57. [Google Scholar] [CrossRef] [PubMed]
  29. Lohse, M.; Drechsel, O.; Bock, R. OrganellarGenomeDRAW (OGDRAW): A tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr. Genet. 2007, 52, 267–274. [Google Scholar] [CrossRef]
  30. Katoh, K.; Kuma, K.I.; Toh, H.; Miyata, T. MAFFT Version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005, 33, 511–518. [Google Scholar] [CrossRef]
  31. Mayor, C.; Brudno, M.; Schwartz, J.R.; Poliakov, A.; Rubin, E.M.; Frazer, K.A.; Pachter, L.S.; Dubchak, I. VISTA: Visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 2000, 16, 1046–1047. [Google Scholar] [CrossRef] [Green Version]
  32. Amiryousefi, A.; Hyvönen, J.; Poczai, P. IRscope: An online program to visualize the junction sites of chloroplast genomes. Bioinformatics 2018, 34, 3030–3031. [Google Scholar] [CrossRef]
  33. Thiel, T.; Michalek, W.; Varshney, R.K.; Graner, A. Exploiting EST databases for the development and characterization of gene-derived SSR-Markers in barley (Hordeum vulgare L.). Theor. Appl. Genet. 2003, 106, 411–422. [Google Scholar] [CrossRef]
  34. Kurtz, S.; Choudhuri, J.V.; Ohlebusch, E.; Schleiermacher, C.; Stoye, J.; Giegerich, R. REPuter: The manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001, 29, 4633–4642. [Google Scholar] [CrossRef] [Green Version]
  35. Rozas, J.; Ferrer-Mata, A.; Sánchez-DelBarrio, J.C.; Guirao-Rico, S.; Librado, P.; Ramos-Onsins, S.E.; Sánchez-Gracia, A. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol. 2017, 34, 3299–3302. [Google Scholar] [CrossRef]
  36. Santorum, J.M.; Darriba, D.; Taboada, G.L.; Posada, D. Jmodeltest.org: Selection of nucleotide substitution models on the cloud. Bioinformatics 2014, 30, 1310–1311. [Google Scholar] [CrossRef] [Green Version]
  37. Stamatakis, A.; Hoover, P.; Rougemont, J. A rapid bootstrap algorithm for the RAxML web servers. Syst. Biol. 2008, 57, 758–771. [Google Scholar] [CrossRef] [PubMed]
  38. Acharya, N.; Acharya, S.; Shah, U.; Shah, R.; Hingorani, L. A comprehensive analysis on Symplocos racemosa Roxb.: Traditional uses, botany, phytochemistry and pharmacological activities. J. Ethnopharmacol. 2016, 181, 236–251. [Google Scholar] [CrossRef]
  39. Im, H.T.; Hong, H.H.; Son, H.D.; Park, M.S.; Nam, B.M.; Kwon, B.K.; Lee, C.H.; Chung, G.Y. The usage of regional folk plants in Gyeongsangnam-do. Korean J. Plant Resour. 2011, 24, 419–429. [Google Scholar] [CrossRef]
  40. Zhu, Z.X.; Wang, J.H.; Cai, Y.C.; Zhao, K.K.; Zhou, R.C.; Wang, H.F. Characterization of the complete chloroplast genome sequence of Symplocos ovatilobata (Symplocaceae). Conserv. Genet. Resour. 2018, 10, 503–506. [Google Scholar] [CrossRef]
  41. Grassi, F.; Labra, M.; Scienza, A.; Imazio, S. Chloroplast SSR markers to assess DNA diversity in wild and cultivated grapevines. Vitis 2002, 41, 157–158. [Google Scholar]
  42. Xue, J.; Wang, S.; Zhou, S.L. Polymorphic chloroplast microsatellite loci in Nelumbo (Nelumbonaceae). Am. J. Bot. 2012, 99, e240–e244. [Google Scholar] [CrossRef]
  43. Mariette, S.; Le Corre, V.; Austerlitz, F.; Kremer, A. Sampling within the genome for measuring within-population diversity: Trade-offs between markers. Mol. Ecol. 2002, 11, 1145–1156. [Google Scholar] [CrossRef] [Green Version]
  44. Torokeldiev, N.; Ziehe, M.; Gailing, O.; Finkeldey, R. Genetic diversity and structure of natural Juglans regia L. populations in the southern Kyrgyz Republic revealed by nuclear SSR and EST-SSR markers. Tree Genet. Genomes 2019, 15, 1–12. [Google Scholar] [CrossRef]
  45. Goulding, S.E.; Wolfe, K.H.; Olmstead, R.G.; Morden, C.W. Ebb and flow of the chloroplast inverted repeat. Mol. Gen. Genet. 1996, 252, 195–206. [Google Scholar] [CrossRef]
  46. Wang, R.J.; Cheng, C.L.; Chang, C.C.; Wu, C.L.; Su, T.M.; Chaw, S.M. Dynamics and evolution of the inverted repeat-large single copy junctions in the chloroplast genomes of monocots. BMC Evol. Biol. 2008, 8, 36. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  47. Choi, K.S.; Ha, Y.H.; Gil, H.Y.; Choi, K.; Kim, D.K.; Oh, S.H. Two Korean endemic Clematis chloroplast genomes: Inversion, reposition, expansion of the inverted repeat region, phylogenetic analysis, and nucleotide substitution rates. Plants 2021, 10, 397. [Google Scholar] [CrossRef]
  48. Kim, S.C.; Lee, J.W.; Baek, S.H.; Lee, M.W.; Hong, K.N. The complete chloroplast genome of Fraxinus chiisanensis (Oleaceae). Mitochondrial DNA B Resour. 2017, 2, 823–824. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  49. Li, X.; Tan, W.; Sun, J.; Du, J.; Zheng, C.; Tian, X.; Zheng, M.; Xiang, B.; Wang, Y. Comparison of four complete chloroplast genomes of medicinal and ornamental Meconopsis species: Genome organization and species discrimination. Sci. Rep. 2019, 9, 10567. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  50. Li, Y.; Xu, W.; Zou, W.; Jiang, D.; Liu, X. Complete chloroplast genome sequences of two endangered Phoebe (Lauraceae) species. Bot. Stud. 2017, 58, 37. [Google Scholar] [CrossRef]
  51. Kim, S.C.; Baek, S.H.; Lee, J.W.; Hyun, H.J. Complete chloroplast genome of Vaccinium oldhamii and phylogenetic analysis. Mitochondrial DNA B Resour. 2019, 4, 902–903. [Google Scholar] [CrossRef] [Green Version]
  52. Liu, J.; Chen, T.; Zhang, Y.; Li, Y.; Gong, J.; Yi, Y. The complete chloroplast genome of Rhododendron delavayi (Ericaceae). Mitochondrial DNA B Resour. 2020, 5, 37–38. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  53. Martínez-Alberola, F.; Del Campo, E.M.; Lázaro-Gimeno, D.; Mezquita-Claramonte, S.; Molins, A.; Mateu-Andrés, I.; Pedrola-Monfort, J.; Casano, L.M.; Barreno, E. Balanced gene losses, duplications and intensive rearrangements led to an unusual regularly sized genome in Arbutus unedo chloroplasts. PLoS ONE 2013, 8, e79685. [Google Scholar]
  54. Särkinen, T.; George, M. Predicting plastid marker variation: Can complete plastid genomes from closely related species help? PLoS ONE 2013, 8, e82266. [Google Scholar] [CrossRef] [Green Version]
  55. Xu, J.; Shen, X.; Liao, B.; Xu, J.; Hou, D. Comparing and phylogenetic analysis chloroplast genome of three Achyranthes species. Sci. Rep. 2020, 10, 10818. [Google Scholar] [CrossRef]
  56. Luo, J.; Hou, B.W.; Niu, Z.T.; Liu, W.; Xue, Q.Y.; Ding, X.Y. Comparative chloroplast genomes of photosynthetic orchids: Insights into evolution of the Orchidaceae and development of molecular markers for phylogenetic applications. PLoS ONE 2014, 9, e99016. [Google Scholar] [CrossRef] [PubMed]
  57. Moore, M.J.; Soltis, P.S.; Bell, C.D.; Burleigh, J.G.; Soltis, D.E. Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots. Proc. Natl. Acad. Sci. USA 2010, 107, 4623–4628. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Chloroplast genome maps for S. chinensis for. pilosa, S. coreana, S. prunifolia, and S. tanakana. Genes inside the circle are transcribed clockwise, and genes outside the circle are transcribed counterclockwise. Different colors indicate different gene functions, as denoted in the key inset. Dark and light gray colors in the inner circle correspond to the GC and AT content, respectively.
Figure 1. Chloroplast genome maps for S. chinensis for. pilosa, S. coreana, S. prunifolia, and S. tanakana. Genes inside the circle are transcribed clockwise, and genes outside the circle are transcribed counterclockwise. Different colors indicate different gene functions, as denoted in the key inset. Dark and light gray colors in the inner circle correspond to the GC and AT content, respectively.
Forests 12 00608 g001
Figure 2. Comparison of seven Symplocos chloroplast genomes using m-VISTA. Gray arrows and thick black lines above the alignments indicate gene orientation. Purple bars represent exons, blue bars represent RNA, pink bars represent non-coding sequences (CNS), gray bars represent mRNA, and white peaks represent differences in gene sequences. The y-axis represents the percentage identity (range shown: 50–100%).
Figure 2. Comparison of seven Symplocos chloroplast genomes using m-VISTA. Gray arrows and thick black lines above the alignments indicate gene orientation. Purple bars represent exons, blue bars represent RNA, pink bars represent non-coding sequences (CNS), gray bars represent mRNA, and white peaks represent differences in gene sequences. The y-axis represents the percentage identity (range shown: 50–100%).
Forests 12 00608 g002
Figure 3. Comparison of the borders of LSC, SSC, and IR regions in seven Symplocos chloroplast genomes. The JSB junction produces a pseudogene of ycf1 of varying lengths because of IR replication.
Figure 3. Comparison of the borders of LSC, SSC, and IR regions in seven Symplocos chloroplast genomes. The JSB junction produces a pseudogene of ycf1 of varying lengths because of IR replication.
Forests 12 00608 g003
Figure 4. Comparison of the frequencies of different types of SSRs in the chloroplast genomes of seven Symplocos species.
Figure 4. Comparison of the frequencies of different types of SSRs in the chloroplast genomes of seven Symplocos species.
Forests 12 00608 g004
Figure 5. Comparison of long repeats in the chloroplast genomes of seven Symplocos species. (a) The number of different types of long repeats. (b) The number of repeats of each length.
Figure 5. Comparison of long repeats in the chloroplast genomes of seven Symplocos species. (a) The number of different types of long repeats. (b) The number of repeats of each length.
Forests 12 00608 g005
Figure 6. Sliding window analysis of the whole chloroplast genome nucleotide diversity (Pi) among seven Symplocos species.
Figure 6. Sliding window analysis of the whole chloroplast genome nucleotide diversity (Pi) among seven Symplocos species.
Forests 12 00608 g006
Figure 7. Minimum, average, and maximum p-distance (Pi) values of different regions of seven Symplocos chloroplast genomes.
Figure 7. Minimum, average, and maximum p-distance (Pi) values of different regions of seven Symplocos chloroplast genomes.
Forests 12 00608 g007
Figure 8. Maximum likelihood (ML) phylogenetic tree based on 80 protein-coding genes from 11 Ericales species. Numbers mentioned above the lines represent ML posterior probabilities.
Figure 8. Maximum likelihood (ML) phylogenetic tree based on 80 protein-coding genes from 11 Ericales species. Numbers mentioned above the lines represent ML posterior probabilities.
Forests 12 00608 g008
Table 1. Summary of the assembly data for Symplocos chloroplast genomes.
Table 1. Summary of the assembly data for Symplocos chloroplast genomes.
CategoryS. chinensis for. pilosaS. coreanaS. prunifoliaS. tanakana
Specimen numberWTFRC10032701WTFRC10031678WTFRC10032813WTFRC10031670
Accession numberMW307951MW307952MW307953MW307954
Total bases (GB)7.24119.5712.8
Total reads42,998,60364,043,10756,918,74977,435,124
Read length (bp)177180175171
Genome size [GC (%)]156,961 [37.5]157,365 [37.5]157,204 [37.5]156,971 [37.5]
LSC [GC (%)]87,006 [35.5]87,434 [35.5]87,219 [35.4]87,017 [35.5]
SSC [GC (%)]17,817 [31.0]17,879 [31.0]17,795 [31.0]17,814 [31.0]
IR [GC (%)]26,069 [43.1]26,026 [43.1]26,095 [43.1]26,070 [43.1]
Table 2. Chloroplast genome-encoded gene types and functional classification for S. chinensis for. pilosa, S. coreana, S. prunifolia, and S. tanakana.
Table 2. Chloroplast genome-encoded gene types and functional classification for S. chinensis for. pilosa, S. coreana, S. prunifolia, and S. tanakana.
Gene CategoryGene GroupGene Names
Self-replicationLarge subunit ribosomal proteinrpl2(×2) *, rpl14, rpl16 *, rpl20, rpl22, rpl23(×2), rpl32, rpl33, rpl36
DNA dependent RNA polymeraserpoA, rpoB, rpoC1 *, rpoC2
Small subunit ribosomal proteinrps2, rps3, rps4, rps7(×2), rps8, rps11, rps12(×2) **, rps14, rps15, rps16 *, rps18, rps19
rRNArrn4.5S(×2), rrn5S(×2), rrn16S(× 2), rrn23S(×2)
tRNAtrnA-UGC(×2) *, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnG-GC C *, trnG-UCC, trnH-GUG, trnI-GAU(×2) *, trnI-CAU(×2), trnK-UUU *, trnL-CAA(×2), trnL-UAA *, trnL-UAG, trnM-CAU, trnN-GUU(×2), trnP-UGG, trnQ-UUG, trnR-ACG(×2), trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC(×2), trnV-UAC *, trnW-CCA, trnY-GUA
PhotosynthesisATP synthase subunitatpA, atpB, atpE, atpF *, atpH, atpI
NADH-dehydrogenase subunitndhA *, ndhB(×2) *, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Cytochrome b/f complex subunitpetA, petB *, petD, petG, petL, petN
Photosystem I subunitpsaA, psaB, psaC, psaI, psaJ, ycf4
Photosystem II subunitpsbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
Rubisco subunitrbcL
Other genesAcetyl-CoA-carboxylase subunitaccD
C-type cytochrome synthesis geneccsA
Envelope membrane proteincemA
ATP-dependent protease subunit PclpP **
Translational initiation factorinfA
MaturasematK
Unknown functionConserved open reading frameycf1, ycf2(×2), ycf3 **, ycf15(×2)
Note: (×2)—two gene copies in the IRs; * Gene containing a single intron; ** Gene containing two introns.
Table 3. Summary of SSRs in the chloroplast genomes of the seven Symplocos species identified by using MISA.
Table 3. Summary of SSRs in the chloroplast genomes of the seven Symplocos species identified by using MISA.
SSR TypeRepeat UnitS. chinensis for. pilosaS. coreanaS. prunifoliaS. tanakanaS. costaricanaS. ovatilobataS. paniculataTotal
Mono-A/T27293831372836230
C/G1100101
Di-AT/AT636654636
Tri-AAT/ATT11112119
AGC/CTG0100000
Tetra-AAAG/CTTT323322379
AAAT/ATTT3233223
AACC/GGTT0000110
AAGG/CCTT1011201
AATC/ATTG1111111
AATT/AATT1111111
ACAG/CTGT1111111
AGAT/ATCT1011111
ATCC/ATGG2022002
Penta-AACTT/AAGTT01001108
AAGGT/ACCTT0000010
AATAT/ATATT0000100
ACTAT/AGTAT0100110
Total 48445851594557362
Table 4. Summary of long-repeat data for the chloroplast genomes of seven Symplocos species generated by REPuter analysis.
Table 4. Summary of long-repeat data for the chloroplast genomes of seven Symplocos species generated by REPuter analysis.
Repeat TypeS. chinensis for. pilosaS. coreanaS. prunifoliaS. tanakanaS. costaricanaS. ovatilobataS. paniculata
Forward12111312111012
Reverse0100010
Palindromic17171617181817
Total29292929292929
Repeat length (bp)
20–2925222221242325
30–394444564
40–490333000
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, S.-C.; Lee, J.-W.; Choi, B.-K. Seven Complete Chloroplast Genomes from Symplocos: Genome Organization and Comparative Analysis. Forests 2021, 12, 608. https://doi.org/10.3390/f12050608

AMA Style

Kim S-C, Lee J-W, Choi B-K. Seven Complete Chloroplast Genomes from Symplocos: Genome Organization and Comparative Analysis. Forests. 2021; 12(5):608. https://doi.org/10.3390/f12050608

Chicago/Turabian Style

Kim, Sang-Chul, Jei-Wan Lee, and Byoung-Ki Choi. 2021. "Seven Complete Chloroplast Genomes from Symplocos: Genome Organization and Comparative Analysis" Forests 12, no. 5: 608. https://doi.org/10.3390/f12050608

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop