**1. Introduction**

*Aristolochia* sensu lato, comprising about 500 species, is the largest genus of Aristolochiaceae, with a broad distribution range from tropical to subtropical, extending to temperate regions [1,2]. Several species of *Aristolochia*, such as *Aristolochia moupinensis*, *Aristolochia tagala*, and *Aristolochia mollissima*, have been reported as traditional Chinese medicines [3,4]. Aristolochiaceae is a unique plant family containing aristolochic acids (AAs), and their derivatives are widely implicated in liver cancers [5,6]. However, current studies have demonstrated that AAs are of nephrotoxicity, carcinogenicity, and mutagenicity [7,8]. The sale and use of AA-containing herbal preparations have been restricted in many countries [9].

The monophyly of Aristolochiaceae was well supported in most analysis, and was divided into two subfamilies, Asaroideae and Aristolochioideae [10,11]. The studies recognized two genera *Saruma* and *Asarum* in Asaroideae [10–13]. The genus *Aristolochia* of subfamily Aristolochioideae was classified into two major lineages, as indicated by previous studies based on morphological characters and molecular phylogenetic methods [10,14–16]. In the past years, the nuclear *ITS*2, *phyA* gene, and several plastid genome regions (such as *matk*, *rbcL*, *trnK*, and *trnL*-*trnF*) or their combinations have been frequently used in molecular systematics of Aristolochiaceae [11,15,17,18]. The inter-simple sequence repeat (ISSR) markers were also used to identify diverse genetic stocks and understand the evolutionary relationships of *Aristolochia* [19,20]. The selected loci failed to provide sufficient phylogenetic information to elucidate the evolutionary relationships among *Aristolochia* species. A universal barcode either using whole chloroplast (cp) genomes or hyper-variable regions are urgently needed, which may significantly improve the low resolution in plant relationships and contribute to the conservation, domestication, and utilization of *Aristolochia* plants.

The chloroplast is the key organelle for photosynthesis and carbon fixation in green plants [21]. Their genomes could provide valuable information for taxonomic classification and phylogenetic reconstruction among species of land plants [22–25]. Typical cp genomes in angiosperms have a generally conserved quadripartite circular structure with two copies of inverted repeat (IR) regions that are separated by a large single copy (LSC) region and a small single copy (SSC) region, and encode 120–135 genes with sizes in the range of 120–170 kb [26,27]. In recent years, the cp genomes of *Aristolochia debilis*, *Aristolochia contorta*, *Saruma henryi*, and nine species of *Asarum* within the Aristolochiaceae family have been reported [28–31]. Those sequenced cp genomes of Aristolochiaceae, except for those of *Asarum* species, were conserved in length, gene and GC content, from which no rearrangement event had been detected.

With the rapid development of next-generation sequencing (NGS), it is now more convenient and cheaper to obtain cp genome sequences, feasible to compare analysis of sequences evolution among different individuals. In this study, we reported seven complete cp genomes of *Aristolochia* and conducted comparative genomic analyses, focused on gene size, content, patterns of nucleotide substitutions, and variable sites. Another 12 published cp genome sequences of Magnoliids downloaded from the National Center for Biotechnology Information (NCBI) organelle genome database (https://www.ncbi.nlm.nih.gov) [32] were also used to detect selective sites, repeat sequences, simple sequence repeats (SSRs), and phylogenetic constructions. We performed these comparative genomes analysis to obtain comprehensive understanding the structure of plastomes within *Aristolochia* and to provide genetic resources for future research in the genus.

#### **2. Results**

#### *2.1. The Chloroplast Genome Structures of Species*

All the species of *Aristolochia* we sequenced had a typical quadripartite structure, with a circular molecule of 159,308 bp to 160,520 bp in length. The complete cp genomes of involved species comprise an LSC region (88,652–89,859 bp) and an SSC region (19,322–19,799 bp), separated by a pair of IRs ranging from 25,242 bp to 25,700 bp in length (Figure 1, Table 1). GC content of the plastomes of the seven *Aristolochia* species varies slightly from 38.5% to 38.9% (Table 1). The GC content within coding sequence (CDS) of the two species (*A. tagala* and A. *tubiflora*) of subgenus *Aristolochia* and five species (*A. kunmingensis*, *A. moupinensis*, *A. macrophylla*, *A. kaempferi*, and *A. mollissima*) of subgenus *Siphisia* was 38.9% and 39.2%, respectively. GC% content of the first position was higher compared to those of the second and third positions (Figure 2, Table S1). A total of 113 unique genes were identified in the seven cp genomes, including 79 protein coding genes, 30 tRNAs and four rRNAs, 19 or 18 genes of which duplicated in the IR region (Tables 1 and 2).

**Figure 1.** *Cont*.

B

Int. J. Mol. Sci. 2019, 20, x FOR PEER REVIEW 5 of 23

**Figure 1.** *Cont*.

Figure 1. Gene maps of the complete cp genome of seven species of Aristolochia. Gene map of cp genome of (A) Aristolochia manshuriensis; (B) Aristolochia kaempferi, Aristolochia macrophylla, Aristolochia mollissima and Aristolochia kunmingensis; (C) Aristolochia tagala and Aristolochia tubiflora. Genes on the inside of the circle are transcribed clockwise, while those outside are transcribed counter clockwise. The darker gray in the inner circle corresponds to GC content, whereas the lighter gray corresponds to AT content. **Figure 1.** Gene maps of the complete cp genome of seven species of *Aristolochia*. Gene map of cp genome of (**A**) *Aristolochia manshuriensis*; (**B**) *Aristolochia kaempferi*, *Aristolochia macrophylla*, *Aristolochia mollissima* and *Aristolochia kunmingensis*; (**C**) *Aristolochia tagala* and *Aristolochia tubiflora*. Genes on the inside of the circle are transcribed clockwise, while those outside are transcribed counter clockwise. The darker gray in the inner circle corresponds to GC content, whereas the lighter gray corresponds to AT content.


Introns play an important role in the regulation of some gene expressions [33]. Eighteen genes of **Table 1.** Summary of complete chloroplast (cp) genomes of *Aristolochia* species.


**Table 2.** Gene contents in the cp genomes of *Aristolochia* species.

\* Gene contains one intron; \*\* gene contains two introns; (x2) indicates the number of the repeat unit is 2. trnL-UAA LSC 35 490 50 trnV-UAC LSC 37 595 36

ycf3 LSC 126 830 226 763 149

Figure 2. The GC (%) composition in different positions of coding sequence (CDS) region of species within Aristolochia. **Figure 2.** The GC (%) composition in different positions of coding sequence (CDS) region of species within *Aristolochia*.

2.2. IR Contraction and Expansion The IR regions are expanded in five species of subgenus Siphisia compare with other two species (A. tagala and A. tubiflora) of subgenus Aristolochia, indicated by different duplication genes in the IR regions, where eight or seven tRNA genes were duplicated, respectively (Figure 1, Table 2). The size of the IR region of subgenus Siphisia varies from 25,664 bp to 25,700 bp, and is 25,242 bp and 25,431 bp in the two plastomes of subgenus Aristolochia (Table 1). Introns play an important role in the regulation of some gene expressions [33]. Eighteen genes of seven plastomes contain one intron, including *atpF*, *rpoC1*, *ycf3*, *rps12*, *rpl2*, *rpl16*, *clpP*, *petB*, *petD*, *rps16*, *ndhA*, *ndhB*, and six tRNA genes, while three genes (*clpP*, *ycf3*, and *rps12*) contain two introns. The longest intron occurred in the *trnK*-UUU gene is 2552–2687 bp of seven plastomes, and had been used to the inter- and intra-species of *Aristolochia* [2,16]. In addition, the length of *rpl2* intron in species of subgenus *Siphisia* and subgenus *Aristolochia* is 700 bp and 659 bp, respectively (Table 3).

Fluctuation of IR-SC borders, together with the adjacent genes, were examined among seven Aristolochia species and six plastomes retrieved from GenBank (including Aristolochia contorta: NC\_036152.1, Aristolochia debilis: NC\_036153.1, Asarum canadense: MG544845-MG544851, Saruma henryi: MG520100, Piper auritum: NC\_034697.1, and Drimys granadensis: NC\_008456.1) (Figure 3). The LSC-IRb border, was located within the genic spacer of rps19-trnH for A. kaempferi, A. macrophylla, and A. mollissima (Type I), within the rps19 gene for A. kunmingensis and A. moupinensis (Type II),

SSC-IRa border was situated in the coding region ycf1 gene in the other 10 sequenced species, which spanned into the IRa region. Among the 10 detected species, the pseudogene ycf1 in the IRb region with the same length as far as the IRa expanded into ycf1 gene, and the length ranged from 153 bp to 2271 bp. The ndhF gene was entirely located in the SSC region in 10 species of Aristolochiaceae, but varied in distance (11-80 bp) from the IRb-SSC border. The LSC-IRa border in the species of subgenus Aristolochia was situated in the trnH gene with 10 bp into the IRa region (Type III), while the border

was located in the trnH-psbA spacer in subgenus Siphisia species (Type I and II) (Figure 3).


**Table 3.** Genes with introns in the seven cp genomes of *Aristolochia* as well as the lengths of the exons and introns.


**Table 3.** *Cont.*


**Table 3.** *Cont.*

#### *2.2. IR Contraction and Expansion*

The IR regions are expanded in five species of subgenus *Siphisia* compare with other two species (*A. tagala* and *A. tubiflora*) of subgenus *Aristolochia*, indicated by different duplication genes in the IR regions, where eight or seven tRNA genes were duplicated, respectively (Figure 1, Table 2). The size of the IR region of subgenus *Siphisia* varies from 25,664 bp to 25,700 bp, and is 25,242 bp and 25,431 bp in the two plastomes of subgenus *Aristolochia* (Table 1).

Fluctuation of IR-SC borders, together with the adjacent genes, were examined among seven *Aristolochia* species and six plastomes retrieved from GenBank (including *Aristolochia contorta*: NC\_036152.1, *Aristolochia debilis*: NC\_036153.1, *Asarum canadense*: MG544845-MG544851, *Saruma henryi*: MG520100, *Piper auritum*: NC\_034697.1, and *Drimys granadensis*: NC\_008456.1) (Figure 3). The LSC-IRb border, was located within the genic spacer of *rps19*-*trnH* for *A. kaempferi*, *A. macrophylla*, and *A. mollissima* (Type I), within the *rps19* gene for *A. kunmingensis* and *A. moupinensis* (Type II), while in the *rps19*-*rpl2* spacer for *A. tagala* and *A. tubiflora* (Type III). There were two types of SSC-IRa border among 13 detected species. In the three plastomes (*A. moupinensis*, *A. tubiflora*, and *A. tagala*), which *ycf1* gene was fully located in the SSC region, and 25-43 bp apart from the SSC-IRa border. The SSC-IRa border was situated in the coding region *ycf1* gene in the other 10 sequenced species, which spanned into the IRa region. Among the 10 detected species, the pseudogene *ycf1* in the IRb region with the same length as far as the IRa expanded into *ycf1* gene, and the length ranged from 153 bp to 2271 bp. The *ndhF* gene was entirely located in the SSC region in 10 species of Aristolochiaceae, but varied in distance (11-80 bp) from the IRb-SSC border. The LSC-IRa border in the species of subgenus *Aristolochia* was situated in the *trnH* gene with 10 bp into the IRa region (Type III), while the border was located in the *trnH*-*psbA* spacer in subgenus *Siphisia* species (Type I and II) (Figure 3).

Int. J. Mol. Sci. 2019, 20, x FOR PEER REVIEW 10 of 23

Figure 3. Comparison of the borders of large single copy (LSC), small single copy (SSC) and inverted repeat (IR) regions among 13 cp genomes. Number above the gene features means the distance between the ends of genes and the borders sites. These features are not to scale. **Figure 3.** Comparison of the borders of large single copy (LSC), small single copy (SSC) and inverted repeat (IR) regions among 13 cp genomes. Number above the gene features means the distance between the ends of genes and the borders sites. These features are not to scale.

#### 2.3. Codon Usage *2.3. Codon Usage*

All the protein-coding genes were composed of 26,194–26,398 codons in the cp genomes of the seven species of Aristolochia. The codon usages of protein-coding genes in the cp genomes are summarized in Figure 4 and Table S2. Among these codons, the most common amino acid in the protein-coding genes is leucine, which appears 2775 times in A. kaempferi and A. mollissima. The relative synonymous codon usage (RSCU) value analysis showed that almost all amino acids have more than one synonymous codon, except methionine and tryptophan. Nearly all of the proteincoding genes of Aristolochia species had the standard ATG start codon (RSCU = 1). About half of codons have RSCU > 1, and most of those (29/31, 93.5%) end with base A or T. About half of the codons have RSCU < 1, and most of those (28/31, 90.3%) end with base C or G. All the protein-coding genes were composed of 26,194–26,398 codons in the cp genomes of the seven species of *Aristolochia*. The codon usages of protein-coding genes in the cp genomes are summarized in Figure 4 and Table S2. Among these codons, the most common amino acid in the protein-coding genes is leucine, which appears 2775 times in *A. kaempferi* and *A. mollissima*. The relative synonymous codon usage (RSCU) value analysis showed that almost all amino acids have more than one synonymous codon, except methionine and tryptophan. Nearly all of the protein-coding genes of *Aristolochia* species had the standard ATG start codon (RSCU = 1). About half of codons have RSCU > 1, and most of those (29/31, 93.5%) end with base A or T. About half of the codons have RSCU < 1, and most of those (28/31, 90.3%) end with base C or G.

Figure 4. Codon content of 20 amino acid and stop codons in all protein-coding genes of the seven cp genomes. The histogram from the left-hand side of each amino acid shows codon usage within Aristolochia (From left to right: A. tagala, A. tubiflora, A. moupinensis, A. kunmingensis, A. kaempferi, A. **Figure 4.** Codon content of 20 amino acid and stop codons in all protein-coding genes of the seven cp genomes. The histogram from the left-hand side of each amino acid shows codon usage within *Aristolochia* (From left to right: *A. tagala*, *A. tubiflora*, *A. moupinensis*, *A. kunmingensis*, *A. kaempferi*, *A. macrophylla*, and *A. mollissima*).

#### macrophylla, and A. mollissima). *2.4. Positive Selection Analysis*

gene with one positive selection site.

2.4. Positive Selection Analysis We compared the ratio of non-synonymous (dN) and synonymous (dS) substitution for 79 protein-coding genes among seven species, including A. kunmingensis, A. kaempferi, A. tagala, A. debilis, As. canadense, S. henryi, and P. auritum within Piperales. The statistical neutrality test showed that five genes in the seven cp genomes are under significant positive selection, and these genes are involved in the synthesis of ribosomal small and large subunit protein (rps12, rps18, and rpl20) or unknown function (ycf1 and ycf2) (Table 4). Likelihood ratio tests (M1a vs. M2a, M7 vs. M8) supported the presence of positively selected codon sites (p < 0.05) (Table S3). According to the M2a and M8 models, the rpl20 harbored three or four sites under positive selection. The gene ycf1 harbored one or three sites under positive selection based on two models, respectively. In addition, we identified rps12 We compared the ratio of non-synonymous (dN) and synonymous (dS) substitution for 79 protein-coding genes among seven species, including *A. kunmingensis*, *A. kaempferi*, *A. tagala*, *A. debilis*, *As. canadense*, *S. henryi*, and *P. auritum* within Piperales. The statistical neutrality test showed that five genes in the seven cp genomes are under significant positive selection, and these genes are involved in the synthesis of ribosomal small and large subunit protein (*rps12*, *rps18*, and *rpl20*) or unknown function (*ycf1* and *ycf2*) (Table 4). Likelihood ratio tests (M1a vs. M2a, M7 vs. M8) supported the presence of positively selected codon sites (*p* < 0.05) (Table S3). According to the M2a and M8 models, the *rpl20* harbored three or four sites under positive selection. The gene *ycf1* harbored one or three sites under positive selection based on two models, respectively. In addition, we identified *rps12* gene with one positive selection site.


**Table 4.** Positive selected sites detected in the cp genome of the Piperales.

\* p < 0.05; \*\* p < 0.01. \* *p* < 0.05; \*\* *p* < 0.01.

#### 2.5. Repeat Structure and Simple Sequence Repeats Analyses *2.5. Repeat Structure and Simple Sequence Repeats Analyses*

Repeats in ten cp genomes were analyzed using REPuter, including seven species of Aristolochia, S. henryi, P. auritum, and D. granadensis (Figure 5, Table S4). The results showed that A. macrophylla had the greatest number of repetitive elements in cp genome, comprised of 25 forward, 26 palindromic, 21 reverse, and eight complement repeats. The size of the most repeats were 30–39 bp, and the repeats with the length > 49 bp only occurred in cp genomes of S. henryi and P. auritum. The Repeats in ten cp genomes were analyzed using REPuter, including seven species of *Aristolochia*, *S. henryi*, *P. auritum*, and *D. granadensis* (Figure 5, Table S4). The results showed that *A. macrophylla* had the greatest number of repetitive elements in cp genome, comprised of 25 forward, 26 palindromic, 21 reverse, and eight complement repeats. The size of the most repeats were 30–39 bp, and the repeats with the length > 49 bp only occurred in cp genomes of *S. henryi* and *P. auritum*. The longest repeats,

with a length of 1591 bp, was detected in *S. henryi*. The total numbers of SSRs were also identified in the cp genomes of the ten species (Figure 6 and Table S5). Mononucleotide repeats were the largest in a number of these SSRs, with 88% and 85% found in A. *tubiflora* and *A. tagala*, respectively. A/T repeats were the most common of mononucleotides, while AT/TA repeats are the majority of dinucleotide repeat sequences (96.3%–100%). The trinucleotide in the five species of subgenus *Siphisia* were only comprised of AAT/ATT repeats (100%), while *A. tubiflora* and *A. tagala* of subgenus *Aristolochia* also comprised AAC/GTT and AAG/CTT repeats. longest repeats, with a length of 1591 bp, was detected in S. henryi. The total numbers of SSRs were also identified in the cp genomes of the ten species (Figure 6 and Table S5). Mononucleotide repeats were the largest in a number of these SSRs, with 88% and 85% found in A. tubiflora and A. tagala, respectively. A/T repeats were the most common of mononucleotides, while AT/TA repeats are the majority of dinucleotide repeat sequences (96.3%–100%). The trinucleotide in the five species of subgenus Siphisia were only comprised of AAT/ATT repeats (100%), while A. tubiflora and A. tagala of subgenus Aristolochia also comprised AAC/GTT and AAG/CTT repeats. longest repeats, with a length of 1591 bp, was detected in S. henryi. The total numbers of SSRs were also identified in the cp genomes of the ten species (Figure 6 and Table S5). Mononucleotide repeats were the largest in a number of these SSRs, with 88% and 85% found in A. tubiflora and A. tagala, respectively. A/T repeats were the most common of mononucleotides, while AT/TA repeats are the majority of dinucleotide repeat sequences (96.3%–100%). The trinucleotide in the five species of subgenus Siphisia were only comprised of AAT/ATT repeats (100%), while A. tubiflora and A. tagala of subgenus Aristolochia also comprised AAC/GTT and AAG/CTT repeats.

Int. J. Mol. Sci. 2019, 20, x FOR PEER REVIEW 12 of 23

Int. J. Mol. Sci. 2019, 20, x FOR PEER REVIEW 12 of 23

Figure 5. Repeat sequences in ten cp genomes. REPuter was used to identify repeat sequences with length ≥ 30 bp and sequence identity ≥ 90% in the cp genomes. F, P, R, and C indicate the repeat types F (forward), P (palindrome), R (reverse), and C (complement), respectively. Repeats with different lengths are indicated in different colors. **Figure 5.** Repeat sequences in ten cp genomes. REPuter was used to identify repeat sequences with length ≥ 30 bp and sequence identity ≥ 90% in the cp genomes. F, P, R, and C indicate the repeat types F (forward), P (palindrome), R (reverse), and C (complement), respectively. Repeats with different lengths are indicated in different colors. Figure 5. Repeat sequences in ten cp genomes. REPuter was used to identify repeat sequences with length ≥ 30 bp and sequence identity ≥ 90% in the cp genomes. F, P, R, and C indicate the repeat types F (forward), P (palindrome), R (reverse), and C (complement), respectively. Repeats with different lengths are indicated in different colors.

Figure 6. Frequency of simple sequence repeats (SSRs) in the ten cp genomes. **Figure 6.** Frequency of simple sequence repeats (SSRs) in the ten cp genomes.

#### Figure 6. Frequency of simple sequence repeats (SSRs) in the ten cp genomes. 2.6. Comparative Genomic Divergence and Hotspots Regions *2.6. Comparative Genomic Divergence and Hotspots Regions*

regions were ndhF and ycf1.

2.6. Comparative Genomic Divergence and Hotspots Regions The SC and IR regions of cp genomes of the seven species (including A. moupinensis, A. kunmingensis, A. tagala, A. contorta, S. henryi, As. canadense, and P. auritum) were compared using the mVISTA program to detect hyper-variable regions (Figure 7). The alignment revealed high sequence conservatism across the cp genomes of A. moupinensis and A. kunmingensis of subgenus Siphisia. The comparison among seven cp genomes showed that the IR region was more conserved than the SC The SC and IR regions of cp genomes of the seven species (including A. moupinensis, A. kunmingensis, A. tagala, A. contorta, S. henryi, As. canadense, and P. auritum) were compared using the mVISTA program to detect hyper-variable regions (Figure 7). The alignment revealed high sequence conservatism across the cp genomes of A. moupinensis and A. kunmingensis of subgenus Siphisia. The comparison among seven cp genomes showed that the IR region was more conserved than the SC regions. The most divergent regions located in the intergenic spacers, and the most divergent coding regions were ndhF and ycf1. The SC and IR regions of cp genomes of the seven species (including *A. moupinensis*, *A. kunmingensis*, *A. tagala*, *A. contorta*, *S. henryi*, *As. canadense*, and *P. auritum*) were compared using the mVISTA program to detect hyper-variable regions (Figure 7). The alignment revealed high sequence conservatism across the cp genomes of *A. moupinensis* and *A. kunmingensis* of subgenus *Siphisia*. The comparison among seven cp genomes showed that the IR region was more conserved than the SC regions. The most divergent regions located in the intergenic spacers, and the most divergent coding regions were *ndhF* and *ycf1*.

regions. The most divergent regions located in the intergenic spacers, and the most divergent coding

Int. J. Mol. Sci. 2019, 20, x FOR PEER REVIEW 13 of 23

Figure 7. Sequence identity plot compared seven cp genomes with A. moupinensis as a reference by using mVISTA. Grey arrows and thick black lines above the alignment indicate genes with their orientation and the position of the IRs, respectively. A cut-off of 70% identity was used for the plots, and the Y-scale represents the percent identity from 50% to 100%. **Figure 7.** Sequence identity plot compared seven cp genomes with *A. moupinensis* as a reference by using mVISTA. Grey arrows and thick black lines above the alignment indicate genes with their orientation and the position of the IRs, respectively. A cut-off of 70% identity was used for the plots, and the *Y*-scale represents the percent identity from 50% to 100%.

Comparative analysis among our seven sequenced species within Aristolochia was conducted of the entire cp genomes, LSC, SSC, IR, and CDS regions, respectively (Table 5). The nucleotide diversity (Pi) value was also calculated to evaluate the sequence divergence among these cp genomes, and their values varied from 0 to 0.07746 (Figure 8). The analysis revealed that the SSC region, compared with other regions, exhibited the highest levels of divergence (Pi = 0.03114). These values of the LSC region, varied from 0.00175 to 0.07746, with the mean value of 0.02182. The IR region exhibited the lowest Pi values varying from 0 to 0.01056, with the mean of 0.00411, indicating that IR region was the most conserved one. Furthermore, we identified 16 hotspot regions (Pi > 0.04, the mean value = 0.05413) with the full length of 20,296 bp, including rps16-trnQ-psbK, psbI-trnS-trnG, atpH-atpI, psbM-trnD, rps4-trnTtrnL, trnF-ndhJ, ndhC-trnV, accD-psaI, petA-psbJ, rps18-rpl20, trnN-ndhF, rpl32-trnL-ccsA, and four regions of ycf1 coding gene (Table 6). Ten of these (rps16-trnQ-psbK, psbI-trnS-trnG, atpH-atpI, psbM-trnD, rps4 trnT-trnL, trnF-ndhJ, ndhC-trnV, accD-psaI, petA-psbJ, and rps18-rpl20) are located in the LSC, and six (trnN-ndhF, rpl32-trnL-ccsA and ycf1) in the SSC region, which could be utilized as potential markers for the phylogeny reconstruction and species identification of this subgenus in further studies. Comparative analysis among our seven sequenced species within *Aristolochia* was conducted of the entire cp genomes, LSC, SSC, IR, and CDS regions, respectively (Table 5). The nucleotide diversity (Pi) value was also calculated to evaluate the sequence divergence among these cp genomes, and their values varied from 0 to 0.07746 (Figure 8). The analysis revealed that the SSC region, compared with other regions, exhibited the highest levels of divergence (Pi = 0.03114). These values of the LSC region, varied from 0.00175 to 0.07746, with the mean value of 0.02182. The IR region exhibited the lowest Pi values varying from 0 to 0.01056, with the mean of 0.00411, indicating that IR region was the most conserved one. Furthermore, we identified 16 hotspot regions (Pi > 0.04, the mean value = 0.05413) with the full length of 20,296 bp, including *rps16*-*trnQ*-*psbK*, *psbI*-*trnS*-*trnG*, *atpH*-*atpI*, *psbM*-*trnD*, *rps4*-*trnT*-*trnL*, *trnF*-*ndhJ*, *ndhC*-*trnV*, *accD*-*psaI*, *petA*-*psbJ*, *rps18*-*rpl20*, *trnN*-*ndhF*, *rpl32*-*trnL*-*ccsA*, and four regions of *ycf1* coding gene (Table 6). Ten of these (*rps16*-*trnQ*-*psbK*, *psbI*-*trnS*-*trnG*, *atpH*-*atpI*, *psbM*-*trnD*, *rps4*-*trnT*-*trnL*, *trnF*-*ndhJ*, *ndhC*-*trnV*, *accD*-*psaI*, *petA*-*psbJ*, and *rps18*-*rpl20*) are located in the LSC, and six (*trnN*-*ndhF*, *rpl32*-*trnL*-*ccsA* and *ycf1*) in the SSC region, which could be utilized as potential markers for the phylogeny reconstruction and species identification of this subgenus in further studies.

**Table 5.** Variable sites analyses in the seven *Aristolochia* cp genomes.


length: 600 bp; step size: 200 bp). X-axis: position of the midpoint of a window; Y-axis: nucleotide

diversity of each window.

the phylogeny reconstruction and species identification of this subgenus in further studies.

Figure 7. Sequence identity plot compared seven cp genomes with A. moupinensis as a reference by using mVISTA. Grey arrows and thick black lines above the alignment indicate genes with their orientation and the position of the IRs, respectively. A cut-off of 70% identity was used for the plots,

Comparative analysis among our seven sequenced species within Aristolochia was conducted of the entire cp genomes, LSC, SSC, IR, and CDS regions, respectively (Table 5). The nucleotide diversity (Pi) value was also calculated to evaluate the sequence divergence among these cp genomes, and their values varied from 0 to 0.07746 (Figure 8). The analysis revealed that the SSC region, compared with other regions, exhibited the highest levels of divergence (Pi = 0.03114). These values of the LSC region, varied from 0.00175 to 0.07746, with the mean value of 0.02182. The IR region exhibited the lowest Pi values varying from 0 to 0.01056, with the mean of 0.00411, indicating that IR region was the most conserved one. Furthermore, we identified 16 hotspot regions (Pi > 0.04, the mean value = 0.05413) with the full length of 20,296 bp, including rps16-trnQ-psbK, psbI-trnS-trnG, atpH-atpI, psbM-trnD, rps4-trnTtrnL, trnF-ndhJ, ndhC-trnV, accD-psaI, petA-psbJ, rps18-rpl20, trnN-ndhF, rpl32-trnL-ccsA, and four regions of ycf1 coding gene (Table 6). Ten of these (rps16-trnQ-psbK, psbI-trnS-trnG, atpH-atpI, psbM-trnD, rps4-

and the Y-scale represents the percent identity from 50% to 100%.

Figure 8. Sliding window analysis of the entire cp genome of seven Aristolochia species (window length: 600 bp; step size: 200 bp). X-axis: position of the midpoint of a window; Y-axis: nucleotide diversity of each window. **Figure 8.** Sliding window analysis of the entire cp genome of seven *Aristolochia* species (window length: 600 bp; step size: 200 bp). *X*-axis: position of the midpoint of a window; *Y*-axis: nucleotide diversity of each window.


**Table 6.** Sixteen regions of highly variable sequences (Pi > 0.04) of *Aristolochia*.
