*2.7. Phylogenetic Analyses*

The phylogenetic relationships of Aristolochiaceae were constructed based on six datasets (entire cp genome sequences except a copy of IR, LSC, SSC, IR, and CDS regions and combining 16 hotspots) of 18 samples, using three methods of ML, MP, and BI, respectively (Figure 9). The robust topologies were consistent for most clades of cp genomes, LSC, SSC, CDS, and hotspots datasets, with the high bootstrap values for most of the branches (Figure 9A). From these six different datasets, the phylogenetic analysis showed that the genera *Asarum* and *Saruma* represented by seven species formed a clade with posterior probabilities (PP) = 1 based on BI, bootstrap values (%) (BS) =100 based on ML and BS =100 based on MP methods. However, the tree constructed using sequences of the IR region failed to resolve the phylogeny position of *Asarum epigynum* and *As. canadense* (Figure 9B), maybe due to inadequate information sites in the IR region. These nine species of *Aristolochia* species formed another strongly supported monophyletic group (PP = 1; [ML] BS = 100; [MP] BS = 100), and were divided into two subclades with strong support, corresponding to the taxonomic division of subgenus *Siphisia* (PP = 1; [ML] BS = 100; [MP] BS = 100) and subgenus *Aristolochia* (PP = 1; [ML] BS = 100; [MP] BS = 100). Within the subgenus *Siphisia*, the species *A. macrophylla* from North America was sister to the rest of four species from Asian region (PP = 1; [ML] BS = 100; [MP] BS = 100).

Int. J. Mol. Sci. 2019, 20, x FOR PEER REVIEW 15 of 23

Figure 9. Phylogenetic relationships of the 18 species inferred from maximum parsimony (MP), maximum likelihood (ML), and Bayesian (BI) analyses. (A) The topology was constructed by cp genomes, LSC, SSC, CDS, and hotspots regions; (B) tree constructed by IR region. Bayesian posterior probability values < 0.95 or Bootstrap values < 90 were marked on the branches. The support values in node (a): 1/86/93 (using LSC region), 0.97/78/84 (SSC), 0.82/-/- (CDS), and 1/81/80 (hotspots); (b): 1/90/79 (SSC) and 1/73/71 (hotspots). Numbers above nodes are support values with Bayesian posterior probabilities values on the left, ML bootstrap values in the middle, and MP bootstrap values on the right. " - " indicates the value < 70. **Figure 9.** Phylogenetic relationships of the 18 species inferred from maximum parsimony (MP), maximum likelihood (ML), and Bayesian (BI) analyses. (**A**) The topology was constructed by cp genomes, LSC, SSC, CDS, and hotspots regions; (**B**) tree constructed by IR region. Bayesian posterior probability values < 0.95 or Bootstrap values < 90 were marked on the branches. The support values in node (a): 1/86/93 (using LSC region), 0.97/78/84 (SSC), 0.82/-/- (CDS), and 1/81/80 (hotspots); (b): 1/90/79 (SSC) and 1/73/71 (hotspots). Numbers above nodes are support values with Bayesian posterior probabilities values on the left, ML bootstrap values in the middle, and MP bootstrap values on the right. " - " indicates the value < 70.

#### 3. Discussion **3. Discussion**

#### 3.1. IR Contraction and Expansion *3.1. IR Contraction and Expansion*

Taken another two reported species (A. debilis and A. contorta) of subgenus Aristolochia into account, although genomic structure and size were highly conserved, the IR-SC boundary regions were variable between these nine cp genomes of Aristolochia (Figure 3). In general, contraction and expansion at the borders of IR regions are common evolutionary events and may cause IR size variation of plastomes [29,34–36]. The length of the IR regions of five Siphisia species, varying in the range of 25,664– 25,700 bp, was longer than those of the four species of subgenus Aristolochia, which varied from 25,175 bp to 25,459 bp (Table 1) [28]. We identified three types of the IR-SC junctions from the nine Aristolochia species, according to the organization of genes (Figure 3). Within five detected species of subgenus Siphisia, its patterns were Type I and II, while the Type III only occurred in the four species of subgenus Aristolochia. Type I was found in A. mollissima, A. macrophylla, and A. kaempferi, and was characterized by trnH gene in IR region and LSC-IRb border located in rps19-trnH spacer. Type II was only found in A. moupinensis and A. kunmingensis and refers to LSC-IRb border within the rps19 gene. The trnH gene is intact and located upstream of rpl2 in IRb region for type I and II. Type III pattern was found in the four species of subgenus Aristolochia, characterized by LSC-IRb and SSC-IRa border in the rps19-rpl2 spacer and trnH gene, respectively. The trnH gene spanned the junction between IR-LSC regions in the four species of subgenus Aristolochia. The shift of IR-LSC borders, caused by contraction and expansion of the gene trnH, is one of major Taken another two reported species (*A. debilis* and *A. contorta*) of subgenus *Aristolochia* into account, although genomic structure and size were highly conserved, the IR-SC boundary regions were variable between these nine cp genomes of *Aristolochia* (Figure 3). In general, contraction and expansion at the borders of IR regions are common evolutionary events and may cause IR size variation of plastomes [29,34–36]. The length of the IR regions of five *Siphisia* species, varying in the range of 25,664–25,700 bp, was longer than those of the four species of subgenus *Aristolochia*, which varied from 25,175 bp to 25,459 bp (Table 1) [28]. We identified three types of the IR-SC junctions from the nine *Aristolochia* species, according to the organization of genes (Figure 3). Within five detected species of subgenus *Siphisia*, its patterns were Type I and II, while the Type III only occurred in the four species of subgenus *Aristolochia*. Type I was found in *A. mollissima*, *A. macrophylla*, and *A. kaempferi*, and was characterized by *trnH* gene in IR region and LSC-IRb border located in *rps19*-*trnH* spacer. Type II was only found in *A. moupinensis* and *A. kunmingensis* and refers to LSC-IRb border within the *rps19* gene. The *trnH* gene is intact and located upstream of *rpl2* in IRb region for type I and II. Type III pattern was found in the four species of subgenus *Aristolochia*, characterized by LSC-IRb and SSC-IRa border in the *rps19*-*rpl2* spacer and *trnH* gene, respectively. The *trnH* gene spanned the junction between IR-LSC regions in the four species of subgenus *Aristolochia*.

differences between the plastomes of the subgenera Siphisia and Aristolochia. The whole gene duplication of trnH was detected in most monocots (e.g., Acorus, Phalaenopsi and Dioscorea), D. granadensis (Winteraceae) of magnoliids, and basal eudicots (Ranunculus japonica and Ranunculus macranthus) [34,37–41]. Wang et al. (2008) conducted RT-PCR assays and deduced that the duplicated trnH genes in most of non-monocots and monocots were regulated by different expression levels of promoters, and had distinct fates [37]. Within the family Aristolochiaceae, the trnH gene was located in the LSC region of S. henryi, 128 bp away from the border of LSC-IR, and was also a single copy in the six cp genomes of Asarum, but not sure the positions of the gene [29]. Furthermore, the study proposed that the low-complexity trnH region and ultimately inversion of a portion of the LSC were due to an AAT repeat. For inversion of a large portion of the LSC region, there were genes rearranged in SC-IR borders of sequenced species of Asarum, the IR boundaries of cp genomes of Asarum were highly The shift of IR-LSC borders, caused by contraction and expansion of the gene *trnH*, is one of major differences between the plastomes of the subgenera *Siphisia* and *Aristolochia*. The whole gene duplication of *trnH* was detected in most monocots (e.g., *Acorus*, *Phalaenopsi* and *Dioscorea*), *D. granadensis* (Winteraceae) of magnoliids, and basal eudicots (*Ranunculus japonica* and *Ranunculus macranthus*) [34,37–41]. Wang et al. (2008) conducted RT-PCR assays and deduced that the duplicated *trnH* genes in most of non-monocots and monocots were regulated by different expression levels of promoters, and had distinct fates [37]. Within the family Aristolochiaceae, the *trnH* gene was located in the LSC region of *S. henryi*, 128 bp away from the border of LSC-IR, and was also a single copy in the six cp genomes of *Asarum*, but not sure the positions of the gene [29]. Furthermore, the study proposed that the low-complexity *trnH* region and ultimately inversion of a portion of the LSC were due to an AAT repeat. For inversion of a large portion of the LSC region, there were genes rearranged

in SC-IR borders of sequenced species of *Asarum*, the IR boundaries of cp genomes of *Asarum* were highly variable and experienced positional shifts at borders. Such as there was an entirety of the SSC of *As. canadense* and *As. sieboldii* has been incorporated into the IR, and the boundary of the LSC-IR was found within *rpl2* or *rpl14* gene [29]. Within the species of *S. henryi*, *rps19* pseudogene existed in the IRa region, with the length of 183 bp. The *trnH*-*rps19* gene cluster had been used to distinguish monocots from other angiosperm for the organization of gene flanking the IR-SC junction [37,39]. The events of contraction or expansion of the IR regions also can be used to distinguish the species within Aristolochiaceae.

## *3.2. Inferring the Phylogeny and Species Identification of Aristolochia*

Chloroplast genomes provide abundant resources significant for evolutionary, taxonomic, and phylogenetic studies [42–44]. The whole cp genomes and protein-coding genes have been successfully used to resolve phylogenetic relationships at multiple taxonomic levels during the past decade [45,46]. Repeats can lead to changes in genomic structure, and can be investigated to population genetics of allied taxa [47–50]. Repeats in ten cp genomes revealed that the repeats had a great number, comprised of 38–80 repeats (Figure 5 and Table S4), 66 and 138 repeats were respectively detected in *A. debilis* and *A. contorta* [28]. Given the variability of these repeats between lineages, they can be informative regions for developing genomic markers for phylogenetic analysis. SSRs, known as microsatellites, are tandemly repeated DNA sequences that consist of one–six nucleotide repeat units and are ubiquitous throughout the genomes [51]. A total number of 95–142 SSRs were identified in the seven cp genomes detected (Figure 6 and Table S5). According to the analysis of high variable regions, the hotspot regions within seven cp genomes also provide sufficient information sites to reveal phylogeny structure among species of family Aristolochiaceae, especially for the spacer *ycf1* and *rpl20*, with high nucleotide diversity and under positive selection (Table 4). The *ycf1* gene could be served as the barcode of land plants, and was also recognized as the most variable regions in plastid genome [50,52]. The gene *rpl20* is an important part of protein synthesis, and is involved in translation [53]. This study will also provide a reference for phylogenomic studies of closely related lineages among *Aristolochia* and other genera.

Furthermore, we can design effective markers for clarifying the phylogenetic relationships of *Aristolochia* and elucidating the evolutionary history of species complex of *Aristolochia* at the population level, based on the analysis of SSR and SNP sites. Understanding genetic variation within and between populations plays an important role in improving genetic diversity and is essential for future adaptive changes, reproduction patterns, and its conservation [20,54,55]. The cpDNA and B-class gene PISTILLATA (*PI*) have been used to investigate taxonomy at the species complex, such as *Aristolochia kaempferi* group, and these studies revealed that its DNA barcoding and taxonomy are difficult to assess for multiple hybridization and introgression events in the group [56,57]. More genes under selection and neutral markers should be used to clarify those multiple diversification events. It will better to apply the full genome information or hyper-variation regions to elucidate the species diversity of *Aristolochia*.

## **4. Materials and Methods**
