**Table 5.** Genes from the chloroplast genomes of *Pyrus*.


Compared to *P. hopeiensis* HB-1, *psaJ*, *rpl20*, *rps18*, and *ycf1* in *P. hopeiensis* HB-2 were subject to negative selection, and no positive selection gene (Figure 2 and Table S1–S4) was found. In *P. betulifolia*, *atpE*, *ndhF*, *ndhI*, *rps18*, and *ycf2* were subject to positive selective pressure, whereas *ndhD*, *ndhH*, *ndhK*, *rpl20*, *rpl22*, *rpoC2*, *rps11*, and *ycf1* were subject to negative selection. The *psbC*, *psbK*, *rpoA*, *rps14*, *rps18*, and *ycf2* genes were subject to positive selective pressure in *P. communis* L. cv. Early Red Comice. Moreover, *accD*, *atpA*, *atpE*, *cemA*, *matK*, *ndhA*, *ndhD*, *ndhF*, *ndhH*, *petA*, *psaA*, *psaB*, *rbcL*, *rpl22*, *rpoB*, *rpoC2*, *rps11*, *rps2*, *rps3*, and *ycf4* were subject to negative selection. In *P. ussuriensis* Maxin. cv. Jingbaili, *atpE*, *atpI*, *cemA*, *ndhF*, *rps18*, and *ycf2* were subject to positive selective pressure, and *ndhD*, *ndhH*, *psaA*, *psbC*, *rpl20*, *rpl22*, *rpoC2*, *rps11*, and *ycf4* were subject to negative selection. Compared with *P. betulifolia*, *atpE* was subject to positive selective pressure in *P. betulifolia* and *P. ussuriensis* Maxin. cv. Jingbaili, whereas *atpE* was subject to negative selection in *P. communis* L. cv. Early Red Comice. This shows that the chloroplast genome of *Pyrus* has been affected by different environmental pressures during evolution, which may account for the different gene numbers among the five *Pyrus* species. *Int. J. Mol. Sci.* **2018**, *19*, x FOR PEER REVIEW 8 of 19

**Figure 2.** Ka/Ks value of five *Pyrus* species. (**a**)‒(**d**)represent the Ka/Ks values of *Pyrus betulifolia*, *Pyrus communis L*.cv.Early Red Comice, *Pyrus ussuriensis Maxin*.cv. Jingbaili, and *Pyrus hopeiensis* HB-2, respectively, with respect to *Pyrus hopeiensis* HB-1. **Figure 2.** Ka/Ks value of five *Pyrus* species. (**a**)–(**d**) represent the Ka/Ks values of *Pyrus betulifolia*, *Pyrus communis* L. cv. Early Red Comice, *Pyrus ussuriensis* Maxin. cv. Jingbaili, and *Pyrus hopeiensis* HB-2, respectively, with respect to *Pyrus hopeiensis* HB-1.

#### *2.5. Indel Identification and Relationship of the Five Pyrus cp Genomes 2.5. Indel Identification and Relationship of the Five Pyrus cp Genomes*

The nucleotide bases in coding and non-coding regions have different evolutionary mutation rates. DNA variations located in coding regions can lead to large phenotypic and functional variations; moreover, these often have a slower mutation rate, making them suitable for phylogenetic studies of higher order elements (families, orders, and higher). Mutations in non-coding regions have little effect on phenotype and fewer functional restrictions, and as they take no part in the transcription/translation process, they have a relatively high nucleotide replacement rate and hence The nucleotide bases in coding and non-coding regions have different evolutionary mutation rates. DNA variations located in coding regions can lead to large phenotypic and functional variations; moreover, these often have a slower mutation rate, making them suitable for phylogenetic studies of higher order elements (families, orders, and higher). Mutations in non-coding regions have little effect on phenotype and fewer functional restrictions, and as they take no part in the transcription/translation process, they have a relatively high nucleotide replacement rate and hence rapid evolution, making them suitable for the phylogenetic study of lower order elements (species, genus) [14].

rapid evolution, making them suitable for the phylogenetic study of lower order elements (species, genus) [14]. The chloroplast genome data of five *Pyrus* species were compared with those of *P. hopeiensis* HB-1 by multiple sequence alignment using MAFFT. All differentially expressed sites were extracted The chloroplast genome data of five *Pyrus* species were compared with those of *P. hopeiensis* HB-1 by multiple sequence alignment using MAFFT. All differentially expressed sites were extracted using a script from the comparison results, and differences in sites of indels ≥ 5 bp were screened

were located in the LSC region; three were located in gene regions and 12 in intergenic regions. Among these, the longest was located in the *ndh*-*trnM-CAT* region, and as many as six mutations were located in the intergenic region *rpl18*‒*rps20*. A total of 96 mutation sites were detected in the other four *Pyrus* species, 81 of which were located in the LSC region of the chloroplast genome and 11 in the SSC region, whereas only two mutation sites were found in the IRa and IRb regions in *P. communis* L. cv. Early Red Comice. There were more mutation sites in the SC region, and the IR region was more conserved. Indels were mainly located in the intergenic regions, and three indel loci were detected in the intron region (*rpl22*, *trnN-ATT*, *ndhA*). Because the protein-coding region is arranged by triplet codons, the tolerance of indels is poor. Therefore, only five indel loci were detected in the protein-coding region (*trnL-TAT*, *trnN-ATT*, *rps18*, *rps19* and *ycf1*), but no indel loci were detected in the rRNA region. A comparison of the occurrence of these indel loci among the four *Pyrus* species

using a script from the comparison results, and differences in sites of indels ≥ 5 bp were screened out.

out. The location of different chloroplast genome sites was determined and ggplot in R was used to create graphic plots that were then optimized using AI. The results indicated 15 mutation sites in *P. hopeiensis* (Figure 3), which included 11 insertion and four deletion sites. All of these mutation sites were located in the LSC region; three were located in gene regions and 12 in intergenic regions. Among these, the longest was located in the *ndh*-*trnM-CAT* region, and as many as six mutations were located in the intergenic region *rpl18*–*rps20*. A total of 96 mutation sites were detected in the other four *Pyrus* species, 81 of which were located in the LSC region of the chloroplast genome and 11 in the SSC region, whereas only two mutation sites were found in the IRa and IRb regions in *P. communis* L. cv. Early Red Comice. There were more mutation sites in the SC region, and the IR region was more conserved. Indels were mainly located in the intergenic regions, and three indel loci were detected in the intron region (*rpl22*, *trnN-ATT*, *ndhA*). Because the protein-coding region is arranged by triplet codons, the tolerance of indels is poor. Therefore, only five indel loci were detected in the protein-coding region (*trnL-TAT*, *trnN-ATT*, *rps18*, *rps19* and *ycf1*), but no indel loci were detected in the rRNA region. A comparison of the occurrence of these indel loci among the four *Pyrus* species revealed 15 indel loci in the chloroplast genome of *P. hopeiensis*, 32 in *P. ussuriensis* Maxin. cv. Jingbaili, 57 in *P. communis* L. cv. Early Red Comice, and 31 in *P. betulifolia*. The insertion or deletion frequency in the chloroplast genome of *P. hopeiensis* HB-2 was less than that in *P. hopeiensis* HB-1. The *psbA*-*trnQ\_TTG* and *rpl18*-*rps20* intergenic regions were the most variable regions with seven loci, followed by *trnT-TGT*\_*trnF\_GAA* (six) and the *trnI-TAT* gene-coding region (six). The largest indels were located in *psbA*\_*trnQ-TTG* in the chloroplast genome of *P. communis* L. cv. Early Red Comice. *Int. J. Mol. Sci.* **2018**, *19*, x FOR PEER REVIEW 9 of 19 revealed 15 indel loci in the chloroplast genome of *P. hopeiensis*, 32 in *P. ussuriensis* Maxin. cv. Jingbaili, 57 in *P. communis* L. cv. Early Red Comice, and 31 in *P. betulifolia*. The insertion or deletion frequency in the chloroplast genome of *P. hopeiensis* HB-2 was less than that in *P. hopeiensis* HB-1. The *psbA*-*trnQ\_TTG* and *rpl18*-*rps20* intergenic regions were the most variable regions with seven loci, followed by *trnT-TGT*\_*trnF\_GAA* (six) and the *trnI-TAT* gene-coding region (six). The largest indels were located in *psbA*\_*trnQ-TTG*in the chloroplast genome of *P. communis* L*.* cv. Early Red Comice.

**Figure 3.** Indels (≥5 bp) identified based on multiple sequence alignment of five *Pyrus* cp genomes. Insertions are shown above and deletions below the horizontal axis. Indel distribution was positioned using *Pyrus hopeiensis* HB-1 as a reference. **Figure 3.** Indels (≥5 bp) identified based on multiple sequence alignment of five *Pyrus* cp genomes. Insertions are shown above and deletions below the horizontal axis. Indel distribution was positioned using *Pyrus hopeiensis* HB-1 as a reference.

#### *2.5. Codon Preference Analysis 2.6. Codon Preference Analysis*

the development of evolutionary and phylogenetic models.

This is common in the chloroplast genomes of higher plants [17–21].

*2.6. Comparison of the Genome Structure in Rosaceae cp Genomes*

Codons have an important role in the transmission of genetic information. Codon use is not equal in many species, and the phenomenon of a specific codon use frequency being higher than that of its synonymous codon is known as codon preference [15]. Codon preference is formed during the long-term evolution of organisms, with different species having different codon preferences. Codon use is affected by natural selection, mutagenesis, tRNA abundance, the composition of base groups, hydrophilicity of codons, gene length, and expression levels [16]. Analysis of the codon use Codons have an important role in the transmission of genetic information. Codon use is not equal in many species, and the phenomenon of a specific codon use frequency being higher than that of its synonymous codon is known as codon preference [15]. Codon preference is formed during the long-term evolution of organisms, with different species having different codon preferences. Codon use is affected by natural selection, mutagenesis, tRNA abundance, the composition of base groups, hydrophilicity of codons, gene length, and expression levels [16]. Analysis of the codon use

encoding CDS and proteins. According to the full-length CDS criterion, sequences with lengths < 300 nt were deleted. The codon-use frequency of each genome was extracted from the annotated files of each genome and the corresponding frequency ratio was calculated. The final statistical results were clustered and mapped using the pheatmap package in R. The results showed obvious codon use preferences for both types of *P. hopeiensis*, among which ATT, AAA, GAA and AAT, and TTT were used most frequently (Figure 4). Statistical analysis of all the codons of *P. hopeiensis*, the three other *Pyrus* species, and the other Rosaceae showed a high A/T preference in the third chloroplast codon.

preferences of a species improves our understanding of the transmission of genetic information and

preferences of a species improves our understanding of the transmission of genetic information and the development of evolutionary and phylogenetic models. have different evolutionary histories and genetic backgrounds, the chloroplast genome size, genome structure, and gene numbers vary. Insertion/deletion is the most frequent type of microstructural variation in the chloroplast genome, and it occurs frequently in some segments where the variation

sequence, and composition of their genes are conserved. However, because different plant groups

*Int. J. Mol. Sci.* **2018**, *19*, x FOR PEER REVIEW 10 of 19

The annotated files of plant genomes, including *P. pashia*, *P. pyrifolia*, *Malus prunifolia*, *Prunus mume*, and *Chaenomeles japonica*, were selected from the NCBI database, including the sequence files encoding CDS and proteins. According to the full-length CDS criterion, sequences with lengths <300 nt were deleted. The codon-use frequency of each genome was extracted from the annotated files of each genome and the corresponding frequency ratio was calculated. The final statistical results were clustered and mapped using the pheatmap package in R. The results showed obvious codon use preferences for both types of *P. hopeiensis*, among which ATT, AAA, GAA and AAT, and TTT were used most frequently (Figure 4). Statistical analysis of all the codons of *P. hopeiensis*, the three other *Pyrus* species, and the other Rosaceae showed a high A/T preference in the third chloroplast codon. This is common in the chloroplast genomes of higher plants [17–21]. is high, such as *trnH*-*psbA* and *trnS-G*. In Rosaceae, an insertion/deletion of 277 bp in the intergenic region of the *trnS-G* gene was reported in peach plants [22], and an insertion/deletion of 198 bp in the intergenic region of *trnL-F* was identified in *P. mume* [23]. The collinear method was used to analyze and compare the chloroplast genomes of the two genotypes of *P. hopeiensis*, the other three sequenced *Pyrus*, and other related Rosaceae (*P. pashia*, *P. pyrifolia*, *P. spinosa*, *M*. *prunifolia*, *P. mume*, and *C*. *japonica*). The results showed optimal collinearity between *P. hopeiensis* HB-1 and *P. hopeiensis* HB-2, and only a few sites contained insertions and deletions (Figure 5). Compared with the other Rosaceae, the genome structure and gene sequences were highly conserved, with more linear relationships indicating high chloroplast genome homology among the different plants.

**Figure 4.** Codon distribution of all merged protein-coding genes. Red indicates a higher frequency and blue indicates a lower frequency. **Figure 4.** Codon distribution of all merged protein-coding genes. Red indicates a higher frequency and blue indicates a lower frequency.
