*4.3. Gene Annotation*

CpGAVAS [45] was used to annotate the gene and the final annotation results were obtained by artificial correction. First, the results of Blastx, BLASTn, protein2genome, and est2genome [46] were integrated to predict the protein coding gene and the rRNA gene. Then, tRNA was identified using tRNAscan [47] and ARAGORN [48]. Finally, the reverse repeat region IR was identified using Vmatch [49]. Chloroplast genome mapping was performed using OrganellarGenomeDRAW [50] (http://ogdraw.mpimp-golm.mpg.de/index.shtml) based on the annotated results.

The protein and coding sequences (CDS) of each sample were extracted from the annotated files of each sample and the pairwise protein sequences aligned using MUSCLE software. The aligned protein sequences were converted to DNA sequences using PAL2NAL. KaKs\_Calculator2.0 [51] software (https://sourceforge.net/projects/kakscalculator2) was then used to calculate Ka/Ks, which was used to analyze the selection pressure on different *Pyrus* species during the evolutionary process. Chloroplast genome sequences of 36 Rosaceae species were selected from NCBI and 57 common protein coding genes were used to explore the evolution of the *Pyrus* chloroplast genome, using *Arabidopsis thaliana* as the outgroup. The taxonomic status was confirmed. The annotated files of all of the genomes were downloaded from NCBI and the protein sequences of any genes shared among the chloroplast genomes of all of the species were extracted. Each gene was placed in a file in which each genome contained only one protein sequence. MUSCLE was used to make multiple sequence alignments for each file. The first and last sequences were aligned according to the genome source to obtain a growing alignment sequence: final.fa. MEGA7.0 software was then used to construct a neighbor-joining tree and the CGView Server was used to analyze the genetic variation among the chloroplast genomes of the five *Pyrus* species.
