**3. Results**

### *3.1. Genome-Wide Identification, Classification, and Conserved Motif Analysis of PPR Genes in Watermelon*

Genome-wide search analysis revealed that there were 464 putative PPR genes present in watermelon genome (97103 v2). After analyzing the domain and P motif patterns, a total of 422 PPR genes were predicted and identified in this study. These watermelon PPRs were designated as *Citrullus lanatus* PPR (*ClaPPR*) from *ClaPPR1* to *ClaPPR422* proteins in the order of their chromosomal position and accession number in the CuGenDB database (http://cucurbitgenomics.org). The chromosomal location, accession number, length of ORF and protein, and number of introns are listed in Table S2. The identified number of *ClaPPR* genes on each chromosome ranged from 20 to 56. (Figure 1A). The determined exon-intron organization of 422 *ClaPPR* genes showed that 71.8% of them were intronless (303/422), while the remaining were with 1 intron (14.7%), 2–5 introns (9.2), and ≥6 introns (4.3%) as shown in Figure 1B. Analysis of repeated motifs structures in *ClaPPRs* indicated that they could be classified into P and PLS subfamilies, containing approximately equal number of genes, representing 46.4% (197 of 422) and 53.6% (225 of 422) PPR proteins, respectively (Figure 1C). In the PLS subfamily, DYW and E2 subgroups both accounted for almost half of the genes (representing 100 of 197 and 97 of 197, respectively), followed by PLS (15), E+ (10), and E1 (3) with the least number of *ClaPPR* genes (Figure 1C). Based on the tandem array of PPR motifs, in watermelon, the estimated number of PPR motifs per protein characterized was 3–27 motifs. A basic motif organization of several typical *ClaPPR* proteins representing the subfamily and subgroups are shown in Figure 1D. A strong peak was noted in the distribution at around 7–12 and 13–17 PPR motifs in P- and PLS-class proteins of watermelon, respectively (Table S2).

Analysis of conserved motifs in PPR proteins have been suggested to rule out the common molecular functions of PPR genes in different subgroups [6]. Therefore, we investigated the conserved motifs in *ClaPPR* proteins using MEME Suite (Figure S1), and results indicated that there were 25 motifs in the 422 *ClaPPR* proteins (Figure S2). Almost all of the *ClaPPR* proteins contained 16 of the 25 motifs except these nine motifs (Motifs 3, 5, 7, 13, 17, 20, 21, 24, and 25), indicating that these *ClaPPR* proteins might have a conserved domain. In addition, the majority of these motifs were analyzed mostly in P, DYW, and E2 (197, 100, and 97 *ClaPPR* genes, respectively) subgroups because they were more dominant in total number than others: E+, E1, and PLS were least in number (Figure 1C). We also found out that the different subgroups possessed specific motifs; for example, motif 21 and 25 exist only in the P subfamily. In the PLS subfamily, the DYW subgroup mainly contained motif 5, 13, and 20. Similarly, motif 24 was mainly present in the E2 subgroup. Some motifs were found to be conserved in two subgroups, such as motif 3, 7, and 17, which mainly exist in DYW and E2 subgroups (Figure S2).

Duplication events of whole genome (tandem) or segmental had been portrayed as a major factor responsible for the expansion of a gene in gene families, including PPR and their subsequent evolution in plants [5,41]. Therefore, we investigated gene duplication events to determine the expansion mechanism of the *ClaPPR* members in watermelon. A total of 11 segmentally duplicated *ClaPPR* gene pairs (5 from the P subfamily and 6 from the PLS subfamily) were identified in the watermelon genome (Table S3). All the gene pairs were inter-chromosomal, involving two different chromosomes (Figure 2). Further analysis showed that the PLS subfamily consisted of three special duplicated gene pairs involving different subgroups: *ClaPPR205*, *ClaPPR264*, and *ClaPPR221*, which belong to DYW and *ClaPPR366* (E2), *ClaPPR307* (E+), and *ClaPPR286* (E2) of the E-subgroups (Table S3). In addition, the calculated ratio of non-synonymous (Ka) and synonymous (Ks) substitution ratios (Ka/Ks or ω) for

these 11 duplicated gene-pairs were found to be ω < 1, suggesting that these duplicated *ClaPPR* gene pairs were under purifying selection.

**Figure 1.** Number, distribution, and structures of *Pentatricopeptide-repeat* (*PPR*) genes in watermelon. (**A**) Number of *ClaPPR* genes in each chromosome. (**B**) Number of introns in *ClaPPR* genes. (**C**) Number of *ClaPPR* proteins belonging to the P subfamily and PPR-like long and short (PLS) subfamily with subgroups. (**D**) Typical motif structures of *ClaPPR* proteins from different subfamilies and subgroups.

### *3.2. Chromosomal Distribution and Duplication of PPR Members in Watermelon*

To investigate the chromosomal distribution of *ClaPPR* genes, the detailed position of *ClaPPR1*–*422* genes on watermelon (97103 v2) chromosomes were obtained from the CuGenDB database. The results showed that the identified 422 *ClaPPR* genes were distributed unevenly and widely in all the 12 chromosomes; for example, chromosome 5 and 4 were found to have the largest and fewest *ClaPPR* genes at 13.3% and 4.7%, respectively (Figures 1A and 2). PPR genes usually appear in clusters or individually on chromosome [3,5]. In the present study, most of the *ClaPPR* genes clustered together either proximally or distally with very few *ClaPPR* genes positioned in the pericentromeric region of the chromosomes, indicating gene duplications during evolution.

### *3.3. Phylogenetic Analysis of PPR Members in Watermelon*

In order to determine the evolutionary relationships among the *ClaPPR* family members, we constructed a phylogenetic tree based on the deduced full-length amino acid sequences of the 422 ClaPPR proteins along with the 48 PPR proteins from Arabidopsis. As expected, the tree was divided into two distinct clusters: one containing the P subfamily and the other containing the PLS subfamily (Figure 3). Interestingly, the PLS subfamily member *ClaPPR53* was clustered into the P subfamily members; similarly, several P subfamily members, including *ClaPPR338*, *ClaPPR368*, and *ClaPPR394*, were clustered into the PLS subfamily members regardless of the corresponding structure of their repeated motifs. This finding is consistent with the findings from the phylogenetic analysis of PPR proteins in rice and poplar, where P or PLS subfamily members of PPR proteins were found to be clustered into their opposite members [3,7]. These deviations in clustering could be explained by the shared structural similarities of the C-terminal motifs between P and PLS members which might have arisen via duplication of PPR motif coding regions during evolution of the aforementioned plant species including watermelon.

**Figure 2.** Putative chromosomal localization and gene duplication of *ClaPPRs* in watermelon. Collinear blocks in whole watermelon genome are indicated by grey lines, while the distributions of segmentally duplicated *ClaPPR* pairs are connected with red lines.
