*2.1. Identification and Characterization of Dof Genes in Physic Nut*

To extensively identify all the Dof candidate members in the physic nut genome, we used a whole-genome scanning to identify genes that encode proteins containing the Dof DNA-binding domain by both BLASTP and HMM profile search. Initially, the Dof protein sequences from *Arabidopsis thaliana* and their HMM profiles of the Dof domain were used as the BLASTP and HMMER query sequences to screen the physic nut genome. Subsequently, it was examined for the presence of the Dof domain using the SMART software and NCBI Conserved Domain database for all the Dof candidate sequences. Eventually, we identified 24 candidates of *Dof* genes in total, represented by 33 transcripts in physic nut (Table S1). Based on their gene loci, we designated each Dof protein uniquely as JcDof-1, and JcDof-2 to JcDof-24.

In addition, we systematically evaluated the basic properties of JcDof protein, including domain position, protein length, molecular weight (Mw), isoelectric point (pI), instability coefficient, and orthologous genes (Table 1). The average length of these Dof protein sequences was 339 amino acid residues and the length mainly centered on the range of 160–518 amino acid residues. Correspondingly, the molecular weights were mainly distributed from 18.2 kDa (JcDof-1) to 55.7 kDa (JcDof-6). The predicted isoelectric point of Dof proteins varied from 4.65 (JcDof-21) to 9.42 (JcDof-3). The instability coefficient of JcDof protein showed a variation from 39.4 (JcDof-17) to 61.74 (JcDof-7.3-5). The location of JcDof protein conserved domain was analyzed by SMART. It was found that the domain positions of JcDof proteins encoded by the same gene (i.e., JcDof proteins that are generated by alternative splicing of the same gene model) were similar, but quite different for those encoded by different genes.


**Table 1.** The information of the *JcDof* gene family.


**Table 1.** *Cont.*

\* These genes are regulated by alternative splicing mechanisms. Mw: Molecular weight; pI: Isoelectric point. 105631489 JcDof-19 XP\_012069011.1 18–76 249 26,497 8.26 47.16 AT3G21270.1 105630455 JcDof-20 XP\_012067660.1 129–187 465 51,091 6.8 47.74 AT5G39660.1 105629142 JcDof-21 XP\_012066060.1 28–86 287 32,762 4.65 51.26 AT1G21340.1

105628246 JcDof-22 XP\_012065018.1 71–129 315 33,921 9.23 51.88 AT2G28810.1

#### *2.2. DNA-Binding Domain Conservation Analysis of JcDof Protein* 105628152 JcDof-23 XP\_012064896.1 36–94 290 32,430 6.65 41.41 AT2G28510.1 105647749 JcDof-24 XP\_012089351.1 70–128 338 35,691 9.19 50.23 AT3G55370.3

Dof protein usually has a DNA-binding domain of approximate 40–60 amino acid residues in the N-terminus. This domain contains a highly-conserved CX2CX21CX2C single zinc-finger structure, which is essential for the zinc finger configuration and loop stability. In this study, the conservation of DNA-binding domain of JcDof proteins was analyzed. Multiple protein sequence alignments against *Dof* DNA-binding domain of JcDof proteins revealed that all of them were highly conserved. Especially, we found 20 highly-conserved (100% identical in all 33 JcDof proteins) amino acids CPRC-S–TKFCY-NNY—QPR-FCK-C in the 29 amino acid-long region which corresponded to the CX2CX21CX2C single zinc-finger structure (Figure 1). \* These genes are regulated by alternative splicing mechanisms. Mw: Molecular weight; pI: Isoelectric point. *2.2. DNA-Binding Domain Conservation Analysis of JcDof Protein*  Dof protein usually has a DNA-binding domain of approximate 40–60 amino acid residues in the N-terminus. This domain contains a highly-conserved CX2CX21CX2C single zinc-finger structure, which is essential for the zinc finger configuration and loop stability. In this study, the conservation of DNA-binding domain of JcDof proteins was analyzed. Multiple protein sequence alignments against *Dof* DNA-binding domain of JcDof proteins revealed that all of them were highly conserved. Especially, we found 20 highly-conserved (100% identical in all 33 JcDof proteins) amino acids CPRC-S–TKFCY-NNY—QPR-FCK-C in the 29 amino acid-long region which corresponded to the CX2CX21CX2C single zinc-finger structure (Figure 1).

#### To explore the phylogenetic relationships of JcDof proteins, we carried out phylogenetic analysis on Dof proteins from physic nut and other two plant species, including *Ricinus communis*, also from *2.3. Phylogenetic Analysis and Classification of JcDof Proteins*

*2.3. Phylogenetic Analysis and Classification of JcDof Proteins* 

the Euphorbiaceae family, and *A. thaliana*, as an outgroup (detailed information on all of the Dof proteins is listed in Supplementary Table S2). A phylogenetic tree was reconstructed including 24 physic nut, 21 *R. communis* and 36 *A. thaliana* Dof proteins (Figure 2). For each gene, we chose the longest protein formed by alternative splicing. The resulting phylogenetic tree was clustered into To explore the phylogenetic relationships of JcDof proteins, we carried out phylogenetic analysis on Dof proteins from physic nut and other two plant species, including *Ricinus communis*, also from the Euphorbiaceae family, and *A. thaliana*, as an outgroup (detailed information on all of the Dof proteins is listed in Supplementary Table S2). A phylogenetic tree was reconstructed including 24 physic nut, 21 *R. communis* and 36 *A. thaliana* Dof proteins (Figure 2). For each gene, we chose the longest protein formed by alternative splicing. The resulting phylogenetic tree was clustered into three major groups (A, B, and C), and they were considered to be evidentfor distinct phylogenetic lineages, which were supported by a bootstrap value over 80%. The two external nodes at the end of the same clades of phylogenetic tree were likely to represent the closest homologous gene pairs.

Of the three major groups, Group C was the first main clade, containing 19 physic nut Dof proteins, 17 *R. communis* Dof proteins, and 25 *A. thaliana* Dof proteins, which were further divided into two sub-groups, C1 and C2, supported by a bootstrap value over 40%. Group A was the second major clade with five physic nut Dof proteins, four *R. Communis* Dof proteins, and seven *A. Thaliana* Dof proteins. Group B was the minimal clade, with only four proteins. Distinguishingly, the Group B Dof proteins were only found in *Arabidopsis*, which could be explained by species/lineage-specific gene gain or loss events. We further checked the GO (Gene Ontology) annotations of these four *Arabidopsis Dof* genes, and found that comparing with the *Arabidopsis Dof* genes in other groups, two of these four genes (*At4g21030*, *At4g21050*) have some specific annotations, such as "cotyledon development", "mucilage metabolic process involved in seed coat development", "regulation of secondary shoot formation", and "fruit development", which implied the possible function divergence of Dof genes in group B (Supplementary Table S3 for detailed information). The phylogenetic tree showed that Dofs in the Group A and C were duplicated several times before the divergence of these three species, and were highly conserved among *J. curcas*, *R. communis*, and *A. thaliana*. In addition, the physic nut Dof proteins were more closely related, evolutionarily, to *R. communis* than to the *Arabidopsis* Dof proteins. three major groups (A, B, and C), and they were considered to be evidentfor distinct phylogenetic lineages, which were supported by a bootstrap value over 80%. The two external nodes at the end of the same clades of phylogenetic tree were likely to represent the closest homologous gene pairs. Of the three major groups, Group C was the first main clade, containing 19 physic nut Dof proteins, 17 *R. communis* Dof proteins, and 25 *A. thaliana* Dof proteins, which were further divided into two sub-groups, C1 and C2, supported by a bootstrap value over 40%. Group A was the second major clade with five physic nut Dof proteins, four *R. Communis* Dof proteins, and seven *A. Thaliana*  Dof proteins. Group B was the minimal clade, with only four proteins. Distinguishingly, the Group B Dof proteins were only found in *Arabidopsis*, which could be explained by species/lineage-specific gene gain or loss events. We further checked the GO (Gene Ontology) annotations of these four *Arabidopsis Dof* genes, and found that comparing with the *Arabidopsis Dof* genes in other groups, two of these four genes (*At4g21030*, *At4g21050*) have some specific annotations, such as "cotyledon development", "mucilage metabolic process involved in seed coat development", "regulation of secondary shoot formation", and "fruit development", which implied the possible function divergence of Dof genes in group B (Supplementary Table S3 for detailed information). The phylogenetic tree showed that Dofs in the Group A and C were duplicated several times before the divergence of these three species, and were highly conserved among *J. curcas*, *R. communis*, and *A. thaliana*. In addition, the physic nut Dof proteins were more closely related, evolutionarily, to *R. communis* than to the *Arabidopsis* Dof proteins.

*Int. J. Mol. Sci.* **2018**, *19*, x FOR PEER REVIEW 5 of 15

**Figure2.** Phylogenetic relationships among *J. curcas*, *A. thaliana*, and *R. communis* Dof proteins. The neighbor-joining tree was created using the MEGA6.0 program (bootstrap value set at 1000). Thirtysix (36) AtDof proteins marked with black pentacle, 24 JcDof proteins marked with yellow pentacle, and 21RcDof proteins marked with red pentacle. The resulting phylogenetic tree was clustered into three major groups (A, B, and C), which were supported by a bootstrap value over 80%. The Dof proteins in Group C were further divided into two sub-groups, C1 and C2, supported by a bootstrap value over 40%. The detailed information of all the Dof proteins is listed in Supplementary TableS2. **Figure 2.** Phylogenetic relationships among *J. curcas*, *A. thaliana*, and *R. communis* Dof proteins. The neighbor-joining tree was created using the MEGA6.0 program (bootstrap value set at 1000). Thirty-six (36) AtDof proteins marked with black pentacle, 24 JcDof proteins marked with yellow pentacle, and 21RcDof proteins marked with red pentacle. The resulting phylogenetic tree was clustered into three major groups (A, B, and C), which were supported by a bootstrap value over 80%. The Dof proteins in Group C were further divided into two sub-groups, C1 and C2, supported by a bootstrap value over 40%. The detailed information of all the Dof proteins is listed in Supplementary Table S2.
