*Article* **Genome-Wide Analysis of the DYW Subgroup PPR Gene Family and Identification of** *GmPPR4* **Responses to Drought Stress**

**Hong-Gang Su 1,2,**†**, Bo Li 1,**†**, Xin-Yuan Song 3,**†**, Jian Ma 4, Jun Chen 1, Yong-Bin Zhou 1, Ming Chen 1, Dong-Hong Min 2, Zhao-Shi Xu 1,\* and You-Zhi Ma 1,\***


Received: 25 September 2019; Accepted: 6 November 2019; Published: 12 November 2019

**Abstract:** Pentatricopeptide-repeat (PPR) proteins were identified as a type of nucleus coding protein that is composed of multiple tandem repeats. It has been reported that *PPR*genes play an important role in RNA editing, plant growth and development, and abiotic stresses in plants. However, the functions of PPR proteins remain largely unknown in soybean. In this study, 179 DYW subgroup *PPR* genes were identified in soybean genome (*Glycine max* Wm82.a2.v1). Chromosomal location analysis indicated that DYW subgroup *PPR* genes were mapped to all 20 chromosomes. Phylogenetic relationship analysis revealed that DYW subgroup *PPR* genes were categorized into three distinct Clusters (I to III). Gene structure analysis showed that most *PPR* genes were featured by a lack of intron. Gene duplication analysis demonstrated 30 *PPR* genes (15 pairs; ~35.7%) were segmentally duplicated among Cluster I *PPR* genes. Furthermore, we validated the mRNA expression of three genes that were highly up-regulated in soybean drought- and salt-induced transcriptome database and found that the expression levels of *GmPPR4* were induced under salt and drought stresses. Under drought stress condition, *GmPPR4-*overexpressing (*GmPPR4*-OE) plants showed delayed leaf rolling; higher content of proline (Pro); and lower contents of H2O2, O2 − and malondialdehyde (MDA) compared with the empty vector (EV)-control plants. *GmPPR4*-OE plants exhibited increased transcripts of several drought-inducible genes compared with EV-control plants. Our results provided a comprehensive analysis of the DYW subgroup *PPR* genes and an insight for improving the drought tolerance in soybean.

**Keywords:** Pentatricopeptide-repeat (PPR) proteins; genome-wide analysis; drought responses; hairy root assay; soybean

#### **1. Introduction**

The pentatricopeptide repeat (PPR) proteins containing tandem 30–40 amino acid sequence motifs, constitute a large gene family in plants [1]. Typically, PPR proteins are classed into two major subfamilies (P and PLS). The P subfamily proteins contain only the canonical P motif (35 amino acid residues); the PLS subfamily proteins are confined to contain a series of PLS triplets (L, long variants of P; S, small variants of P) [2]. L motif and S motif contain 31 and 35 or 36 amino acid residues, respectively, which are considered to be variations of the P motif [2]. Most PPR proteins are highly conserved at the C-terminus and usually have three conserved domains at the C-terminus, E, E+ and DYW domains, respectively [3,4]. Based on these C-terminal domains, the PLS subfamily is divided into four subgroups, PLS subgroup, E subgroup, E+ subgroup, and DYW subgroup [2]. The PPR proteins also contain a distinctive feature that is essentially free of intron. Although members of the PPR family are widely distributed in plants and are large in number, they perform special functions [2].

So far, a considerable amount of PPR proteins have been experimentally identified from various plant species. [5]. For example, 441 and 626 PPR members have been identified in the *A. thaliana* and *Populus trichoccapa* genome, respectively [5–7]. A large number of studies showed that PPR proteins played roles in the growth and development of plants and the formation of organelles [8]. It can participate in the fertility restoration of cytoplasmic male sterility, post-transcriptional processing of RNA and adversity defense [2,7,9–12]. In maize, EMP12 is targeted to the mitochondria and is essential for the splicing of three *nad2* introns and seed development, and the mutant in *EMP12* caused defects complex I [13]. It was reported that the nuclear *OsNPPR1* encoded a PPR protein, which was involved in the Regulation of mitochondrial development and endosperm development. The mutant named *fgr1* exhibited a lower content of starch and delayed seedling growth [14]. Rice *PPS1* encodes a DYW motif-containing PPR protein, which is important for C-to-U RNA editing of *nad3* transcripts. Knock-down and Knock-out of *PPS1* revealed a significant decrease in the editing efficiency of C-to-U at five editing sites of *nad3* transcripts, which resulted in some pleiotropic phenotypes, such as dwarfing, developmental retardation and stunted development in vegetative stages; partial pollen was sterile at the reproductive stage [15]. A chloroplast-localized P-class PPR protein, PpPPR\_21, was identified as an essential protein for the accumulation of a stable *psbl-ycf12* mRNA. The knockout *ppr21* mutants displayed decreasing protonemata growth and lower photosynthetic activity [16].

Of particular note, several studies have recently reported roles for *PPR* genes during abiotic stress [17]. For example, *PPR96* from *A. thaliana* was involved in the response to salt, ABA and oxidative stress, and altered translational levels of abscisic stress responsive genes, implying that *PPR96* may take part in the regulation of plant tolerance to abiotics [18]. PGN, an *A. thaliana* mitochondria-localized PPR protein, was reported to play a role in regulating mitochondrial reactive oxygen species balance in abiotic and biotic stress responses. Inactivation of *PGN* displayed hypersensitivity to ABA, glucose, and salinity. Loss of *PGN* resulted in enhanced accumulation of reactive oxygen species (ROS) in seedlings, and the *pgn* mutant notably elevated levels of *ABI4* and *ALTERNATIVE OXIDASE1a* [19]. In rice, *cisc(t)* encoded a PPR protein, which was necessary for chloroplast development in early stages under cold stress [20]. The *A. thaliana ABA overly sensitive 5 (abo5)* mutant showed growth retardation and accumulated increased levers of H2O2 in root tips [21]. *AtPPR40* was reported to be involved in salt, ABA and oxidative stress [22].

Previous studies have emphasized the role of *PPR* genes in the growth and development of plants, and several *PPR* genes performed potential roles against abiotic stresses. To date, little information about *PPR* genes relating to abiotic stress response mechanisms in soybean has been reported. Here, a comprehensive genome-wide analysis of the DYW subgroup PPR gene family has been accomplished in soybean. We explored the characteristics of 179 DYW subgroup *PPR* genes, including intron-exon organization, chromosomal location and phylogenetic relationships. The DYW subgroup *PPR* genes were divided into three discrete groups (Cluster I to III). An analysis of a complete set of the Cluster I PPR genes/proteins was performed, including classification, chromosomal location, orthologous relationships, duplication analysis and tissue-specific expression patterns. *GmPPR4* was further investigated owing to significant up-regulation by drought and salt stresses. Our results showed that overexpression of *GmPPR4* improved tolerance to drought stress in soybean. Our study provided an insight into the foundation of the *GmPPR4* gene in abiotic stress responses.

#### **2. Result**

#### *2.1. Identification of DYW Subgroup PPR Genes in the Soybean Genome*

In our study, 205 proteins were identified in the soybean genome (*Glycine max* Wm82.a2.v1). The candidate members were examined for the required DWY domain, which was unique to DYW subgroup PPR proteins. SMART and Pfam databases were used to verify the presence of the conserved domain in all 205 DYW subgroup PPR proteins, and the results showed that a total of 197 members were identified. Except for the presence of conserved PPR domains, there are great differences in the size and physicochemical properties of the DYW subgroup *PPR* genes (Table S1). The statistical results showed that the amino acid sequence length of the 179 DYW PPR proteins varied from the lowest of 118 to the highest of 1431, the isoelectric point (p*I*) varied from 5.28 to 9.36, and the molecular weight (MW) ranged from 13.7 kDa to 160.3 kDa. The detailed information of the 179 DYW subgroup *PPR* gene sequences is provided in Table S1. For convenience, the 179 DYW subgroup *PPR* genes were named from *GmPPR1* to *GmPPR179* according to the order of their chromosomal locations [23,24].

A considerable amount of research showed that the vast majority of *PPR* genes were found to be lacking or containing several introns in various plant species [5–7]. The intron numbers present within the ORF of each of DYW subgroup *PPR* genes in soybean were decided for analyzing their exon-intron organization. As showed in Figure 1, 59% (105/179) of DYW subgroup *PPR* genes were predicted to lack introns, 17% (30/179) with one intron, 22% (40/179) with two to five introns, and 2% (4/179) with six or more introns.

**Figure 1.** Relative proportions of intron-containing *PPR* genes in soybean. (**A**) Number of introns in the 179 DYW subgroup *PPR genes*. (**B**) Number of introns in the 84 Cluster I *PPR genes*.

#### *2.2. Chromosomal Distribution, Phylogenetic Analysis and Multiple Sequence Alignment*

Our study showed 179 DYW subgroup *PPR* genes were distributed widely and unevenly in all the 20 soybean chromosomes (Figure 2). Chromosome 8 of soybean contained the highest number of DYW subgroup *PPR* genes, while fewer numbers of DYW subgroup *PPR* genes were located on chromosome 14 and 19. The detailed position of each DYW subgroup of *PPR* genes on the soybean chromosomes was obtained from Phytozome (Table S1).

#### *Int. J. Mol. Sci.* **2019**, *20*, 5667

**Figure 2.** Chromosomal distribution of the 179 DYW subgroup *PPR* genes in soybean. (**A**) The physical location of each member. (**B**) *PPR* genes distribution on 20 soybean chromosomes.

To analyze the evolutionary relationship among DWY subgroup *PPR* genes, the full-length amino acid sequences of the 179 DYW subgroup PPR proteins were aligned and then the phylogenetic tree was conducted using the Maximum likelihood method by MEGA7.0 software (Figure 3). The phylogenetic analysis categorized the DYW subgroup *PPR* genes into three distinct clusters (I to III) comprising of 84, 65 and 30 genes, respectively. In order to isolate drought-inducible DYW subgroup *PPR* genes in soybean, we analyzed the expression patterns of the DYW subgroup *PPR* genes in soybean transcriptome database. We found that three DWY subgroup *PPR* genes were highly up-regulated by drought and salt stresses, including *GmPPR4* (Glyma.01G011700), *GmPPR18* (Glyma.02G174500), and *GmPPR111* (Glyma.11G008800). Interestingly, they all clustered into the Cluster I, implying that Cluster I *PPR* genes may be involved in the abiotic stress of soybean, therefore, we further analyzed the genetic structure and motif of Cluster I *PPR* genes.

**Figure 3.** Phylogenetic tree of DYW subgroup *PPR* genes. The complete amino acid sequences of the 179 DYW subgroup PPR proteins were aligned by ClustalW and Maximum-likelihood with MEGA7. Three discrete groups (Cluster I to III) were highlighted in different colors.

To identify conservation of PPR proteins in dicot plant species, six reported DYW proteins of *A. thaliana* and six randomly selected proteins of soybean were used to conduct the multiple sequence alignment, and the C-terminal region is displayed in Figure 4. Results showed that PPR proteins were relatively conserved across *A. thaliana* and soybean, however, only the significant short arrays (no more than four amino acid residues) were absolutely conservative, and 11 members harbored the conservative DYW sequence in C-terminal region that have previously been described as characteristic features of DYW subgroup PPR proteins. In addition, a phylogenic tree was conducted to reveal the relations of 12 selected proteins (Figure S1).


**Figure 4.** Multiple sequence alignment of 12 DYW subgroup PPR proteins from *A. thaliana* and soybean by DNAMAN. Black or blue shadings represent the 100% or >75% similarity of amino acids. Red rectangle marks highly core short signatures.

#### *2.3. Gene Structure and Motif Composition of Soybean Cluster I PPR Genes*

The exon-intron organization of the 84 Cluster I *PPR* genes was conducted to provide valuable information for the structure diversity evolution of the PPR family in soybean. As showed in Figure 5 and Table S1, they almost contained very few introns, which was similar to the previous results (Figure 1). However, the result showed that several genes contained more than 10 introns. Fourteen and 17 introns were found in Glyma.13G220400 and Glyma.15G092000, respectively. Many paralogs exhibited the same numbers and sizes of introns such as gene clusters of Glyma.07G057000, Glyma.16G026000, Glyma.02G217200, and Glyma.14G184600; However, only a few of them showed intron gain/loss phenomenon such as Glyma.03G205100 and Glyma.19G202500. On the whole, the intron distribution and the intron phase are in accordance with the alignment cluster of the Cluster I *PPR* genes.

Also, the Multiple EM for Motif Elicitation (MEME) program was used to search the conserved motifs which were shared with the Cluster I proteins (Figure 6). A total of 10 distinct conserved motifs named 1 to 10, were found. The motifs were defined based on sequence conservation without knowing its structure or function. Motif 1, the most typical DYW domain, comprised of 46 amino acids; the sequence (KNLRVCGDCHSAIKLISKIYNREIIVRDRNRFHHFKBGSCSCGDYW) contains a highly conserved C-terminal region (DYW) (Figure 7). However, this conservative motif is not included in the five proteins, including GmPPR28 (Glyma.03G238000), GmPPR72 (Glyma.08G117400), GmPPR91 (Glyma.09G154200), PPR103 (Glyma.10G093800), and GmPPR165 (Glyma.18G225600), which may be due to the fact that some PPR proteins are truncated at the C-terminus, lacking part, most of or the entire DYW domain. Proteins with higher homology often have the same conserved domain composition, which corresponds to the results of the phylogenetic tree.

**Figure 5.** Phylogenetic relationship and structural analysis of the 84 Cluster I *PPR* genes. The phylogenetic tree was constructed via MEGA7.0 software; the different classes of *PPR* genes make up separate clades. The schematic diagram was carried out to represent the gene structure. Introns and exons were indicated by black lines and purple boxes, respectively. The lengths of introns and exons of each gene were displayed proportionally.

**Figure 6.** Putative motifs of each Cluster I PPR gene by MEME program and TBtools software. Ten putative motifs were indicated in colored boxes. The length of protein can be estimated using the scale at the bottom.

**Figure 7.** Sequence logo of global multiple alignment of 179 DYW subgroup PPR proteins by MEME program.

#### *2.4. Duplication and Divergence Rate of Soybean Cluster I PPR Genes*

Whole genome tandem and segmental duplications play a vital role in multiple copies of genes in a gene family and their subsequent evolution. Among *SiPPR* genes, 190 (95pairs; ~39.01%) were segmentally duplicated [25]. Our research reached a similar conclusion that this proportion is far higher than of genes generally. Among Cluster I *PPR* genes, 30 (15 pairs; ~35.7%) were segmentally duplicated (connected by lines in Figure 8). Among 15 paralog pairs, 13 segmental duplication gene pairs were involved in two chromosomes; only two segmental duplications were intra-chromosomal. In particular, three genes have appeared twice, including *GmPPR3* (Glyma.01G011300), *GmPPR4* (Glyma.01G011700) and *GmPPR94* (Glyma.09G209700). The results indicated that gene segmental duplications might mainly contribute to the expansion of the PPR gene family in soybean.

**Figure 8.** Distribution of segmentally duplicated Cluster I *PPR* genes on soybean chromosomes. Green lines indicate duplicated *PPR* gene pairs, and bold Photozome Locus indicates the gene appears twice.

The ratio of non-synonymous (Ka) versus synonymous (Ks) substitution rates (Ka/Ks) was estimated for 15 segmental duplicated gene pairs to evaluate the role of Darwinian positive selection in duplication and divergence of the Cluster I *PPR* genes (Table S2). The result showed that the Ka/Ks for segmental duplications gene-pairs ranged from 0,14 to 1.03 with an average of 0.61. On the whole, it suggested that the duplication Cluster I *PPR* genes were under purifying selection pressures since their Ka/Ks rations were estimated as <1 except one gene pair with a rate (Ka/Ks) of 1.03.

#### *2.5. Tissue-Specific Expression Pattern of DYW Subgroup PPR Genes*

Different members may exhibit significant diversity in expression abundance among different tissues to adapt to different physiological process. To gain insight into the gene expression patterns within the organism in soybean growth and development, we investigated the relative transcript abundance of the 84 DYW subgroup *PPR* genes in six soybean tissues using publicly available RNA-seq data from Phytozome database, including flower, leaves, root, root hairs, seed, and stem, revealing that most DYW subgroup *PPR* genes had a similar pattern of expression in the same tissue. Out of 84 DYW subgroup *PPR* genes, most of them expressed in almost any tissues, while a total of four genes had no expression abundances to be discovered in any tissues, including Glyma.08G318700, Glyma.18G126700, Glyma.18G128100, and Glyma.18G225600. We found that most *PPR* genes showed preferential accumulation in leaves compared to other tissues, which were likely to be responsible for their subcellular localization, and most reported PPR proteins were located in mitochondria or chloroplast. A relatively high expression level was revealed in seeds, suggesting that they have potential function in seed development, which has been reported to be involved in endosperm development [14]. As shown in Figure 9, Glyma.11G008800, Glyma.13G220400 and Glyma.15G192000 showed extremely high expression in roots and root hairs, indicating that the DYW subgroup *PPR* genes may play specific roles in responding to drought stress.

#### *2.6. Cis-Elements Analysis*

By examining the expression levels of DYW subgroup *PPR* genes according to soybean droughtand salt-induced transcriptome database, we found that three genes were highly up-regulated to drought and salt stresses. *Cis*-elements presented in the upstream region play major roles in regulating the gene expression at the transcriptional level. To further understand the mechanism of the three genes, a set of 12 important *cis*-elements were identified in their 1.5 Kb 5 flanking region upstream from the start codons. Various *cis*-elements including ABA-responsive element (ABRE) and MYB-binding site (MBS) were shown in Figure 10, which suggested that the three candidate genes might be involved in responses abiotic stresses.

#### *2.7. Several Candidates Are Involved in Abiotic Stresses*

To comprehensively understand the physiological functions of DYW subgroup *PPR* genes, we initially examined the expression patterns of three genes in response to salt and drought stresses by qRT-PCR (Figure 11). Under drought treatment, these three genes exhibited similar expression patterns, reaching a peak at 2 h (~ 4.5-fold, 3-fold and 2.5-fold, respectively), indicating that these genes have a similar biological function in the same environment. Similarly, the expression levels of the three genes were enhanced by salt; the expression peaks of *GmPPR4*/*18*/*111* occurred at 4, 8 and 4 h, respectively, which are equivalent to 3-fold, 2.8-fold and 3.2-fold increases, respectively.

**Figure 9.** Heat map of expression profiles (in log10-based FPKM) of all DYW subgroup *PPR* genes from six soybean tissues (flower, leaves, root, root hairs, seed, and stem). The expression abundance of each transcript is represented by the color bar: Red, higher expression; and green, lower expression.

*Int. J. Mol. Sci.* **2019**, *20*, 5667

**Figure 10.** Putative *cis*-element in a 1.5 kb 5 flanking region upstream from the start codon. Different *cis*-elements were indicated by colored symbols and placed in relative positions on the promoter. The ABA-responsive element (ABRE), light-responsive element (ACE), anaerobic induction element (ARE), auxin responsive element (AuxRR-core), light-responsive element (Box4), MeJA-responsive (CGTCA-motif), light-responsive element (GT1-motif), low temperature responsive element (LTR), MYB-binding site (MBS), light-responsive element (Sp1), salicylic acid responsive element (TCA), and defense and stress responsive element (TC-rich repeat) were analyzed.

**Figure 11.** Expression patterns of three selected *PPR* genes under salt and drought treatments by qRT-PCR. *GmPPR4* (Glyma.01G011700), *GmPPR18* (Glyma.02G174500), *GmPPR111* (Glyma.11G008800), respectively. The *actin* gene was used as an internal control. The data are shown as the means ± SD obtained from three biological replicates. ANOVA test demonstrated that there were significant differences (\* *p* < 0.05, \*\* *p* < 0.01).

#### *2.8. GmPPR4 Improved Drought Tolerance in Transgenic Soybean Hairy Roots*

Among the three genes, *GmPPR4* (Glyma.01G011300) clearly responded to drought and salt stresses. For this reason, *GmPPR4* was selected for further investigation. To examine the function of *GmPPR4* in vivo, transgenic soybean plants which overexpressed *GmPPR4* (*GmPPR4*-OE) were generated into soybean hairy roots [26]. qRT-PCR analysis showed that *GmPPR4* accumulated in the *GmPPR4*-OE plants. (Figure S2), and about 80% of the roots of transgenic soybean plants were positive. To examine whether *GmPPR4* plays a role in the drought stress tolerance, we compared drought tolerance of *GmPPR4*-OE and WT plants at the vegetable stage; the hairy roots of seedlings which grew in soil were withheld from water for 2 weeks. No significant differences were observed for transgenic plants under normal growth conditions, compared to empty vector (EV)-transformed control hairy roots; drought treatment caused obvious differences in the growth of the EV-control and *GmPPR4*-OE plants; compared with the *GmPPR4*-OE, the EV-control plants showed wilted leaf under drought for 3 days, and seriously dehydrated leaves were observed after 7 days, whereas the *GmPPR4*-OE seedlings showed delayed and less leaf rolling during the drought stress process. (Figure 12A).

**Figure 12.** Drought stress analysis of EV-control transgenic plants and *GmPPR4*-OE transgenic plants. (**A**) Phenotypes of *GmPPR4*-OE and EV-control transgenic soybean plants subjected to drought stress without water for 3 days and 7 days. (**B**) Trypan blue staining of soybean plant leaves placed on filter paper for the induction of rapid drought for 3 h, the dead cells can be strained, but living cells cannot. (**C**) The proline content, (**D**) malondialdehyde (MDA) content, (**E**) H2O2 content, and (**F**) O2 − content of *GmPPR4*-OE and EV-control transgenic soybean plants under normal and drought conditions. The data are shown as the means ± SD obtained from three biological replicates. ANOVA test demonstrated that there were significant differences (\* *p* < 0.05, \*\* *p* < 0.01).

To explore the potential physiological mechanism responsible for the improved drought tolerance of the *GmPPR4*-OE seedlings, we compared some stress-related physiological changes in the EV-control and *GmPPR4*-OE plants under both normal growth and drought conditions. We found that proline accumulation in the transgenic plants was much more evident than in EV-control plants (Figure 12C), while the MDA content was decreased due to drought stress (Figure 12D). Furthermore, we also measured the level of H2O2 and O2 - in roots; we found that the drought-treated EV-control roots accumulated much more H2O2 and O2 - than the *GmPPR4*-OE roots (Figure 12E,F).

In addition, we used Trypan blue solution to detect cell activity in EV-control and *GmPPR4*-OE leaves. As shown in Figure 12B, no plant leaves differed under drought stress conditions, however, the color depth of the *GmPPR4*-OE leaves was lower than EV-control leaves under drought treatment, which indicated that the cell membrane integrity and stability in the leaves of the EV-control plants was better than that in the leaves of the *GmPPR4*-OE plants.

NaCl treatment was carried out. However, no obvious differences were observed in EV-control and *GmPPR4*-OE plants under NaCl treatment.

#### *2.9. GmPPR4-OE Plants Exhibited Increased Transcripts of Some Drought-Inducible Genes*

A previous study indicated that several genes play an important role in drought stress [27,28], including *DREB2* [29], *DREB3* [30], *MYB84* [31], *bZIP1* [32], *bZIP44* [33], *NAC11* [34], *WRKY13* [35], and *WRKY21* [35,36]. We compared the transcripts of several drought-inducible maker genes between *GmPPR4*-OE and EV-control plants under normal and drought conditions, and we found that drought

induced more transcripts of all genes. However, expression differences between *GmPPR4*-OE and EV-control plants under drought treatment occurred in six genes, including *DREB2*, *DREB3*, *bZIP1*, *NAC11*, *WRKY13*, and *WRKY21*. There was no clear difference in *bZIP44* and *MYB84* expression with drought treatment between *GmPPR4*-OE and EV-control plants (Figure 13).

**Figure 13.** GmPPR4-OE plants exhibited increased transcripts of some drought-inducible genes. The two-week-old soybean seedlings were placed on filter paper for 3 h. qRT-PCR analysis for drought-inducible genes (**A**–**H**). The actin gene was used as an internal control. The data were shown as the means ± SD obtained from three biological replicates. ANOVA test demonstrated that there were significant differences (\* *p* < 0.05, \*\* *p* < 0.01).

#### **3. Discussion**

Soybean is one of the most widely cultivated crops, with a total production of more than 260 million tons in 2010 (FAO data) [37]. Greenhouse and field studies showed that drought stress seriously affects plant development and led to significant reduction in crop yield [38]. Thus, when identifying ideal candidate genes, increasing drought stress resistance is essential for improving soybean yield.

*PPR* genes comprise a large family, which is ubiquitous to all terrestrial plants. In particular, previous reports indicated that a total of 4000 *PPR* family genes were identified in the spikemoss (S. *moellendor*ffi*i*) genome [3]. It is estimated that there were more than 1000 *PPR* genes in soybean. The duplication event was likely to be responsible for the large PPR family size, which has been experimentally verified in many species. Genome-wide identification of *PPR* family genes has been widely carried out in many species that have been sequenced [5–7]. In the present study, we identified 179 DYW subgroup *PPR* genes in the soybean genome (*Glycine max* Wm82.a2.v1). Consistent with previous studies, gene structure analysis revealed that most members of 179 DWY subgroup *PPR* genes were intron-less. Approximately 80% and 65% of *PPR* genes were free of intron in *A. thaliana* and rice, respectively [5,6]. The intron-less nature of the great majority of *PPR* genes may be the result of selection pressures during evolution. However, there are several *PPR* genes that tend to evolve diverse exon-intron organizations with more than 10 introns; we propose that they have a higher probability of evolving new specialized functions and adjusting to their living environment. The intron-less nature may demonstrate a duplication event of *PPR* genes [39]; we presumed that the nature also allows alternative splicing and alteration of splicing pattern in plants. In particular, many coding sequences of PPR proteins with extremely high similarity on the genome were divided only by frameshifts, for example, Glyma.13G332400 and Glyma.15G041800. In our study, *GmPPR3* (Glyma.01G011300), *GmPPR4* (Glyma.01G011700) and *GmPPR94* (Glyma.09G209700) shared more than 95% identity in their coding sequence. In particular, many *PPR* genes have extremely close proximity on the genome, which contributed to the extensive gene-level synteny shared between them [24].

Previous studies have shown that PPR proteins were involved in RNA editing, plant growth and development and abiotic stress. Several PPR proteins have been implicated with trans-splicing of *nad* intron in *A. thaliana*, these include: OTP43 [40], SLO4 [41], PPR19 [42], BIR6 [43], MTL1 [44], ABO5 [21], OTP439 [45], and SG3 [46]. Lots of PPR proteins have a similar function in the development of seed or

endosperm, such as EMP5 [47], PPR2263 [48], PPR8522 [49], OGR1 [50], AHG11 [17], and OTP43 [40]. In addition, in rice or *A. thaliana*, six PPR proteins: cisc(t) [20], ABO5 [21], ECB2 [51], PPR40 [22], SVR7 [52], and YS1 [53], were reported to be involved in abiotic stress responses, including ABA, drought, light and cold. In this study, a higher level of H2O2 and O2 - were found in *GmPPPR4*-OE plants compared with EV-control plants under drought stress. In a previous study, the *abo5* mutant accumulated higher H2O2 content in roots than wild type [21], and the *ahg11* mutants exhibited higher transcript levels of oxidative stress-responsive genes [17]. The above results suggested that *PPR* genes may play roles in drought stress by regulating the level of H2O2.

#### **4. Methods**

#### *4.1. Identification of DYW Subgroup PPR Genes in Soybean*

DYW subgroup PPR proteins sequences of soybean were downloaded from Phytozome [54]. Predicted proteins from the soybean genome (*Glycine max* Wm82.a2.v1) were scanned using HMMER v3 [55] using the Hidden Markov Model (HMM) corresponding to the Pfam of PPR family (PF14432) [56]. The proteins obtained using the raw DYW HMM, a high-quality protein set (E-value < 1 <sup>×</sup> 10−20), were aligned and used to construct a soybean-specific DYW HMM profile using hmmbuild from the HMMER v3 suite. This new soybean-specific HMM was used to search for all members in all soybean proteins; in addition, all obtained proteins with an E-value lower than 0.01 were selected. The highly matched sequences were reorganized and merged to remove the redundancy. Then, all protein sequences of putative DYW subgroup *PPR* genes were submitted to the SMART database [57] and Pfam database to confirm the existence of the DYW conserved domain. The sequences lacking a DYW domain were refused in this study. Furthermore, molecular weights and isoelectric points identified that the DYW subgroup PPR proteins were obtained by using tools from EXPASY website [58].

#### *4.2. Chromosomal Location and Phylogenetic Analysis*

Positional information of DYW subgroup *PPR* genes on chromosomes of soybean was obtained from the Phytozome database. All DYW subgroup *PPR* genes were mapped to 20 chromosomes in soybean.

Multiple sequence alignment of proteins from soybean was performed via ClustalW with a default parameter. A phylogenetic tree was constructed by using the maximum likelihood with MEGA7 software, with the following parameters: Poisson model, pairwise deletion and 1000 bootstrap replications.

#### *4.3. Gene Structure Analysis and the Sequence of PPR Motif Analysis*

The gene structures of *PPR* were illustrated using the online program Gene Structure Display Server [59] by comparing predicted coding sequences with their corresponding genomic DNA sequences.

For sequence analysis of PPR motifs, the MEME online program [60] was used to identify conserved motifs. The scatter diagram was used to display the distribution of PPR motifs at the opposite positions in the PPR proteins by using the TBtools software v0.665 (Guangzhou, China) [61].

#### *4.4. Gene Duplication*

All Cluster I PPR proteins were searched for using the BLASTp search (E-value >1e−10) with more than 75% sequence similarity being considered to be a pair of tandem repeat genes. Then, the resulting file and GFF3 files of soybean genome (*Glycine max* Wm82.a2.v1) were used to analyze the gene duplication by software MCScanX and visualization using CIRCOS [62].

#### *4.5. Tissue-Specific Expression Patterns of DWY subgroup PPR genes*

Transcription databases were obtained from Phytozome database to investigate the tissue expression patterns of the soybean DYW subgroup PPR family; TBtools software was used to conduct a visual hierarchical clustering of the 84 DYW subgroup *PPR* genes under normal conditions. The transcript data are available in Table S3.

#### *4.6. Promoter Sequence Analysis for Potential cis-Elements*

For *cis*-elements analysis, 1.5 kb 5 upstream region sequences were extracted from the Phytozome database. Then, the potential *cis*-elements of promotors for each gene were analyzed using PlantCARE database [63].

#### *4.7. Plant Materials and Treatments*

Soybean cultivar Zhonghuang 39 was used to analyze the gene expression pattern. The leaves of 15-day soybean seedlings were collected for RNA extraction and used to further qRT-PCR analysis. For drought stress, the soybean seedlings were placed on filter paper, for NaCl stress, the soybean seedlings were subjected to a 200 mM NaCl solution; samples were collected at 0, 1, 2, 4, 8, 12, 24, and 48 h after treatments. All treated leaf samples were frozen immediately in liquid and then stored at −80 ◦C for subsequent analysis.

#### *4.8. RNA Extraction and qRT-PCR*

Total RNA was extracted from plant samples using the Trizol method as the manufacturer's protocol (TIANGEN, Beijing, China), which was reverse transcribed into cDNA using a PrimeScriptTM RT Reagent Kit (TaKaRa, Shiga, Jappa). All primers used in the study are shown in Table S4 [64].

#### *4.9. Agrobacterium Rhizogenes-Mediated Transformation of Soybean Hairy Roots*

Soybean Williams 82, a typical variety, was used to generate *GmPPR4*-OE soybean hairy roots. The coding sequences of *GmPPR4* was ligated into plant transformation vector pCAMBIA3301 with the CaMV 35S promoter. The constructs were transferred into A. rhizogenes strain K599, and *Agrobacteriumrhizo* strain K599 harboring EV-control and *GmPPR4*-OE were injected at the cotyledonary node, as previously described [65]. The injected plants were transferred to the greenhouse with high humidity until hairy roots were generated at the infection site and had grown to about 5 cm long. The original main roots were cut off from 0.5 cm below the infection site. Seedlings were transplanted into nutritious soil and cultured normally in the greenhouse for a week (25 ◦C 16 h light/8 h dark photoperiod).

#### *4.10. Drought and Salt Stress Assays*

For drought treatment, one-week-old seedlings were subjected to dehydration for 15 days. After drought treatment, we re-watered soybean plants for 3 days. At the same time, we carried out salt treatment with 200 mM NaCl solution for 3 days [66].

#### *4.11. Measurement of Proline Content, MDA Content, H2O2 Content and O2 - Content*

The content of proline, MDA, H2O2 and O2 - were measured with the corresponding assay kit (Cominbio, Suzhou, China) based on the manufacturer's protocols; all measurements were from four biological replicates.

#### *4.12. Trypan Blue Staining*

The leaves separated from the EV-control and *GmPPR4*-OE plants were placed on filter paper for the induction of rapid drought for 3 h, which were used to trypan stain, as previously described [23].

#### **5. Conclusions**

This study is the first time identifying the presence of 179 DYW subgroup *PPR* genes in the soybean genome (*Glycine max* Wm82.a2.v1) sequences, which were designated to *GmPPR1* through *GmPPR179* on the basis of their chromosomal location. We conducted a comprehensive and systematic analysis of the DYW subgroup PPR family. Based on the gene expression patterns under drought and salt stresses, we found that three *PPR* genes were highly up-regulated under salt and drought treatment, and *GmPPR4* was selected for validating its role in drought stress tolerance. Compared with the EV-control plants, *GmPPR4*-OE exhibited drought tolerant phenotypes. Our results showed that *GmPPR4* could improve tolerance to drought in soybean.

**Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1422-0067/20/22/ 5667/s1. Table S1. DYW-subgroup *PPR* genes in soybean. Detailed genomic information including Phytozome locus, class present, number of introns within ORF, protein, genomic locus (chromosomal location), +/− stand, isoelectric point, and molecular weight (kDa) of the PPR proteins for each *PPR* gene. Table S2. The Ka/Ks ratios for segmentally duplicated the Cluster I PPR proteins. Table S3. Transcript data of 84 DYW subgroup *PPR* genes for heat map. Table S4. The sequences of primers used in the study. Figure S1. The phylogenic tree of 12 selected proteins. Figure S2. qRT-PCR analysis of *GmPPR4* expression in *GmPPR4*-OE and EV-control transgenic hairy roots.

**Author Contributions:** Z.-S.X. coordinated the project, conceived and designed experiments, and edited the manuscript; H.-G.S. performed experiments and wrote the first draft; B.L., X.-Y.S., and J.M. conducted the bioinformatic work and performed experiments; J.C., Y.-B.Z. and M.C. provided analytical tools and managed reagents; D.-H.M., Z.-S.X., and Y.-Z.M. contributed with valuable discussions. All authors have read and approved the final manuscript.

**Funding:** This research was financially supported the National Transgenic Key Project of the Chinese Ministry of Agriculture (2018ZX0800909B and 2016ZX08002-002), which were used to design the study and write the manuscript, the National Natural Science Foundation of China (31871624) was used to publish the results.

**Acknowledgments:** We are grateful to Lijuan Qiu and Shi Sun of the Institute of Crop Science, Chinese Academy of Agricultural Sciences for kindly providing soybean seeds.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Characterizing the Role of** *TaWRKY13* **in Salt Tolerance**

**Shuo Zhou 1,**†**, Wei-Jun Zheng 2,**†**, Bao-Hua Liu 3, Jia-Cheng Zheng 4, Fu-Shuang Dong 1, Zhi-Fang Liu 5, Zhi-Yu Wen 1, Fan Yang 1, Hai-Bo Wang 1, Zhao-Shi Xu 6, He Zhao 1,\* and Yong-Wei Liu 1,\***


Received: 13 October 2019; Accepted: 11 November 2019; Published: 14 November 2019

**Abstract:** The WRKY transcription factor superfamily is known to participate in plant growth and stress response. However, the role of this family in wheat (*Triticum aestivum* L.) is largely unknown. Here, a salt-induced gene *TaWRKY13* was identified in an RNA-Seq data set from salt-treated wheat. The results of RT-qPCR analysis showed that *TaWRKY13* was significantly induced in NaCl-treated wheat and reached an expression level of about 22-fold of the untreated wheat. Then, a further functional identification was performed in both *Arabidopsis thaliana* and *Oryza sativa* L. Subcellular localization analysis indicated that TaWRKY13 is a nuclear-localized protein. Moreover, various stress-related regulatory elements were predicted in the promoter. Expression pattern analysis revealed that *TaWRKY13* can also be induced by polyethylene glycol (PEG), exogenous abscisic acid (ABA), and cold stress. After NaCl treatment, overexpressed *Arabidopsis* lines of *TaWRKY13* have a longer root and a larger root surface area than the control (Columbia-0). Furthermore, *TaWRKY13* overexpression rice lines exhibited salt tolerance compared with the control, as evidenced by increased proline (Pro) and decreased malondialdehyde (MDA) contents under salt treatment. The roots of overexpression lines were also more developed. These results demonstrate that *TaWRKY13* plays a positive role in salt stress.

**Keywords:** stress responsive mechanisms; TaWRKY transcription factors; salt tolerance

#### **1. Introduction**

Unlike animals, plants cannot move when exposed to stress. However, complex signaling network have been established to cope with stress [1]. Under stress, a series of responses are induced to prevent or minimize damage. These are accompanied by many physiological, biochemical and developmental changes [2]. Current research on plant stress response has reached the level of cells and molecules, and combined with genetics, we can explore the stress responsive mechanisms in order to improve plant growth under conditions of stress [3–7].

Many genes are induced by stress; the products of these genes both participate in stress response and regulate the expression of related genes involved in signal transduction pathways in order to avoid or reduce tissue damage [8–10]. Signaling via the hormone, liposome, SnRK2 (sucrose non-fermenting 1-related protein kinase 2) [11], MAPK (mitogen activated protein kinase) [12], ROS signal [13] and stomatal [14] pathways are the main networks by which plants respond to salt and drought stress. Plant adaptation to drought and other stresses depends on both the expression of stress-resistant related genes and the regulation of various signal pathways induced by stress [15]. Products of stress-related genes can be divided into two classes: the first class includes ion channel proteins [16], water channel proteins [17], osmotic regulators (sucrose, proline and betaine), synthases [18] and other products that directly function in stress response, while the products of the second type include proteins involved in stress-related signal transmission and regulators of gene expression, such as protein kinases (PKs) and transcription factors (TFs) [19,20].

Transcription factors play a crucial role in regulating the expression of stress-related genes in plants. When abiotic stress occurs, changes in the activity of transcription factors cause changes in the activity of target genes. Transcription factors involved in plant stress response are widely researched, such as the AP2/EREBP TF family [21], MYC/MYB TF family [22], HSE binding TFs [23], NAC TF family [24], and WRKY TF family [25]. Among them, WRKY TFs are extensively found in higher plants including *Arabidopsis thaliana*, *Oryza sativa*, *Setaria italica*, *Glycine max*, and *Triticum aestivum*, which indicates that WRKY TFs play a significant role in plant stress tolerance [26–30].

Although a large number of studies have shown that WRKY TFs in plants are mainly involved in disease resistance and defense response, some members of the WRKY TFs are involved in abiotic stress response. *TaWRKY1* mediates stomatal movement through an ABA-dependent pathway to improve plant tolerance to drought stress [31]. In addition, *TaWRKY10* acts as a positive regulator under drought, salt, cold, and hydrogen peroxide stress conditions and improves the stress tolerance in transgenic tobacco [32]. In *Arabidopsis*, WRKY proteins are involved in regulating ABA response factors, such as MYB2, DREB1a, DREB2a and Rab18 [33]. The overexpression of *ZmWRKY33* in *Arabidopsis* improved the salt-stress tolerance of transgenic plants [34]. These studies suggested that WRKY TFs play a significant role in plant stress response.

High salt stress is a major obstacle to plant growth and development. High salt conditions lead to increases in reactive oxygen species (ROS), metabolic toxicity, membrane disorganization, the inhibition of photosynthesis, and attenuated nutrient acquisition at different plant growth stages [35]. Recent reports claim that salinity affects about 20% of all irrigated arable land and is an increasing problem in worldwide agriculture (FAO Cereal Supply and Demand Brief. http://www.fao.org/ worldfoodsituation/csdb/en/).

Since wheat is rich in thiamine, fat, calcium, niacin, starch, protein, iron, riboflavin, minerals, and vitamin A and can provide abundant energy and protein for humans, wheat is regarded as one of the most important crops in the world [36]. However, wheat production is constrained by environmental conditions, such as drought, salinity, waterlogging, and extremes in temperature. Next in importance to drought stress, salinity affects crop yields worldwide. The improvement of stress tolerance in wheat by biotechnology and transgenic technology could contribute to increased production worldwide. However, the huge wheat genome has slowed progress [37,38]. Although many studies have investigated the roles of WRKY transcription factors in response to various stress conditions, the mechanisms underlying their function need further study. Here, RNA-Seq, real-time fluorescence quantification PCR (RT-qPCR), and several databases were used in a study of *TaWRKY13*. The results demonstrated that overexpression of *TaWRKY13* can improve salt tolerance in *Arabidopsis* and rice.

#### **2. Results**

#### *2.1. Identification and Genome Structure Analysis of WRKYs in Triticum aestivum*

According to the Plant Transcription Factor Database website (http://planttfdb.cbi.pku.edu.cn/ index.php), wheat has 171 TaWRKYs, which are distributed across all chromosomes (1AL, 1BL, 1DL, 2AL, 2AS, 2BS, 2DL, 2DS, 3AL, 3B, 3DL, 4AL, 4AS, 4DS, 5AL, 5BS, 5BL, 5DL, 6AL, 6AS, 6BS, 6DS, 7AL, 7DL). Here, PF03106 was used as a key word to blast WRKYs in wheat on the Phytozome website (https://phytozome.jgi.doe.gov/pz/portal.html). Nucleic acid and amino acid sequences of 100 TaWRKYs that harbor at least one WRKY domain are shown in Supplementary Table S2. Based on the rule that the CDS of TaWRKYs were more than 300 base pairs [30], some TaWRKYs were removed, and then combined with the NCBI database (https://www.ncbi.nlm.nih.gov/pubmed), meaning that 57 TaWRKYs were identified with the annotation gene's name, ID, transcript name and location (Table 1). The location of 57 TaWRKYs on chromosomes was analyzed by using the online website http://mg2c.iask.in/mg2c\_v2.0/. From the map, we can see that the locations of TaWRKYs were different on each chromosome; for example, *TaWRKY6*, *38*, *50*, *27, 48*, and *57* were located at the end of chromosome 3B (forward or reverse), while *TaWRKY70* and *TaWRKY 71* were located near the centromere of chromosome 1D. Moreover, the distribution of TaWRKYs on 4A was the combination of both distributions described above (Figure 1). To further explore gene structure differences, a gene structure figure of 56 TaWRKYs is displayed in Figure 2. The TaWRKYs are all different in structure. Most TaWRKYs contain 1 to 5 different exons, which may contain different functional structures, such as zinc finger, leucine, kinase structure, exerting different biological functions. TaWRKY7, 22, 23, 24, 33, 56, and 90 do not harbor introns, only containing exons and/or an upstream structure.




**Table 1.** *Cont.*

Annotations were according to Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html), PlantTFDB (http:// planttfdb.cbi.pku.edu.cn/index.php and NCBI (https://www.ncbi.nlm.nih.gov/pubmed).

**Figure 1.** Chromosome location of TaWRKYs listed in Table 1.


**Figure 2.** Gene structure analysis of TaWRKYs. Segments in yellow represent CDS, blue indicates upstream/downstream, and black lines represent introns.

#### *2.2. Identification and Biological Analysis of TaWRKY13*

To find wheat stress-responsive genes under salt stress, the roots of three-leaf wheat seedlings were immersed in 150 mM NaCl solution for 1 h. Control\_Leaf represents the leaf tissue without NaCl treatment, NaCl\_Leaf represents the leaf tissue treated as per the above description; each treatment involved two independent replicates which were then sampled for RNA-seq (Supplementary Table S1). Twelve TaWRKYs (TaWRKY4, 9, 12, 13, 15, 22, 29, 33, 34, 44, 53, and 70) were selected based on the rule log2 (NaCl\_Leaf /Control\_Leaf) > 2. As shown in Figure 3, *TaWRKY13* gave the highest relative expression in response to salt stress, peaking at more than 20-fold at 1 h. *TaWRKY13* (ID: 31962353, Traes\_2AS\_6269D889E.1) was selected for further investigation. *TaWRKY13* contained a 975 bp open reading frame (ORF) encoding 324 amino acids; the molecular weight of the protein was 81.02 kDa with pI 4.99 (https://web.expasy.org/protparam/). The predicted amino acid sequence showed that *TaWRKY13* only harbored one WRKY domain with a highly conserved WRKYGQK motif and a CX4-5CX22-23HXH zinc-finger motif.

**Figure 3.** Real-time fluorescence quantification PCR of 12 TaWRKYs under salt treatment. The expression level of TaActin was used as a loading control. The data represent the means ± SD of three biological replications. The ANOVA demonstrated significant differences (*\* p* < 0.05, *\*\* p* < 0.01).

#### *2.3. Phylogenetic Analysis of AtWRKYs, OsWRKYs and TaWRKYs*

Phylogenetic analysis is a useful method that can provide some clues to the possible functions of predicted or analyzed target genes. It would be useful to know the homologs of *Triticum aestivum* WRKYs (TaWRKYs), especially *TaWRKY13*, with WRKYs of *Arabidopsis thaliana* (AtWRKYs) and WRKYs of *Oryza sativa* (OsWRKYs) with reference to previous results. A phylogenic tree was constructed by the neighbor-joining method [39] to investigate the evolutionary relationships between AtWRKYs, OsWRKYs and TaWRKYs. There are 398 WRKYs for phylogenetic analysis (90 AtWRKYs, 128 OsWRKYs and 171 TaWRKYs) (Figure 4). According to Figure 4, AtWRKYs, OsWRKYs and TaWRKYs were scattered across different branches of the phylogenic tree, and all WRKYs were divided into three broad categories; among them, there were more WRKYs in groups I and II than in group III. *TaWRKY13* (ID: Traes\_2AS\_6269D889E.1) and *AtWRKY13* (ID: AT4G39410) were in group II, and *OsWRKY13* (ID: LOC-Os01g546600) belonged to group I. The results of phylogenetic analysis preliminarily indicated that *TaWRKY13* has a closer homology with *AtWRKY13* than *OsWRKY13*.

**Figure 4.** Phylogenetic analysis of AtWRKYs, OsWRKYs and TaWRKYs. The phylogenetic tree was produced using the aligned file with 1000 bootstrap replications in MEGA 6.0. *TaWRKY13, AtWRKY13* and *OsWRKY13* are highlighted in red, blue and yellow, respectively. The numbers at nodes are bootstrap values, and the length of branches represent evolutionary distance. Number of bootstrap replications: 1000.

#### *2.4. TaWRKY13 was Localized in the Nucleus*

To investigate the biological activity of *TaWRKY13*, the coding sequence fused to the N-terminus of the green fluorescent protein (GFP) was inserted into wheat mesophyll protoplasts by the PEG-mediated method. As the control, 35S::GFP was transformed [40]. The fluorescence of the control GFP was distributed throughout the cells, whereas the fluorescence of 35S::TaWRKY13-GFP was specifically localized in the nucleus (Figure 5). Thus, TaWRKY13 is a nuclear-located protein.

**Figure 5.** Subcellular localization of TaWRKY13. 35S::GFP and 35S::TaWRKY13-GFP constructs were transformed into wheat mesophyll protoplasts under the control of the Cauliflower Mosaic Virus 35S (CaMV35S) promoter. Wherein, green color represents fluorescence emitted by green fluorescent protein under confocal laser scanning microscope and the red color represents the fluorescence emitted by chloroplasts under confocal laser scanning microscope. Results were observed by a confocal laser scanning microscope (LSM700; CarlZeiss, Oberkochen Germany) after incubation in darkness at 22 ◦C for 18–20 h. Scale bars, 10 μm.

#### *2.5. Tissue-Specific Expression of TaWRKY13*

Studies of genes with a specific expression in different tissues are necessary to understand the regulatory mechanisms of plant growth and development and the relationship between cell type and function. Here, the promoter sequence of *TaWRKY13* was fused to the pCAMBIA1305 vector, which contains a β-glucuronidase (GUS) reporter gene in the N-terminus (Figure 6). The GUS reporter gene can preliminarily determine the tissue specificity of the gene by observing the tissue location with a blue color after staining [41]. qRT-PCR was used to further verify the relative expression level at the molecular level. *TaWRKY13* was expressed in the roots, stems and leaves of T3 generation transgenic *Arabidopsis* plants under normal and salt-stress conditions, with the relative expression in roots being higher than in leaves and stems. After NaCl treatment, the expression levels in roots, stems and leaves were significantly increased, indicating that *TaWRKY13* might be responsive to salt stress.

**Figure 6.** Tissue-specific expression analysis of *TaWRKY13*. (**A**) Identification of homozygous lines by agarose gel electrophoresis. (**B**) Three transgenic lines selected by RT-qPCR. (**C**) β-glucuronidase (GUS) staining of transgenic *Arabidopsis* under normal conditions. (**D**) GUS staining of transgenic *Arabidopsis* after NaCl treatment. (**E**) qRT-PCR for tissue-specific expression analysis of *TaWRKY13* under normal conditions. (**F**) qRT-PCR for tissue-specific expression analysis of *TaWRKY13* after NaCl treatment. All data are means ± SDs of three independent biological replicates. The ANOVA demonstrated significant differences (\* *p* < 0.05, *\*\* p* < 0.01).

#### *2.6. TaWRKY13 Is Involved in Various Stress Responses*

WRKY proteins are reported to be involved in various biotic and abiotic stresses [25]. Expression pattern analyses were conducted to determine whether *TaWRKY13* was responsive to abiotic stresses. The results indicated that *TaWRKY13* participated in salt PEG, ABA and cold-stress responses (Figure 7). For PEG treatment, the relative expression level of *TaWRKY13* was rapidly induced at 1 h after the imposition of PEG stress (Figure 7A). After NaCl treatment for 1 h, *TaWRKY13* was highly induced at a maximum level of about 22-fold (Figure 7B). Exogenous ABA and cold stress also significantly affected the expression of *TaWRKY13* (Figure 7C,D). The rapid increase in relative expression levels of *TaWRKY13* following different stress treatments indicated an important role at the initial stages of stress response.

**Figure 7.** Expression patterns of *TaWRKY13* under (**A**) PEG, (**B**) NaCl, (**C**) exogenous abscisic acid (ABA), and (**D**) cold treatments. The ordinates are relative expression levels (fold) of *TaWRKY13* compared to the non-stressed control. The horizontal ordinate is the treatment time, at 0, 1, 6 and 24 h. The expression level of TaActin as a loading control. All experiments were repeated three times. Error bars represent standard deviations (SDs). All data are means ± SDs of three independent biological replicates. The ANOVA demonstrated significant differences (*\*\* p* < 0.01).

#### *2.7. Stress-Related Regulatory Elements in the Promoter of TaWRKY13*

The 1.856 kb promoter region upstream of the *TaWRKY13* ATG start codon was isolated to gain an insight into the regulatory mechanism. We searched for putative cis-acting elements in the promoter regions using the database PLACE (http://www.dna.affrc.go.jp/PLACE/). The results are shown in Table 2. Numerous stress-related regulatory elements were present, including a W-BOX, MYB element and TATA-BOX, which take part in the response to both drought and high-salt stress, as well as low-temperature responsive (LTR), ABA-responsive element (ABRE) and GT1, which mainly participate in salt-stress response. Moreover, there were various light, gibberellin, SA (salicylic acid) and high-temperature responsive elements, indicating that *TaWRKY13* is involved in abiotic stress response and plant hormone-related signal transduction.



"Number" corresponds to the number of each type of cis-element in the promoter.

#### *2.8. Root System Analysis Indicates That Overexpression Lines Respond to Salt Stress in Arabidopsis*

To explore the mechanism of *TaWRKY13* under salt stress, a pCAMBIA1302-*TaWRKY13 (35S::TaWRKY13)* vector was constructed and transformed into *Arabidopsis* for root length assay [40]. The results of the identification of homozygotes by agarose gel electrophoresis (AGE) and the selection of three transgenic lines (*35S::TaWRKY13#1*, *#2*, *#3*) by RT-qPCR are available in Supplementary Figure S1. Seedlings of control (Columbia-0) and three T3 generation overexpression lines were first grown on MS (Murashige & Skoog) medium for one week and then transplanted to MS medium supplemented with various NaCl concentrations (0, 100, 120 mM) for salt treatment. As shown in Figure 8, the overexpression lines have an advantage in terms of the main root length and total surface area compared to Col-0 under NaCl treatment.

**Figure 8.** Root length phenotypes of *Arabidopsis* overexpression lines after NaCl treatment. (**A**) Image of the root length phenotype of transgenic lines grown in 0, 100 and 120 mM NaCl. (**B**) Analysis of the main root lengths of transgenic lines under NaCl treatment. (**C**) Analysis of total surface areas of transgenic lines under NaCl treatment. The main root length and total surface area of *Arabidopsis* roots were measured by the WinRHIZO system. All data are means ± SDs of three independent biological replicates. The ANOVA demonstrated significant differences (\* *p* < 0.05).

#### *2.9. TaWRKY13 Overexpression Response to Salt Stress in Oryza sativa*

Two-week-old T3 rice lines seedlings of the control (Nipponbare) and three overexpression lines (*35S::TaWRKY13#1*, *#2*, *#3*) were grown hydroponically in untreated control solution or in the same solution supplemented with 150 mM NaCl to explore the physiological tolerance of *TaWRKY13* overexpression rice lines to salt stress [42]. The verification of homozygotes and the selection of three transgenic lines were conducted by AGE and RT-qPCR, respectively (Figure 9A,B). As shown in Figure 9C, before NaCl treatment, both Nipponbare and the three transgenic lines showed similar growth patterns, with no or little difference in plant height, root length, and proline (Pro) and malondialdehyde (MDA) contents. After 7 days of NaCl treatment, both Nipponbare and the overexpression lines showed leaf shedding (Figure 9D). Compared with the transgenic lines, Nipponbare plants showed evidence of wilting, water loss and yellowing, whereas the transgenics lines showed less severe symptoms. Meanwhile, the overexpression of *TaWRKY13* increased the proline content and decreased MDA content under NaCl treatment (Figure 9E,F). The root length of Nipponbare was significantly lower than for transgenic plants; the surface areas of transgenic plants were higher than for Nipponbare

(Figure 9G,H). These results indicated that the overexpression of *TaWRKY13* enhanced salt tolerance in rice.

**Figure 9.** Phenotype identification of *TaWRKY13* transgenic rice under NaCl treatment. (**A**) Confirmation of homozygotes by agarose gel electrphoresis. (**B**) Selection of three transgenic lines by RT-qPCR. (**C**) Rice seedlings and root system diagram of Nipponbare and *35S::TaWRKY13* before treatment. (**D**) Rice seedlings and root system diagram of Nipponbare and *35S::TaWRKY13* after 150 mM NaCl treatment for 7 days. (**E**) Proline contents in Nipponbare and *35S::TaWRKY13* seedlings under normal conditions and NaCl treatment. (**F**) Malondialdehyde (MDA) contents in in Nipponbare and *35S::TaWRKY13* rice seedlings under normal growth conditions and NaCl treatment. (**G**) Root length measurements of Nipponbare and *35S::TaWRKY13* transformants with and without NaCl treatment. (**H**) Total surface areas of Nipponbare and *35S::TaWRKY13* with and without NaCl treatment. Main root lengths and total surface areas were measured by the WinRHIZO system (Hang xin, Guangzhou, China). All data are means ± SDs of three independent biological replicates. The ANOVA demonstrated significant differences (\*\* *p* < 0.01).

#### **3. Discussion**

Regarded as one group among many important transcription factors in plants, WRKY TFs are represented by 90 members in *Arabidopsis* and more than 100 in rice [43]. The functions of WRKY TFs have been studied in detail in various plant species since their first discovery.

Since the application of transcriptome sequencing technology, researchers have sequenced the genome of wheat [44,45]. However, owing to the large and complex genome of heterohexaploid wheat, the task has posed many challenges [46]. Recently, transgenic *Arabidopsis* plants of *TaWRKY2* and *TaWRKY19* have shown improved stress tolerance, and the overexpression of *TaWRKY2* and *TaWRKY19*

has exhibited salt, osmotic/dehydration and freezing stress tolerance [47]. More than 160 TaWRKYs were characterized according to their sequence alignment, motif type and phylogenetic relationship analysis by Sezer et al. [48]. Although the WRKY genes associated with stress can be identified by transcriptome sequencing and family analysis, functional identification and mechanism analysis in wheat is limited. Salt stress is one of the most serious stresses that cannot be reversed after damage [49].

Here, on the basis of the previous research, combining RNA-Seq, real-time quantitative PCR (RT-qPCR), and the latest wheat database, *TaWRKY13* was isolated from the wheat genome for further study. RNA-Seq was conducted first (Supplementary Table S1); meanwhile, using the wheat database, 57 TaWRKY genes were annotated (Table 1). The results showed that TaWRKYs were differently distributed (number and location) on wheat chromosomes (Figure 1). Studies of the genome structure and the phylogenetic analysis of TaWRKY genes were initially difficult, because the wheat genome was too complex for statistical analysis; there were 171 TaWRKY genes according to the database (https://phytozome.jgi.doe.gov/pz/portal.html). Based on the rule that the CDS of TaWRKYs were more than 300 base pairs, we removed redundant TaWRKY genes and, combined with the NCBI database (https://www.ncbi.nlm.nih.gov/pubmed), 56 TaWRKY genes were selected for the analysis of the gene structure (Figure 2). Major TaWRKY genes harbored different CDS and binding motifs responsible for special function; for example, *TaWRKY1* contained an N-terminal CUT domain and a C-terminal NL domain [30]. To further explore TaWRKY genes that respond to salt stress, 12 TaWRKY genes were chosen for verification by qRT-PCR (Figure 3). All 12 genes were up-regulated under salt stress, and *TaWRKY13* was chosen for further study due to its higher expression level under salt treatment. Phylogenetic analysis demonstrated that TaWRKY genes have different evolutionary relationships and homologies to WRKYs in *Arabidopsis* and rice (Figure 4); compared to *OsWRKY13, AtWRKY13* was closer to *TaWRKY13,* possibly indicating similar biological functions [50]. For *OsWRKY13,* the non-conservation of evolution may provide a basis for the subsequent functional identification of *TaWRKY13* in rice, in that the influence of rice itself in *OsWRKY13* was eliminated. Subcellular localization showed that TaWRKY13 is a nuclear protein (Figure 5) which may mainly be involved in nuclear signal transduction [51,52]. Although many cotton (*Gossypium hirsutum*) *WRKY* genes were expressed at low levels during development, a few *GhWRKYs* expressed highly in specific tissues such as roots, stems, leaves and embryos [53]. Our results showed that *TaWRKY13* was expressed in roots, stems and leaves in transgenic lines, the relative expression level of roots was higher than stems and leaves in transgenic lines, and under salt-stress conditions, the relative expression level was double that of the normal condition (Figure 6).

An increasing number of studies have shown that WRKY TFs play important roles in abiotic stress response; for instance, the overexpression of *GmWRKY21* improved cold tolerance in *Arabidopsis*, because of the regulation of DREB2A and STZ/Zat10. *GmWRKY54* conferred salt and drought tolerance; *GmWRKY13*, which was insensitive to ABA (abscisic acid) but markedly sensitive to salt and mannitol, may function in both lateral root development and the abiotic stress response [54]. Expression pattern analyses revealed that *TaWRKY13* was induced significantly by PEG, salt, low-temperature and ABA (Figure 7). Compared with PEG, low-temperature, and ABA stress, *TaWRKY13* achieved the highest relative expression level under salt treatment, which was in accordance with the following root length assay in *Arabidopsis* and the rice resistance assay. Products of WRKY TFs bind to specific cis-regulatory sequences such as the W-BOX in the promoter to induce the expression of downstream target genes [55]. Many regulatory cis-elements that are responsive to drought (W-BOX, MYB and TATA-BOX), high salt (LTR, ABRE and GT1), SA (salicylic acid, WRKY) and cold were recognized in the *TaWRKY13* promoter, showing that *TaWRKY13* is capable of responding to stress (Table 2). WRKY13 participated in various physiological processes; for example, a weaker stem phenotype, reduced sclerenchyma development, and altered lignin synthesis were observed in an *AtWRKY13* mutant, showing that it functioned in stem development [56]. When *AtWRKY13* was disturbed under short-day conditions, *AtWRKY13* promoted flowering [57]. Furthermore, WRKY13 was also involved in the cross talk between abiotic and biotic stress signaling pathways, and *OsWRKY13* displayed selective binding to different cis-elements to

regulate various stress [58]. In this study, a root length assay of overexpression lines was conducted in *Arabidopsis* for an analysis of the stress tolerance of *TaWRKY13*; overexpression lines had longer root lengths and a higher total root area than Col-0 (Figure 8A–C). Additionally, the overexpression of *TaWRKY13* enhanced salt tolerance in transgenic rice (Figure 9). Under NaCl treatment, the transgenic lines of *TaWRKY13* grew vigorously, whereas Nipponbare seedlings were more wilted and yellow (Figure 9D); the transgenic lines also had higher proline (Pro) and reduced malondialdehyde (MDA) contents (Figures 8F and 9E ) under NaCl treatment. In addition, the roots of transgenic lines were longer and more developed than Nipponbare (Figure 9G,H). These results all showed that *TaWRKY13* was responsive to salt stress, in agreement with data from other species [54,56,58]. In accordance with the present study, the results suggested that *TaWRKY13* has a potential role in improving salt tolerance in wheat. These results are only preliminary in exploring the putative role of *TaWRKY13* in salt tolerance; more researches about the role and regulation mechanism of *TaWRKY13* are still needed in wheat. For instance, based on the above findings, *TaWRKY13* was transformed into wheat for functional verification and mechanism analysis to further improve the role of *TaWRKY13* in wheat stress tolerance pathways.

#### **4. Materials and Methods**

#### *4.1. De Novo Transcriptome Sequencing of Salt-Treated Wheat*

Wheat (*Triticum aestivum* L. cultivar Jinhe 9123, from the Institute of Genetics and Physiology, Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang, China) was cultivated in a 10 cm × 10 cm pot (vermiculite:soil, 1:3) supplemented with Hoagland's liquid medium at 22 ◦C under a 16 h light/8 h darkness photoperiod for 10 days. When the wheat seedlings were at the three-leaf stage, the pots were immersed in 150 mM NaCl solution and water (control) for 1 h, respectively [30], prior to the sampling of 0.1 g fresh leaf tissue. Samples were submerged immediately in liquid nitrogen and stored at −80 ◦C for RNA-Seq. The experiment was performed in three independent replications. In Supplementary Table S1, Control\_Leaf means a sample without NaCl treatment, and NaCl\_Leaf means a sample with salt treatment; each treatment involved three independent replicates, which were then sampled for RNA-Seq. Data are shown in Supplementary Table S1.

#### *4.2. Identification and Annotation of TaWRKY Response to Salt Stress*

According to the RNA-Seq data, the rule was adopted that the expression level was up-regulated and log2(NaCl\_treat/Control\_treat) > 2 to select TaWRKYs which responded to salt stress. Several databases—NCBI (https://www.ncbi.nlm.nih.gov/pubmed) PlantTFDB (http://planttfdb.cbi.pku.edu. cn/) and Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html)---were used to annotate the gene name, ID, transcript name and localization.

#### *4.3. Structure Analysis and Phylogenetic Analysis of TaWRKYs*

According to the information listed in Table 1, the chromosome location of TaWRKYs was analyzed by using the online website http://mg2c.iask.in/mg2c\_v2.0/. For gene structure analysis, the data of TaWRKYs that were identified in Section 4.2 were uploaded to GSDS (http://gsds.cbi.pku.edu.cn/*)* to obtain the map of the TaWRKYs' structure. For phylogenetic analysis, a tree of WRKYs from wheat, rice and *Arabidopsis* was constructed using the neighbor-joining method in MEGA 6.0 with 1000 bootstrap replications [39]. Data for gene structure and the phylogenetic tree analysis were downloaded from PlantTFDB (http://planttfdb.cbi.pku.edu.cn/) and are shown in Supplementary Tables S2 and S3.

#### *4.4. RNA Extraction of Stress Treatments and RT-qPCR Analyses*

Wheat seeds were sown as previously described; vermiculite and soil were removed by water after being grown for 10 days, and the fresh leaf tissue of three-leaf-stage wheat seedlings were used for the RNA extraction of different stress treatments. For the identification of TaWRKY responses to salt stress, the seedlings roots were immersed in 150 mM NaCl solution, and 0.1 g of fresh leaf tissue was sampled at different times (0, 0.5, 1, 2, 4, 8, 12 and 24 h). For the expression pattern analyses, the roots of wheat seedlings were immersed in 10% PEG6000, 150 mM NaCl and 100 <sup>μ</sup>mol·L−<sup>1</sup> ABA solutions. Wheat seedlings for cold treatment were placed in a 10 h light/14 h darkness, 4/2 ◦C chamber and sampled at different periods (0, 1, 6 and 24 h) [30,59,60]. For specific tissue expression assays, T3 generation transgenic *Arabidopsis* (*35S::pTaWRKY13*) plants were surface-sterilized with 10% Chloros and washed three times with sterile water. Sterilized seeds were sown on MS (Murashige & Skoog) medium, vernalized in darkness for 3–4 days at 4 ◦C, then grown in a chamber at 22 ◦C and 75% humidity under a 16 h light/8 h darkness photoperiod for one week. The seedings were transplanted to soil (vermiculite:soil, 1:3), 0.1 g fresh roots, stems and leaves tissue of 10-day-old transgenic *Arabidopsis* seedlings with or without 150 mM NaCl treatment were sampled for RNA the extraction of different tissues [56].

All samples after collection were submerged immediately in liquid nitrogen and stored at −80 ◦C for RNA extraction using an RNA prep plant kit (TIANGEN, Beijing, China); cDNA was synthesized using a Prime Script First-Strand cDNA Synthesis Kit (TransGen, Beijing, China) following the manufacturer's instructions. RT-qPCR was performed with Super Real PreMix Plus (TransGen, Beijing, China) on an ABI Prism 7500 system (Applied Biosystems, Foster city, CA, USA). Specific primers for TaActin, AtActin and TaWRKY4, 9, 12, 13, 15, 22, 19, 33, 34, 44, 53 and 70 for RT-qPCR are listed in Supplementary Table S4. Three biological replicates were used for RT-qPCR analysis, and the 2−ΔΔCt method was used for quantification.

#### *4.5. Gene Isolation and Subcellular Localization*

The ORF (open reading frame) of *TaWRKY13* was amplified by PCR with specific primers from wheat cDNA (cultivar Jinhe 9123). The PCR product was fused into pZeroBack vector (TIANGEN, Beijing, China) and sequenced for further study. The correct sequencing plasmids were treated as templates, the segment with restriction sites was amplified by specific primers, and the PCR product was inserted into the N-terminus of the green fluorescent protein (GFP) containing the CaMV35S promoter for subcellular localization; the 35S::GFP vector was used as the control. Both 35S::GFP and 35S::TaWRKY13-GFP were transferred into wheat mesophyll protoplasts by the PEG-mediated method [29]. A confocal laser scanning microscope (LSM700; CarlZeiss, Oberkochen, Germany) was used to observe the fluorescence after incubation in darkness at 22 ◦C for 18–20 h. All primers are listed in Supplementary Table S4.

#### *4.6. Tissue-Specific Expression of TaWRKY13 and GUS Staining*

Tissue-specific expression analysis of *TaWRKY13* was conducted by two methods. In the first one, the CDS of *TaWRKY13* was amplified as described in Section 4.5, then cloned into the pCAMBIA1302 vector; then, the infected inflorescence of *Arabidopsis* was determined by the Agrobacterium-mediated method [61], grown as described in Section 4.4, until T3 generation transgenic *Arabidopsis* seeds were obtained. The identification of homozygotes and selection of three transgenic lines were conducted by agarose gel electrophoresis and RT-qPCR, respectively [59]. The transgenic *Arabidopsis* seedlings with or without NaCl (150 mM) treatment were used for RT-qPCR as described in Section 4.4. In the second method, promoter fragments of *TaWRKY13* (*pTaWRKY13*) were obtained from Ensemble Plants (plants.ensembl.org/index.html); the *pTaWRKY13* was amplified by PCR with specific primers from wheat cDNA (Jinhe 9123), and the PCR product was fused into pLB vector (TIANGEN, Beijing, China) and sequenced. The fragment of *TaWRKY13* promoter was cloned to the pCAMBIA1305 vector harboring a β-glucuronidase (GUS) tag, obtaining the T3 generation transgenic *Arabidopsis* seeds as per the previous description. T3 generation transgenic *Arabidopsis* (*35S::pTaWRKY13*) seeds were surface-sterilized, sown on MS medium, vernalized, and grown in a chamber at 22 ◦C and 75% humidity under a 16 h light/8 h darkness photoperiod for one week as described in Section 4.4. Ten-day-old

transgenic *Arabidopsis* seedlings were submerged to 150 mM NaCl solution for 1 h. After salt treatment, the liquid was drained with filter paper, and the plant material was subjected to GUS staining solution supplemented with 5-bromo-4-chloro-3-indolylb-d-glucuronic acid (X-gluc) for 3 h; 70% (*vol*/*vol*) ethanol was used to remove the chlorophyll following the manufacturer's protocol (Real-Times, Beijing, China) [56]. GUS staining was observed by a Leica microscope (Wetzlar, Germany). Primers are listed in Supplementary Table S4.

#### *4.7. Cis-Acting Elements in the TaWRKY13 Promoter*

A 1.856 kb promoter fragment upstream of the ATG start codon of *TaWRKY13* was obtained from the Ensemble Plants website (http://plants.ensembl.org/index.html). Cis-acting elements that respond to various stresses in the promoter region were analyzed by PLACE (http://www.dna.affrc.go. jp/PLACE/) [29].

#### *4.8. Root Growth Assays of TaWRKY13 under Salt Stress in Arabidopsis*

T3 generation transgenic *Arabidopsis* lines were obtained as previously described (Section 4.6). Seeds of Col-0 and transgenic lines (*35S::TaWRKY13#1, #2, #3*) were surface-sterilized, sown on MS medium, vernalized, grown in a chamber at 22 ◦C and 75% humidity under a 16 h light/8 h darkness photoperiod for one week as described above (Section 4.4). Three ten-day-old *Arabidopsis* seedlings (Col-0 and transgenic lines) were transferred to MS medium containing different concentrations of NaCl (0, 100, 120 mM) for one week [40].

#### *4.9. Generation of Transgenic Rice and Stress Identification of TaWRKY13 to Salt Tolerance*

Plant expression vector *pCAMBIA1305-TaWRKY13* was constructed and transformed to competent cells of EHA105 as previously described [30]. Genetic transformation was conducted by Dr Chuan-Yin Wu and colleagues at the Institute of Crop Science, Chinese Academy of Agricultural Sciences using the agrobacterium-mediated method, and Nipponbare was used as the control [62]. The selection of three transgenic lines was made by agarose gel electrophoresis and RT-qPCR, respectively, as previously described (Figure S1). T3 generation transgenics (*35S::TaWRKY13#1*, *#2*, *#3*) and Nipponbare were used for further study. Rice seeds were treated with 0.7% hydrogen peroxide for one day for surface sterilization, breaking dormancy and promoting germination, then replaced with 0.7% hydrogen peroxide with water and germinated at 37 ◦C for 3 days (changing the water once a day). When seeds showed white buds, bare seeds were transplanted to 96-well plates (24 seeds of Nipponbare and *35S::TaWRKY13#1,#2,#3*, respectively) and placed in a growth chamber at 28 ◦C and a 16 h light/8 h darkness photoperiod and 70% relative humidity for the hydroponic culture. Seedings were cultured in water for one week, then cultured in water supplemented with Hoagland's hydroponic culture solution. The culture solution was replaced every 5 days, and the pH was set at 5.5 [63]. Three-leaf seedlings were treated. For salt treatment, the 96-well plates growing three-leaf stage seedlings were transferred to YS hydroponic culture solution and a YS hydroponic culture solution supplemented with 150 mM NaCl for several days until phenotypes appeared [62]. For each salt treatment, there were three independent replicates. Primers are listed in Supplementary Table S4.

#### *4.10. Measurements of Proline (Pro) and Malondialdehyde (MDA) Contents*

To better understand the function of *TaWRKY13* under salt treatment, proline and MDA contents were measured with Pro and MDA assay kits (Comin, Beijing, China) based on the manufacturer's protocols. Main root lengths and total surface areas of *Arabidopsis* and rice roots were measured by the WinRHIZO system (Hang xin, Guangzhou, China). Measurements were made on all three biological replicates; means ± SD and statistically significant differences were based on the ANOVA (\* *p* < 0.05, \*\* *p* < 0.01).

#### **5. Conclusions**

We identified the salt-induced WRKY gene *TaWRKY13* (ID: 31962353) from a wheat RNA-Seq database (https://phytozome.jgi.doe.gov/pz/portal.html) and real-time quantitative PCR (RT-qPCR). TaWRKY13 is a nuclear protein that was expressed in the roots, stems and leaves of transgenic *Arabidopsis. TaWRKY13* was responsive to PEG, salt, cold, and exogenous abscisic acid (ABA) treatment. The overexpression of *TaWRKY13* was responsive to salt stress in both *Arabidopsis* and rice, as evidenced by the promotion of root length and the total root surface area. These results provide a basis for further understanding the functions of *TaWRKY13* in wheat when subjected to salt stress.

**Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1422-0067/20/22/5712/s1. Supplementary Table S1: RNA-Seq data of salt treated wheat. Supplementary Table S2: CDS and amino acid sequences of 100 TaWRKYs used for genome structure and phylogenetic analysis. Supplementary Table S3: Amino acid sequences of 90 AtWRKYs, 128 OsWRKYs and 171 TaWRKY used for phylogenetic analysis. Supplementary Table S4: Primers used in this study. Supplementary Figure S1: Overexpression rice lines. A: Identification of homozygotes by agarose gel electrophoresis B: Selection of three transgenic lines by RT-qPCR.

**Author Contributions:** Y.-W.L. coordinated the project, wrote and reviewed the manuscript; H.Z. conceived and designed the experiments and edited the manuscript; S.Z. performed the experiments; W.-J.Z. performed validation and formal analysis; B.-H.L., J.-C.Z. and F.-S.D. conducted the bioinformatics and performed related experiments; Z.-S.X. provided analytical tools and analyzed the data; Supervision, Z.-Y.W.; Project administration, Z.-F.L. and F.Y.; Funding acquisition H.-B.W. All authors have read and approved the final manuscript.

**Funding:** This research was financially supported by the National Natural Science Foundation of China (31600216 and 31871611), the HAAFS Agriculture Science and Technology Innovation Project (2019-4-8-1), and the Natural Science Foundation of Hebei Province (C2017301066).

**Acknowledgments:** We thank Li-Na Ning for critically reading the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Genome-Wide Identification and Characterization of** *FBA* **Gene Family in Polyploid Crop** *Brassica napus*

**Wei Zhao 1,**†**, Hongfang Liu 1,**†**, Liang Zhang 1, Zhiyong Hu 1, Jun Liu 1, Wei Hua 1, Shouming Xu 2,\* and Jing Liu 1,\***


Received: 1 November 2019; Accepted: 14 November 2019; Published: 15 November 2019

**Abstract:** Fructose-1,6-bisphosphate aldolase (FBA) is a versatile metabolic enzyme involved in multiple important processes of glycolysis, gluconeogenesis, and Calvin cycle. Despite its significance in plant biology, the identity of this gene family in oil crops is lacking. Here, we performed genome-wide identification and characterization of *FBAs* in an allotetraploid species, oilseed rape *Brassica napus*. Twenty-two *BnaFBA* genes were identified and divided into two groups based on integrative analyses of functional domains, phylogenetic relationships, and gene structures. Twelve and ten *B. napus* FBAs (BnaFBAs) were predicted to be localized in the chloroplast and cytoplasm, respectively. Notably, synteny analysis revealed that *Brassica*-specific triplication contributed to the expansion of the *BnaFBA* gene family during the evolution of *B. napus*. Various *cis*-acting regulatory elements pertinent to abiotic and biotic stresses, as well as phytohormone responses, were detected. Intriguingly, each of the *BnaFBA* genes exhibited distinct sequence polymorphisms. Among them, six contained signatures of selection, likely having experienced breeding selection during adaptation and domestication. Importantly, *BnaFBAs* showed diverse expression patterns at different developmental stages and were preferentially highly expressed in photosynthetic tissues. Our data thus provided the foundation for further elucidating the functional roles of individual *BnaFBA* and also potential targets for engineering to improve photosynthetic productivity in *B. napus*.

**Keywords:** *Brassica napus*; aldolase; Calvin cycle; phytohormones; environmental stresses

#### **1. Introduction**

Fructose-1,6-bisphosphate aldolase (FBA, EC4.1.2.13 or aldolase) is an essential metabolism enzyme in the glycolytic pathway [1]. FBA catalyzes the reversible aldol cleavage of fructose-1,6-bisphosphate (FBP) into dihydroxyacetone phosphate (DHAP) and glyceraldehyde-3-phosphate (G3P), two important intermediates for oil biosynthesis [2]. DHAP could be further converted to diacylglycerol (DAG) by multiple enzymatic reactions, and DAG is a key substrate of diacylglycerol acyltransferase (DGAT) for the synthesis of triacylglycerols (TAGs). Meanwhile, G3P could be converted to malonyl CoA (coenzyme A) that is then used to produce fatty acids. Thus, FBA is not only one of the key regulatory enzymes in the glycolysis pathway but also may control the flux of carbohydrates and, therefore, play an important role in the oil yield of oilseeds [3,4].

FBAs could be broadly classified into two classes, namely class-I and class-II, based on their catalytic mechanisms and prevalence among species in evolution [5,6]. Class-I FBAs are usually tetrameric enzymes, forming a Schiff base with the substrate as intermediate, and utilize a lysine residue to generate a nucleophilic enamine from DHAP and are not inhibited by EDTA or affected by potassium ions. Class-II FBAs are found as homodimers and require divalent cations as cofactors to stabilize the DHAP enolate intermediate involved in the aldol condensation reaction and are inhibited by EDTA [7]. FBAs of class-I are found in some bacteria, animals, and plants, while class-II FBAs occur in most bacteria, yeast, and fungi [8–10]. However, FBAs of class-II are also found in wheat and green algae, such as *Euglena gracilis*, *Chlamydomonas mundane*, and *Chlamydomonas rheinhardii* [11–13].

In higher plants, one set of FBA isoenzymes is localized in the cytosolic (cFBA), and another one in the chloroplast/plastid (cpFBA) [14,15]. Both cFBA and cpFBA are encoded by separate nuclear genes that probably evolved from duplication of a common ancestral gene [16]. To date, different members of the *FBA* family genes have been reported in a variety of monocots and eudicots species. For example, in *Arabidopsis thaliana* (*A. thaliana*), eight *FBA* family genes (*AtFBA1–8*) were identified, including three chloroplast/plastid members (*AtFBA1–3*) and five cytosolic members (*AtFBA4–8*) [17]. In tomato, eight *FBA* genes, including five cFBAs and three cpFBAs, were characterized based on the phylogenetic tree, gene structures, and conserved motifs [18,19]. In addition, two homologous genes of cpFBAs were isolated from green leaves of *Nicotiana paniculata* [20]. One *SpFBA* gene was cloned from *Sesuvium portulacastrum* roots [21]. In wheat, 21 genes encoding *Thermus aquaticus* FBA (TaFBA) isoenzymes were identified and categorized into three subgroups of class-I *cpFBAs*, class-I *cFBAs*, and class-II *cpFBAs* [11]. In rice, two *OsFBAs* were reported to localize in the chloroplasts, while the other five *OsFBAs* in the cytoplasm [22]. One chloroplastic *FBA* and one cytosolic *FBA* were detected in the leaves of maize seedlings [23]. Four developmentally up-regulated *FBAs* (*CoFBA1–4*) genes were identified in tea oil tree (*Camellia oleifera*) seeds [24]. However, there is no report on the identification of *FBA* family genes so far in oilseed rape *Brassica napus*.

Recent studies suggest that the *FBA* genes are playing important roles in diverse significant physiological and biochemical processes in plants. The cpFBA is an essential enzyme in the Calvin cycle, in which its activity generates metabolites for starch biosynthesis [25]. For instance, the loss of the *AtFBA6* function resulted in a lower germination rate after abscisic acid (ABA) treatment in *A. thaliana* [17]. The expression levels of *CoFBA1* and *CoFBA3* genes were highly correlated with the amount of tea oil in the seeds of tea oil tree [24]. Reduction of plastid FBA activity inhibits photosynthesis and alters carbon partitioning in potato, whereas increased FBA activity in plastids could promote CO2 fixation and enhance the growth and photosynthesis in tobacco [26,27]. Inhibition of chloroplastic FBA affects the development of fruit size in tomato [28]. Gibberellin (GA) treatment could increase the levels of cytoplasmic *FBA* in all regions of the roots, resulting in the stimulation of the root growth mediating the energy production in rice [29]. Thus, *FBA* family genes appear to be also associated with the response to phytohormones. Besides, *FBAs* also participate in response to various environmental stimuli in plants, such as salinity, drought, anoxygenic stress, abnormal temperature, light acclimation, and *Rhizoctonia solani Kuhn* infection [21,30–32].

*Brassica napus* L. (*B. napus*), a relatively recent allotetraploid formed from hybridization between *Brassica rapa* (*B. rapa*) and *Brassica oleracea* (*B. oleracea*), is the second-largest source of vegetable oil crop and cultivated around the world [33,34]. *B. napus* has adapted to diverse climate zones and latitudes by forming three main ecotype groups, namely winter, semi-winter, and spring types [35,36]. Although *FBA* family genes have been studied in several plant species, little is known regarding this gene family in oil crops. Here, we systematically identified *FBA* family genes in *B. napus* and profiled their gene structures, chromosome locations, conserved motifs, *cis*-acting elements in the promoter regions, phylogenetic classifications, and sequence polymorphisms. We further analyzed their expression patterns in different developmental tissues and in response to stresses. Our results provided the foundation for further elucidating the functional roles of *FBA* family genes and potential targets for engineering to improve photosynthetic capacity in *B. napus*.

#### **2. Results**

#### *2.1. Identification, Properties, and Genomic Distribution of BnaFBA Genes*

To identify each member of the *FBA* gene family in *B. napus*, the glycolytic domain (PF00274) and fructose-bisphosphate aldolase class-II domain (PF01116) from the Pfam database (http://pfam.xfam. org/) were employed as queries to search against *B. napus* var. Darmor-*bzh* protein dataset. The peptides of putative *B. napus* FBAs (BnaFBAs) with the best hit of *A. thaliana* FBAs (AtFBAs) and TaFBAs were further used to predict functional domains by using the Pfam and SMART databases to confirm the presence of the fructose-bisphosphate aldolase domains. As indicated in Table 1, we identified 22 *BnaFBA* genes in *B. napus*. Meanwhile, we found 14 *BraFBAs* and fourteen *BolFBAs* in *B. rapa* var. Z1 and *B. oleracea* var. HDEM, respectively (Table S1).

The transcript length of *BnaFBA* genes varied from 951 bp to 4149 bp. All identified *BnaFBA* genes encoded proteins ranging from 317 (*BnaFBA2e*) to 1382 (*BnaFBA9b*) amino acids (aa), molecular weight (MW) from 34.18 (*BnaFBA2e*) to 148 kDa (*BnaFBA9b*), isoelectric point (pI) from 5.7 (*BnaFBA2a*) to 8.47 (*BnaFBA3a*), and grand average of hydropathy (GRAVY) from −0.263 (*BnaFBA3a*) to 0.076 (*BnaFBA9b*) (Table 1). Twelve BnaFBAs (BnaFBA1a/b/c, BnaFBA2a/b/c/d/e/f, BnaFBA3a/b, and BnaFBA9a) were predicted to be localized in chloroplast, with the other 10 BnaFBAs (BnaFBA5a/b, BnaFBA6, BnaFBA8a/b/c/d/e/f, and BnaFBA9a) being in the cytoplasm (Table 1). Similarly, in *B. rapa* and *B. oleracea*, seven *B. Oleracea* FBAs (BolFBAs) and seven *B. rapa* FBAs (BraFBAs) were predicted to be chloroplast-localized proteins, six BolFBAs and six BraFBAs were cytoplasm-localized ones, whereas BolFBA9 and BraFBA6a were predicted to be localized in plasma membrane and nucleus, respectively (Table S1).

Based on the functional domains contained, the 22 *BnaFBAs* could be divided into two classes—class-I and class-II. All the BnaFBAs of class-I harbored one glycolytic domain (PF00274) across the proteins, and BnaFBA9a and BnaFBA9b of class-II had one fructose-bisphosphate aldolase class-II domain (PF01116). Except for the glycolytic domain (PF00274), some members of class-I FBA, such as BnaFBA5b, as the longest FBA of class-I with 870 aa, also harbored PRK (phosphoribulokinase kinase) (PF00485) and UPRTase (uracil phosphoribosyltransferase) (PF14681) domains in the N terminal. Likewise, in *B. oleracea*, BolFBA5 contained three functional domains of PF00274, PF00485, and PF14681, and BolFBA6b had two tandem copies of glycolytic domains (PF00274). Interestingly, BraFBA6a contained both glycolytic domain (PF00274) and RNA polymerase II-binding domain (PF04818), suggesting that it might be involved in RNA processing in the nucleus (Table S1).

To attain a general view of the distribution of *BnaFBA* genes on the genome of *B. napus*, the 22 *BnaFBAs* were mapped on the corresponding chromosomes according to their physical positions. Of the 22 *BnaFBA* genes, 19 were evenly distributed on the 15 *B. napus* chromosomes, while the three other *BnaFBAs* were assigned to the random chromosomes (two on the An chromosomes and one on the Cn chromosome). The number of *BnaFBAs* per chromosome ranged from one to two. Among them, eleven *BnaFBA* genes were distributed over the seven A subgenome chromosomes of *B. napus*, including A01, A02, A04, A06, A07, A08, and A09, as well as Ann\_random. Equally, the other 11 *BnaFBAs* were distributed over eight C subgenome chromosomes of *B. napus*, including C01, C02, C03, C04, C05, C06, C07, and C08, as well as Cnn\_random. Most of the *BnaFBAs* were not localized in the terminal regions of the chromosomes, where the gene density was relatively high in *B. napus* (Figure 1).



**Table** 

*Int. J. Mol. Sci.* **2019**, *20*, 5749



aldolase class-I (PF00274); kinase family (PF00485); AP2, apetala 2 (PF00847).

F\_bP\_aldolase,

fructose-bisphosphate

 aldolase class-II (PF01116); Uracil

phosphoribosyltransferase,

 UPRTase (PF14681); PRK,

phosphoribulokinase

 / uridine

**Figure 1.** Genomic distributions of *BnaFBA* genes on *B. napus* chromosomes. The *BnaFBAs* were plotted based on the location of genes, length of chromosomes, and positions of centromeres. Heatmap of each chromosome indicated the gene density by the frequency per 1 Mb.

#### *2.2. Phylogenetic and Structure Analysis of BnaFBAs*

To explore the molecular evolution of the *FBA* gene family in *B. napus*, a total of 79 *FBA* genes from *B. napus*, *B. rapa*, *B. oleracea*, *A. thaliana,* and wheat, were used to construct an unrooted phylogenetic tree. According to the phylogenetic relationships of these *FBA* genes, they could be divided into two independent classes, consistent with the classification by functional domains they contained (Figure 2A). Furthermore, the *FBA* genes of class-I could be further classified into four subclasses, namely class-Ia, class-Ib, class-Ic, and class-Id. Therefore, in *B. napus*, there were seven *BnaFBA* genes in the class-Ia group, two in class-Ib, nine in class-Ic, two in class-Id, and two in class-II. Consistent with its polyploid origin, except for the FBA4 gene, the genome of *B. napus* maintained homologs of each *FBA* gene derived from the diploid parents, *B. rapa* and *B. oleracea* (Figure 2A).

Based on the gene information of the genome available in the GENOSCOPE database, we performed gene structure analysis by comparing the coding sequence (CDS) of *BnaFBAs, BraFBAs,* and *BolFBAs*. In *B. napus*, A subgenome homologs and C subgenome homologs of *FBA* genes that were in the adjacent branches of the phylogenetic tree exhibited the same gene structure (Figure 2B). Within each class of *FBA* genes, the features of exons, such as order, length, and number, were largely conserved except for *FBA6* genes (Figure 2B). Besides, the organization of the introns of *BnaFBA* genes was highly variable. The length of introns varied extensively in different members of the *BnaFBA* gene family, ranging from 30 to 4152 bp, with the number ranging from 2 to 41. Compared to the class-I *BnaFBA* genes, the class-II genes were much longer and had much more exons and introns (Figure 2B).

**Figure 2.** Phylogenetic relationships and gene structure of *BnaFBA* genes. (**A**) Phylogenetic relationships of *BnaFBAs*, *BraFBAs*, *BolFBAs*, *AtFBAs,* and *TaFBAs*. The unrooted tree was generated using MEGA7.1 software by the neighbor-joining method. The numbers next to the branch show the 1000 bootstrap replicates expressed in percentage. The phylogenetic classes of *FBA* genes were marked by corresponding colors that are shown in the color legend at the bottom right. (**B**) The schematic diagrams of the exon-intron organization of *FBA* genes in *B. napus*, *B. rapa,* and *B. oleracea*. The phylogenetic tree of the *FBA* genes is placed at the left, and the color squares represent phylogenetic classes. The green boxes and lines indicate CDS (coding sequence) and introns, respectively. The length of the scale is at the bottom.

To further explore the higher-order structure of the BnaFBA proteins, the three-dimensional (3D) structural models for BnaFBA1a, BnaFBA8a, and BnaFBA9a were generated using SWISS-MODEL. Based on the experimental structure of class-I rabbit muscle aldolase, the SWISS-MODEL analysis results revealed that BnaFBA1a and BnaFBA8a of the class-I group could form tetramers structures, and interfaces A and B were observed in the BnaFBA1a and BnaFBA8a

tetramers. Different from BnaFBA1a and BnaFBA8a, BnaFBA9a belonged to class-II group and form dimers based on its relatively high similarity to class-II aldolases of *Thermus aquaticus* (Figure 3A–C). Furthermore, the catalytic residues of D74-K147-K186-R188-E226-E228-K269-S301-R331 and D30-K103-K142-R144-E183-E185-K225-S266-R298 were observed in BnaFBA1a and BnaFBA8a, respectively. Additionally, similar to the class-II FBAs of *Thermus aquaticus*, BnaFBA9a contained active sites of H1181-E1232-H1277-H1309 that also serve as the divalent metal cation binding sites (Figure 3E,F). Multiple sequence alignment results indicated that class-I BnaFBAs, BraFBAs, BolFBAs, and AtFBAs had high conserved catalytic residues (Figure S1). The active sites among class-II BnaFBAs, BraFBAs, and BolFBAs were the same as FBA isozymes in the *Thermus aquaticus* (Figure S2).

**Figure 3.** Predicted three-dimensional model of representative *Brassica napus* Fructose-1,6-bisphosphate aldolase (BnaFBA) proteins. (**A**) Tetrameric BnaFBA1a. (**B**) Tetrameric BnaFBA8a. (**C**) Dimeric BnaFBA9a. (**D**) Active site residues of BnaFBA1a. (**E**) Active site residues of BnaFBA8a. (**F**) Active site residues of BnaFBA9a. The sites of the Asp167/124 substitutions are indicated on interface B, and the filled circle represents the divalent metal cation of BnaFBA9a.

#### *2.3. Synteny and Gene Duplication of BnaFBA Genes*

*A. thaliana* is the most prominent model system for plant molecular biology and genetics research, whose structural genes have been identified and functionally characterized. Thus, we traced the orthologous gene pairs between *A. thaliana* and *Brassica* species to investigate the evolutionary history by syntenic gene analysis. A total of 29 orthologous gene pairs were identified between *A. thaliana* and *B. napus*, 20 between *A. thaliana* and *B. rapa*, and 19 between *A. thaliana* and *B. oleracea*. In addition, we also obtained 29, 15, and 15 paralogous gene pairs within *B. napus*, *B. rapa,* and *B. oleracea*, respectively (Figure 4A). The previous study revealed that crucifer (*Brassicaceae*) lineage experienced two whole-genome duplications (named α and β) and one triplication event (γ) shared by most dicots [37]. Moreover, *Brassica* species experienced an extra whole-genome triplication (WGT) event compared with *A. thaliana* [38]. As WGT of the *Brassica* ancestor, *FBA* genes in the *A. thaliana* genome might have triplicated orthologous copies in *B. rapa* and *B. oleracea*. Consequently, some *FBA* genes (i.e., *FBA2* and *FBA8*) existed triple the number of those in *A. thaliana*, while the other genes (i.e., *FBA1*, *FBA3*, *FBA5,* and *FBA6*) had double or equal the number (Figure 4B). The *FBA* genes of *B. napus* were inherited from its diploid ancestors; thus, most of the *BnaFBA* genes were double the number of those in *B. rapa* and *B. oleracea* (i.e., *FBA2*, *FBA3*, *FBA5*, *FBA8,* and *FBA9*). However, both *FBA1* and *FBA6* genes lost one copy in the C subgenome of *B. napus*, while FBA4 lost all the copies in *B. napus* compared to its two ancestors (Figure 4B). Gene duplication analysis with syntenic and phylogenomic approaches

using tool DupGen\_finder in *B. napus* showed that all *BnaFBA* genes had corresponding duplicate genes. In *B. napus*, a total of 42 gene pairs were derived from whole-genome duplication (WGD), with one gene pair of *BnaFBA2e*-*BnaFBA2f* being derived from transposed duplication (Table S2).

**Figure 4.** Collinear correlations and copy number variation of the *FBA* family genes in *B. napus*, *B. rapa*, *B. oleracea,* and *A. thaliana*. (**A**) Collinear correlations of FBA genes in the *B. napus*, *B. rapa*, *B. oleracea,* and *A. thaliana* genomes. The *B. napus*, *B. rapa*, *B. oleracea,* and *A. thaliana* chromosomes were colored by corresponding colors that are shown in the color legend at the bottom right. The blue lines represent the collinear correlations of FBA genes within *B. napus,* and the orange lines are for the collinear correlations of *FBA* genes between *B. napus* and the other species, with the grey lines representing the collinear correlations of *FBA* genes among *B. rapa*, *B. oleracea,* and *A. thaliana*. The figure was created using CIRCOS software. (**B**) Copy number variation of the *FBA* family genes in *B. napus*, *B. rapa*, *B. oleracea,* and *A. thaliana*. The phylogenetic tree of *FBA* genes is shown on the left, with the species tree shown at the top. The *Brassica*-specific triplication was indicated on the branches of the trees according to the Plant Genome Duplication Database. The numbers are the copy numbers of each FBA gene in *A. thaliana*, *B. napus*, *B. rapa,* and *B. oleracea*.

#### *2.4. Cis-Acting Elements in the Putative Promoter Regions of BnaFBA Genes*

As important molecular switches, *cis*-acting elements in the promoter region could provide useful information to understand the function and regulation of the genes during plant development and responses to various stresses. The 1.5 kb genomic DNA sequences identified from upstream of the *BnaFBA* genes were extracted and deployed in *cis*-acting regulatory elements analysis with PlantCARE. Various *cis*-acting regulatory elements existed within the promoter regions of *BnaFBA* genes (Table 2 and Table S3). For example, *BnaFBA* genes contained multiple phytohormone responsive elements, such as ABRE (abscisic acid-responsive element), AuxRE (auxin-responsive element), ERE (ethylene-responsive element), GARE (gibberellin-responsive element), MeJARE (MeJA-responsive element), and SARE (salicylic acid-responsive element). This suggested that the expression of *BnaFBAs* might be induced by different phytohormones. Besides, the *cis*-acting regulatory elements involved in stress-responsive elements, such as ARE (anoxic-responsive element), DRE (damage-responsive element), DIRE (drought-responsive element), DSRE (drought- and stress-responsive element), HSRE (heat stress-responsive element), LTRE (low-temperature-responsive element), and WRE (wound-responsive element), were also found within the promoters of *BnaFBA* genes, suggesting that expression levels of *BnaFBAs* might be also regulated by various environmental factors like drought, heat, and low-temperature. Globally, three phytohormone-related elements (i.e., ABRE, ERE, and MeJARE) and two stress-responsive elements (ARE and HSRE) were detected with high frequency in the promoter regions of *BnaFBA* genes. Notably, each *BnaFBA* had multi-copy LREs (light-responsive elements) ranging from 8 to 26, implying that *BnaFBAs* might play roles in light responses.



Heat element (SARE), and

stress-responsive

 element (HSRE), Wound-responsive

 element (WRE).

Light-responsive

 element (LRE),

Low-temperature-responsive

 element (LTRE),

MeJA-responsive

 element (MeJARE), Salicylic

acid-responsive

#### *2.5. Natural Variations of BnaFBA Family Genes in B. Napus*

Critical sequence polymorphism across the gene and its flanking regions may reflect the evolutionary process of species adapting to different environments. The public resequencing datasets of 991 *B. napus* germplasm accessions covering three main ecotype groups, namely winter, semi-winter, and spring types, were collected for variation analysis of the *BnaFBA* family genes [39]. The polymorphism sites of CDS in *BnaFBA* family genes ranged from two (*BnaFBA2c*) to 130 (*BnaFBA6*) (Table 3). The π and θ<sup>w</sup> of nucleotide diversity parameters extended from 0.00057 (*BnaFBA3b*) to 0.04158 (*BnaFBA6*) and from 0.00054 (*BnaFBA3b*) to 0.02255 (*BnaFBA6*), respectively. Some members of *BnaFBA* gene family were conserved, such as *BnaFBA3b* (θ<sup>w</sup> = 0.00054), *BnaFBA2c* (θ<sup>w</sup> = 0.00092), and *BnaFBA2d* (θ<sup>w</sup> = 0.00098), while others had high polymorphism, such as *BnaFBA9b* (θ<sup>w</sup> = 0.00511), *BnaFBA5a* (θ<sup>w</sup> = 0.00535), *BnaFBA8b* (θ<sup>w</sup> = 0.0056), *BnaFBA8e* (θ<sup>w</sup> = 0.00598), *BnaFBA3a* (θw = 0.00665), *BnaFBA8a* (θ<sup>w</sup> = 0.00706), *BnaFBA1c* (θ<sup>w</sup> = 0.00762), *BnaFBA8d* (θ<sup>w</sup> = 0.00807), and *BnaFBA6* (θ<sup>w</sup> = 0.0225). Besides, the *BnaFBA6* variation ratio reached a peak of 12.60 among the 22 *BnaFBA* genes, whereas *BnaFBA2c* had the lowest variation ratio of 0.17, with only two polymorphic sites (Table 3). Generally, due to the longer length of genes, *BnaFBA9a* and *BnaFBA9b* of class-II had much more variations than *FBA* genes of class-I; however, the variation ratio and nucleotide diversity of the coding regions of *BnaFBA9a* and *BnaFBA9b* showed no difference with the *BnaFBA* genes of class-I except for *BnaFBA6* (Table 3). In addition to CDS regions, a total of 1029 and 814 variations were also identified in the upstream/downstream 1.5 kb regions and intronic regions of *BnaFBA* genes, respectively (Table S4). Notably, *BnaFBA8a*, *BnaFBA8b,* and *BnaFBA6* each harbored one stop-gain mutation that led to premature stop codons, which indicated that these genes exhibited loss-of-function (Figure 5, Table S5). Only a few of the Indels had been detected in *BnaFBAs*. For example, *BnaFBA1c* had three non-frameshift Indels, and *BnaFBA3b* contained only one non-frameshift Indel, with *BnaFBA6* harboring two frameshift deletion/insertion variations (Figure 5). To study the population selection pressure, we conducted neutral testing using Tajima's D. Tajima's D values of all the *BnaFBAs* were positive, with significant Tajima's D values (*p* < 0.01) being observed in *BnaFBA1a*, *BnaFBA1e*, *BnaFBA2d,* and *BnaFBA2e* (Table 3). Particularly, the Tajima's D values of *BnaFBA1b* and *BnaFBA8a* reached extremely significant levels (*p* < 0.001 and *p* < 0.0001). Based on the variation of *BnaFBA* genes, we performed principal component analysis (PCA), in which no significant difference in the polymorphism of *BnaFBA* genes was seen between different ecotype groups of *B. napus* (Figure S3).


**Table 3.** Summary of Polymorphic Sites of the 22 *BnaFBA* Genes.

<sup>1</sup> polymorphic includes the SNPs (single nucleotide polymorphisms) and Indels (insertions and deletions). <sup>2</sup> <sup>π</sup>, average nucleotide differences per site between the two sequences. <sup>3</sup> <sup>θ</sup>w, Watterson estimator. <sup>4</sup> Tajima's

D, test for neutral selection (\*: significant at *p* < 0.01; \*\*: significant at *p* < 0.001; \*\*\*: significant at *p* < 0.0001). CDS, coding sequence.

*Int. J. Mol. Sci.* **2019**, *20*, 5749

**Figure 5.** The variation distribution of *BnaFBA* genes. The heterozygous SNPs and homozygous SNPs were marked as red and blue lines, respectively. The red peak on the top of each *FBA* gene represents nonsynonymous variation. The green and purple triangles signify frameshift deletion/insertion and stop-gain variations, respectively. The length of the scale is placed at the bottom.

#### *2.6. Expression Patterns of BnaFBA Genes*

The members of the *FBA* gene family are playing important roles in diverse significant physiological and biochemical processes in plants [21,25]. Besides, FBAs also participate in response to various environmental stimuli [30–32]. To identify the function of *BnaFBA* genes under various conditions, the expression levels of *BnaFBAs* were evaluated during growth and development, as well as in response to biotic stresses and phytohormones in *B. napus*.

To explore the expression patterns of the *BnaFBAs* during growth in *B. napus*, we analyzed their expression levels in twelve various tissues (leaf, root, stem, cauline leaf, pistil, stamen, petal, flower bud, axillary bud, silique wall, embryo, and seed) at different developmental stages. As a result, at the trefoil stage and the flowering stage, *BnaFBA* genes showed opposite expression patterns between leaves and roots. For example, *BnaFBA1a*/*b*/*c*, *BnaFBA2a*/*b*/*c*/*d*/*f,* and *BnaFBA5a*/*b* displayed relatively high expression level in leaves than that in roots. On the contrary, *BnaFBA3a*/*b* and *BnaFBA8a*/*b*/*c*/*d*/*e*/*f* exhibited relatively high expression levels in roots than that in leaves. At the flowering stage, the expression levels of *BnaFBA1a*/*b*/*c* and *BnaFBA2a*/*b*/*c*/*d*/*f* were higher in pistil than that in stamen and petal tissues. At the pod stage, *BnaFBA1a*/*b*/*c* and *BnaFBA2a*/*b*/*c*/*d*/*f* were highly expressed in the silique wall, with their expression levels being increased during the development of pod. *BnaFBA3a*, *BnaFBA3b*, *BnaFBA8c,* and *BnaFBA8d* were highly and constantly expressed in all tissues, particularly showing the highest expression levels in the embryo at 25 days after pollination (DAP). In addition, the expression of *BnaFBA2a*/*b*/*c*/*d*/*e*/*f* increased during the development of seeds and reached high levels at 4 weeks after pollination (WAP), then decreased in the following mature stages of seed development (Figure 6A). *BnaFBA9a* and *BnaFBA9b* of class-II showed constitutive expression at relatively low levels in all the tissues examined. *BnaFBA6* of class-I exhibited unexpressed in most tissues. Taken together, these results suggested that *BnaFBA* genes displayed diverse spatiotemporal expression patterns during the growth and development of different tissues in *B. napus*.

**Figure 6.** Expression analysis of the 22 *BnaFBA* genes in *B. napus*. (**A**) Expression patterns of the *BnaFBA* genes in various tissues at different developmental stages. The color bar represents log2 expression levels (FPKM, fragments per kilobase of exon per million fragments mapped) of each gene. (**B**) Expression patterns of the *BnaFBA* genes under drought stress, *Sclerotinia sclerotiorum* infection, and strigolactones treatments in *B. napus*. The color scale of heatmap indicates expression values, with blue-white and red representing low and high levels of transcript abundance, respectively. The cluster tree of *BnaFBA* genes, based on the expression level, is shown on the left.

To identify the physiological functions of *BnaFBAs* in response to various environmental stresses and phytohormones, as an illustration, we investigated the expression of *BnaFBA* genes in leaf and root tissues under drought stress, *Sclerotinia sclerotiorum* infection, and strigolactones (SLs) treatments. After 24 h of *Sclerotinia sclerotiorum* infection, *BnaFBA1a*/*b*/*c*, *BnaFBA2a*/*b*/*c*/*d*/*e*/*f*, *BnaFBA5a*/*b,* and *BnaFBA9b* were down-regulated, while *BnaFBA3a*/*b* and *BnaFBA8a*/*b*/*c*/*d*/*e*/*f* were up-regulated compared to the control in the leaves of the two *B. napus* cultivars—wester and ZY821 (Figure 6B). Under drought stress, the expression levels of *BnaFBA1a*/*c*, *BnaFBA2a*/*b*/*c*/*d*/*e*/*f,* and *BnaFBA5a*/*b* decreased in the leaves. Similarly, the expression levels of *BnaFBA1b*, *BnaFBA2b*/*d*/*e*/*f* also decreased in roots compared to the control (Figure 6B). We further examined the expression of *BnaFBAs* under exogenous SLs treatments in *B. napus*. Notably, the members of the *BnaFBA2* group showed diverse expression patterns after SLs treatments in *B. napus*. At 12 h after SLs treatments, the expression levels of *BnaFBA2a*/*b*/*c*/*f* were significantly down-regulated, while the expression of *BnaFBA2d* was slightly decreased relative to the control. At 1 day after SLs treatment, *BnaFBA2a*/*b*/*c*/*d* expression levels were significantly up-regulated compared to the control, whereas *BnaFBA2e*/*f* showed no difference. At 4 days after SLs treatment, all members of the *BnaFBA2* group exhibited up-regulated expression compared to the control. Then, at 7 days after SLs treatments, the expression levels of *BnaFBA2a*/*b*/*c*/*d* turned to be down-regulated, while *BnaFBA2e*/*f* expression levels were still up-regulated compared to the control (Figure 6B).

#### *2.7. qRT-PCR Analysis of Selected BnaFBA Genes under Abiotic Stresses*

To further validate the functional roles of *BnaFBAs* under abiotic stresses, four *BnaFBA* genes from different clusters were selected for the examination of their expression levels under three stress conditions using quantitative real-time PCR (qRT-PCR) in *B. napus*. These genes included *BnaFBA2a*, *BnaFBA5b,* and *BnaFBA8a* of class-I, and *BnaFBA9a* of class-II. The qRT-PCR analysis was carried out using rapeseed plants exposed to salt, heat, and drought stresses. At 3 days after NaCl treatment, the expression level of *BnaFBA8a* was significantly up-regulated by approximately 11-fold compared with the control, while *BnaFBA2a*, *BnaFBA5b,* and *BnaFBA9a* were significantly down-regulated, particularly, the expression levels of *BnaFBA5b* and *BnaFBA9a* were largely reduced (Figure 7). In addition, at three days after heat and drought stress treatments, the expression level of *BnaFBA5b* was significantly reduced nearly by half, whereas the expression of *BnaFBA8a* was slightly increased compared to the control (Figure 7).

**Figure 7.** The real-time PCR analysis of the selected four representative *BnaFBA* genes responded to salt, heat, and drought stresses. The dotted lines represent the equivalent levels of expression. Statistically significant differences (Student's *t*-test) are indicated as followed: \*\* *p* < 0.01.

#### *2.8. Co-Regulatory Networks of BnaFBA Genes*

Based on the publically available RNA-seq datasets collected from different tissues, biotic and abiotic stresses, and phytohormone treatments in *B. napus*, we calculated the Pearson correlation coefficients (PCCs) of the expression levels of *BnaFBA* genes and constructed co-regulatory networks. Positive correlations were observed between members of class-Ia and class-Ib, such as *BnaFBA3a*/*b*, *BnaFBA8a*/*b*/*c*/*d*/*e*/*f,* and *BnaFBA6*. Likewise, the *BnaFBA* genes of class-Ic and class-Id also showed positive correlations between each other. Particularly, members of class-Id, including *BnaFBA1a*/*b*/*c* and *BnaFBA2a*/*b*/*c*/*d*, exhibited strong positive correlations. *BnaFBA2b* and *BnaFBA2f* showed significant negative correlation with *BnaFBA3a*/*b* and *BnaFBA8a*/*b*/*e*/*f*. However, they displayed a significant positive correlation with other *BnaFBA* genes of class-Ic and class-Id (Figure 8A). All the significant PCCs (*p*-value ≤ 0.05 and |PCC| > 0.5) of *BnaFBAs* were extracted and used to construct co-regulatory networks delineated by Cytoscape (version 3.1, Seattle, WA, USA) (Figure 8B). The co-regulatory networks of *BnaFBAs* were constituted with 22 nodes and 83 edges. There were four *BnaFBA* gene pairs showing negative correlations (*p*-value ≤ 0.05 and PCC < −0.5), including *BnaFBA2f* and *BnaFBA8a*, *BnaFBA2b* and *BnaFBA8a*, *BnaFBA2f* and *BnaFBA8b*, and *BnaFBA2d* and *BnaFBA8a*. Besides, 79 *BnaFBA* gene pairs showed positive correlations, of which 38 pairs exhibited strong positive correlations (*p*-value ≤ 0.05 and PCC > 0.8). Notably, three gene clusters of *BnaFBAs* were observed, including cluster 1 consisting of *BnaFBA3a*/*b*, *BnaFBA6,* and *BnaFBA8a*/*b*/*c*/*d*/*e*/*f*, cluster 2 consisting of *BnaFBA1a*/*b*/*c*, *BnaFBA2a*/*b*/*c*/*d*/*e*/*f,* and *BnaFBA5a*/*b,* and cluster 3 consisting of *BnaFBA9a*/*b*. Members within each cluster showed strong positive correlations between the expression levels, whereas only a few significant negative correlations existed between cluster 1 and cluster 2. Moreover, cluster 3 was independent among the three clusters (Figure 8B).

**Figure 8.** Correlations and co-regulatory networks of *BnaFBA* genes. (**A**) Correlation analysis of *BnaFBA* genes was performed based on the Pearson correlation coefficients (PCCs) of *BnaFBA* gene pairs. Correlations are indicated by the size and color of circles. The left bar represents the correlation values of PCCs. The class information for *BnaFBAs* is indicated by the squares with different colors at the right. Black star signifies the correlation with *p*-value ≤ 0.05. (**B**) The co-regulatory network of *BnaFBA* genes was generated on the basis of the significant PCCs of gene pairs (*p*-value ≤ 0.05). The distinct correlation levels of gene pairs are marked by edge lines with different colors shown at the bottom. The co-regulatory network was illustrated by Cytoscape (version 3.1, Seattle, WA, USA).

#### **3. Discussion**

The Calvin cycle is the initial pathway of photosynthetic carbon fixation, which plays a requisite role in the growth and development of plants [40]. Extensive efforts for seeking a breakthrough in the regulation of this cycle have been made to substantially enhance photosynthetic CO2 capacity and plant productivity. The carboxylation capacity of ribulose bisphosphate carboxylase oxygenase (Rubisco) and the regenerative capacity of ribulose diphosphate (RuBP) are uncovered to be essential for maintaining high photosynthetic CO2 fixation capacity. Previous studies revealed that three non-regulated enzymes, including fructose−1,6-bisphosphate aldolase (FBA or aldolase), sedoheptulose 1,7-bisphosphatase (SBPase), and transketolase (TK), had significantly higher control coefficient on photosynthesis than the other Calvin cycle enzymes, which indicated that they could limit photosynthetic rate and exert significant control over photosynthetic carbon flux other than Rubisco [41]. Particularly, FBA could catalyze the reversible conversion of DHAP and FBP and also catalyze DHAP and erythrose 4-phosphate (E4P) to sedoheptulose 1,7-bisphosphate (SBP). Thus, FBA is not only one of the key regulatory enzymes in the glycolysis pathway but also may lie in a vital strategic position to determine the carbon partitioning in the Calvin cycle, which made FBA probably an important candidate target of engineering to boost photosynthetic carbon CO2 fixation capacity [42].

*B. napus*is the second-largest source of vegetable oil crops and is cultivated around the world [33,34]. *B. napus* seeds contain oil, carbohydrates, and proteins as major storage reserves. In most seeds, glycolysis in plastids supplies carbon for fatty acid (FA) synthesis [43]. FBA, as a key enzyme in the glycolytic metabolism, provides precursors for amino acid and fatty acid synthesis [26]. To our best knowledge, there is not any report in the literature describing *FBA* family genes and their function in *B. napus* so far. In this study, we performed a comprehensive identification and characterization of the *FBA* gene family in *B. napus*. Compared to the number of *FBA* genes identified in other plant species, such as eight in *A. thaliana* [17], seven in rice [22], eight in tomato [19], and 21 in wheat [11], *B. napus* had more *FBA* genes, including 22 *BnaFBA* genes distributed in fifteen *B. napus* chromosomes and two random chromosomes. The *BnaFBAs* can be classified into two classes based on the functional domains contained, class-I and class-II. Enzyme kinetics analysis of aldolase 1 (class-I FBAs) and aldolase 2 (class-II FBAs) in *Escherichia coli* revealed that aldolase 1 and 2 hydrolyze fructose 1,6-bisphosphate by the aldol cleavage reaction [12]. Class-I BnaFBA proteins could form tetramer structures with high conserved catalytic residues of D-K-K-R-E-E-K-S-R that were homologous to those of rabbit FBA isozymes [44]. Class-II BnaFBA could form dimers with the active sites of H-E-H-H that corresponded to those of FBA isozymes in the *Thermus aquaticus* [45,46]. Eleven *BnaFBAs* of class-I and *BnaFBA9a* of class-II were predicted to be localized in chloroplast, while nine class-I *BnaFBAs* and *BnaFBA9b* of class-II were in the cytoplasm. This is consistent with the subcellular localization of their homologs in *A. thaliana* [17] and wheat [11]. Although AT1G18270 in *Arabidopsis* is the homolog of BnaFBA9a and BnaFBA9b, and it is annotated as ketose-bisphosphate aldolase class-II family protein; however, it contains fructose-bisphosphate aldolase class-II (pfam01116) domain and zinc-binding site and is further assigned as fructose-bisphosphate aldolase (NCBI Reference Sequence: NP\_001117303.1). More accurately, AT1G18270 is fructose-bisphosphate aldolase class-II family protein. This inaccuracy might result from the homologous protein annotation in bacteria, such as *Variovorax paradoxus B4*. Now, the ketose-bisphosphate aldolase, class-II protein, was further assigned as fructose-1,6-bisphosphate aldolase, class-II in *Variovorax paradoxus B4* (GenBank: AGU50177.1). Therefore, *BnaFBA9a* and *BnaFBA9b* were classified as class-II *FBA* genes in *Brassica napus*.

In contrast to the model plant *A. thaliana*, except for the paleo-polyploidization of alpha (α)-beta (β)-gamma (γ) WGD events, the *Brassica* species, such as *B. rapa* and *B. oleracea*, experienced an extra whole-genome triplication (WGT) event at approximately 15.9 Mya [47–49]. *B. napus* is a relatively new species of *Brassica* genus with a short history of post-Neolithic speciation (~7500 years) and domestication (~700 years) and a recent allotetraploid formed from hybridization between *B. rapa* and *B. oleracea* [35]. Therefore, *B. napus* has experienced five genome duplication events (3 × 2 × 2 × 3 × 2) at times during the evolution process. Gene duplication that resulted from whole-genome duplication (WGD), tandem duplication (TD), proximal duplication, transposed duplication, or dispersed duplication is one of the primary driving forces to the evolution of morphological and physiological diversity in plants [50]. Our results showed that *Brassica* species had an extended *FBA* gene family, containing extra copies of *FBA1*, *FBA2*, *FBA6*, and *FBA8*. Most gene duplications of the *FBA* gene family resulted from WGD (Table S2). Considering the collinear correlations and subgenome, transposed duplication and triplication events in the *FBA* family had contributed to this expansion

prior to the speciation of *Brassica* species. WGD is often followed by intensive gene loss to adapt to continuously changing environments. Compared to *A. thaliana* and its two ancestors, *B. napus* species lost some copies of *FBA* genes (i.e., *FBA1*, *3*, 4, *5,* and *6*). The remaining duplicated or triplicated FBA genes identified might be conducive to the adaptation of *B. napus* to various adverse environments during speciation and domestication.

Genome resequencing provides an effective way to identify a large number of variations, which lay a foundation for further identification and functional validation of candidate genes contributed to important traits in crop plants, such as rice, tomato, and soybean [51–53]. Critical sequence polymorphisms across the gene and its flanking regions may reflect the evolutionary trends and breeding selection effects on the genes. Based on the resequencing of a worldwide collection of 991 *B. napus* germplasm accessions that released recently, we explored the pattern of genetic polymorphisms of the *BnaFBA* family genes. The results showed that the 22 *BnaFBAs* diversified in sequence polymorphism, with the polymorphism sites ranging from two to 130 (Table 3). The *BnaFBA* genes showed different levels of polymorphism. For example, *BnaFBA6*, *BnaFBA1c,* and *BnaFBA8d* with the highest levels of genetic variation polymorphism might have experienced weak selection pressure during the evolution process, whereas *BnaFBA2c*, *BnaFBA2d,* and *BnaFBA3b* were highly conserved after strong selection. Particularly, *BnaFBA6* harbored the highest levels of genetic variation, including two missense polymorphism sites with a frequency of 12.12% and 5.72%, respectively (Table S5). Moreover, a one-stop gained site with a frequency of 1.02% was also found in the CDS regions of the *BnaFBA6* gene. These results suggested that the *BnaFBA6* gene might be under the pseudogenization process. Furthermore, *BnaFBA* genes might have experienced balancing selection or population shrinkage. For example, six *FBA* family genes, including *BnaFBA8b*, *BnaFBA1b*, *BnaFBA1c*, *BnaFBA2e*, *BnaFBA1a,* and *BnaFBA2d*, harbored some selected signals and probably underwent breeding selection in the process of natural selection and domestication in *B. napus*.

Fructose-1,6-bisphosphate aldolase (FBA) is a non-regulated enzyme in the Calvin cycle, whose activity is not regulated by effectors or posttranslational modification but by expressional regulation or protein degradation [42]. Recent studies using transgenic plants with reduced enzyme levels have revealed that FBAs play important roles in regulating carbon flux through the Calvin cycle. Elevated plastid aldolase activity accelerated RuBP regeneration and resulted in increased photosynthetic capacity, growth rate, and biomass yield in tobacco and cyanobacterium [27,54]. Increased activity of FBA in *Anabaena sp*. strain PCC 7120 stimulated photosynthetic yield [55]. Generally, the expression levels of *BnaFBA* family genes were higher in the overground tissues than that in the underground. In *B. napus*, some members of the FBA family, such as *BnaFBA5a*, *BnaFBA5b*, *BnaFBA2b*, *BnaFBA2c,* and *BnaFBA2d*, showed higher expression levels in mature leaves and the top of stems than that in young leaves and the base of stems. Notably, *BnaFBA* genes were preferentially highly expressed in the photosynthetic tissues and stages, particularly leaves and silique wall in *B. napus*. For example, during the development of silique, *BnaFBA5a* and *BnaFBA5b* exhibited the highest expression levels at 10 days after DAP, while *BnaFBA1a*, *BnaFBA2a,* and *BnaFBA2b* had highest expression levels at 15 days after DAP in the silique wall. Besides, *BnaFBA1a*, *BnaFBA2a*, *BnaFBA2b*, *BnaFBA2c*, *BnaFBA2d*, *BnaFBA2e,* and *BnaFBA2f* showed the highest expression levels at four weeks after DAP in the seed tissues when seed fill begins with rapid embryo growth, oil biosynthesis, and protein accumulation [56]. In addition, *BnaFBA3a*, *BnaFBA3b*, *BnaFBA8c,* and *BnaFBA8d* also showed the highest expression levels at 25 days in the embryo tissues. Thus, the members of *BnaFBAs* that have a strong correlation with photosynthetic events are potential targets for engineering to improve photosynthetic capacity in the future, which would be further investigated in our next study.

Previous studies revealed *FBA* family genes also involved in plant defense and response to various biotic and abiotic stresses, such as cold and heat stress [17], salt stress [21], drought stress [21], water-deficit stress [30], stress with *Rhizoctonia solani* Kuhn [31], and high light acclimation stress [32]. Here, we found that the members of the *BnaFBA* family showed diverse expression patterns in response to *Sclerotinia sclerotiorum* infection and drought stress in *B. napus*. For example, *BnaFBA1a*/*b*/*c*, *BnaFBA2a*/*b*/*c*/*d*/*e*/*f*, *BnaFBA5a*/*b,* and *BnaFBA9b* were down-regulated, while *BnaFBA3a*/*b* and *BnaFBA8a*/*b*/*c*/*d*/*e*/*f* were up-regulated compared to the control in the leaves at 24 h after *Sclerotinia sclerotiorum* infection. Under drought stress, the expression levels of 10 *BnaFBA* genes (*BnaFBA1a*/*c*, *BnaFBA2a*/*b*/*c*/*d*/*e*/*f,* and *BnaFBA5a*/*b*) decreased in the leaves and five *BnaFBA* genes (*BnaFBA1b*, *BnaFBA2b*/*d*/*e*/*f*) lowered in roots. Besides, we applied salt, heat, and drought stresses on *B. napus* seedlings to explore the expression changes of *BnaFBAs*. The expression of the *BnaFBAs* selected was induced by the three stresses. The expression levels of *BnaFBA2a*, *BnaFBA5b,* and *BnaFBA9a* were down-regulated, while *BnaFBA8a* was up-regulated compared to the control after treatments. In addition, *FBA* family genes were reported to play roles in response to phytohormones, such as ABA [17] and GA [29]. Various *cis*-acting regulatory elements related to phytohormone responsive elements were observed in the promoter regions of *BnaFBA* genes in *B. napus*, including ABRE, AuxRE, ERE, GARE, MeJARE, and SARE, indicating that the expression levels of *BnaFBA* family genes might be affected by ABA, IAA (indolylacetic acid), ethylene, GA, MeJA, and SA. Furthermore, our results showed that *BnaFBAs,* such as *BnaFBA2a*/*b*/*c*/*d*/*e*/*f*, could be also induced by strigolactones (SLs), a new class of plant hormones playing functional roles in the development of root.

In wheat, some FBA genes showed close correlations in expression patterns and could be classified into different clusters, such as *TaFBA1*/*2*/*3*, *TaFBA14*/*15*/*17*, and *TaFBA19*/*20*/*21* [11]. Similarly, based on the co-regulatory networks of *BnaFBAs*, the 22 *BnaFBA* family genes could be divided into three gene clusters. Strong positive correlations between the expression levels were observed among the members within each gene cluster of *BnaFBAs*. Meanwhile, only a few significant negative correlations appeared between minority members of cluster 1 and cluster 2. Cluster 3, consisting of *BnaFBA9a*/*b* of class-II, was independent among the three clusters. The results suggested that the *BnaFBA* genes might have functional redundancy within gene clusters and functional diversification among gene clusters in *B. napus*.

In summary, we performed a genome-wide identification of *FBA* family genes in *B. napus*, as well as *B. rapa* and *B. oleracea*, and further investigated their gene structures and phylogenetic relationships. The *cis*-acting regulatory elements in the promoter regions, natural variations, and expression patterns of *BnaFBAs* in different tissues or under treatments were analyzed. Our findings provided useful information regarding *FBA* family genes in *B. napus*. Remarkably, we found that some members of the *FBA* family that underwent breeding selection and were highly expressed in leaves and silique wall probably had positive roles in the promotion of photosynthetic capacity in *B. napus*. These *BnaFBA* genes might be utilized in the development and selection of high-yield *B. napus* cultivars.

#### **4. Materials and Methods**

#### *4.1. Identification and Property Analysis of BnaFBA Genes*

The genome sequence, annotation, and protein datasets of *B. napus* var. Darmor-*bzh*, *B. rapa* var. Z1, and *B. oleracea* var. HDEM were obtained from the GENOSCOPE database (http://www. genoscope.cns.fr/brassicanapus/ and http://www.genoscope.cns.fr/externe/plants/). The glycolytic domain (PF00274) and fructose-bisphosphate aldolase class-II domain (PF01116) from the Pfam database (http://pfam.xfam.org/) were applied as queries to search against local *B. napus* protein sequence dataset using HMMER (version 3.2.1, HHMI, Chevy Chase MD, USA) with *E*-value setting at 1e-5 [57]. Then, for further confirmation of BnaFBA proteins, the sequences of predicted BnaFBA proteins were searched against all the annotated proteins of *A. thaliana* and wheat (*Triticum aestivum* L.) using BLASTP (version 2.2.26, Bethesda, MD, USA) with *E*-value < 1 <sup>×</sup> 10-5. The putative BnaFBAs with best hits of the *A. thaliana* and wheat FBA proteins remained and were further deployed to determine the fructose-bisphosphate aldolase domains by using the Pfam and SMART databases (http://smart.embl-heidelberg.de/). The molecular weight (MW), isoelectric point (pI), and grand average of hydropathy (GRAVY) of each BnaFBA were calculated using the ProtParam tool (http://web.expasy.org/protparam/).

#### *4.2. Cis-Acting Regulatory Elements and Subcellular Localization Analysis*

The 1.5 kb promoter sequence upstream from the transcription start site of each *BnaFBA* gene was extracted from the *B. napus* genome sequence and used to predict *cis*-acting regulatory elements by PlantCARE [58]. The subcellular localization of each BnaFBA was predicted by Plant-mPLoc (http://www.csbio.sjtu.edu.cn/bioinf/plant/) [59].

#### *4.3. Structure and Chromosomal Localization Analysis*

The gene structures of the *BnaFBA* genes were inferred by aligning their coding sequences to the *B. napus* genomic sequence. Then, a schematic map of the exon-intron structure of each *BnaFBA* gene was drawn by Gene Structure Display Server 2.0 (http://gsds.cbi.pku.edu.cn/) [60]. Three-dimensional (3D) protein models of the BnaFBAs were generated using SWISS-MODEL [61,62]. The physical chromosomal locations of *BnaFBAs* were obtained from the *B. napus* genome annotation information. The graphical representation of the *BnaFBA* genes on chromosomes was plotted using R software with RIdeogram package (https://github.com/TickingClock1992/RIdeogram) [63].

#### *4.4. Multiple Alignments and Phylogenetic Analysis*

Multiple alignments of FBA protein sequences from *B. napus*, *B. rapa*, *B. oleracea*, *A. thaliana,* and wheat were performed using MUSCLE (version 3.8, Hinxton, Cambridge, UK) with default parameters [64]. The phylogenetic tree was generated using MEGA7.1 with the neighbor-joining method, and the robustness of each node in the tree was determined using 1000 bootstrap replicates [65].

#### *4.5. Synteny and Duplicate Gene Analysis*

The genomic collinearity between all pairwise combinations of *B. napus*, *B. rapa*, *B. oleracea*, and *A. thaliana* genomes was analyzed using MCScanX (version Nov. 11, 2013, Athens, GA, USA) with the default parameters [66]. Then, the syntenic relationships of *BnaFBAs*, *BraFBAs*, *BolFBAs,* and *AtFBAs* were determined according to the genomic collinearity between pairwise genomes. The syntenic map was illustrated by CIRCOS (version 0.69-9, BCGSC, Vancouver, BC, Canada) software [67,68]. The duplicated gene pairs and the modes of gene duplication were identified among *FBA* genes using DupGen\_finder (https://github.com/qiao-xin/DupGen\_finder) in *B. napus*, *B. rapa*, *B. oleracea*, and *A. thaliana* [69].

#### *4.6. Variations Analysis and Principal Component Analysis*

The publically available genome resequencing datasets of 991 *B. napus* germplasm accessions were downloaded from the National Center of Biotechnology Information (NCBI) under SRP155312 [39]. The sequencing reads for each accession were mapped to *B. napus* var. Darmor-*bzh* reference genome using the MEM algorithm of Burrows–Wheeler Aligner (version 0.7.17, Hinxton, Cambridge, UK) [70]. The mapping results were sorted using SAMTOOLS (version 1.1, Hinxton, Cambridge, UK) [71]. The duplicated reads were marked with PICARD (version 2.0.1, Broad Institute, Cambridge, MA, USA) [72]. Variations, including SNPs and InDels, were called using the HaplotypeCaller module in GATK (version 4.1.3.0, https://github.com/broadinstitute/gatk/) for each accession. SNP and InDels annotation was performed using ANNOVAR (version 2018Apr16, Philadelphia, PA, USA) software based on the annotation of *B. napus* var. Darmor-*bzh* genome [73]. Principal component analysis (PCA) was performed using the R software with the ggbiplot package (https://github.com/vqv/ggbiplot).

#### *4.7. Expression Analysis of BnaFBA Genes*

The publically available RNA-seq dataset of 12 different tissues (sepal, pistil, stamen, ovule, pericarp, blossomy pistil, wilting pistil, root, flower, leaf, silique wall, and stem) collected from different stages of the growth (BioProject ID: PRJNA394926) [36], RNA-seq dataset of leaf and root tissues under drought stress (BioProject ID: PRJNA256233) [74], RNA-seq dataset of seeds across four phases of the

development (BioProject ID: PRJNA311067) [56], RNA-seq dataset of stems and leaves after *Sclerotinia sclerotiorum* infection (BioProject ID: PRJNA321917) [75], and time-series RNA-seq dataset of roots under a synthetic analog of strigolactones (rac-GR24) treatments (BioProject ID: PRJNA484313) [76] were downloaded from the NCBI SRA database, and further used as main sources to perform gene expression profiling of *BnaFBA* genes in *B. napus*. The transcriptome reads were mapped to *B. napus* var. Darmor-*bzh* reference genome using HISAT2 (version 2.1.0, Baltimore, MD, USA) with the default settings [77]. The read counts per gene were generated by featureCounts [78]. Fragments per kilobase of exon per million fragments mapped (FPKM) was used for the quantification of gene expression. The clustered heatmaps were visualized with expression levels (log2) of *BnaFBA* genes by R software using the pheatmap function package (https://cran.r-project.org/web/packages/pheatmap/).

#### *4.8. Plant Materials and Treatments*

Rapeseed seeds were germinated on a filter paper saturated with distilled water in darkness at 22 ◦C for two days. Then, the seedling plants were transferred to a 4 L hydroponic system containing continuously aerated 1/2 Murashige and Skoog (MS) liquid solution (pH 5.8, without agar and sugar) and grown in an incubator under a photosynthetic flux of 160 μmol photons m−<sup>2</sup> s−<sup>1</sup> and a humidity of about 50% (16 h light at 25 ◦C/8 h darkness at 22 ◦C). The 1/2 MS liquid solution was changed once every two days. After three weeks, the seedlings were transferred to a new 1/2 MS liquid solution (pH 5.8, without agar and sugar) for different stress treatments. For salt, heat, and drought stress treatments, seedlings were exposed to 1/2 MS solution (pH 5.8, without agar and sugar) containing 250-mM NaCl, 40 ◦C conditions, and 20% (*w*/*v*) polyethylene glycol (PEG), respectively. Seedlings exposed to 1/2 MS solution at 22 ◦C were used as controls. Leaf samples were collected 3 days after each treatment. All collected samples were immediately frozen in liquid nitrogen and stored at −70 ◦C for further analysis.

#### *4.9. RNA Isolation and Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR) Analysis*

Total RNAs were extracted from each sample using an RNA extraction kit (Takara, Dalian, China) following the manufacturer's procedure. Two micrograms of total RNA was used to synthesize the first-strand cDNA using the Prime Script RT reagent Kit (Takara, Dalian, China) according to the manufacturer's protocol. Real-time quantitative PCR was performed using 2 mL of cDNA in a 20 mL reaction volume using an SYBR Green PCR kit (GeneCopoeia Inc., Rockville, MD, USA) with ViiATM 7 Dx platform (ABI, Los Angeles, CA, USA). The qRT-PCR reaction condition was as follows: 95 ◦C for 5 min, 40 cycles at 95 ◦C for 30 s, 55 ◦C for 30 s, and 72 ◦C for 30 s. Gene-specific primers were designed and listed in Supplementary Materials (Table S6). The relative expression levels of these genes were analyzed by the 2−*C*<sup>t</sup> method. The *BnaTMA7* gene (*BnaC05g11560D*), which exhibited stable expression in different/same tissues under various experimental conditions, was used as an internal control [79]. All qRT-PCR reactions were assayed in triplicates.

#### *4.10. Pearson Correlation Analysis*

On the basis of the RNA-seq results, the Pearson correlation coefficients (PCCs) and *p*-value of the expression levels of *BnaFBA* gene pairs were calculated using R software with cor and cor.test function packages, respectively. The correlation heatmap was generated by R software with the corrplot function package. The gene co-regulatory networks were constructed by Cytoscape (version 3.1, Seattle, WA, USA) based on the PCCs of *BnaFBA* gene pairs with a *p*-value ≤ 0.05 [80].

#### **Supplementary Materials:** Supplementary materials can be found at http://www.mdpi.com/1422-0067/20/22/ 5749/s1.

**Author Contributions:** J.L. (Jing Liu) and W.Z. designed the research and wrote the article. W.Z. and H.L. collected data and performed most of the data analysis. Z.H. and L.Z. performed the experiments. W.H. provided the research facility. J.L. (Jun Liu) participated in the revision of the manuscript. S.X. coordinated the study. All authors read and approved the final manuscript.

**Funding:** This research was supported by the National Key Basic Research Program of China (2015CB150200), the National Natural Science Foundation of China (Grant No. U1304303 and 31500237), the Ministry of Science and Technology of China (2016YFD0100500), and the Agricultural Science and Technology Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-OCRI).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
