**3. Discussion**

The trihelix DNA binding domain is usually associated with that of Myb/SANT LIKE. There are three strongly conserved, regularly spaced tryptophan (W) residues in each repeating Myb α-helix. The residues of the Myb α-helix regions are also strongly conserved between the GT factors and the Myb/SANT-LIKE proteins. Individual helices are longer, so the trihelix domains of the GT factors and the Myb/SANT LIKE proteins are related [20]. In contrast, other amino acids at this location have longer Myb repeat sequences than the helix-turn-helix structure formed by the Myb-type DNA-binding domain. Therefore, the trihelix family genes have functions and target gene sequences differing from those of the MYB transcription factor family [20]. In the present study, 41 rice trihelix genes were identified using the Myb-type (Myb/SANT-LIKE) DNA-binding domain. However, in a previous study, only 31 rice trihelix genes were identified [21]. Because the repeated search method was performed in the present study, the trihelix gene could be identified in the rice genome more comprehensively. Based on the wide ranges of *OsMSL* protein MW, isoelectric point, and subcellular localization, we speculated that *OsMSL*s are not conservatively evolved. Besides, in *Arabidopsis*, the chimeric trihelix gene At4g17060 was misannotated in the genome [22]. Its ortholog LOC\_Os10g41460 (*OsMSL36*), was identified in this study.

In general, gene families expand by tandem and segmental duplications [23]. Evolutionary conservatism increased with the number of duplicated genes in a gene family [16]. Chromosomal distribution and gene duplication analyses indicated that there are six pairs of duplicated genes in rice among total 41 trihelix gene, including two pairs of tandem genes and four pairs of segmental genes. However, in *Arabidopsis*, fifteen pairs of duplicated trihelix genes were previously detected among 34 trihelix genes [24]. These results suggested that *OsMSL*s are less conserved, and most genes may not originate from the same ancestor. On the other hand, these results demonstrate that the rice trihelix family has a high degree of evolutionary divergence and is non-conservative. These properties may account for the substantial differences among the rice trihelix proteins.

We conducted a phylogenetic analysis to elucidate the evolutionary relationships within the rice and other species trihelix gene family. In a previous study, no GTδ was designated for rice or *Arabidopsis* [18] because the *Arabidopsis* family Myb/SANT-LIKE DNA-binding domain or protein sequence was aligned with the rice genome in the attempt to identify the potential rice trihelix gene. However, there is a relatively long evolutionary distance between rice and *Arabidopsis.* Therefore, this method may overlook the specific trihelix genes in rice. Misclassification may have resulted in deviations because of the influence of the *Arabidopsis* trihelix genes. In the present study, a class of rice trihelix genes and some tomato trihelix genes previously assigned to this subfamily have been found in the GTδ clade. Subsequent investigation revealed that this subfamily has a high structure similarity. For this reason, the evolutionary relationships of its members are more conservative than those in other subgroups. In previous studies, the GT clade was divided into the GT1 and

GT2 subfamilies cited [13,14,24]. According to our evolutionary analysis, there are two GT subfamily clusters (Figure 3A). The difference between GT1 and GT2 is smaller than those among SIP1, SH4, GTγ, and GTδ. The genes in GT1 each have one trihelix DNA binding domain with three conserved tryptophans. Those in GT2 each have an additional trihelix DNA binding domain with two tryptophans and one phenylalanine [25]. Consequently, we classified both the GT1 and GT2 clades in the GT subfamily.

Although the evolution of the trihelix family was not conservative, our gene structure and conservative functional domain analyses indicated that the genes within the same subfamilies (especially SIP1 and GTδ) were still relatively conserved. In fact, most duplicated genes occurred in GTδ. The MEME analysis revealed that the functional domain distribution of each *OsMSL* was related to its classification (Figure 4). Therefore, these conserved functional domains play central roles in group-specific functions. In contrast, the gene structures among the various groups differed greatly from the conserved functional domains. For this reason, they may have different downstream regulatory genes and participate in different signaling pathways.

The distribution and type of *cis*-elements on the gene promoter may determine *OsMSL* functions. In this study, we identified five *cis*-elements shared by all genes out of a large number among 41 *OsMSL*s. The results showed that *OsMSL*s were mainly involved in abiotic stress and light-induced responses. To date, little functional analysis has been conducted on the trihelix transcription factors in plants. Previous studies showed that they participate in responses to pathogens and abiotic stress, light induction, and nitrogen metabolism [18]. The light-induced process is a major feature of the trihelix genes. Light induces massive reprogramming of the plant transcriptome and upregulates or downregulates gene expression and its corresponding signaling pathway [26]. Light signaling coordinates the induction or repression of specific downstream genes like bHLH [27], bZIP [28], R2R3-MYB [29], FAR1 [30], and FHY3 [31]. Studies on light regulatory mechanisms in plants focused on the long-term effects of light exposure. However, little attention has been paid to the transient light-responsive processes of transcription factors in plant stress reactions. In the present study, the *OsMSL* expression profiles disclosed that their responses to light are transient and change with the processing time. Therefore, *OsMSL*s may be regulated by light in response to abiotic stress in the same way that phototropism, chloroplast movement, and stomatal opening participate in rapid light-responsive processes and are not under extensive transcriptional regulation. This mechanism substantially differs from that observed in relation to gene expression changes in response to the long-term effects of light on photoperiod.

Gene expression specificity in plant tissues and developmental stages may indicate possible gene functions. Previous studies showed that certain trihelix genes in tomato and chrysanthemum exhibited stable expression levels in all tissues [8,14]. However, most *OsMSL*s do not maintain stable expression levels in different tissues. *OsMSL* expression levels may vary substantially among tissues. For example, the expression levels of *OsMSL16*, *OsMSL27*, *OsMSL28*, *OsMSL35*, and *OsMSL39* were extremely high in the leaves and sheaths but comparatively low in the roots and stems. Subcellular localization revealed that these genes are expressed in the chloroplasts, and these are present only in the rice leaves and sheaths. The expression patterns of *OsMSL16*, *OsMSL27*, *OsMSL28*, *OsMSL35*, and *OsMSL39* in GTδ were similar. Therefore, the different *OsMSL*s within the same group explain the parallel functions of the rice trihelix family. *OsMSL02*, *OsMSL08*, and *OsMSL14* in Group II were expressed at low levels in all four tissues. Either they are inducible or they are only upregulated under special conditions [32].

High *OsMSL* expression was observed mainly in 7–40 days seedlings (from germination to early tillering). Therefore, they indicate that *OsMSL*s contribute primarily to the early stages of rice growth, as do many other genes. For example, *ZFP182* overexpression enhanced salt, drought, and cold tolerance in transgenic rice seedlings [33]. Loss of the ABA transporter *OsPM1* in 35 days rice seedlings conferred greater drought sensitivity than that seen in the WT [34]. The expression patterns of *OsMSL34a* and *OsMSL34b* disclosed that they were, in fact, different variable splicing forms

of the same gene with very different expression levels. Since *OsMSL34b* might be the primary variable splicing form of the gene, *OsMSL34a* was downregulated to some extent.

In the previous study on rice trihelix genes, their responses to various plant hormones were highlighted [21], and the present study focused on abiotic stress and stress signaling molecules. Other than plant growth and development, MSLs participate in stress responses [18]. Although no *cis*-elements related to the ABA signaling pathway were found in the *OsMSL* promoter region, ABA nonetheless, induced the rice trihelix genes. ABA accumulates when plants are subjected to a water deficit. It regulates the expression of drought stress-related genes and modulates the molecular, cellular, and physiological mechanisms for adaptation to environmental stress [35]. In chrysanthemum, however, ABA downregulated the trihelix genes but others were upregulated after prolonged ABA exposure [14]. In contrast, *GmGT-2A* and *GmGT-2B* in soybean were upregulated by ABA [11]. In the present study, twelve *OsMSL*s from all subfamilies have been induced by ABA. These results suggest that the signaling mechanisms of the trihelix family genes vary with species. Whether the ABA presence has a negative regulation on the trihelix gene requires further experimental verification.

ROS (reactive oxygen species) are produced in response to most environmental stress. Excessive ROS accumulation may irreversibly damage cells [36,37]. In previous studies, trihelix family genes had at least one response during plant osmotic stress defense [18]. It follows that trihelix family genes may also participate in ROS scavenging and enhance plant tolerance to various stresses. However, little is known about the mechanism of the peroxide reaction mediated by the trihelix family. We performed a quantitative PCR analysis on 12 *OsMSL*s subjected to hydrogen peroxide. All 12 *OsMSL*s responded to hydrogen peroxide stress. Therefore, they may help improve the permeation tolerance by increasing the ROS scavenging capacity in rice.

The trihelix transcription factors bind to GT elements on the light-regulating genes [25]. In darkness, photo regulatory genes are repressed and their associated trihelix family genes are also affected. Nevertheless, certain trihelix family genes are downregulated in response to light exposure, apparently because they must be repressed to be able to downregulate target whose expression is light-dependent. Certain constitutively expressed trihelix genes occur in *Arabidopsis.* Their expression is ubiquitous and indifferent to the light regime. They are concentrated mainly in GT1 and GT2 [38]. In the present study, however, the expression of the 12 rice trihelix genes changed direction at least once after light or dark treatment. Therefore, they may be inducible rather than constitutive. It remains to be determined whether *OsMSL*s are regulated by one wavelength or a wide light spectrum. Further investigation of the light path and its components is necessary. The results of our study have helped initiate the research of the rice trihelix transcription factors. In future research, the relationships among the *OsMSL*s, ABA-mediated dehydration stress tolerance, and ROS scavenging ability under light regulation should be explored.
