*2.1. Identification of Trihelix Genes in Rice*

The HMM (Hidden Markov Model) for the Myb/SANT-LIKE domain identified 117 gene candidates and a rice-specific Myb/SANT-LIKE domain was built using them. The HMM profile search was performed on the whole rice genome with the rice-specific Myb/SANT-LIKE domain and 79 new candidate genes were found. Only genes with E-value < 0.01 were classified in the trihelix family. Putative genes were verified in the Pfam and InterPro databases to confirm the existence of the complete Myb/SANT-LIKE domains. Finally, 41 trihelix genes were identified.

All trihelix genes mapped onto the rice chromosomes, they were named *OsMSL01*-*OsMSL41* according to the gene distribution order on the chromosomes. *OsMSL25* and *OsMSL34* have two alternative splicing. "MSL" stands for "Myb/SANT-LIKE". The characteristics of OsMSLs including the gene MSU\_Locus ID, the chromosomes locations, the lengths of the CDS (coding sequence) and amino acid sequences, the number of exons, the protein sizes, and the isoelectric points are summarized in Table 1. OsMSL19 was the smallest protein with 266 amino acids, whereas OsMSL12 was the largest with 882 amino acids. The protein MW (Molecular Weight) ranged from 28.62 kDa to 97.37 kDa. Their predicted isoelectric points varied from 4.45 (OsMSL09) to 11.38 (OsMSL17). Twenty-nine of the trihelix transcription factors were localized in the nucleus, ten in the chloroplast, and two in the peroxisome (OsMSL04 and OsMSL29).



**Table 1.** Detailed information of all trihelix family genes identified in the rice genome.
