**2. Results**

#### *2.1. Identification and Chromosomal Distribution of BrBGLUs*

After a HMM (Hidden Markov Model) search, 64 *BrBGLU* genes were identified and designated as *BrBGLU1* to *BrBGLU64*, according to their positions on the chromosomes (Figure 1). The locus ID, genome location, coding sequence (CDS) length, and the protein length of the *BrBGLUs* are listed in Table 1. The genomic DNA sequences of the *BrBGLU*s ranged from 390 bp to 9617 bp. While the average length was 1293 bp, the length of the CDS of the *BrBGLU*s ranged from 267 bp to 2058 bp. The *BrBGLU* genes were heterogeneously distributed among all 10 chromosomes of *B. rapa*. Chromosome 5 contained the largest number of *BrBGLU* genes, comprising 15 members (23.4%), whereas chromosome 2 contained only one gene member. We also detected tandem arrays of the *BrBGLU* genes among the 10 *B. rapa* chromosomes. The tandem array was defined as 'multiple *BrBGLU* genes located in neighboring or the same intergenic region' [30]. Ten *BrBGLU* gene clusters were found on chromosomes A01, A03, A05, A07, and A09. Chromosome 5 contained the maximum number of clusters, comprising 11 *BrBGLU*s.





**Figure 1.** Chromosomal distribution of the 64 *BrBGLU* genes identified in this study. The chromosome number is indicated above each chromosome. Ten clusters of *BrBGLUs* are indicated in red boxes. **Figure 1.** Chromosomal distribution of the 64 *BrBGLU* genes identified in this study. The chromosome number is indicated above each chromosome. Ten clusters of *BrBGLUs* are indicated in red boxes. Black ovals on each chromosome represent the centromeric regions.

#### Black ovals on each chromosome represent the centromeric regions. *2.2. Phylogenetic and Gene Structure Analysis of BrBGLUs*

To understand the evolutionary relationship of the *BrBGLU* genes, phylogenetic analysis of the *BrBGLU* and *AtBGULU* genes was conducted. To obtain *AtBGLU*s, HMM searching was performed by using all of the putative protein sequences of the *Arabidopsis* genome (ARAPORT11, https://www. araport.org) as queries. A total of 48 *AtBGLU* genes were obtained, which agrees with the results of a previous study [6]. The 64 BrBGLUs and 48 AtBGLUs protein sequences were aligned using ClustaX2 [31]. An unrooted phylogenetic tree was constructed for the 64 BrBGLUs and 48 AtBGLUs, using the NJ method in MEGA6 with a Poisson model. All BGLU proteins were classified into 10 distinct subgroups, namely, BGLU-a to BGLU-j (Figure 2). The results of the phylogenetic analysis were relatively similar to the findings of a previous study using *Arabidopsis* [6], with a few exceptions. All *B. rapa* and *Arabidopsis* proteins are grouped into 10 subgroups, whereas *Arabidopsis* subgroups 8 and 9 were combined into a subgroup GH1-c in our analysis. In addition, AtBGLU48 (SENSITIVE TO FREEZING 2, SFR2), which belongs to a distinct lineage from 10 subgroups in a previous study [6,32], was incorporated into the GH1-j subgroup, together with BrBGLU8 and BrBGLU42 (Figure 2).

Phylogenetic analysis generated an interesting finding, that the clustering or groupings of genes were related to the chromosomal locus or function. Based on the functions of the *AtBGLUs*, flavonol accumulation (*AtBGLU1*-*6*) and anthocyanin glucosyltransferase (*AtBGLU7*-*11*)-related genes were highlighted by subgroup GH1-a [10–12]; flavonoid utilization-related *AtBGLUs* (*AtBGLU12*-*17*) were represented by the GH1-e subgroup [10,13]; myrosinase-encoding *AtBGLUs* (*AtBGLU34*-*39*) belonged to the GH1-d subgroup [14–16], and scopolin hydrolysis-related *AtBGLUs* (*AtBGLU21*-23) were grouped into GH1-i [21,22]. Most of the genes within the same clusters on a chromosome were grouped into the same subfamily, which is similar to the findings using *Arabidopsis*, i.e., *BrBGLU5*/*6*, *BrBGLU11*/*12*, *BrBGLU31*/*32*/*33*, *BrBGLU40*/*41*, *BrBGLU58*/*59*, and *BrBGLU61*/*62*. This clustering indicates that the *BGLU* genes may have evolved from an ancestral gene via gene duplication. However, *BrBGLU51* was grouped with six *AtBGLUs* (*AtBGLU34*/*35*/*36*/*37*/*38*/*39*) in the GH1-d subgroup, indicating the possible loss of some *BGLU* genes in *B. rapa*.

**Figure 2.** Phylogenetic reconstruction of GH1 genes of *Arabidopsis* and *Brassica rapa*. Multiple sequence alignment of GH1 proteins was performed using ClustalX2 with default parameters. The unrooted phylogenetic tree was constructed by MEGA 6 with the neighbor-joining (NJ) methods using the following parameters: bootstrap values (1,000 replicates) and Poisson model. The tree is divided into 11 phylogenetic subgroups, designated as GH1-a to GH1-k. Members of *Arabidopsis* and *B. rapa* are **Figure 2.** Phylogenetic reconstruction of GH1 genes of *Arabidopsis* and *Brassica rapa*. Multiple sequence alignment of GH1 proteins was performed using ClustalX2 with default parameters. The unrooted phylogenetic tree was constructed by MEGA 6 with the neighbor-joining (NJ) methods using the following parameters: bootstrap values (1,000 replicates) and Poisson model. The tree is divided into 11 phylogenetic subgroups, designated as GH1-a to GH1-k. Members of *Arabidopsis* and *B. rapa* are denoted by blue squares and red circles.

denoted by blue squares and red circles. Gene structure was commonly diversified during the evolution of the large number of gene families. To expand our knowledge of BrBGLUs in relation to evolution and functional diversification, the gene structures of the *BrBGLU*s were analyzed on the basis of exon–intron organization, using GSDS 2.0 [33]. The *BrBGLU*s exhibited 12 distinct exon–intron organization patterns, and the most common organization was 11 exons separated by 10 introns, presenting 19 members (Table 1 and Figure 3). Most genes contained more than two introns, except for *BrBGLU46*  and *BrBGLU55*, indicating the possible occurrence of alternative splicing during gene expression. The *AtBGLU*s exhibited 10 distinct exon–intron organization patterns, and the pattern with 13 exons was the most common [6]. This analysis is consistent with *Arabidopsis* and rice, where the intron size and Gene structure was commonly diversified during the evolution of the large number of gene families. To expand our knowledge of BrBGLUs in relation to evolution and functional diversification, the gene structures of the *BrBGLU*s were analyzed on the basis of exon–intron organization, using GSDS 2.0 [33]. The *BrBGLU*s exhibited 12 distinct exon–intron organization patterns, and the most common organization was 11 exons separated by 10 introns, presenting 19 members (Table 1 and Figure 3). Most genes contained more than two introns, except for *BrBGLU46* and *BrBGLU55*, indicating the possible occurrence of alternative splicing during gene expression. The *AtBGLU*s exhibited 10 distinct exon–intron organization patterns, and the pattern with 13 exons was the most common [6]. This analysis is consistent with *Arabidopsis* and rice, where the intron size and number of the *BGLUs* genes are highly variable [5,6].

number of the *BGLUs* genes are highly variable [5,6].

**Figure 3.** Exon–intron organization of *BrBGLUs* in different subgroups. Exons and introns are represented by blue boxes and black lines, respectively. The phylogenetic tree of each subfamily was constructed using MEGA6, as described in Figure 1.
