**3. Results**

### *3.1. wbm Located on Chromosome 7A*

We aimed to obtain insight into the genome organization of *wbm* in wheat. For this, we identified the genomic location of *wbm* using corresponding EST and protein sequences [11]. We did not find any genomic sequences with absolute identity with the *wbm* sequence in the wheat Chinese Spring (CS) genome reference data. Hits located on chromosomes 7A and 7D were the most significant (81% identity and 100% coverage, E-value 3e × 10−<sup>59</sup> for both hits). The hit located on chromosome 7A overlaps a gene, (TraesCS7A02G531903, *wbm-like* hereafter), whereas the hit located on chromosome 7D (*wbm\_hit\_7D*) partially overlapped with the 5- end of the annotated non-coding RNA (STRG\_Seed.132206.1). We also performed a BLAST search with the Tag-A sequence (CATGTTGTTCCGTGTAGTACC), which was used for *wbm* identification [11]. We found that this sequence had near-perfect similarity (only one mismatch) with the downstream region of *wbm-like*. To obtain additional evidence of *wbm* location on chromosome 7A, we applied the recently released wheat 10+ genomes project (http://www.10wheatgenomes.com/). Using *wbm-like*, *wbm*, and their promoter sequences, as well as the *wbm\_hit\_7D* sequence, as a query, we performed a similarity search against de novo genome assembly of 10 wheat cultivars. We found one wheat cultivar, Mace, which shows absolute similarity with *wbm* and its promoter sequence, whereas no absolute similarity with *wbm-like* was observed, pointing to the absence of *wbm-like* in the genome of this cultivar and presence of *wbm*. The similarity search for *wbm\_hit\_7D* resulted in a hit with 100% similarity in the Mace cultivar located on distinct chromosome from *wbm* gene hit, further supporting that *wbm\_hit\_7D* is less related to *wbm* gene than *wbm-like*. Thus, the obtained results show that *wbm* and *wbm-like* are orthologous and located in the long arm of chromosome 7A.

### *3.2. Comparative Analysis of wbm and wbm-*Like

To trace the origin of *wbm* and *wbm-like* (cultivar CS), we conducted multiple alignment of their nucleotide and protein sequences with sequences of *Triticum urartu* (A genome donor), *Aegilops speltoides* (B genome donor), *A. tauschii* (D genome donor), *T. monococcum*, and *Secale cereal*, followed by phylogenetic tree construction. *wbm\_hit\_7D*, located on chromosome 7D, and *wbm* sequences from two additional cultivars (Mace, which has *wbm*, and Julius, which has *wbm-like*) were also involved in the multiple sequence alignment. Comparison of the nucleotide sequences resulted in distinct cluster generated by the reference and Mace *wbm* genes (Cluster 1) and three clusters formed by *S. cereale* (Cluster 2), *A. speltoides* (Cluster 3), and *wbm-like* sequences from other species (Cluster 4) (Figure 1A,C). We next assess whether the comparison of translated protein sequences will show similar phylogenetic picture. Prediction of protein sequences from *wbm\_hit\_7D* and *A. tauschii* revealed a premature termination codon at the same position, indicating that sequences may not encode proteins. Multiple alignment of all other predicted proteins (Figure 1B) and phylogenetic tree construction resulted in three clusters, generated by *wbm* (Cluster 1), *wbm-like* (Cluster 2), and *wbm* similar sequences from *S. cereale*, *T. urartu*, *A. speltoides*, and *T. monococcum* (Cluster 3). Thus, both phylogenetic trees show the distinct phylogenetic position of *wbm* gene. Interestingly, with a high divergence rate between the sequences from different clusters, N-terminus, corresponding to the signal peptide of the *wbm* protein [11], is well conserved.

**Figure 1.** Conservation analysis of *wbm* and *wbm*-*like*. Multiple alignment of *wbm* and *wbm*-*like* (**A**) nucleotide and (**B**) amino acid sequences with similar sequences from *Triticum urartu*, *Aegilops speltoides*, *Triticum monococcum*, *Secale cereale*, and *Aegilops tauschii*. (**C**) Phylogenetic tree based on multiple alignment of *wbm*(-*like*) (**C**) nucleotide and (**D**) amino acid sequences. Bootstrap values are indicated.

Taken together, the comparative analysis indicated that *wbm* has a distinct origin from wheat genome donor species, supporting its introgressive origin in wheat. In addition, *wbm*, *wbm-like*, and *wbm* similar sequences from analyzed species demonstrate signatures of diversifying selection with high conservation in the N-terminus of protein.

### *3.3. Screening of Triticale Collection in the Presence of wbm*

The established location of *wbm* and *wbm-like* on the chromosome 7A (Figure 2A) provided the possibility of checking for the presence of the genes in triticale possessing A, B, and R genomes but not D genome. The previously designed PCR marker (NWP, [11]) on *wbm* was dominant, resulting in amplification only in *wbm*-carrying genotypes. Therefore, we designed new *wbm-like*-specific primers on the upstream region (pro primers, Figure 2B) and CDS (*wbm-like* primers, Figure 2B) of the gene (Figure 2). In addition, *wbm*-specific primers on the CDS region of the gene were also designed (*wbm* primers, Figure 2B). In this study, we used the triticale germplasm collection consisting of 107 lines, most of which were obtained in Russia (Supplementary Table S1). Using the primer sets, we conducted PCR screening of this collection. Screening of this collection with *wbm* specific primers (NWP and

*wbm*) revealed amplification of the PCR product of expected size (961 bp for NWP and 988 for pro, Figure 2C) only in three triticale lines, L8665, P13-5-13, and P13-5-2. The results of PCR screening with *wbm-like* specific primers showed that the expected PCR products were obtained for all lines except three (L8665, P13-5-13, and P13-5-2; Figure 2D), which do not possess *wbm-like*. Thus, we identified three triticale lines carrying *wbm* that can be further used in plant breeding programs to improve the bread-making quality of triticale.

### *3.4. Genomic Region of wbm-*Like *Lacks Conservation with wbm and Surrounded by Transposon Insertions*

To understand the conservation rate of the genomic regions near *wbm* and *wbm-like*, promoter sequences (~−1 Kb region) of the genes were sequenced. Comparison of the obtained sequences demonstrated only partial similarity of the 67 bp region upstream of *wbm* and *wbm-like*. Analysis of the *wbm* genome locus in the CS genome assembly showed that the gene is surrounded by two transposable element (TE) insertions, and none of them were observed in the promoter of *wbm*. Of them, the TE located downstream (ca. 350 bp) belong to mutator DNA transposons family, whereas the TE located ca. 270 bp upstream of *wbm* (promoter) is L1-like retrotransposon (L1). To verify that the presence of L1 elements is unique to lines carrying *wbm-like,* we designed a primer pair (*wbm*\_L1) flanking the L1 insertion. The screening of the triticale collection with this primer pair showed that the PCR product of expected size was obtained only in lines possessing *wbm-like*; no amplification was observed with DNA of lines possessing *wbm* (Figure 2C). Comparison of the *wbm-like* promoter with the *T. urartu* genome resulted in the finding of a highly similar sequence that also contains LINE1 insertion.

**Figure 2.** Genomic organization of *wbm*-*like* and *wbm* in genome of Chinese Spring. ( **A**) Schematic representation of Chromosome 7A with marked position of *wbm*(-*like*) gene position. (**B**) Schematic organization of *wbm* and *wbm*-*like* and their promoter regions. The blue box depicts regions with high similarity (81% to 84%) and red box depicts Long interspersed nuclear element (LINE1) retroelement insertion. Horizontal lines show positions of primers. Gel electrophoresis of PCR products obtained with DNA of triticale lines (6 lines are represented here), and the primers specifically designed for (**C**) *wbm* gene (NWP (Furtado et al. [11]), *wbm* (specific for *wbm* open reading frame (ORF))) and ( **D**) *wbm*-*like* gene (pro (promoter region of *wbm*-*like*), L1 (flanking LINE1 insertion in promoter region of *wbm*-*like*), and *wbm*-*like* (specific for *wbm*-*like* ORF)). 1–6 lanes of the gels corresponding to the 3 *wbm* positive lines (lane 1: L8665, lane 3: P13-5-13, and lane 5: P13-5-2) and 3 randomly selected *wbm*-*like* positive lines (lane 2: 131/7, lane 4: C 235, and lane 6: Yarilo).

The insertion of transposable element nearby genes often results in silencing of the gene expression. The expression of *wbm-like* in the CS cultivar was almost undetectable (wheat-expression.com).

Thus, the results show that upstream (promotor) regions of *wbm* and *wbm*-like genes have substantial di fferences supporting the introgressive origin of *wbm* gene. Further studies are needed to establish the borders and the size of the introgression.
