**Molecular Research for Cereal Grain Quality**

Editors

**Jinsong Bao Jianhong Xu**

Basel ' Beijing ' Wuhan ' Barcelona ' Belgrade ' Novi Sad ' Cluj ' Manchester

*Editors* Jinsong Bao Hainan Institute Zhejiang University Sanya China

Jianhong Xu Hainan Institute Zhejiang University Sanya China

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *International Journal of Molecular Sciences* (ISSN 1422-0067) (available at: www.mdpi.com/journal/ ijms/special issues/Cereal Seed Quality).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

Lastname, A.A.; Lastname, B.B. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-9055-4 (Hbk) ISBN 978-3-0365-9054-7 (PDF) doi.org/10.3390/books978-3-0365-9054-7**

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license. The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons Attribution-NonCommercial-NoDerivs (CC BY-NC-ND) license.

## **Contents**


Reprinted from: *Int. J. Mol. Sci.* **2022**, *23*, 11349, doi:10.3390/ijms231911349 . . . . . . . . . . . . . **141**


## **About the Editors**

### **Jinsong Bao**

Dr. Jinsong Bao is currently a professor in the College of Agriculture and Biotechnology at Zhejiang University, China. He received his B.S. (1993) and M.S. (1996) degrees in horticulture from Zhejiang Agricultural University, and his Ph.D. degree in biophysics from Zhejiang University in 1999. His research interests focus on molecular genetics of rice quality, more specifically in the areas of starch chemistry and quality, nutritional quality, and molecular breeding. He has edited two books, including *Rice: Chemistry and Technology* (4th edition), and published more than 150 peer-reviewed research articles and 17 book chapters in these areas. He has received two professional awards for his achievements in the genetic study and molecular improvement of rice quality from the Zhejiang Provincial Government. He has also been awarded the "Young Scientist Research Award" from AACCI and the Zhejiang Young Science and Technology Award from the Zhejiang Association of Science and Technology. He has traveled and worked extensively in Hong Kong and USA. He teaches undergraduate and graduate courses in mutational genetics and developmental biology in plants. He is a member of the Editorial Boards of the *International Journal of Molecular Sciences*, the *Journal of Cereal Science*, *Cereal Chemistry*, *Starch*, *Genes & Genomics*, *Rice Science*, *Frontiers in Nutrition* and *Applied Science*.

### **Jianhong Xu**

Dr. Jian-Hong Xu is a professor at Zhejiang University. In 2001, he won a Japanese government scholarship to study in Japan and received his PhD degree from the University Tokyo in 2004. Then, he conducted his postdoctoral research at Dr. Messing's Lab at Rutgers University. He joined Zhejiang University in 2011. His research mainly focuses on plant molecular biology and genomics, including the molecular mechanism of seed quality formation, epigenetics (DNA methylation and demethylation) and small RNAs, molecular evolution of genes and genomes, identification and transposition mechanism of transposable elements and pollen sterility, and transition mechanism of rice photoperiod-and-thermo-sensitive genic male sterility via methods of comparative genomics, bioinformatics, evolutionary biology, molecular biology, and transgenes. His work has been published in journals with a high impact factor, such as *PNAS*, *Genome Research*, *Molecular Biology and Evolution*, *PLoS Genetics*, *Molecular Plant* and *Plant Journal*.

## *Editorial* **Molecular Research for Cereal Grain Quality**

**Jinsong Bao \* and Jian-Hong Xu**

Hainan Institute, Zhejiang University, Yazhou Bay Science and Technology City, Sanya 572025, China; jhxu@zju.edu.cn

**\*** Correspondence: jsbao@zju.edu.cn

Cereals such as wheat (*Triticum aestivum* L.), rice (*Oryza sativa* L.), and maize (*Zea mays* L.) provide key sources of dietary energy for human beings. Deficiency in their production will bring serious food security problems. Due to technical advances, for example, the utilization of heterosis in the development of hybrid cereals, many countries have achieved self-sufficiency in cereal production. However, due to increases in the human population and urbanization and a decline in arable land, the higher production demand remains challenging for many nations. On the other hand, with the increase in living standards thanks to economic development, our desire for a better life requires more production of high-quality cereal foods [1,2].

Cereal grain quality is governed by all the features and characteristics of the grain and its products to meet the demands of end users, which includes milling efficiency, processing quality, grain shape and appearance, ease of cooking, palatability, and nutrition [3]; it mainly reflects the physical and chemical properties of the grain. Physically, the grain's shape and size affect the quality of appearance and also the yield, while chalkiness (especially for rice) affects its appearance and processing quality. Chemically, the major constituents of cereals, i.e., starch, protein, and lipids, affect cooking and eating quality, while protein, lipids, and other micronutrients affect nutritional quality [3]. All the physical and chemical properties are considered to be complex traits that are affected by both genetic and environmental factors [3]. However, molecular mechanisms underlying grain quality formation are poorly understood, which may constrain our ability to produce high-quality cereal grain. This Special Issue aims to provide a forum on the most recent advances in the application of molecular tools to understand the mechanism for improving any cereals' grain quality. A total of 13 papers were collected for this Special Issue [1,2,4–14], mainly covering rice and wheat crops.

Rice is one of the most important staple food crops in the world and feeds more than half of the world's population. The grain quality of rice generally includes the milling, appearance, cooking and eating, and nutritional qualities [3]. Liu et al. [4] cloned a novel quantitative trait locus (QTL), *GLW7.1* (*Grain Length*, *Width and Weight 7.1*), which encodes the CCT motif family protein GHD7. It was hypothesized that GHD7 participates in GA biosynthesis to increase grain size and is regulated by the GID1-GA-DELLA module as the feedback of the pathway. The near-isogenic line constructed with the dominant allele showed reduced chalkiness, improved cooking and eating quality, and increased grain length [4].

Rice cooking and eating quality is especially important as it directly affects consumer taste preferences and market values. The eating quality can be indirectly predicted with a series of starch physicochemical property evaluations [15]. Starch, including amylose and amylopectin, is synthesized with the actions of a series of enzymes, such as ADPglucose pyrophosphorylase (AGPase), granule-bound starch synthase (GBSS), soluble starch synthases (SSs), starch branching enzymes (BEs), starch debranching enzymes (DBEs), and phosphorylases [5]. Amylose is synthesized by GBSS I, encoded by the *Waxy* gene (*Wx*). Apparent amylose content (AAC) is an important indicator to evaluate the eating quality of rice grains [15]. Starch gelatinization temperature (GT) is another important

**Citation:** Bao, J.; Xu, J.-H. Molecular Research for Cereal Grain Quality. *Int. J. Mol. Sci.* **2023**, *24*, 13687. https://doi.org/10.3390/ ijms241813687

Received: 29 August 2023 Accepted: 4 September 2023 Published: 5 September 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

predictor to determine the cooking quality of rice. GT is mainly controlled by the *SSIIa/ALK* (*alkaline degenerate*) gene [1,15]. Fragrance is appealing to consumers, which is controlled by the recessive gene *fgr*/*OsBADH2* (*betaine aldehyde dehydrogenase 2*) [6,7,15]. Pan et al. [1] analyzed the allelic diversification of the *Wx* and *ALK* loci in indica restorer lines developed over the last 50 years. The proportion of the *Wx<sup>a</sup>* allele decreased, while that of the *Wx<sup>b</sup>* allele increased, leading to a decrease in AAC and an increase in cooking and eating quality in current rice cultivars. *ALK* had no significant effect on taste value, and there was no strict requirement for the GT of high-quality rice, resulting in no selection pressure for *ALK* [1]. CRISPR/Cas9 technology was used to edit the 50 UTR of the *Wx* gene of three lines with undesirably high AAC to create a batch of soft rice (low-AAC rice) breeding materials. The edited rice lines had an AAC of 16–18% and gel consistency of 77–80 mm, suggesting that the eating quality was successfully improved [2]. Lu et al. [6] obtained a high-quality chromosome-level genome assembly (~378.78 Mb) for a new fragrant japonica cultivar "Changxianggeng 1813", with 31,671 predicted protein-coding genes. They found that it was the *badh2-E2* type of deletion (a 7 bp deletion in the second exon) that causes fragrance in this rice. The identification of many single-nucleotide polymorphisms, insertion-deletion polymorphisms (InDels), and large structure variants (SVs, >1000 bp) will be useful for genomics-assisted breeding in fragrant japonica rice. Tian et al. [7] edited *Wx* and *OsBADH2* simultaneously using CRISPR/Cas9 system to produce both homozygous two-line male sterile mutant lines and homozygous restorer mutant lines with free Cas9. The obtained mutants had a much lower AAC while having a significantly higher 2-acetyl-1-pyrroline aroma content. Based on this, a fragrant glutinous hybrid rice was developed without much effect on most agronomic traits.

Heading date may affect starch structure and grain yield due to different temperatures during seed development. Crofts et al. [8] constructed near-isogenic rice lines with *ss2a Hd1* (*heading date1*), *ss2a Hd1 hd1*, and *ss2a hd1* genotypes. They found that the *ss2a Hd1* line showed the highest plant biomass but with varied grain yield across different years. The *ss2a Hd1 hd1* line showed a higher total grain weight than *ss2a hd1*. The *ss2a hd1* line produced the lowest number of premature seeds and showed higher GT and lower AAC than *ss2a Hd1*, suggesting that *Hd1* is a candidate gene for developing high-yielding rice cultivars with the desired starch structure. Seed development and grain quality may also be affected by epigenetic regulation; DNA methylation is one of the main epigenetic modifications. Irshad et al. [9] generated a null mutant of a rice DNA demethylation gene, *Repressor Of Silencing 1a* (*OsROS1a*), with an in-frame deletion of the complete loss function of the Per-CXXC domain using CRISPR/Cas9 technology. The *osros1a* mutant showed longer and narrower grains, and seeds were deformed and contained an underdeveloped and less-starch-producing endosperm with slightly irregularly shaped embryos. The grains of the mutant were slightly opaque with rounded starch granules. RNA-Seq results indicated that the key genes for starch synthesis (*OsSSIIa* and *OsSSIIIa*) and cellulose synthesis (*CESA2*, *CESA3*, *CESA6*, and *CESA8*) and genes encoding polysaccharides and glutelin were downregulated in the mutant endosperm. Furthermore, 378 differential alternative splicing (AS) genes were identified in the mutant, suggesting that *OsROS1a* has an impact on AS events. Further analyzing the generated mutants, they produced a frameshift mutation to truncate the Pem-CXXC and RRMF domains of *OsROS1a*; this mutant had shrink spikelets, smaller anthers, and pollen grains and was not stained by iodine staining, showing a significant reduction in total soluble sugar and starch contents as compared to the wild type, which caused complete male sterility [10].

The enzymes to synthesize starch usually form a multienzyme complex, but whether these complexes change during seed development is not fully understood. Ying et al. [5] revealed that most of the enzymes except for SSIVb were eluted from GPC, first in smallermolecular-weight fractions at the early developing stage, and then transferred to highermolecular-weight fractions at the later stage in both WT and a *BEIIb* mutant (*be2b*). However, the inverse elution pattern of SSIVb may be attributed to its vital role in the initiation step of starch synthesis. The number of protein complexes was markedly decreased in *be2b* at

all development stages as compared to those in BEIIb. Although SSIVb could partially compensate for the role of BEIIb in protein complex formation, it was difficult to form a larger protein complex containing over five proteins in *be2b*. These findings unraveled a dynamic change in the protein complex during seed development, which enables a deeper understanding of the complex mechanism of starch biosynthesis and quality improvement in rice.

Storage proteins represent the second-largest storage substance in cereal grains and play an important role in determining the nutritional as well as cooking and eating quality of cereals. In cereal endosperm cells, seed storage proteins are synthesized on the endoplasmic reticulum (ER), where they proteins are translocated to the lumen. The localization of mRNAs plays an essential role in governing gene expression and protein targeting and thus determines cell fate, development, and polar growth. Zhang et al. [11] reviewed the current knowledge of the mechanisms and functions of mRNA localization to the ER in cereal endosperm cells. mRNA targeting to ER subdomains is driven by specific RNA zipcodes and requires a set of trans-acting RBPs that recognize and bind these zipcodes and recruit other factors to mediate active transport. A more detailed network of cotransported mRNAs and the mechanism of assembly and remodeling of multi-RNA-binding protein (RBP) complexes to recognize and bind target mRNAs deserve further investigation.

The storage of rice is also an important part of its production and transactions, and only with good storage performance can its commercial value be maintained in commodity transactions. Zhu et al. [12] found that under high-temperature storage conditions (35 ◦C), the indica–japonica hybrid rice Yongyou 1540 was not significantly worse in terms of fatty acid value, whiteness value, and changes in electron microscope profile. Metabolomics analysis identified 19 key differential metabolites, in which the lipid metabolites related to palmitoleic acid were found to affect the aging of rice. In addition, two substances, guanosine 30 ,50 -cyclophosphate and pipecolic acid, were beneficial to enhancing the resistance of rice to harsh storage conditions, thereby delaying the deterioration of, and maintaining, its quality.

Wheat grain quality mainly includes processing and nutritional quality. The processing quality mainly depends on the content and characteristics of storage proteins, which are important indicators for market value and consumer acceptance. Cao et al. [13] identified two novel high-molecular-weight glutenin subunits (HMW-GS) 1Ax2.1\* at *Glu-A1* and 1By19\* at *Glu-B1* from German spelt wheat with 2478 bp and 2163 bp encoding 824 and 720 amino acid residues, respectively. The specific single-nucleotide-polymorphism-based markers for 1Ax2.1\* and 1By19\* genes were developed and validated by using a wide range of wheat accessions, which provide new gene resources and molecular markers for improving wheat's breadmaking quality.

Spring cold stress (SCS) causes a serious threat to wheat reproductive tissues and grain production, which is a major constraint in achieving high grain yield and quality in winter wheat. Su et al. [14] reviewed the physiological and molecular mechanisms involved in wheat floret and spikelet SCS tolerance and summarized QTLs that regulate SCS to identify candidate genes for breeding. To sustain grain setting and quality under SCS, it is necessary to breed novel wheat cultivars using novel SCS-tolerant QTLs or genes with regards to floret and spikelet development in new breeding strategies and uncover the fundamental resistance mechanism.

All these papers provide new scientific insights into molecular research for cereal grain quality. However, most of them focus on the grain quality of rice and wheat crops, so information on other important cereals, including maize, oat, barley, millets, sorghum, rye, etc., is still limited. We hope to collect papers with a broad coverage of the grain quality features of more cereal crops in future Special Issues.

**Funding:** This work was financially supported by the Hainan Provincial Natural Science Foundation of China (323MS066).

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Allelic Diversification of the** *Wx* **and** *ALK* **Loci in Indica Restorer Lines and Their Utilisation in Hybrid Rice Breeding in China over the Last 50 Years**

**Li-Xu Pan 1,†, Zhi-Zhong Sun 2,†, Chang-Quan Zhang 1,3 , Bu Li <sup>1</sup> , Qing-Qing Yang <sup>1</sup> , Fei Chen <sup>1</sup> , Xiao-Lei Fan 1,3 , Dong-Sheng Zhao 1,3, Qi-Ming Lv <sup>2</sup> , Ding-Yang Yuan 2,\* and Qiao-Quan Liu 1,3,\***


**Abstract:** Hybrid rice technology has been used for more than 50 years, and eating and cooking quality (ECQ) has been a major focus throughout this period. *Waxy* (*Wx*) and alkaline denaturation (*ALK*) genes have received attention owing to their pivotal roles in determining rice characteristics. However, despite significant effort, the ECQ of restorer lines (RLs) has changed very little. By contrast, obvious changes have been seen in inbred rice varieties (IRVs), and the ECQ of IRVs is influenced by *Wx*, which reduces the proportion of *Wx<sup>a</sup>* and increases the proportion of *Wx<sup>b</sup>* , leading to a decrease in amylose content (AC) and an increase in ECQ. Meanwhile, *ALK* is not selected in the same way. We investigated *Wx* alleles and AC values of sterile lines of female parents with the main mating combinations in widely used areas. The results show that almost all sterile lines were *Wx<sup>a</sup>* -type with a high AC, which may explain the low ECQ of hybrid rice. Analysis of hybrid rice varieties and RLs in the last 5 years revealed serious homogenisation among hybrid rice varieties.

**Keywords:** *Oryza sativa* L.; hybrid rice; ECQ; *Waxy*; *ALK*; *indica* restorer; rice breeding

## **1. Introduction**

Increasing rice (*Oryza sativa* L.) yield has been the main breeding objective for several decades, and the yield potential of irrigated rice has already experienced two quantum leaps [1]. The first advance was semidwarf breeding [2,3], and the second was utilisation of hybrid rice technology [4]. In 1971, the team of Professor Yuan Longping discovered *O. rufipogon*, a wild male sterile plant, in Hainan Province, China, which revealed the production potential of hybrid rice, paving the way for the development of F<sup>1</sup> hybrid rice introduced in 1976 [4]. Through the introduction of hybrid rice, the development of heterotic markers led to the second breakthrough and consequent leap in production. The achievements of the 1960s and 1970s increased rice production by ~50% in 10 years in many countries, including China, Indonesia and Vietnam. Since then, hybridisation has been the core focus of rice breeding in China [5,6].

Although *xian/indica*, *XI* (*O. sativa* L. *subsp. xian/indica*)-*geng/japonica*, *GJ* (*O. sativa* L. *subsp. geng/japonica*) and *GJ-GJ* hybrids have been reported, the most common rice hybrids

**Citation:** Pan, L.-X.; Sun, Z.-Z.; Zhang, C.-Q.; Li, B.; Yang, Q.-Q.; Chen, F.; Fan, X.-L.; Zhao, D.-S.; Lv, Q.-M.; Yuan, D.-Y.; et al. Allelic Diversification of the *Wx* and *ALK* Loci in Indica Restorer Lines and Their Utilisation in Hybrid Rice Breeding in China over the Last 50 Years. *Int. J. Mol. Sci.* **2022**, *23*, 5941. https://doi.org/10.3390/ ijms23115941

Academic Editors: Jinsong Bao and Jianhong Xu

Received: 30 March 2022 Accepted: 23 May 2022 Published: 25 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

are *XI-XI* hybrids, or *XI* hybrids for short. There are two types of *XI* hybrid rice: threeline hybrids and two-line hybrids [7–11]. In the early stages, there were many materials collected from various countries, and then more and more materials began to be cultivated in China. Among them, South China (SC) took the lead in breeding a large number of varieties. Since the 1990s, a large number of materials have been bred in the upper reaches of the Yangtze River (UYR) and the middle and lower reaches of the Yangtze River (MLYR). Among the materials in the UYR, 3-RLs predominate, while 2-RLs are mainly from the MLYR. Hybrid rice ECQ is directly controlled by both restorer and sterile lines.

As people's living standards increase with economic development, rice ECQ receives attention. In hybrid rice breeding, a high and stable yield is the main goal, and rice ECQ indices are essential to this end [12,13]. Primary factors affecting rice ECQ are AC, gelatinisation temperature (GT), gel consistency (GC) and viscosity [14–16]. For a long time, it was generally believed that hybrid rice in China had a high yield but poor quality [17]. Higher AC and a chalky grain rate are the main reasons why the quality of hybrid rice is lower than that of inbred rice [18]. In order to improve the efficacy of breeding for rice quality, the Ministry of Agriculture and Rural Affairs of China promulgated the agricultural industry standards NY20-1986 (1986), 'quality edible rice', and NY/T-593-2002 (2002), 'cooking rice variety quality'.

These factors are mainly regulated by starch-synthesis-related genes. Starch is synthesised by the orchestrated functional interactions of four classes of enzymes; ADP-glucose pyrophosphorylase, starch branching enzyme (BE), starch synthase (SS) and starch debranching enzyme (DBE) [19–23]. The granule-bound starch synthase I enzyme (GBSSI) is required for amylose synthesis in rice, and the gene encoding GBSSI was named *Waxy* (*Wx*) [24]. Many alleles of *Wx* (such as *Wx<sup>a</sup>* , *Wx<sup>b</sup>* , *Wxin* , *Wxop* , *Wxmp* , *Wxlv, wx* and *Wxmw/la*) lead to regional changes in rice AC and affect consumer preferences [24–30]. GT is mainly controlled by the alkaline denaturation (*ALK*) gene encoding modified starch synthase IIa (SSIIa) [31]. Several studies reported that at least three single-nucleotide polymorphisms (SNPs) of *ALK* are associated with the diversity of GT in rice. Ex8-733 bp (A/G) and Ex8-864/865 bp (G/T and C/T) of *ALK* in exon 8 (Ex8) are closely related to changes in GT [14,31–33]. These functional SNPs produce three haplotypes/alleles, including *ALK<sup>a</sup>* (A-GC) and *ALK<sup>b</sup>* (G-TT), which control low GT, and *ALK<sup>c</sup>* (G-GC), which controls high GT [32,34]. Recently, a new *ALK* allele, *ALK<sup>d</sup>* Ex1-294 (G/T), was reported [15].

For rice, seed development begins with double fertilisation, which leads to the development of the embryo and the endosperm. After double fertilisation, two sperm enter the embryo sac through the pollen tube. One sperm cell fuses with the egg cell to form a zygote, and the other sperm cell fuses with the central cell to form a triploid primary endosperm cell. Zygotes and primary endosperm cells develop into an embryo and transfer genetic material from parents to the next generation, and endosperm nourishes developing embryos/seedlings [35–37]. Both sterile lines and restorer lines will affect the ECQ of hybrid rice. Although ECQ evolutionary trends in hybrid rice breeding can be realised directly from the perspective of restorer lines, the process of improving the ECQ of hybrid rice can be achieved indirectly.

Various materials are employed for the cultivation of parents of hybrid rice, especially RLs, which have been mainly used to systematically explore the ECQ of hybrid rice. At present, there are few studies on the ECQ of hybrid rice. Previous studies have mainly focused on rice quality in the F<sup>2</sup> generation of hybrid rice [13,38,39]. However, the results are limited to specific combinations and cannot be widely used to guide the cultivation of hybrid rice. Therefore, based on the analysis of restorer line materials, we investigated improving RLs to indirectly predict the evolutionary process driving the ECQ of hybrid rice.

#### **2. Results**

#### *2.1. Characteristics of ECQ among Indica RLs*

RL materials were subjected to various analyses, including AAC, differential scanning calorimetry (DSC), Rapid Visco-Analyzer (RVA) and ECQ (Figures 1c–h and S1). AAC

values fell into two main ranges: 10–18% and 22–30%. Similarly, *T<sup>p</sup>* values fell into two main ranges: 65–75 ◦C and 75–85 ◦C. The results reveal a negative correlation between AAC and taste value (r = −0.515, *p* < 0.01; Table S2). *T<sup>p</sup>* was not significantly correlated with AAC or taste value, but *T<sup>p</sup>* was significantly correlated with BDV and SBV, positively in the case of BDV (r = 0.517, *p* < 0.01; Table S2) and negatively in the case of SBV (r = −0.511, *p* < 0.01; Table S2). *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 4 of 17

**Figure 1.** Phenotypic distribution statistics over time. (**a**,**b**) Distributions of materials in terms of breeding time, region and type. (**c**,**e**,**g**) 2-RLs and 3-RLs. (**d**,**f**,**h**) IRVs. (**c**,**d**) Apparent amylose content. (**e**,**f**) Peak temperature of gelatinisation. (**g**,**h**) Taste value. **Figure 1.** Phenotypic distribution statistics over time. (**a**,**b**) Distributions of materials in terms of breeding time, region and type. (**c**,**e**,**g**) 2-RLs and 3-RLs. (**d**,**f**,**h**) IRVs. (**c**,**d**) Apparent amylose content. (**e**,**f**) Peak temperature of gelatinisation. (**g**,**h**) Taste value.

#### *2.2. ECQ Differentiation with Years and Places*

In the past 50 years, none of the indices of RL materials have changed significantly. By contrast, for IRVs, AAC dropped from ~25% in 1970 to ~15% in 2000 (stage 1), and AAC has remained stable at ~15% since (stage 2). Taste value increased over time before the 21st century, and has remained stable since. *T<sup>p</sup>* was high before 2000, and has decreased since. RVA and other indices followed similar trends. There was no significant change in RLs over time, but there was an obvious regular pattern for IRVs (Figures 1c–h and S1). Regarding rice ECQ, this gradually improved over time for IRVs, but remained stable for RLs, although it remained higher for RLs than for IRVs.

According to the classification of the area used for collecting materials, all materials were divided into four groups: Foreign, SC, UYR and MLYR. Since entries for the Foreign group were fewer and less distributed than in other areas after introduction, this group did not receive our focus. In general, *T<sup>p</sup>* and ECQ values for UYR were higher than for other regions. Differences in hardness, stickiness and PKV were identical in the overall trend compared to ECQ (Figure S3). Regarding hardness and SBV, 2-RLs were significantly higher than 3-RLs, and PKV and BDV displayed the opposite phenomena. Except for these differences in indicators, there were no significant differences between 2-RLs and 3-RLs. All ECQ indices were significantly different between IRVs and RLs. In general, IRVs had higher AAC values and lower GT and ECQ values than RLs (Figure S2).

#### *2.3. Allelic Diversification of the Wx Locus in Indica RLs*

The data used in this experiment were obtained from the resequencing data of 1143 xian/indica rice lines reported by Lv et al., [11]. *Wx* and *ALK* segments were extracted from the data (physical location: Shuhui 498, R498). R498 genome assembly and annotation can be found at http://www.mbkbase.org/R498 (accessed 8 August 2021). Analysis of the *Wx* haplotype was performed using the chromosome 6 physical interval (1643113–1648065). A total of 33 SNPs were found to be in linkage disequilibrium with index SNPs (Figure 2b). Five haplotypes were identified from the above SNPs, and AAC values for materials corresponding to these haplotypes showed differences, but there were no significant differences between haplotypes 4 and 5 (Figure 2d). According to the analysis of known functional loci, there were no differences between haplotypes 4 and haplotype 5, which indicates that (508) in the intron and (215) in exon C-T changed CCC (proline) to CCT (proline); hence, there was no change in amino acids due to codon degeneracy.

Four *Wx* alleles were detected in the RLs. The predominant *Wx* gene was *Wx<sup>b</sup>* (228), followed by *Wx<sup>a</sup>* (44), and *Wxlv* and *Wxin* were also detected in smaller numbers (Figure 2a). The Hap1 material was *Wxlv*-type, and Hap1 and *Wxlv* were consistent in functional sites. The AAC of Hap1 was >25%, consistent with *Wxlv*-type rice. Similarly, Hap2 material was *Wx<sup>a</sup>* , whereas Hap3 material was *Wxin*. It is worth noting that Hap4 and Hap5 were *Wx<sup>b</sup>* , and these were consistent at functional sites. The AAC of Hap4 and Hap5 was ~15%, consistent with *Wx<sup>b</sup>* -type rice (Figure 2d,e). Additionally, the *Wx* genes of IRV materials were also analysed (Figure S4).

Based on comprehensive analysis of breeding years, materials were divided into seven time periods: <1980, 1980–1989, 1990–1999, 2000–2004, 2005–2009, 2010–2014 and ≥2015 (Figure 3a,d,g). The AAC of *Wx<sup>a</sup>* was ~25%, and for *Wx<sup>b</sup>* , it was ~15%, and the higher the proportion of *Wx<sup>a</sup>* , the greater the increase in AAC. According to the combined genotype and phenotype data, the reason for the decrease in AAC for IRVs before the 2000s was that the allelic proportion of the *Wx<sup>b</sup>* type increased, and the proportion remained stable (~15%) after the 2000s (Figure S7a). The allelic proportion of the *Wx<sup>b</sup>* type of 2-RLs also increased over time, but this was masked by 3-RLs. It should be noted that the early stage of 2-RLs was screened by test crossing with IRVs. The proportion of the *Wx<sup>b</sup>* allele did not differ between areas (Figure S8c). Evidently, the *Wx<sup>b</sup>* -type allele of IRVs was scarcer than that of RLs. Analysis of different types of materials showed that the *Wxa*/*Wx<sup>b</sup>* ratio of IRVs and 2-RLs decreased, while the ratio of 3-RLs did not change significantly (Figures 3a,d,g and S7a).

**Figure 2.** Analysis of different *Wx* alleles and haplotypes. (**a**) *Wx* allele types and numbers screened by KASP marker. (**b**) Linkage disequilibrium (LD) analysis. The *r*2 value is reflected on the matrix diagram. (**c**) Results of haplotype analysis. (**d**) Apparent amylose content of different haplotypes. (**e**) Apparent amylose content of different allele types. Different letters indicate significant differences (*p* < 0.05). **Figure 2.** Analysis of different *Wx* alleles and haplotypes. (**a**) *Wx* allele types and numbers screened by KASP marker. (**b**) Linkage disequilibrium (LD) analysis. The *r* <sup>2</sup> value is reflected on the matrix diagram. (**c**) Results of haplotype analysis. (**d**) Apparent amylose content of different haplotypes. (**e**) Apparent amylose content of different allele types. Different letters indicate significant differences (*p* < 0.05).

were also analysed (Figure S4).

(Figures 3a,d,g and S7a).

**Figure 3.** Frequency distribution of different rice materials based on different alleles. (**a**,**d**,**g**) Frequency distribution of different alleles of *Wx*. (**b**,**e**,**h**) Frequency distribution of different alleles of *ALK*. (**a**– **c**) Total restorer lines. (**d**–**f**) 3-RLs. (**g**–**i**) 2-RLs. **Figure 3.** Frequency distribution of different rice materials based on different alleles. (**a**,**d**,**g**) Frequency distribution of different alleles of *Wx*. (**b**,**e**,**h**) Frequency distribution of different alleles of *ALK*. (**a**–**c**) Total restorer lines. (**d**–**f**) 3-RLs. (**g**–**i**) 2-RLs.

Four *Wx* alleles were detected in the RLs. The predominant *Wx* gene was *Wxb* (228), followed by *Wxa* (44), and *Wxlv* and *Wxin* were also detected in smaller numbers (Figure 2a). The Hap1 material was *Wxlv*-type, and Hap1 and *Wxlv* were consistent in functional sites. The AAC of Hap1 was >25%, consistent with *Wxlv*-type rice. Similarly, Hap2 material was *Wxa*, whereas Hap3 material was *Wxin*. It is worth noting that Hap4 and Hap5 were *Wxb*, and these were consistent at functional sites. The AAC of Hap4 and Hap5 was ~15%, consistent with *Wxb*-type rice (Figure 2d,e). Additionally, the *Wx* genes of IRV materials

Based on comprehensive analysis of breeding years, materials were divided into seven time periods: <1980, 1980–1989, 1990–1999, 2000–2004, 2005–2009, 2010–2014 and ≥2015 (Figure 3a,d,g). The AAC of *Wxa* was ~25%, and for *Wxb*, it was ~15%, and the higher the proportion of *Wxa*, the greater the increase in AAC. According to the combined genotype and phenotype data, the reason for the decrease in AAC for IRVs before the 2000s was that the allelic proportion of the *Wxb* type increased, and the proportion remained stable (~15%) after the 2000s (Figure S7a). The allelic proportion of the *Wxb* type of 2-RLs also increased over time, but this was masked by 3-RLs. It should be noted that the early stage of 2-RLs was screened by test crossing with IRVs. The proportion of the *Wxb* allele did not differ between areas (Figure S8c). Evidently, the *Wxb*-type allele of IRVs was scarcer than that of RLs. Analysis of different types of materials showed that the *Wxa*/*Wxb* ratio of IRVs and 2-RLs decreased, while the ratio of 3-RLs did not change significantly

#### *2.4. Allelic Diversification of the ALK Locus in Indica RLs 2.4. Allelic Diversification of the ALK Locus in Indica RLs*

Analysis of *ALK* haplotype was performed using chromosome 6 physical interval 6811279−6816183. A total of 30 SNPs were found to be in linkage disequilibrium with index SNPs (Figure 4b). Five haplotypes were identified by the SNP above, and the *Tp* values Analysis of *ALK* haplotype was performed using chromosome 6 physical interval 6811279–6816183. A total of 30 SNPs were found to be in linkage disequilibrium with index SNPs (Figure 4b). Five haplotypes were identified by the SNP above, and the *T<sup>p</sup>* values of materials corresponding to these haplotypes showed differences, but it should be noted that the *T<sup>p</sup>* of haplotype 1 and another four haplotypes were significantly different (Figure 4c,d). According to known functional loci analysis, there were differences between haplotypes 1 and the other four haplotypes, but there were no differences between haplotypes 2–5. In addition to 6811957, 6812194, 6815238, 6815273, 6815342 and 6815357, other SNP differences were found in the noncoding region, and except for 6815238 (A–G) encoding a serine to glycine change, all amino acids remained the same. According to known functional loci analysis, there were no differences between haplotypes 2 and 5; there were only differences between haplotype 1 and other haplotypes in functional loci of Ex8-864/865 (Figure 3d).

Additionally, the *ALK* genes of IRV materials were analysed (Figure S5). Two *ALK* alleles were detected in RLs (Figure 3a), most of which were *ALK<sup>b</sup>* (134), followed by *ALK<sup>c</sup>* (141). Haplotypes 2–5 were different in SNPs, but the GT was similar with no significant differences between haplotypes. Relative differences between haplotypes were further explored by cluster analysis of *ALK* genes with rice varieties as controls. A phylogenetic tree and genetic distance analyses showed that all varieties could be divided into three categories: Hap1 and 4, Hap2 and 3 and Hap5. The Hap1 allele clustered with the Hap4 allele, indicating a close genetic relationship between these two alleles, and they were distributed in *XI* and *GJ*. Hap2, Hap3 and Hap5 were distributed in *XI*, and Hap2 and Hap3 were closely genetically related (Figure S6 and Table S3).

 **Figure 4.** Analysis of different alleles and haplotypes of *ALK*. (**a**) Allele types and number of *ALK*s screened by KASP markers. (**b**) Linkage disequilibrium (LD) analysis. The *r*2 value is shown on the matrix diagram. (**c**) Haplotype analysis. (**d**) Peak temperature of gelatinisation of different haplotypes. (**e**) Peak temperature of gelatinisation of different allele types. Different letters indicate significant differences (*p* < 0.05). **Figure 4.** Analysis of different alleles and haplotypes of *ALK*. (**a**) Allele types and number of *ALK*s screened by KASP markers. (**b**) Linkage disequilibrium (LD) analysis. The *r* <sup>2</sup> value is shown on the matrix diagram. (**c**) Haplotype analysis. (**d**) Peak temperature of gelatinisation of different haplotypes. (**e**) Peak temperature of gelatinisation of different allele types. Different letters indicate significant differences (*p* < 0.05).

Previous studies showed that *ALK* had three major alleles: *ALK<sup>a</sup>* , *ALK<sup>b</sup>* and *ALK<sup>c</sup>* . There were only two types (*ALK<sup>b</sup>* and *ALK<sup>c</sup>* ) in the materials studied herein. Hap1 materials were *ALK<sup>b</sup>* , and Hap1 and *ALK<sup>b</sup>* were consistent at functional sites. The *T<sup>p</sup>* of Hap1 was ~70 ◦C, consistent with the *T<sup>p</sup>* of the *ALK<sup>b</sup>* type. Hap2–5 materials were *ALK<sup>c</sup>* , and they were consistent in functional sites. The *T<sup>p</sup>* of Hap2–5 was ~80 ◦C, consistent with the *T<sup>p</sup>* of the *ALK<sup>c</sup>* type (Figure 3d,e).

The *ALK* gene was divided into *ALK<sup>b</sup>* and *ALK<sup>c</sup>* types, and *ALK<sup>c</sup>* had a higher GT than *ALK<sup>b</sup>* . As a result, the *ALK<sup>b</sup>* -type rate of 2-RLs and 3-RLs did not change regularly over time; rather, fluctuations were irregular. The allelic proportion of the *ALK<sup>b</sup>* -type rate of IRVs displayed two stages. The first stage (before the year 2000) had a lower proportion (~50%) than the second stage (after 2000), which was ~95% (Figures 3b,e,h and S7b). Regarding area distribution, the allelic proportion of the *ALK<sup>b</sup>* type in the UYR was lower than in SC and the MLYR, which led to *T<sup>p</sup>* in the UYR being significantly higher than that in SC and the MLYR (Figure S8f). The proportion of the *ALK<sup>b</sup>* haplotype was higher in IRVs than RLs (Figures 3b,e,h and S8d,e). For the *ALK* gene, the *ALKb*/*ALK<sup>c</sup>* ratio of RLs did not change significantly, but for IRVs, the ratio displayed an increasing trend (Figure S7b).

#### *2.5. Combining Wx and ALK Alleles and Their Utilisation in Indica RLs*

*Wx* and *ALK* are both on the short arm of chromosome 6, with physical positions 164313−164806 and 6811279−6816183, respectively, in Nipponbare. Because *Wx* was mainly composed of *Wx<sup>a</sup>* and *Wx<sup>b</sup>* , the two genes are divided into four types: *Wxa*/*ALK<sup>b</sup>* , *Wxa*/*ALK<sup>c</sup>* , *Wxb*/*ALK<sup>b</sup>* and *Wxb*/*ALK<sup>c</sup> .* For different types of materials, we performed analysis of *ALK* and *Wx* gene combinations (Figure S9a), and there were no significant differences between *ALK<sup>b</sup>* and *ALK<sup>c</sup>* in IRVs, 2-RLs or 3-RLs under the *Wx<sup>a</sup>* background. There were no significant differences between *ALK<sup>b</sup>* and *ALK<sup>c</sup>* in RLs under the *Wx<sup>b</sup>* background, but the gene frequency of *ALK<sup>b</sup>* was much higher than that of *ALK<sup>c</sup>* in IRVs. This shows that there was selective pressure on *ALK* under the *Wx<sup>b</sup>* background for IRVs. Previous results show that the *Wx<sup>b</sup>* type has been widely used in rice breeding over the years (Figure S9b). Thus, it was speculated that different alleles of the *ALK* gene were not selected in the early breeding process, but began to be selected in the later stages. To explore this further, the distribution of breeding time under different *Wx* backgrounds was analysed. The results show that *Wx<sup>a</sup>* materials were generally bred in the early stages, while *Wx<sup>b</sup>* materials were generally bred more recently, consistent with the prediction.

#### *2.6. Contributions of Elite Ls in the Development of Hybrid Rice*

In this experiment, 275 RLs were selected, including 2-RLs and 3-RLs. Hybrid rice varieties bred using these RLs were collected and analysed. The results show that during this period, 80 RLs were used in hybrid rice breeding, and 1057 varieties were bred (https: //www.ricedata.cn/variety/) (accessed 1 January 2022). Moreover, 24 RLs served as male plants in the breeding of >10 hybrid rice varieties (Figure 5a and Table S5). According to the statistical results of the number of hybrid rice varieties bred using RLs, almost all *Wx* genotypes of the elite RLs were *Wx<sup>b</sup>* , while the genotypes *ALK<sup>b</sup>* and *ALK<sup>c</sup>* were almost equal among *ALK* genes (Figure 5a). This shows that from the perspective of RLs used in hybrid rice breeding, elite RLs mainly used *Wx<sup>b</sup>* with a low AC, while *Wx<sup>a</sup>* with a high AC has rarely been used. The two main alleles (*ALK<sup>b</sup>* and *ALK<sup>c</sup>* ) of the *ALK* gene regulating GT are not present in RLs used in hybrid rice breeding.

In general, regarding RLs, the selection of functional alleles and the ECQ of the *Wx* gene of elite RLs used in hybrid rice breeding met the requirements of high ECQ. Notably, 132 hybrid rice varieties were bred with Huazhan as RLs, accounting for 12.49% of the total (Figure 5b). Huazhan was bred by the China National Rice Research Institute and Guangdong Academy of Agricultural Sciences. The breeding departments of these varieties were distributed all over China, and suitable planting areas were also widely distributed. Another extremely important RL in the history of hybrid rice breeding is Minghui63, used as a male plant to breed 35 hybrid rice varieties, considerably fewer varieties than Huazhan

(Figure 5a). The possible reason for this is that after 2014, the state successively launched the green channel for variety approval and the consortium test channel, and methods for testing rice varieties diversified, resulting in a sharp increase in the number of rice varieties included in regional tests, and it was ultimately approved. The above results confirm that in the past 5 years of hybrid rice breeding, a few materials have been heavily relied upon; in addition to Huazhan, Wushansimiao (28) and Chenghui727 (24) have been widely applied (Figure 5c). Further details are included in Table S5. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 11 of 17

**Figure 5.** Hybrid rice varieties bred using RLs. (**a**) RLs of male plants used to breed > 10 hybrid rice varieties. (**b**) All hybrid varieties cultivated using the restorer line in this experiment. (**c**) Hybrid varieties cultivated in the past 5 years using the restorer line in this experiment. **Figure 5.** Hybrid rice varieties bred using RLs. (**a**) RLs of male plants used to breed > 10 hybrid rice varieties. (**b**) All hybrid varieties cultivated using the restorer line in this experiment. (**c**) Hybrid varieties cultivated in the past 5 years using the restorer line in this experiment.

#### In general, regarding RLs, the selection of functional alleles and the ECQ of the *Wx* **3. Discussion**

gene of elite RLs used in hybrid rice breeding met the requirements of high ECQ. Notably, 132 hybrid rice varieties were bred with Huazhan as RLs, accounting for 12.49% of the total (Figure 5b). Huazhan was bred by the China National Rice Research Institute and Guangdong Academy of Agricultural Sciences. The breeding departments of these varie-Herein, we assessed 275 rice samples from RLs of foreign, SC, UYR and MLYR origin. Earlier materials were mainly introduced from foreign sources, but a large number of materials were developed in SC, followed by more in the Yangtze River region. The 3-RLs were evenly distributed in different areas, and 2-RLs were mainly distributed in MLYR.

ties were distributed all over China, and suitable planting areas were also widely distributed. Another extremely important RL in the history of hybrid rice breeding is Minghui63, used as a male plant to breed 35 hybrid rice varieties, considerably fewer varieties than Huazhan (Figure 5a). The possible reason for this is that after 2014, the state successively launched the green channel for variety approval and the consortium test channel, and methods for testing rice varieties diversified, resulting in a sharp increase in the number of rice varieties included in regional tests, and it was ultimately approved. The above results confirm that in the past 5 years of hybrid rice breeding, a few materials have been heavily relied upon; in addition to Huazhan, Wushansimiao (28) and Chenghui727 (24) have been widely applied (Figure 5c). Further details are included in Table S5. The phenotypes of selected materials were investigated, including AAC, DSC, RVA and taste value, and the results are consistent with the ranges reported in previous studies [14,15,29,38,39]. Correlation analysis of different indices revealed a significant correlation between AAC and taste value (r = −0.515, *p* < 0.01), with a significant negative correlation. Specifically, varieties with a higher AAC had higher hardness and a lower taste value [14]. There was no significant correlation between GT and ECQ, consistent with previous studies [15]. Significant correlations were identified between certain RVA profiles and texture characteristics. Therefore, RVA profiles are commonly used to evaluate milled rice ECQ [40,41]. Our experimental results reveal a significant positive correlation between eating value and BDV, and a significant negative correlation with SBV. These results are consistent with previous reports.

*3.* **Discussion** Herein, we assessed 275 rice samples from RLs of foreign, SC, UYR and MLYR origin. Earlier materials were mainly introduced from foreign sources, but a large number of materials were developed in SC, followed by more in the Yangtze River region. The 3-RLs were evenly distributed in different areas, and 2-RLs were mainly distributed in MLYR. The phenotypes of selected materials were investigated, including AAC, DSC, RVA and taste value, and the results are consistent with the ranges reported in previous studies [14,15,29,38,39]. Correlation analysis of different indices revealed a significant correlation between AAC and taste value (r = −0.515, *p* < 0.01), with a significant negative correlation. Specifically, varieties with a higher AAC had higher hardness and a lower taste value [14]. In the past 50 years, indices of RL materials have not changed regularly over time. Rather, RLs introduced from foreign sources were mainly *Wx<sup>b</sup>* -type, and this appears to be reflected in ECQ. RLs introduced in the later stages were mainly used in recent breeding, resulting in low AAC. By contrast, obvious regular changes occurred in IRVs. Analysis of AAC of IRVs over this time interval, from the 1970s to the 2000s, showed a decreasing trend with a final decrease to ~15%. After the 2000s, AAC stabilised at 15% without any further decline. Reducing AAC could improve the ECQ of rice. There was no clear standard for ECQ in China before the first standard NY20-1986 (1986), 'quality edible rice', was issued in the 1980s. NY122-86, including ECQ indicators, limited the permissible range of AAC, GC and GT in high-quality rice. There is a significant negative correlation between ECQ and

There was no significant correlation between GT and ECQ, consistent with previous studies [15]. Significant correlations were identified between certain RVA profiles and texture

value and BDV, and a significant negative correlation with SBV. These results are con-

sistent with previous reports.

AAC, since reducing AAC to improve rice ECQ has become a major goal of plant breeders. By contrast, GT has no significant correlation with ECQ, and the GT of different grades of rice is identical among the same rice types of rice, reflecting the fact that GT has not been considered in the breeding process. The Ministry of Agriculture and Rural Affairs of China promulgated new agricultural industry standards in the form of NY/T 593-2013, 'cooking rice variety quality'. This includes GT corresponding to alkali spreading value (ASV) > 6 or GT < 70 ◦C, which will encourage rice breeders to cultivate varieties with a lower GT.

*Wx* and *ALK* are the main genes regulating rice ECQ. *Wx* encodes the soluble starch synthase GBSS1, and many allele variations of *Wx* have been identified, including *Wx<sup>a</sup>* , *Wx<sup>b</sup>* , *Wxin* , *Wxop* , *Wxmp Wxlv* , *Wxmw/ la* and *wx*, leading to regional changes in AC that affect consumer preferences [24–29,42,43]. The *Wx<sup>b</sup>* allele type was found to be predominant in the experimental materials, followed by *Wx<sup>a</sup>* , while *Wxlv* and *Wxin* were also present. Over the years, the *Wxa*/*Wx<sup>b</sup>* ratio of RL materials did not change regularly; rather, *Wxa*/*Wx<sup>b</sup>* of 2-RLs and IRVs decreased from the 1970s to the 2000s, then stabilised after then 2000s, consistent with changes in AAC. In accordance with the breeding regions, the *Wxa*/*Wx<sup>b</sup>* ratio of UYR was lower than those of SC and MLYR, consistent with AAC, indicating that *Wx* is the main gene regulated by AAC synthesis.

*ALK/SSIIA* encodes soluble starch synthase IIA (SSIIA) in rice, which plays a specific role in the synthesis of long B1 chains by elongating the short A and B1 chains of amylopectin in the endosperm [15,32,34,44]. SSIIa is the key enzyme controlling GT in rice [45]. Several studies have reported that at least four SNPs of *ALK* are associated with the four alleles in rice (*ALK<sup>a</sup>* , *ALK<sup>b</sup>* , *ALK<sup>c</sup>* and *ALK<sup>d</sup>* ) [15,32,33,46,47]. Only two allele types, *ALK<sup>b</sup>* and *ALK<sup>c</sup>* , were detected in the experimental material tested herein. Previous results show that GT was higher for *ALK<sup>c</sup>* and lower for *ALK<sup>b</sup>* , consistent with our current results. *ALK<sup>a</sup>* was generally present in *GJ*, and all rice accessions selected were *XI*, consistent with previous reports [34,48,49]. Over the years, the *ALKc*/*ALK<sup>b</sup>* ratio of RLs fluctuated irregularly, and based on area, UYR was significantly higher than others, consistent with the GT data. This indicates that *ALK* is the main gene regulating GT in rice. The *ALKc*/*ALK<sup>b</sup>* ratio in IRVs was lower than in 2-RLs and 3-RLs, consistent with *T<sup>p</sup>* values reported in previous work.

The genetic material affecting rice ECQ in hybrid rice comes from RLs and sterile lines, and the two main genes (*Wx* and *ALK*) regulating rice ECQ are separated in F<sup>2</sup> plants, resulting in heterozygous rice eaten by consumers [50]. In theory, the endosperm of hybrid combinations includes four genotypes, *Wxb*/*Wxb*/*Wx<sup>b</sup>* , *Wxb*/*Wxb*/*Wx<sup>a</sup>* , *Wxb*/*Wxa*/*Wx<sup>a</sup>* and *Wxa*/*Wxa*/*Wx<sup>a</sup>* , and the ratio of genetic separation of these four genotypes is 1:1:1:1 [51]. The AC phenotype in the heterozygous state containing *Wx<sup>a</sup>* tends to be *Wxa*/*Wxa*/*Wx<sup>a</sup>* . If the sterile line is the *Wx<sup>a</sup>* type, even if the RL is *Wx<sup>b</sup>* , 3/4 of the final edible hybrid rice will be high in AC, resulting in hybrid rice with a lower ECQ. *Wx* alleles and AC of sterile lines have been assessed alongside the main mating combinations II-32 A, Jin23 A, Zhenshan97 A, Zhong9 A, Bo A, Long tefu A, Y58A, Xieqing Zao A, Peiai 64S and Gang 46A, the widely used areas Zhenshan 97A, Wei 20A, II-32A, Jin 23A, Gang 46A, Xieqing Zao A, Bo A, Long Tefu A, Peiai 64S and Zhong 9A and corresponding maintainer line materials (https://www.ricedata.cn/variety/) (accessed 1 January 2022). Based on the statistics for sterile lines of females matching RLs in our experiment, the results are consistent with the above website (Figure S9). Almost all sterile lines were *Wx<sup>a</sup>* with a high AC, which may be the main reason for the low ECQ of hybrid rice [17,24]. In addition, *ALK* will also be separated in the F<sup>2</sup> generation, leading to differences in the degree of gelatinisation of hybrid rice during cooking, which further reduces ECQ [35].

There are two types of *XI* hybrid rice, three-line hybrids and two-line hybrids, and the corresponding RLs are 2-RLs and 3-RLs [11]. Hybrid rice bred from RLs were analysed, and the results show that in the breeding process, we relied heavily on several or even a single backbone RL, and many varieties were bred over a short time period; hence, the assimilation of these varieties was rapid. Hybrid rices and their male parents (RLs) that had been employed in the last 5 years were analysed, and of all RLs, only 80 were bred as male parents. It is worth mentioning that many RLs have been eliminated by breeders. Hybrid

rice bred as a parent from several varieties accounts for the majority of hybrids developed over the past 5 years. In particular, Huazhan has been used for 94 varieties, accounting for 29% of the total. In the process of hybrid rice breeding, the high-frequency utilisation of a few backbone RLs will decrease the diversity of rice quality traits. Wushansimiao and Chenghui727 have also been heavily relied upon, although not to the same extent as Huazhan. ECQ analysis of RL materials used with high frequency may reveal information on taste value, and almost all varieties tested herein were *Wx<sup>b</sup>* haplotypes.

We need to consider improving the yield and broad-spectrum resistance of hybrid rice in relation to RLs [50]. In addition, we need to pay attention to rice ECQ, which can be enhanced by introducing major genes such as *Wx* alleles [51]. Rice with a low AC is less expansive, fluffy and soft, and hence popular among consumers [52]. Because of its adaptability to almost all climatic situations, many varieties of rice are present worldwide, with differences in ECQ, and a favoured type of rice for one cultural group may not be favoured by others [17]. The rice (*XJ*) variety available in northern and eastern parts of Asia is sticky and soft when cooked. Introducing alleles that regulate the synthesis of AAC and lower it compared with *Wx<sup>b</sup>* , such as *Wxop* , *Wxmp* and *Wxmw/ la* can improve ECQ [23,29,30]. Similarly, according to the different requirements for populations in different regions for high-quality rice, we should cultivate targeted rice varieties.

Previous studies and our current results show that *ALK* had no significant effect on taste value, and there was no strict requirement for the GT of high-quality rice, resulting in no selection pressure for *ALK*. However, previous studies showed that the taste value of *ALK<sup>c</sup>* rice after cooling was significantly lower than that of *ALK<sup>b</sup>* rice [51]. The gelatinisation temperature increases the cooking time of rice, increasing the need for fuel, which has a great impact on rice-related food processing. Ultimately, diversifying varieties should provide breeders with more raw materials (parents) for selecting and developing desirable traits.

In summary, we compared RLs and conventional materials employed in the rice breeding process over the past 50 years. The overall ECQ of RLs has remained high and not fluctuated significantly. The main reason for the improvement in ECQ in hybrid rice may be derived from sterile lines of female parents. RLs can use other alleles of *Wx*, such as *Wxop* , *Wxmp* and *Wxmw*, to further reduce AC and improve ECQ, which may further improve hybrid rice ECQ. RLs can increase the utilisation of the *ALK<sup>b</sup>* allele and improve the ECQ of hybrid rice varieties. New results should be compared with those of previous studies, and any findings and their implications should be discussed in the broadest possible context.

#### **4. Materials and Methods**

#### *4.1. Plant Materials and Growth Conditions*

A total of 387 indica rice accessions provided by the Hunan Hybrid Rice Research Center, Changsha, China, were used in this study (Figure 1a and Table S1). They included 112 varieties of IRVs, 59 2-RLs and 216 3-RLs, representing the parents of the majority of *XI* hybrid rice accessions that have been widely planted in China over the last 50 years, along with some varieties from foreign sources. All cultivation was performed in the same field in Changsha, China. The planting season was from May to November.

#### *4.2. Polished Rice and Flour Preparation*

Seeds were de-husked using a rice huller SY88-TH (SSANG YONG Motor, Seoul, Korea) and milled with a grain polisher (Kett, Tokyo, Japan). Intact rice was selected to measure the taste value, and another portion of polished rice samples was ground into flour in a mill (FOSS 1093 Cyclotec Sample Mill; Foss A/S, Hillerød, Denmark) and passed through a 100-mesh sieve. Flour was incubated in a 37 ◦C oven for 48 h, then in the natural environment for 2 days, and stored at 4 ◦C. Taste value data included taste value, hardness and stickiness.

#### *4.3. Determination of Grain Physicochemical Properties*

AAC of flour was measured by iodine colorimetry. [53] with modifications. GT of flour was determined by DSC (DSC200F3, NETZSCH, Bavaria, Germany) [16]. DSC data included enthalpy of gelatinisation (∆*H*), onset temperature of gelatinisation (*To*), peak temperature of gelatinisation (*Tp*) and terminating temperature of gelatinisation (*Te*). Pasting properties were determined by RVA (Newport Scientific, Warriewood, Australia) [54]. RVA data included hot-paste viscosity (HPV), peak paste viscosity (PKV), breakdown value (BDV), cool-paste viscosity (CPV) and setback value (SBV). All tests were performed in triplicate.

#### *4.4. Measurement of Rice Taste Value*

A STA1B rice taste meter (Sasaki, Tokyo, Japan) was used to determine the taste values of all materials. ECQ mainly reflects the taste value, hardness and stickiness of rice. All tests were performed in triplicate.

#### *4.5. Sequencing and Genotype Annotation*

DNA sequencing and SNP calling for all rice accessions were performed as described previously [11]. Tag SNPs were then identified from each clump using De Bakker's algorithm implemented in Haploview [55]. Linkage disequilibrium (LD) blocking these tag SNPs was then assessed using Gabriel's algorithm [56] and visualised using Haploview [42]. SNPs within a clump were selected based on *p* < 0.01 and an LD value of *r* <sup>2</sup> > 0.5 with index SNPs.

#### *4.6. Detection of Wx and ALK Alleles*

Genomic DNA was extracted from fresh leaves of all parental lines using a modified CTAB method. *Wx* and *ALK* allelic variations were analyzed by KASP genotyping [38]. The details of allele-specific primers are shown in Supplementary Table S4.

#### *4.7. Statistical Analysis*

DNA sequences of *ALK* genes of different materials were obtained, and homologous alignment was performed using the NCBI website (http://www.ncbi.nlm.nih.gov/) (accessed 1 January 2022). Nucleotide sequence similarity and multiple sequence alignment analyses were carried out using ClustalX and GeneDoc. A phylogenetic tree was constructed by the neighbour-joining (NJ) method with the Kimura 2-parameter (K2-P) model using MEGA version 5.0. Experiments were carried out in triplicate, and results are reported as mean values ± standard deviations (SD). One-way analysis of variance (ANOVA) and Tukey's multiple comparison test were used to determine significant differences among mean values using SPSS 16.0 statistical software (IBM, Armonk, NY, USA).

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms23115941/s1.

**Author Contributions:** L.-X.P. performed experiments, analysed data and wrote the manuscript; Z.- Z.S. and Q.-M.L. performed rice cultivation experiments; D.-Y.Y. and Q.-Q.L. designed experiments and edited the manuscript; B.L. and F.C. performed experiments; Q.-Q.Y. performed the KASP analysis; D.-S.Z., X.-L.F. and C.-Q.Z. reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Science Foundation of China (31825019, and U19A2032), Hainan Yazhou Bay Seed Lab (B21HJ8105), the programs from Jiangsu Province Government (JBGS [2021]001, BZ2021017, and PAPD) and Postgraduate Research & Practice Innovation Program of Jiangsu Province (XKYCX18\_078).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Datasets supporting the conclusions of this article are included within the article (and its additional files).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


## *Article* **Development of Soft Rice Lines by Regulating Amylose Content via Editing the 5**0**UTR of the** *Wx* **Gene**

**Jinlian Yang, Xinying Guo, Xuan Wang, Yaoyu Fang, Fang Liu , Baoxiang Qin and Rongbai Li \***

State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Agriculture, Guangxi University, Nanning 530004, China

**\*** Correspondence: rongbaili@gxu.edu.cn

**Abstract:** The type of soft rice with low amylose content (AC) is more and more favored by consumers for its better eating and cooking quality, as people's quality of life continuously improves in China. The *Wx* gene regulates the AC of rice grains, thus affecting the degree of softness of the rice. Mei Meng B (MMB), Tian Kang B (TKB), and DR462 are three indica rice maintained lines with good morphological characters, but also with undesirably high AC. Therefore, CRISPR/Cas9 technology was used to edit the *Wx* gene of these lines to create a batch of soft rice breeding materials. New gene-edited lines MMB-10-2, TKB-21-12, and DR462-9-9, derived from the above parental lines, respectively, were selected in the T<sup>2</sup> generations, with an AC of 17.2%, 16.8%, and 17.8%, and gel consistency (GC) of 78.6 mm, 77.4 mm, and 79.6 mm, respectively. The rapid viscosity analysis (RVA) spectrum showed that the three edited lines had a better eating quality as compared to the corresponding wild type, and showing new characteristics, different from the high-quality soft rice popular in the market. There was no significant difference in the main agronomic traits in the three edited lines compared to the corresponding wild types. Moreover, the chalkiness of DR462-9-9 was reduced, resulting in an improved appearance of its polished rice. The present study created soft rice germplasms for breeding improved quality hybrid rice, without changing the excellent traits of their corresponding wild type varieties.

**Keywords:** rice; CRISPR/Cas9; *Wx* gene; eating and cooking quality (ECQ); amylose content

#### **1. Introduction**

Along with the improvement of our national living standards, consumers have higher requirements for the eating and cooking quality (ECQ) of rice. Rice with low AC has the advantages of palatability, good puffing properties, softness, elasticity, a low degree of retrogradation and hardening after cooling, and does not rot easily [1]. As the main component of the rice endosperm, AC is an important index for evaluating the ECQ of rice, which determines the taste and cooking properties of rice [2,3]. Using their AC, rice varieties can be divided into five categories: high (25~33%), medium (20~25%), low (12~20%), very low (5~12%), and waxy (0~5%) [4]. Countries and regions have different cultural favor for rice categories; therefore, it is necessary to develop multiple types of rice.

Since the 1980s, Japan has taken the lead in cultivating rice varieties with low amylose content [5]. In recent years, attention was gradually given to the development of cultivated rice varieties with low AC in rice breeding research in China, and a number of studies on the genetics, agronomy, processing quality and other aspects have been carried out in this regard [6–8]. A series of soft rice japonica varieties such as Yunjing 20, Yunjing 29, Yunjing 37, and Yunjing 41 bred in Yunnan; Huruan 1212 and songxiangjing1018 bred in Shanghai; Nanjing 46, Nanjing 9108, Fengjing 1606, and Wuxiangjing 113 bred in Jiangsu; and Jia 58 bred in Zhejiang have been popularized [9–11]. However, at present, the germplasm resources of rice with low AC that can be used as restorers and maintainers are still limited.

**Citation:** Yang, J.; Guo, X.; Wang, X.; Fang, Y.; Liu, F.; Qin, B.; Li, R. Development of Soft Rice Lines by Regulating Amylose Content via Editing the 50UTR of the *Wx* Gene. *Int. J. Mol. Sci.* **2022**, *23*, 10517. https://doi.org/10.3390/ijms231810517

Academic Editors: Jinsong Bao and Jianhong Xu

Received: 15 August 2022 Accepted: 6 September 2022 Published: 10 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The waxy gene (*Wx*) encodes granule-bound starch synthase I (GBSSI), which is the key gene regulating amylose synthesis. The *Wx* gene directly affects AC in rice endosperm, thus affecting the degree of softness of rice. The CRISPR/Cas9 gene editing system has the advantages of high efficiency and easy operation and has been widely used in crop genetic improvement and breeding [12]. Previous studies have shown that the CRISPR/Cas9 system was used in the research on the *Wx* allele in rice and a series of germplasms were created for breeding [8,13–15]. Most researches focused on the editing of coding sequences to eliminate *Wx* expression, in order to generate glutinous rice. Teng (2021) used the CRISPR/Cas9 system to mediate the editing of the *Wx* gene in a Photothermosensitive Genic-Male-Sterile (PTGMS) line Y58S, which caused ultra-low AC mutations that produced a PTGMS glutinous rice strain with excellent waxiness [16]. Fu (2022) developed a rapid and highly efficient strategy through the CRISPR/Cas9 gene-editing system for generating *Wx* mutants from the background of five different rice varieties, which significantly reduced the AC and starch viscosity but did not affect the major agronomic traits [17]. Liu (2022) performed the targeted deletion of the first intron of the *Wx*<sup>b</sup> allele via CRISPR/Cas9. The grain AC of mutant lines significantly increased from 13.0% to 24.0% [15].

In addition to creating glutinous rice, Huang (2020) generated novel Wx alleles by ed-iting the region of the Wxb promoter, in turn fine-tuning of Wx expression [3]. In this study, Meimeng B (MMB), Tiankang B (TKB), and DR462, which are high-quality indica rice parent materials commonly used in breeding, but with a hard rice quality, were used as test materials. They have high a yield performance but also high AC and low GC, resulting in poorer rice quality. It is of great significance to improve the quality of rice and increase the value of breeding and utilization by appropriately changing the AC and GC contents of these lines. However, the direct editing of the coding region of the *Wx* gene by using CRISPR/Cas9 often results in too low an AC content in edited lines, which directly results in glutinous rice lines. This does not meet the AC content requirement for soft rice. Therefore, in this study, by using CRISPR/Cas9 to edit the intron splice site (50UISS) in the 5 0UTR region of the *Wx* gene, the expression of the *Wx* gene was appropriately reduced, but the original protein coding sequence of the *Wx* gene was not changed. Thus, rice gene editing lines with lower AC and GC can be obtained, which can provide high-quality resources for the improvement of new rice varieties.

#### **2. Results**

#### *2.1. Construction of CRISPR/Cas9-Wx Vector*

According to the sequence of the parent *Wx* gene (*LOC\_Os06g04200*), the CRISPR/Cas9 vector was constructed by targeting the intron splicing site (50UISS) in the 50UTR region of *Wx* (Figure 1A). As confirmed by sequencing and alignment, the sequences of target sites in MMB, TKB, and DR462 were consistent with the designed sequences (Figure 1B), indicating that MMB, TKB, and DR462 were suitable for editing.

A U6a-sgRNA expression cassette of the *Wx* gene was obtained through overlapping PCR, and the amplification product of the target construction fragment was 629 bp in length (Figure 2A). The constructed Cas9/sgRNA expression vector was transformed, cultured, and a single colony was selected. The primer PB-L/PB-R were used for PCR amplification and sequencing. The gel electrophoresis results showed that the *Wx* gene Cas9/sgRNA expression vector was present in 16 out of 20 colonies (Figure 2B), indicating that the CRISPR/Cas9-*Wx* expression vector was successfully constructed.

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 3 of 15

**Figure 1.** Location of sgRNA in the *Wx* gene and designed primers for amplification. (**A**): *Wx* gene partial sequence showing target positions and primer sequences; purple nucleotides: Upstream and downstream primers; Red Box: sgRNA sequences; black underline nucleotides: protospacer adjacent motif (PAM) region; (**B**): Sequencing peak map of sgRNA. **Figure 1.** Location of sgRNA in the *Wx* gene and designed primers for amplification. (**A**): *Wx* gene partial sequence showing target positions and primer sequences; purple nucleotides: Upstream and downstream primers; Red Box: sgRNA sequences; black underline nucleotides: protospacer adjacent motif (PAM) region; (**B**): Sequencing peak map of sgRNA. tured, and a single colony was selected. The primer PB-L/PB-R were used for PCR amplification and sequencing. The gel electrophoresis results showed that the *Wx* gene Cas9/sgRNA expression vector was present in 16 out of 20 colonies (Figure 2B), indicating that the CRISPR/Cas9-*Wx* expression vector was successfully constructed.

**Figure 2.** Construction of vector and verification of sgRNA. (**A**): Detection results of sgRNA expression cassette; (**B**) Verification of the size of *Wx* gene cas9/sgRNA expression vector segment; M: DL 5000 DNA marker; W-u6a: sgRNA-u6a fragment; ddH2O is a negative control based on sterile water; 1–20 are the numbers of the positive clone bacterial solution picked at random; Lane 8,9,11, and 17 are empty plasmids; 5000 bp, 750 bp, 500 bp, 629 bp: fragment size. **Figure 2.** Construction of vector and verification of sgRNA. (**A**): Detection results of sgRNA expression cassette; (**B**) Verification of the size of *Wx* gene cas9/sgRNA expression vector segment; M: DL 5000 DNA marker; W-u6a: sgRNA-u6a fragment; ddH2O is a negative control based on sterile water; 1–20 are the numbers of the positive clone bacterial solution picked at random; Lane 8, 9, 11, and 17 are empty plasmids; 5000 bp, 750 bp, 500 bp, 629 bp: fragment size.

#### *2.2. Genetic Transformation of MMB*, *TKB, and DR462 2.2. Genetic Transformation of MMB, TKB, and DR462*

obtained, which were regarded as the T<sup>0</sup> generation.

**Figure 2.** Construction of vector and verification of sgRNA. (**A**): Detection results of sgRNA expression cassette; (**B**) Verification of the size of *Wx* gene cas9/sgRNA expression vector segment; M: DL 5000 DNA marker; W-u6a: sgRNA-u6a fragment; ddH2O is a negative control based on sterile water; 1–20 are the numbers of the positive clone bacterial solution picked at random; Lane 8,9,11, and 17 are empty plasmids; 5000 bp, 750 bp, 500 bp, 629 bp: fragment size. The vector was introduced into MMB, TKB, and DR462 rice through *agrobacterium*mediated transformation. Calli were induced from mature embryos of rice seeds after coculture (Figure 3A). After two rounds of hygromycin resistance screening (Figure 3B–D), the well-growing calli were selected for plantlet differentiation and rooting (Figure 3E,F). As a result, a total of 30, 33, and 34 plants from MMB, TKB, and DR462, respectively, were obtained, which were regarded as the T<sup>0</sup> generation. The vector was introduced into MMB, TKB, and DR462 rice through *agrobacterium*mediated transformation. Calli were induced from mature embryos of rice seeds after co-culture (Figure 3A). After two rounds of hygromycin resistance screening (Figure 3B–D), the well-growing calli were selected for plantlet differentiation and rooting (Figure 3E,F). As a result, a total of 30, 33, and 34 plants from MMB, TKB, and DR462, respectively, were obtained, which were regarded as the T<sup>0</sup> generation.

#### *2.2. Genetic Transformation of MMB*, *TKB, and DR462 2.3. Mutation Analysis of T<sup>0</sup> Transformants*

The vector was introduced into MMB, TKB, and DR462 rice through *agrobacterium*mediated transformation. Calli were induced from mature embryos of rice seeds after coculture (Figure 3A). After two rounds of hygromycin resistance screening (Figure 3B–D), the well-growing calli were selected for plantlet differentiation and rooting (Figure 3E,F). As a result, a total of 30, 33, and 34 plants from MMB, TKB, and DR462, respectively, were T<sup>0</sup> plants were confirmed with hygromycin resistance marker HPT-F/HPT-R. The amplification results of the target fragments of the T<sup>0</sup> plants are shown in Figure 4. There were 24, 26, and 26 positive transformed plants from MMB, TKB, and DR462, respectively, with the mutation rate of 80.0%, 78.8%, and 76.5%, respectively (Figure 4).

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 4 of 15

**Figure 3.** Process steps of genetic transformation. (**A** ): Callus induction; (**B** ): Callus screening; (**C** ): The first screening of calli; (**D**): The second screening of calli; (**E** ): Plantlet differentiation; (**F**): Rooting. **Figure 3.** Process steps of genetic transformation. (**A**): Callus induction; (**B**): Callus screening; (**C**): The first screening of calli; (**D**): The second screening of calli; (**E**): Plantlet differentiation; (**F**): Rooting. were 24, 26, and 26 positive transformed plants from MMB, TKB, and DR462, respectively, with the mutation rate of 80.0%, 78.8%, and 76.5%, respectively (Figure 4).

**Figure 4.** Amplification results of T<sup>0</sup> generation transformed plants of MMB (**A**), TKB (**B**), and DR462 (**C**). M: DL 5000 DNA Marker; ddH2O: a negative control based on sterile water; 5000 bp, 750 bp, 500 bp, 658 bp: fragment size. **Figure 4.** Amplification results of T<sup>0</sup> generation transformed plants of MMB (**A**), TKB (**B**), and DR462 (**C**). M: DL 5000 DNA Marker; ddH2O: a negative control based on sterile water; 5000 bp, 750 bp, 500 bp, 658 bp: fragment size.

**Figure 4.** Amplification results of T<sup>0</sup> generation transformed plants of MMB (**A**), TKB (**B**), and DR462 (**C**). M: DL 5000 DNA Marker; ddH2O: a negative control based on sterile water; 5000 bp, 750 bp, 500 bp, 658 bp: fragment size. The genotype analysis of target mutations in the T<sup>0</sup> generation of MMB, TKB, and DR462 showed that there were four types of mutations, including a homozygous mutant, bi-allelic mutant, heterozygous mutant, and wild-type mutant. Among them, the homozygous mutants accounted for the highest proportion. Respectively, there were 8, 9, and 9 with 33.3%, 34.6%, and 34.6% of the homozygous mutant rate in the T<sup>0</sup> generation of The genotype analysis of target mutations in the T<sup>0</sup> generation of MMB, TKB, and DR462 showed that there were four types of mutations, including a homozygous mutant, bi-allelic mutant, heterozygous mutant, and wild-type mutant. Among them, the homozygous mutants accounted for the highest proportion. Respectively, there were 8, 9, and 9 with 33.3%, 34.6%, and 34.6% of the homozygous mutant rate in the T<sup>0</sup> generation of MMB, TKB, and DR462 (Table 1), indicating that this target has a high editing efficiency and is an ideal target for editing the *Wx* site. The analysis of target mutation types showed that most of the mutations were deletions only, with deletions ranging from 2 to 26 bases. Specifically, mutant MMB-10 from MMB showed a deletion of 11 bases, mutant TKB-21 from TKB showed a deletion of 26 bases, including an intron splice site (GT), while mutant DR462-9 from DR462 showed a deletion of two bases (Figure 5). The genotype analysis of target mutations in the T<sup>0</sup> generation of MMB, TKB, and DR462 showed that there were four types of mutations, including a homozygous mutant, biallelic mutant, heterozygous mutant, and wild-type mutant. Among them, the homozygous mutants accounted for the highest proportion. Respectively, there were 8, 9, and 9 with 33.3%, 34.6%, and 34.6% of the homozygous mutant rate in the T<sup>0</sup> generation of MMB, TKB, and DR462 (Table 1), indicating that this target has a high editing efficiency and is an ideal target for editing the *Wx* site. The analysis of target mutation types showed that most of the mutations were deletions only, with deletions ranging from 2 to 26 bases. Specifically, mutant MMB-10 from MMB showed a deletion of 11 bases, mutant TKB-21 from TKB showed a deletion of 26 bases, including an intron splice site (GT), while mutant DR462-9 from DR462 showed a deletion of two bases (Figure 5).

MMB, TKB, and DR462 (Table 1), indicating that this target has a high editing efficiency and is an ideal target for editing the *Wx* site. The analysis of target mutation types showed that most of the mutations were deletions only, with deletions ranging from 2 to 26 bases. Specifically, mutant MMB-10 from MMB showed a deletion of 11 bases, mutant TKB-21 from TKB showed a deletion of 26 bases, including an intron splice site (GT), while mutant

DR462-9 from DR462 showed a deletion of two bases (Figure 5).


*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 5 of 15

**Table 1.** Analysis of mutation types in the T<sup>0</sup> generation.

**Figure 5.** Target mutation types of three homozygous mutants in T<sup>0</sup> generation. M-10: Mutant MMB-10 from MMB; T-21: TKB-21 from TKB; D-9: DR462-9 from DR462: Green box: PAM: \*: missing base. **Figure 5.** Target mutation types of three homozygous mutants in T<sup>0</sup> generation. M-10: Mutant MMB-10 from MMB; T-21: TKB-21 from TKB; D-9: DR462-9 from DR462: Green box: PAM: \*: missing base.

#### *2.4. Screening of T-DNA-Free Plants in T<sup>1</sup> Generations 2.4. Screening of T-DNA-Free Plants in T<sup>1</sup> Generations*

The homozygous mutant plants MMB-10, TKB-21, and DR462-9 of the T<sup>0</sup> generation were selfed to develop the T<sup>1</sup> lines for T-DNA-free detection. The result showed that seven out of 20 T<sup>1</sup> plants from the line MMB-10 did not carry exogenous T-DNA fragments, all of which were homozygous by sequencing. Five out of 20 T<sup>1</sup> plants from TKB-21 did not carry exogenous T-DNA fragments and three of which were homozygous by sequencing. Four out of 20 T<sup>1</sup> plants from DR462-9 did not carry exogenous T-DNA fragments, all of which were homozygous by sequencing. Subsequently, these T-DNA-free homozygous mutants from MMB-10, TKB-21, and DR462-9 were grown and managed under greenhouse conditions (22 °C~25 °C) for harvesting the T<sup>2</sup> seeds of each line. The homozygous mutant plants MMB-10, TKB-21, and DR462-9 of the T<sup>0</sup> generation were selfed to develop the T<sup>1</sup> lines for T-DNA-free detection. The result showed that seven out of 20 T<sup>1</sup> plants from the line MMB-10 did not carry exogenous T-DNA fragments, all of which were homozygous by sequencing. Five out of 20 T<sup>1</sup> plants from TKB-21 did not carry exogenous T-DNA fragments and three of which were homozygous by sequencing. Four out of 20 T<sup>1</sup> plants from DR462-9 did not carry exogenous T-DNA fragments, all of which were homozygous by sequencing. Subsequently, these T-DNA-free homozygous mutants from MMB-10, TKB-21, and DR462-9 were grown and managed under greenhouse conditions (22 ◦C~25 ◦C) for harvesting the T<sup>2</sup> seeds of each line.

#### *2.5. Quantitative Analysis of Wx Gene Expression in Mutant Lines in the T<sup>2</sup> Generation 2.5. Quantitative Analysis of Wx Gene Expression in Mutant Lines in the T<sup>2</sup> Generation*

The expression of the *Wx* Gene in partial homozygous mutant lines from the T<sup>2</sup> generation was detected by qRT-PCR, including five lines from MMB, three lines from TKB, and four lines from DR462. The results showed that the expression of *Wx* gene in mutant lines was significantly decreased as compared with the parents MMB, TKB, and DR462, indicating that the transcription efficiency of the *Wx* gene was inhibited (Figure 6). The expression of the *Wx* Gene in partial homozygous mutant lines from the T<sup>2</sup> generation was detected by qRT-PCR, including five lines from MMB, three lines from TKB, and four lines from DR462. The results showed that the expression of *Wx* gene in mutant lines was significantly decreased as compared with the parents MMB, TKB, and DR462, indicating that the transcription efficiency of the *Wx* gene was inhibited (Figure 6).

**Figure 6.** Relative expression level of *Wx* gene in mutant lines in T<sup>2</sup> generation and their respective wild type. (**A**): MMB; (**B**): TKB; (**C**): DR462; In the mutant lines, M, T, and R are short for MMB, TKB, and DR462. **Figure 6.** Relative expression level of *Wx* gene in mutant lines in T<sup>2</sup> generation and their respective wild type. (**A**): MMB; (**B**): TKB; (**C**): DR462; In the mutant lines, M, T, and R are short for MMB, TKB, and DR462. **Figure 6.** Relative expression level of *Wx* gene in mutant lines in T<sup>2</sup> generation and their respective wild type. (**A**): MMB; (**B**): TKB; (**C**): DR462; In the mutant lines, M, T, and R are short for MMB, TKB, and DR462.

#### *2.6. Determination of AC in Mutant Lines in the T<sup>2</sup> Generation 2.6. Determination of AC in Mutant Lines in the T<sup>2</sup> Generation 2.6. Determination of AC in Mutant Lines in the T<sup>2</sup> Generation*

As to the wild types, the ACs in MMB, TKB, and DR462 were 27.8%, 26.0%, and 25.4%, respectively. By comparison, the AC in the homozygous mutant lines decreased significantly (Figure 7). Among them, MMB-10-2, TKB-21-12, and DR462-9-9 had an AC of 17.2%, 16.8%, and 17.8%, reaching the first-grade high-quality standard concerning AC (13~18%) [18]. The results indicated that gene editing could effectively reduce the AC, so as to obtain lines with an ideal AC. As to the wild types, the ACs in MMB, TKB, and DR462 were 27.8%, 26.0%, and 25.4%, respectively. By comparison, the AC in the homozygous mutant lines decreased significantly (Figure 7). Among them, MMB-10-2, TKB-21-12, and DR462-9-9 had an AC of 17.2%, 16.8%, and 17.8%, reaching the first-grade high-quality standard concerning AC (13~18%) [18]. The results indicated that gene editing could effectively reduce the AC, so as to obtain lines with an ideal AC. As to the wild types, the ACs in MMB, TKB, and DR462 were 27.8%, 26.0%, and 25.4%, respectively. By comparison, the AC in the homozygous mutant lines decreased significantly (Figure 7). Among them, MMB-10-2, TKB-21-12, and DR462-9-9 had an AC of 17.2%, 16.8%,and 17.8%, reaching the first-grade high-quality standard concerning AC (13~18%) [18]. The results indicated that gene editing could effectively reduce the AC, so as to obtain lines with an ideal AC.

**Figure 7.** Amylose content in T<sup>2</sup> homozygous mutant lines and their respective wild types (**A**): MMB; (**B**): TKB; (**C**): DR462; In the mutant lines, M, T, and R are short for MMB, TKB, and DR462. **Figure 7.** Amylose content in T<sup>2</sup> homozygous mutant lines and their respective wild types (**A**): MMB; (**B**): TKB; (**C**): DR462; In the mutant lines, M, T, and R are short for MMB, TKB, and DR462. **Figure 7.** Amylose content in T<sup>2</sup> homozygous mutant lines and their respective wild types (**A**): MMB; (**B**): TKB; (**C**): DR462; In the mutant lines, M, T, and R are short for MMB, TKB, and DR462.

#### *2.7. Determination of Gel Consistency (GC) in Mutant Lines in the T<sup>2</sup> Generation 2.7. Determination of Gel Consistency (GC) in Mutant Lines in the T<sup>2</sup> Generation 2.7. Determination of Gel Consistency (GC) in Mutant Lines in the T<sup>2</sup> Generation*

Homozygous mutant lines MMB-10-2, TKB-21-12, and DR462-9-9 showed a GC of 78.6 mm, 77.4 mm, and 79.6 mm, while their parents MMB, TKB, and DR462 showed a GC of 60.2 mm, 61.2 mm, and 61.4 mm, respectively, in the analysis. The results indicated that the GC in the mutant lines was significantly increased compared with their respective wild type, and far exceeded the first-grade high-quality standard concerning GC (65 mm) [19] (Table 2). Homozygous mutant lines MMB-10-2, TKB-21-12, and DR462-9-9 showed a GC of 78.6 mm, 77.4 mm, and 79.6 mm, while their parents MMB, TKB, and DR462 showed a GC of 60.2 mm, 61.2 mm, and 61.4 mm, respectively, in the analysis. The results indicated that the GC in the mutant lines was significantly increased compared with their respective wild type, and far exceededthe first-grade high-quality standard concerning GC (65 mm) [19] (Table 2). Homozygous mutant lines MMB-10-2, TKB-21-12, and DR462-9-9 showed a GC of 78.6 mm, 77.4 mm, and 79.6 mm, while their parents MMB, TKB, and DR462 showed a GC of 60.2 mm, 61.2 mm, and 61.4 mm, respectively, in the analysis. The results indicated that the GC in the mutant lines was significantly increased compared with their respective wild type, and far exceeded the first-grade high-quality standard concerning GC (65 mm) [19] (Table 2).

**Table 2.** GC in partial T2 mutant lines and wild type. **Table 2.** GC in partial T2 mutant lines and wild type. **Table 2.** GC in partial T<sup>2</sup> mutant lines and wild type.


Note: \*\* *p* < 0.01 derived from one-way ANOVA with LSD test. Note: \*\* *p* < 0.01 derived from one-way ANOVA with LSD test. Note: \*\* *p* < 0.01 derived from one-way ANOVA with LSD test.

### *2.8. Analysis of RVA Profile in Mutant Lines in the T<sup>2</sup> Generation*

The RVA analysis results of the edited lines showed that, in comparison with the parent MMB, the mutant line MMB-10-2 showed an increased breakdown viscosity (BDV) (994.0 cp

from 355.0 cp), decreased setback viscosity (SBV) (1366.3 cp from 2353.6 cp), decreased consistency viscosity (CSV) (2360.3 cp from 2708.6 cp), and decreased viscosity index (Table 3; Figure 8A). In comparison with TKB, TKB-21-12 showed increased BDV(1533.6 cp from 641.3 cp), decreased SBV (729.6 cp from 1978.0 cp), decreased CSV (2263.3 cp from 2619.3 cp), and decreased viscosity index (Table 3; Figure 8B). In comparison with DR462, DR462-9-9 showed increased BDV (1293.3 cp from 508.7 cp), increased CSV(2738.0 cp from 1966.3 cp), and increased viscosity index, while maintaining an equivalent level in BDV (1444.6 cp from 1457.6 cp) (Table 3; Figure 8C).

**Table 3.** RVA profile characteristics of rice starch mutant lines in the T<sup>2</sup> generation and their respective wild types.


Note: PKV: peak viscosity; BDV: breakdown viscosity; SBV: setback viscosity; CSV: consistency viscosity; HPV: hot paste viscosity; CPV: cool paste viscosity.

**Figure 8.** Comparison of RVA profiles between mutant lines in the T<sup>2</sup> generation and theirrespective wild types. (**A**): MMB; (**B**): TKB; (**C**): DR462. **Figure 8.** Comparison of RVA profiles between mutant lines in the T<sup>2</sup> generation and their respective wild types. (**A**): MMB; (**B**): TKB; (**C**): DR462.

To evaluate the selected mutant lines, we surveyed six main agronomic traits of

vealed that there was no significant difference between the mutant lines and their respective wild types, regarding traits including plant height, 1000-grain weight, panicle length, grain number per panicle, grain set rate, and effective panicle number (Table 4; Figure 9A,C,E). The results indicated that the mutations in the *Wx* genes do not change other agronomic traits. In addition, the grain appearance and chalkiness were observed and compared. The grain of MMB-10-2 and TKB-21-12 was not significantly different from the wild type in chalkiness, while they had a slight decrease in transparency and an increase in whiteness. The grain of DR462-9-9 had a decrease in chalkiness and chalky grain rate,

and an increase in whiteness (Figure 9B,D,F).

*2.9. Analysis of the Main Agronomic Traits in Mutant Lines in the T<sup>2</sup> Generation*

Previous studies have demonstrated that BDV, SBV, and CSV are closely related to the hardness and viscosity of rice. Rice with higher BDV, lower SBV, and lower CSV appeared to be softer and more elastic [18]. The results indicated that the ECQ of the homozygous mutant lines was obviously improved as compared to the wild types.

### *2.9. Analysis of the Main Agronomic Traits in Mutant Lines in the T<sup>2</sup> Generation*

To evaluate the selected mutant lines, we surveyed six main agronomic traits of MMB-10-2, TKB-21-12, and DR462-9-9 and their respective wild types. The results revealed that there was no significant difference between the mutant lines and their respective wild types, regarding traits including plant height, 1000-grain weight, panicle length, grain number per panicle, grain set rate, and effective panicle number (Table 4; Figure 9A,C,E). The results indicated that the mutations in the *Wx* genes do not change other agronomic traits. In addition, the grain appearance and chalkiness were observed and compared. The grain of MMB-10-2 and TKB-21-12 was not significantly different from the wild type in chalkiness, while they had a slight decrease in transparency and an increase in whiteness. The grain of DR462-9-9 had a decrease in chalkiness and chalky grain rate, and an increase in whiteness (Figure 9B,D,F). *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 9 of 15 **Table 4.** Agronomic traits of the mutant lines in the T<sup>2</sup> generation.

**Effective** 

**Table 4.** Agronomic traits of the mutant lines in the T<sup>2</sup> generation. **Plant Panicle Grain** 



**Figure 9.** Comparison of plant type and polished rice appearance between mutant lines in the T<sup>2</sup> fects on AC. For example, MMB-10 had a deletion of 11 bp, producing the progeny line **Figure 9.** Comparison of plant type and polished rice appearance between mutant lines in the T<sup>2</sup>

generation and their respective wild types. (**A**,**B**): Comparisons between MMB and MMB-10-2; (**C**,**D**): Comparisons between TKB and TKB-21-12; (**E**,**F**): Comparisons between DR462 and DR462-

Previous studies have shown that mutation types such as base deletion and insertion in the non-coding region of *Wx* gene will affect its expression level, and then affect AC in the endosperm [20,21]. This study also revealed such a consistent conclusion. When the 5′UISS of the *Wx* gene was edited, the AC in the three elite indica rice lines was downregulated significantly (Figure 6). Different editing modes of the *Wx* gene have different ef-

*3.1. 5′UTR Region Regulates the Expression of Wx Gene*

9-9.

**3. Discussion**

generation and their respective wild types. (**A**,**B**): Comparisons between MMB and MMB-10-2; (**C**,**D**): Comparisons between TKB and TKB-21-12; (**E**,**F**): Comparisons between DR462 and DR462-9-9.

#### **3. Discussion**

#### *3.1. 5*0*UTR Region Regulates the Expression of Wx Gene*

Previous studies have shown that mutation types such as base deletion and insertion in the non-coding region of *Wx* gene will affect its expression level, and then affect AC in the endosperm [20,21]. This study also revealed such a consistent conclusion. When the 50UISS of the *Wx* gene was edited, the AC in the three elite indica rice lines was downregulated significantly (Figure 6). Different editing modes of the *Wx* gene have different effects on AC. For example, MMB-10 had a deletion of 11 bp, producing the progeny line MMB-10-2 with an AC of 17.4%; TKB-21 had a deletion of 26 bp, including a splice site (GT), creating the progeny line TKB-21-12 with an AC of 16.8%. In addition, mutations near 50UISS also have a significant effect on the expression of *Wx*. For example, DR462-9 had a deletion of 2 bp near 50UISS of the *Wx* gene, which led to the change of mRNA conformation and affected the correct splicing of introns, resulting in a decrease of AC, from 25.4% in the parent DR462 to 17.8% in the progeny line DR462-9-9. It could be confirmed that 50UISS plays an important role in regulating the *Wx* gene.

There are relevant studies on the 50UTR region of the *Wx* gene. Cai (1997, 2000) found an intron in the 50UTR region of the *Wx* gene related to regulation in Indica Rice 232 and further found that the 16 base mutations in the first intron of the *Wx* gene can affect the function and efficiency of the its splice site, resulting in a decrease in the expression level of mature mRNA, thus reducing AC [22,23]. Cheng (2001) found that the natural mutation of G→T on the splice site at the 5<sup>0</sup> end of the first intron of the *Wx* gene was the main reason for the decline of the *Wx* gene expression level and AC in rice varieties with medium and low AC [24]. In the study of highland barley starch synthesis, Li (2015) found that there are a large number of polymorphic sites in the 50UTR region of the *Wx* gene, including large fragments of insertion and deletion. A deletion of about 400 bp in the 50UTR region will reduce gene expression, resulting in the reduction of amylose content, and affecting the physicochemical and functional characteristics of starch [25].

#### *3.2. Application of CRISPR/Cas9 System in Wx Gene Editing*

Most studies using the CRISPR/Cas9 system to edit the *Wx* gene of rice directly turn the varieties into waxy ones [26]. Fan (2019) reduced the AC of variety Xiushui 134 from 19.78% to a glutinous rice level (AC < 2%) through gene editing [27]. Wang (2019) knocked out the *Wx* gene of rice line 209B and successfully obtained waxy maintainer *Wx* 209B with a lower AC. Then, the waxy male sterile line *Wx* 209A was developed using *Wx* 209B as male parent and 209A as female parent [28]. The above results show that editing the coding region of *Wx* Gene in rice will "kill" the gene, resulting in mutant lines with a very low AC content and glutinous rice with a high viscosity, which does not meet the requirement of consumers for soft rice quality. Therefore, accurately reducing AC and improving GC are the essence of creating germplasm resources for soft rice breeding. Zeng (2020) edited the 50UISS locus in the *Wx*b 50UTRs region of japonica rice material and obtained new germplasms with soft rice traits, and with a very low AC of 9.8~11.5% [29]. Similarly, this study also chose 50UISS as the target in three indica rice lines with high AC content, and obtained homozygous mutant lines with an AC between 16.8% and 17.8%. The AC of the mutant lines obtained in this study is higher than that of Zeng (2020) and more suitable for consumers. The reasons for this may be (1) less base deletion, resulting in a lower splicing efficiency of mRNA; (2) nucleotide polymorphisms exist between *Wx*<sup>a</sup> in indica rice and *Wx*<sup>b</sup> in japonica rice, resulting in differences in the stability of gene mRNA splicing efficiency [30]. Even so, the three mutant lines in the T<sup>2</sup> generation obtained in this study, MMB-10-2, TKB-21-12, and DR462-9-9, had an AC of 17.2%, 16.8%, and 17.8%, reaching the first-grade high-quality standard concerning AC (13~18%).

#### *3.3. Effects on Agronomic Traits of Plants When Editing the 5*0*UTR of Wx Gene*

Gene editing may change the promoter sequence of the target gene, thereby affecting the agronomic traits of the plants [31,32]. Wang (2021) chose a target at the 50UTR region of the *Wx* Gene of japonica rice variety Jiahua 1, and the mutant lines had main agronomic traits significantly different from that of the wild type, including a higher plant height, increased 1000-grain weight, shortened panicle length, and reduced seed setting rate [33]. In this study, the selected mutant lines in the T<sup>2</sup> generation had no significant differences in their main agronomic traits compared with the wild type and retained the excellent traits of the wild type. The whiteness of the endosperm in mutant lines increased, which may have been due to the changes in grain appearance caused by the decrease of AC and the increase of GC (Figure 9).

#### *3.4. Advantages of CRISPR/Cas9 Technology in Improving ECQ of Rice*

In traditional breeding, continuous backcross is usually used to introduce the *Wx* gene into rice varieties, to improve the ECQ of rice. However, this is time-consuming and it is difficult to break the linkage with undesirable traits. Subsequently, the RNAi, antisense RNA, and base/gene editing technologies were gradually developed to improve the quality of rice starch [34]. Zhao (2007) used RNAi technology to interfere with *GBSSII*, *Sbe3*, *SSI*, *SS*II, *Wx*, *PUL,* and *ISA* genes in rice, and created breeding intermediate materials with different AC, GC, and GT. The analysis of the ECG of transgenic plants showed that the AC of all mutants was decreased to various degrees, and most of them reached a significant level [35]. Terada (2000) used antisense RNA technology to introduced the antisense *Wx* gene into japonica and indica rice, and obtained the transgenic lines with ACs significantly reduced [36]. Mao (2013) used the CRISPR/Cas9 system to conduct gene editing research on Arabidopsis and rice genes [37]. Today, precise genome modifications with the CRISPR/Cas9 tool have revolutionized genome editing research [38]. It is a more efficient, accurate, easier, and cost-effective technique for achieving gene knock-out in a cell [39] and has made preliminary progress in the improvement of rice quality, especially in the research of *Wx* gene editing.

In this study, the CRISPR/Cas9 system was used to edit the 50UISS of the *Wx* gene in indica rice lines MMB, TKB, and dr462. The sequencing of the target site in three materials showed consistency with the design target (Figure 1), and all of them performed efficient and accurate gene editing. This seems impossible for traditional breeding and other gene editing technologies. In addition, the mutant lines in the T<sup>2</sup> generation obtained in this study had decreased AC, increased GC, increased BDV, decreased SBV, and decreased CSV, without significant changes to their main agronomic traits. It is presented that the desirable traits can be stably inherited to the selfed offspring, indicating that CRISPR/Cas9 technology provides an effective strategy for improving the ECQ of rice.

#### **4. Materials and Methods**

#### *4.1. Plant Materials and Growth Conditions*

Three elite indica lines were selected, including maintainer Meimeng B (MMB), Tiankang B (TKB), and restorer DR462, which had high levels of AC. The seeds were supplied by the State Key Laboratory of Conservation and Utilization of Subtropical Agribioresources, Guangxi University. All rice lines were grown in a mesh room with isolation conditions in Nanning, China, during the normal rice-growing seasons, and treated according to the conventional planting management method.

#### *4.2. Construction of CRISPR/Cas9 Vectors and Screening of Homozygous Mutants*

The CRISPR/Cas9 targeting vector was constructed for editing the expression vector with reference to a previous study [16]. Vectors and promoters pYLCRISPR/Cas9Pubi-H and pYLsgRNA-LzU6a were provided by the State Key Laboratory of Conservation and Utilization of Subtropical Agri-bioresources, South China Agricultural University. Bacterial strain *E. coli* DH5α for plasmid propagation and preservation and agrobacterium EHA105 for genetic transformation were provided by the State Key Laboratory of Conservation and Utilization of Subtropical Agri-bioresources, Guangxi University.

According to the *Wx* (*LOC\_Os06g04200*) genomic sequence provided by the China Rice Data Center [40]. The target sites in the *Wx* gene were designed via the CRISPR-GE [41] online toolkit and cloned into sgRNA, as previously described [16]. The constructs were respectively introduced into MMB, TKB, and DR462 by *Agrobacterium*-mediated transformation. The genomic DNA of transformed plants was extracted from young leaves using the CTAB method [42]. The target sites of the T0, T1, and T<sup>2</sup> generations were sequenced using the Sanger method [43]. The PCR products were detected using agarose gel electrophoresis. The homozygous *Wx* mutants without the T-DNA insertion were screened and selected for further experimental analysis. The primers used are listed in Table 5.

**Table 5.** Primer sequences used in this study.


Note: ACTAGT and ACGCGT are *Spe*I and *Mlu*I sites, respectively.

#### *4.3. Quantitative Real-Time PCR (qRT-PCR) Expression Analysis*

Total RNA was extracted from rice caryopses (with glumes removed) 10 days after flowering (DAF) using an RNAplant Plus Reagent kit (Tiangen, Beijing, China). First-strand cDNA was synthesized using a PrimeScript RT reagent kit (Takara, Tokyo, Japan), and qRT-PCR was performed on a CFX Connect Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA) using Cham Q SYBR qPCR master Mix (Vazyme, Nanjing, China). The Actin gene was used for normalization, and each experiment included three biological replicates.

#### *4.4. Measuring Rice Grain Physicochemical Properties*

AC, GC, and RVA were measured as described in the national agricultural industry standard (NY/T83-2017) [44]. In brief, AC was determined using a dual-wavelength spectrophotometric method, by drawing the reference wavelength analysis curve for AC and amylopectin (Figure 10A) and the standard curve for amylose (Figure 10B).

GC was determined by measuring the length of the rice gel after gelatinization and cooling. RVA spectrum was measured using a Techmaster RVA instrument (Pertentecmarster, Sweden, Nanning, China). The primary RVA parameters included PKV, BDV, SBV, CSV, HPV, and CPV. All tests were performed in triplicate.

amylopectin (Figure 10A) and the standard curve for amylose (Figure 10B).

**Figure 10.** Drawing the standard curve of amylose. (**A**): Preparation analysis of amylose and amylopectin (**B**): Standard curve of amylose. **Figure 10.** Drawing the standard curve of amylose. (**A**): Preparation analysis of amylose and amylopectin (**B**): Standard curve of amylose.

sequenced using the Sanger method [43]. The PCR products were detected using agarose gel electrophoresis. The homozygous *Wx* mutants without the T-DNA insertion were screened and selected for further experimental analysis. The primers used are listed in

*Wx*-U6a-1F TGTGTGCTTACAGCCATGGCGTTTTAGAGCTAGAAAT *Wx*-U6a-1R GCCATGGCTGTAAGCACACACGGCAGCCAAGCCAGCA Pps-GGL TTCAGAGGTCTCTCTCGACTAGTATGGAATCGGCAGCAAAGG Pgs-GGR AGCGTGGGTCTCGACCGACGCGTATCCATCCACTCCAAGCTC

PB-R GCGCGCGGTCTCTACCGACGCGTATCC PB-L GCGCGCgGTCTCGCTCGACTAGTATGG

Total RNA was extracted from rice caryopses (with glumes removed) 10 days after flowering (DAF) using an RNAplant Plus Reagent kit (Tiangen, Beijing, China). Firststrand cDNA was synthesized using a PrimeScript RT reagent kit (Takara, Tokyo, Japan), and qRT-PCR was performed on a CFX Connect Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA) using Cham Q SYBR qPCR master Mix (Vazyme, Nanjing, China). The Actin gene was used for normalization, and each experiment included three

AC, GC, and RVA were measured as described in the national agricultural industry standard (NY/T83-2017) [44]. In brief, AC was determined using a dual-wavelength spectrophotometric method, by drawing the reference wavelength analysis curve for AC and

HPT-F ATTTGTGTACGCCCGACAGT HPT-R GTGCTTGACATTGGGGAGTT CAS9-F CTGACGCTAACCTCGACAAG CAS9-R CCGATCTAGTAACATAGATGACACC

Note: ACTAGT and ACGCGT are *Spe*Ⅰ and *Mlu*Ⅰ sites, respectively.

*4.3. Quantitative Real-Time PCR (qRT-PCR) Expression Analysis*

*4.4. Measuring Rice Grain Physicochemical Properties*

#### GC was determined by measuring the length of the rice gel after gelatinization and *4.5. Agronomic Trait Investigation*

biological replicates.

cooling. RVA spectrum was measured using a Techmaster RVA instrument To detect the agronomic traits of homozygous mutants in the T<sup>2</sup> generation, together with the corresponding wild type, 20 plants from each line were subjected to agronomic trait measurement at the maturation stage. The traits including plant height, thousandgrain weight, panicle length, grains per panicle, seed setting rate, and chalkiness degree. Field management followed normal agronomic practices.

#### *4.6. Statistical Analysis*

Table 5.

**Table 5.** Primer sequences used in this study.

**Primer Name Primer Sequence 5′-3′** *Wx*-text-F TCCGCCACGGGTTCCAG *Wx*-text-R CTCCTACCTCAGCCACAACG U-F CTCCGTTTTACCTGTGGAATCG gR-R CGGAGGAAAATTCCATCCAC

At least three replicates were performed for each experiment. Statistical and graphical analyses were performed using Excel 2016, GraphPad Prism 9, and Photoshop 7.0. All data were expressed as means ± standard deviations (means ± SD). One-way analysis of variance (ANOVA) was used to determine the level of significance (\* and \*\* indicate significant differences at *p* < 0.05 and *p* < 0.01, respectively). Different lower-case letters indicate statistically significant differences at *p* < 0.05.

#### **5. Conclusions**

In this study, the elite indica rice varieties MMB, TKB, and DR462 were chosen as experimental materials, and the 50UISS in the 50UTR region of *Wx* gene was edited by CRISPR/cas9 gene editing technology. Screening was carried out to achieve homozygous gene-edited lines, and the relevant phenotypes of these lines were evaluated.


**Author Contributions:** Conceptualization, J.Y.; methodology, F.L., B.Q. and R.L.; validation, X.G. and X.W.; investigation, X.G. and Y.F.; resources, R.L.; writing—original draft preparation, J.Y.; reviews and revision, R.L.; funding acquisition, R.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Guangxi Zhuang Autonomous Region Science and Technology Department, grant numbers AB16380066 and AA17204070.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


## *Article GLW7.1***, a Strong Functional Allele of** *Ghd7***, Enhances Grain Size in Rice**

**Rongjia Liu , Qinfei Feng, Pingbo Li , Guangming Lou, Guowei Chen , Haichao Jiang, Guanjun Gao, Qinglu Zhang, Jinghua Xiao, Xianghua Li, Lizhong Xiong and Yuqing He \***

> National Key Laboratory of Crop Genetic Improvement and National Centre of Plant Gene Research, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China

**\*** Correspondence: yqhe@mail.hzau.edu.cn

**Abstract:** Grain size is a key determinant of both grain weight and grain quality. Here, we report the map-based cloning of a novel quantitative trait locus (QTL), *GLW7.1* (*Grain Length*, *Width and Weight 7.1*), which encodes the CCT motif family protein, GHD7. The QTL is located in a 53 kb deletion fragment in the cultivar Jin23B, compared with the cultivar CR071. Scanning electron microscopy analysis and expression analysis revealed that *GLW7.1* promotes the transcription of several cell division and expansion genes, further resulting in a larger cell size and increased cell number, and finally enhancing the grain size as well as grain weight. *GLW7.1* could also increase endogenous GA content by up-regulating the expression of GA biosynthesis genes. Yeast two-hybrid assays and split firefly luciferase complementation assays revealed the interactions of GHD7 with seven grain-size-related proteins and the rice DELLA protein SLR1. Haplotype analysis and transcription activation assay revealed the effect of six amino acid substitutions on GHD7 activation activity. Additionally, the NIL with *GLW7.1* showed reduced chalkiness and improved cooking and eating quality. These findings provide a new insight into the role of *Ghd7* and confirm the great potential of the *GLW7.1* allele in simultaneously improving grain yield and quality.

**Keywords:** *GLW7.1*; GHD7; grain size; quality; rice

#### **1. Introduction**

Rice (*Oryza sativa* L.) is the most important staple food crop in the world and feeds more than half of the world's population [1]. Therefore, to meet the food needs of a rapidly growing global population, increasing rice grain yield has been a major breeding goal. Rice yield is mainly determined by three major components: grain weight, number of grains per panicle and number of effective tillers per plant [2]. Among them, grain weight is largely determined by grain size, which includes grain length, width, and thickness [3]. In recent decades, many quantitative trait loci (QTLs)/genes regulating grain size have been isolated and shown to participate in multiple signaling pathways, including the G-protein signaling pathway, the ubiquitin–proteasome pathway, mitogen-activated protein kinase (MAPK) signaling pathway, phytohormone signaling and homeostasis, and transcriptional regulators [4].

G proteins are guanine nucleotide-binding trimeric proteins consisting of Gα, Gβ and Gγ subunits and regulate many biological processes. *GRAIN SIZE 3* (*GS3*), encoding an atypical Gγ protein, is the first identified major QTL negatively regulating grain size [5]. DENSE AND ERECT PANICLE 1 (DEP1), another atypical Gγ protein, positively regulates grain size by competitively binding to Gβ (RGB1) with GS3 [6]. Other G-proteins, including conventional Gγ proteins (RGG1 and RGG2), atypical Gγ protein (GGC2), Gα protein (RGA1) and Gβ protein (RGB1) could also control grain size [6–8]. In addition, OsMADS1, a MADS-domain transcription factor, negatively regulates grain length through directly interacting with GS3 and DEP1 [9].

**Citation:** Liu, R.; Feng, Q.; Li, P.; Lou, G.; Chen, G.; Jiang, H.; Gao, G.; Zhang, Q.; Xiao, J.; Li, X.; et al. *GLW7.1*, a Strong Functional Allele of *Ghd7*, Enhances Grain Size in Rice. *Int. J. Mol. Sci.* **2022**, *23*, 8715. https://doi.org/10.3390/ijms23158715

Academic Editors: Jinsong Bao and Jianhong Xu

Received: 9 July 2022 Accepted: 2 August 2022 Published: 5 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Ubiquitination and deubiquitination are two opposite protein modifications which are involved in the regulation of rice grain size. GRAIN WIDTH 2 (GW2), a RING-type E3 ubiquitin ligase, negatively regulates grain width by ubiquitinating WG1 and targeting it for degradation via the 26S proteasome pathway [10]. CHANG LI GENG 1 (CLG1), another RING-type E3 ubiquitin ligase, positively regulates grain length by ubiquitinating GS3 and targeting it for degradation via the endosome pathway [11]. *WIDE AND THICK GRAIN 1* (*WTG1*), which encodes an otubain-like protease with deubiquitination activity, controls grain size and shape mainly by affecting cell expansion in the spikelet hull [12]. *LARGE GRAIN 1* (*LG1*), which encodes a constitutively expressed ubiquitin-specific protease15 (OsUBP15) with deubiquitination activity, positively regulates grain width and size [13].

The OsMKKK10-OsMKK4-OsMPK6 cascade has been revealed to positively regulate grain size by promoting cell proliferation in spikelet hulls [14–16]. GRAIN SIZE AND NUMBER 1 (GSN1)/OsMKP1, a MAPK phosphatase, negatively regulates grain size by directly interacting with and inhibiting the dephosphorylation of OsMPK6 [17]. In addition, the upstream gene *OsER1*, which encodes a receptor-like protein kinase, and the downstream transcription factor *OsWRKY53*, could both positively regulate grain size through the MAPK signaling cascade [18,19].

Phytohormones play various roles in plant growth and development, stress responses, and metabolism. Several genes controlling grain size have been reported to be involved in the brassinosteroid (BR) signaling pathway, such as *GRAIN WIDTH 5* (*GW5*) [20], *GRAIN LENGTH 2* (*GL2*) [21,22] and *GRAIN LENGTH 3.1* (*GL3.1*) [23]. Another set of genes regulate grain size through the auxin signaling pathway, such as *THOUSAND GRAIN WEIGHT 6* (*TGW6*) [24], *BIG GRAIN 1* (*BG1*) [25] and *THOUSAND GRAIN WEIGHT 3* (*TGW3*) [26]. Moreover, some gibberellic acid (GA)-signaling-pathway-related genes also regulate grain size, such as *GIBBERELLIN-DEFICIENT DWARF 1* (*GDD1*) [27], *SMALL AND ROUND SEED 3* (*SRS3*) [28] and *SMALL GRAIN AND DWARF 2* (*SGD2*) [29].

Many transcription factors participate in the regulation of grain size, including the SQUAMOSA promoter binding protein-like (SPL) family (*GRAIN LENGTH AND WIEIGHT 7* (*GLW7*)/*OsSPL13*, *GRAIN WIDTH 8* (*GW8*)/*OsSPL16*, *OsSPL18*) [30–32], the basic helix– loop–helix (bHLH) family (*Awn-1* (*An-1*), *OsbHLH079, OsbHLH107*) [33–35], APETALA2 type (AP2) transcription factors (*SMALL ORGAN SIZE1* (*SMOS1*), *SUPERNUMERARY BRACT* (*SNB*), *FRIZZY PANICLE* (*FZP*)) [36–38], and other transcription factors (*GRAIN SHAPE 9* (*GS9*), *SHORT GRAIN6* (*SG6*), *GRAIN LENGTH 4* (*GL4*)) [39–41].

Although many QTLs/genes regulating grain size have been identified, the understanding of grain size regulation is still fragmented. In this study, we report the mapping, cloning and initial characterization of a novel grain size QTL, *GLW7.1* (*Grain Length, Width and Weight 7.1*) in rice, which encodes the CCT (CONSTANS, CONSTANS-LIKE, and TIM-ING OF CHLOROPHYLL A/B BINDING1) motif family protein, GHD7. *Grain number, plant height, and heading date 7* (*Ghd7*) was first reported as a major regulator of heading date, and improved yield by increasing grain number [42]. Subsequent studies revealed that it participated in a variety of other developmental processes, such as stress responses, seed germination and nitrogen utilization [43–45]. Here, we performed scanning electron microscopic analysis, yeast two-hybrid assays, split firefly luciferase complementation (SFLC) assays and expression analysis to uncover the mechanism mediated by *GLW7.1* to regulate grain size. We also conducted haplotype analysis of *Ghd7* and transcription activation assay to uncover the reason underlying different effects between three allelic GHD7 proteins. Our results provide insights into the role of *Ghd7* in regulating grain size and the effect of different amino acid substitutions on transcriptional activation activity of GHD7 proteins, and we provide a promising *Ghd7* allele for breeding rice with high yield and superior quality.

#### **2. Results 2. Results**

#### *2.1. Identification of GLW7.1 2.1. Identification of GLW7.1*

yield and superior quality.

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 3 of 24

To identify novel QTLs associated with grain size (Figure 1A), we selected two *indica* varieties, Jin23B (hereafter J23B) and CR071, that showed significant differences in grain size (Figure 1B,C) and constructed a set of 238 BC3F<sup>1</sup> lines in the J23B background (Figure S1). Three QTLs were revealed by a subsequent QTL analysis, among which the QTL located between SSR markers RM501 and RM542 on chromosome 7 was the most significant contributor to grain length (Figure 1A, Table S1). To further evaluate the genetic effect of this QTL, we developed a near-isogenic line (NIL) in the genetic background of J23B (Figure S1). Genetic analysis of BC4F<sup>2</sup> progenies derived from the NIL in 2015 showed that the dominant allele from CR071 could increase grain length, grain width and grain weight (Figure S2A–C). The similar genetic effect was further confirmed by BC5F<sup>3</sup> progenies in 2017 (Figure S2D,F). Thus, we designated this QTL as *Grain Length*, *Width and Weight 7.1* (*GLW7.1*). To identify novel QTLs associated with grain size (Figure 1A), we selected two *indica* varieties, Jin23B (hereafter J23B) and CR071, that showed significant differences in grain size (Figure 1B,C) and constructed a set of 238 BC3F1 lines in the J23B background (Figure S1). Three QTLs were revealed by a subsequent QTL analysis, among which the QTL located between SSR markers RM501 and RM542 on chromosome 7 was the most significant contributor to grain length (Figure 1A, Table S1). To further evaluate the genetic effect of this QTL, we developed a near-isogenic line (NIL) in the genetic background of J23B (Figure S1). Genetic analysis of BC4F2 progenies derived from the NIL in 2015 showed that the dominant allele from CR071 could increase grain length, grain width and grain weight (Figure S2A–C). The similar genetic effect was further confirmed by BC5F3 progenies in 2017 (Figure S2D,F). Thus, we designated this QTL as *Grain Length*, *Width and Weight 7.1* (*GLW7.1*).

of GHD7 proteins, and we provide a promising *Ghd7* allele for breeding rice with high

**Figure 1.** Field trial of *GLW7.1* NIL lines. (**A**) Primary mapping of QTLs for grain length using J23B/CR071 BC3F1 population (*n* = 238). The star symbol indicates *GLW7.1* locus. (**B**,**C**) Grain morphology. Scale bar: 5 mm. (**D**) The gross morphology of NIL plants. Scale bar: 10 cm. (**E**) Grain length. (**F**) Grain width. (**G**) Length to width ratio. (**H**) 1000-grain weight. (**I**) Plant height. (**J**) Number of tillers per plant. (**K**) Number of filled grains per panicle. (**L**) Grain yield per plant. All phenotypic data in (**E**–**L**) were measured from paddy-grown NIL plants grown under normal cultivation **Figure 1.** Field trial of *GLW7.1* NIL lines. (**A**) Primary mapping of QTLs for grain length using J23B/CR071 BC3F<sup>1</sup> population (*n* = 238). The star symbol indicates *GLW7.1* locus. (**B**,**C**) Grain morphology. Scale bar: 5 mm. (**D**) The gross morphology of NIL plants. Scale bar: 10 cm. (**E**) Grain length. (**F**) Grain width. (**G**) Length to width ratio. (**H**) 1000-grain weight. (**I**) Plant height. (**J**) Number of tillers per plant. (**K**) Number of filled grains per panicle. (**L**) Grain yield per plant. All phenotypic data in (**E**–**L**) were measured from paddy-grown NIL plants grown under normal cultivation conditions. Data were represented as mean ± s.e.m. (*n* = 15). The Student's *t*-test was used to produce *p* values.

#### *2.2. Characterization of GLW7.1 2.2. Characterization of GLW7.1*

*p* values.

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 4 of 24

Two NIL plants carrying the homozygous J23B allele *glw7.1* and CR071 allele *GLW7.1* were developed and named as NIL-J and NIL-C, respectively. Compared to NIL-J, NIL-C displayed a higher value in grain length (increased by 8%) (Figure 1B,E) and grain width (increased by 5%) (Figure 1C,F), leading to an increase in length-to-width ratio by 2% (Figure 1G) and 1000-grain weight by 22% (Figure 1H). The plant height of NIL-C was about 33 cm higher than that of NIL-J (Figure 1D,I), but no difference was observed in tiller numbers per plant (Figure 1J). Meanwhile, NIL-C displayed more filled grains per panicle (increased by 80%) than NIL-J (Figure 1K). Thus, the increase in grain weight and grain number contributed to the increase in grain yield per plant by 114% in NIL-C, in comparison with NIL-J (Figure 1L). Two NIL plants carrying the homozygous J23B allele *glw7.1* and CR071 allele *GLW7.1* were developed and named as NIL-J and NIL-C, respectively. Compared to NIL-J, NIL-C displayed a higher value in grain length (increased by 8%) (Figure 1B,E) and grain width (increased by 5%) (Figure 1C,F), leading to an increase in length-to-width ratio by 2% (Figure 1G) and 1000-grain weight by 22% (Figure 1H). The plant height of NIL-C was about 33 cm higher than that of NIL-J (Figure 1D,I), but no difference was observed in tiller numbers per plant (Figure 1J). Meanwhile, NIL-C displayed more filled grains per panicle (increased by 80%) than NIL-J (Figure 1K). Thus, the increase in grain weight and grain number contributed to the increase in grain yield per plant by 114% in NIL-C, in comparison with NIL-J (Figure 1L).

conditions. Data were represented as mean ± s.e.m. (*n* = 15). The Student's *t*-test was used to produce

The dominant *GLW7.1* locus with a yield-increasing effect has a good advantage in hybrid rice breeding. Considering that the simultaneous increase in rice grain length and width is usually accompanied by a decrease in rice quality [46,47], to further evaluate the prospects of *GLW7.1* in rice breeding, we then examined rice quality traits among NILs, including percentage of grains with chalkiness, amylose content, gel consistency, and taste value. Surprisingly, a significant reduction in grain chalkiness and a huge improvement in taste score were observed in NIL-C plants (Figure 2A,B,E), accompanied with a significant increase in amylose content and gel consistency (Figure 2C,D). These results demonstrate that the *GLW7.1* allele from CR071 is a pleiotropic gene conferring high yield and superior quality. The dominant *GLW7.1* locus with a yield-increasing effect has a good advantage in hybrid rice breeding. Considering that the simultaneous increase in rice grain length and width is usually accompanied by a decrease in rice quality [46,47], to further evaluate the prospects of *GLW7.1* in rice breeding, we then examined rice quality traits among NILs, including percentage of grains with chalkiness, amylose content, gel consistency, and taste value. Surprisingly, a significant reduction in grain chalkiness and a huge improvement in taste score were observed in NIL-C plants (Figure 2A,B,E), accompanied with a significant increase in amylose content and gel consistency (Figure 2C,D). These results demonstrate that the *GLW7.1* allele from CR071 is a pleiotropic gene conferring high yield and superior quality.

**Figure 2.** The effects of the *GLW7.1* allele on the physicochemical characteristics of milled rice. (**A**) Comparisons of chalkiness and endosperm transparency of milled rice between the *GLW7.1* NILs (*n* = 100). Scale bar: 1 cm. (**B**) Percentage of grains with chalkiness. (**C**) Amylose content. (**D**) Gel consistency. (**E**) Taste score. All phenotypic data in (**B**–**E**) were measured from paddy-grown NIL plants grown under normal cultivation conditions. Data are represented as mean ± s.e.m. (*n* = 10). The Student's *t*-test was used to produce *p* values. **Figure 2.** The effects of the *GLW7.1* allele on the physicochemical characteristics of milled rice. (**A**) Comparisons of chalkiness and endosperm transparency of milled rice between the *GLW7.1* NILs (*n* = 100). Scale bar: 1 cm. (**B**) Percentage of grains with chalkiness. (**C**) Amylose content. (**D**) Gel consistency. (**E**) Taste score. All phenotypic data in (**B**–**E**) were measured from paddy-grown NIL plants grown under normal cultivation conditions. Data are represented as mean ± s.e.m. (*n* = 10). The Student's *t*-test was used to produce *p* values.

The glume, including lemma and palea, determines the upper limit of grain size [39,46,48,49], and its size is determined by cell number and cell size. To uncover the cytolog-

ical reason underlying the difference in grain size between NIL-J and NIL-C, we performed scanning electron microscopic analysis of the outer surfaces of lemmas (Figure 3A,B). Compared with NIL-J, the value of cell length, cell width and the number of longitudinal cells were significantly higher in NIL-C (Figure 3C–E), but the number of transverse cells showed no difference (Figure 3F). To further investigate how *GLW7.1* regulates cell number and cell size, we examined the expression levels of 43 genes involved in cell cycle and cell expansion using the young panicles (8–10 cm in length) of the two NILs. As expected, expression levels of 10 cell cycle related-genes (*CYCD1;1*, *E2F*, *MCM4*, *CDC20*, *CYCA2;3*, *CYCB1;1*, *CYClaZm*, *MAPK*, *CDKB* and *KN*) and 3 cell-expansion-related genes (*EXPA3*, *EXPA5* and *EXPB3*) were significantly up-regulated (fold-change > 1.5 and *p* < 0.01) in NIL-C (Figure 3G, Table S2). These results suggest that *GLW7.1* positively regulates grain size by promoting cell division and cell expansion to increase cell number and cell size of the glume during spikelet development. The glume, including lemma and palea, determines the upper limit of grain size [39,46,48,49], and its size is determined by cell number and cell size. To uncover the cytological reason underlying the difference in grain size between NIL-J and NIL-C, we performed scanning electron microscopic analysis of the outer surfaces of lemmas (Figure 3A,B). Compared with NIL-J, the value of cell length, cell width and the number of longitudinal cells were significantly higher in NIL-C (Figure 3C–E), but the number of transverse cells showed no difference (Figure 3F). To further investigate how *GLW7.1* regulates cell number and cell size, we examined the expression levels of 43 genes involved in cell cycle and cell expansion using the young panicles (8–10 cm in length) of the two NILs. As expected, expression levels of 10 cell cycle related-genes (*CYCD1;1*, *E2F*, *MCM4*, *CDC20*, *CYCA2;3*, *CYCB1;1*, *CYClaZm*, *MAPK*, *CDKB* and *KN*) and 3 cell-expansion-related genes (*EXPA3*, *EXPA5* and *EXPB3*) were significantly up-regulated (fold-change > 1.5 and *p* < 0.01) in NIL-C (Figure 3G, Table S2). These results suggest that *GLW7.1* positively regulates grain size by promoting cell division and cell expansion to increase cell number and cell size of the glume during spikelet development.

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 5 of 24

**Figure 3.** The effect of *GLW7.1* on cell number and cell size. (**A**–**F**) Scanning electron microscope analysis. Scale bar: 100 µm. (**A**) Outer epidermal cells of NIL-J. (**B**) Outer epidermal cells of NIL-C. (**C**) Average cell length. (**D**) Average cell width. (**E**) Total number of longitudinal cells. (**F**) Total number of transverse cells. Data are represented as mean ± s.e.m. (*n* = 15). The Student's *t*-test was used to produce *p* values. (**G**) Relative expression level of 10 cell cycle-related genes and 4 cell expansion genes in young panicles (8–10 cm in length) of NIL-J and NIL-C. *OsActin* (*LOC\_Os03g50885*) was used as the control and the values of expression level in NIL-J were set to 1. Data are represented as mean ± s.e.m. (*n* = 9). The Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01).

#### *2.3. Fine Mapping of GLW7.1* in young panicles (8–10 cm in length) of NIL-J and NIL-C. *OsActin* (*LOC\_Os03g50885*) was used as the

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 6 of 24

**Figure 3.** The effect of *GLW7.1* on cell number and cell size. (**A**–**F**) Scanning electron microscope analysis. Scale bar: 100 μm. (**A**) Outer epidermal cells of NIL-J. (**B**) Outer epidermal cells of NIL-C. (**C**) Average cell length. (**D**) Average cell width. (**E**) Total number of longitudinal cells. (**F**) Total number of transverse cells. Data are represented as mean ± s.e.m. (*n* = 15). The Student's *t*-test was used to produce *p* values. (**G**) Relative expression level of 10 cell cycle-related genes and 4 cell expansion genes

To fine-map *GLW7.1*, we developed a random population consisting of 30,000 individuals from NIL-H lines (NIL plants with heterozygous allele *GLW7.1/glw7.1*) and screened recombinants in the target region using two newly developed markers (G7.1 and LG15). A total of 600 recombinants were identified, and further genotyping was conducted using 16 newly developed simple sequence repeats (SSR) and kompetitive allele-specific PCR (KASP) markers (Figure 4A, Table S1). The grain size of the 600 recombinants and 70 non-recombinants derived from the random population was investigated, and *GLW7.1* was mapped to the interval between LG18 and K5 by a subsequent QTL analysis (Figure S3A). Subsequently, we performed a progeny test by investigating the grain size of homozygous progenies derived from each recombinant, and three non-recombinant lines (NIL-J, NIL-C and NIL-H) were designated as controls. In the end, the *GLW7.1* locus was narrowed to the region between markers K17 and K19 (Figure 4B and Figure S3B). control and the values of expression level in NIL-J were set to 1. Data are represented as mean ± s.e.m. (*n* = 9). The Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). *2.3. Fine Mapping of GLW7.1*  To fine-map *GLW7.1*, we developed a random population consisting of 30,000 individuals from NIL-H lines (NIL plants with heterozygous allele *GLW7.1/glw7.1*) and screened recombinants in the target region using two newly developed markers (G7.1 and LG15). A total of 600 recombinants were identified, and further genotyping was conducted using 16 newly developed simple sequence repeats (SSR) and kompetitive allelespecific PCR (KASP) markers (Figure 4A, Table S1). The grain size of the 600 recombinants and 70 non-recombinants derived from the random population was investigated, and *GLW7.1* was mapped to the interval between LG18 and K5 by a subsequent QTL analysis (Figure S3A). Subsequently, we performed a progeny test by investigating the grain size of homozygous progenies derived from each recombinant, and three non-recombinant lines (NIL-J, NIL-C and NIL-H) were designated as controls. In the end, the *GLW7.1* locus

was narrowed to the region between markers K17 and K19 (Figures 4B and S3B).

**Figure 4.** Map-based cloning of *GLW7.1.* (**A**) Fine mapping of the *GLW7.1* using 30,000 BC5F<sup>2</sup> segregants. Numbers below the line indicate the number of recombinants between *GLW7.1* and the marker shown. (**B**) Genotypes and phenotypes of the recombinants. Grain length (mean ± s.e.m.) of three near-isogenic lines (NIL), and recombinant BC5F<sup>3</sup> lines (L102, L133, L51, L54, L191, L193, L192, L194). White bars represent chromosomal segments for J23B homozygote (progeny test named as A), black for CR071 homozygote (progeny test named as B), and grey for heterozygotes (progeny test named as H). Homozygous progenies from each line were harvested to compare phenotypic differences. The Student's *t*-test was used to produce *p* values.

By comparing the genomic sequences of Nipponbare (http://rice.uga.edu/, accessed on 18 March 2019), Zhenshan97 and Minghui63 (https://rice.hzau.edu.cn/rice\_rs3/, accessed on 18 March 2019), we found a large fragment (~38 kb/55 kb) insertion between Zhenshan97 and Nipponbare/Minghui63 in the candidate region between markers K17 and K19 (Table S3). In order to fine-map the candidate gene, the whole genome of J23B and

CR071 were separately sequenced on Illumina and Nanopore (ONT) platforms to capture the target candidate segment sequences. Compared with CR071, J23B contained a 53 kb deletion in the candidate segment between markers K17 and K19 (68 kb in J23B and 121 kb in the CR071) (Table S3). and CR071 were separately sequenced on Illumina and Nanopore (ONT) platforms to capture the target candidate segment sequences. Compared with CR071, J23B contained a 53 kb deletion in the candidate segment between markers K17 and K19 (68 kb in J23B and 121 kb in the CR071) (Table S3).

By comparing the genomic sequences of Nipponbare (http://rice.uga.edu/, accessed on 18 March 2019), Zhenshan97 and Minghui63 (https://rice.hzau.edu.cn/rice\_rs3/, accessed on 18 March 2019), we found a large fragment (~38 kb/55 kb) insertion between Zhenshan97 and Nipponbare/Minghui63 in the candidate region between markers K17 and K19 (Table S3). In order to fine-map the candidate gene, the whole genome of J23B

**Figure 4.** Map-based cloning of *GLW7.1.* (**A**) Fine mapping of the *GLW7.1* using 30,000 BC5F2 segregants. Numbers below the line indicate the number of recombinants between *GLW7.1* and the marker shown. (**B**) Genotypes and phenotypes of the recombinants. Grain length (mean ± s.e.m.) of three near-isogenic lines (NIL), and recombinant BC5F3 lines (L102, L133, L51, L54, L191, L193, L192, L194). White bars represent chromosomal segments for J23B homozygote (progeny test named as A), black for CR071 homozygote (progeny test named as B), and grey for heterozygotes (progeny test named as H). Homozygous progenies from each line were harvested to compare phenotypic

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 7 of 24

differences. The Student's *t*-test was used to produce *p* values.

Three predicted open reading frames (ORFs) (*ORF1*, *ORF2* and *ORF4*) were located in the 68 kb target region of J23B, and four predicted ORFs (*ORF1*, *ORF2*, *ORF3* and *ORF4*) were located in the corresponding 121 kb region of CR071, excluding those ORFs encoding transposon and retrotransposon proteins (Figure 5A). *ORF1*, *LOC\_Os07g15670*, encodes a putative peroxiredoxin. *ORF2*, *LOC\_Os07g15680*, encodes a putative phospholipase D. *ORF3*, *LOC\_Os07g15770*, encodes a CCT motif family protein, GHD7, which was reported to regulate heading date and yield potential in rice [42] (Figure 5B). *ORF4*, *LOC\_Os07g15820*, encodes an expressed protein with unknown function. Three predicted open reading frames (ORFs) (*ORF1*, *ORF2* and *ORF4*) were located in the 68 kb target region of J23B, and four predicted ORFs (*ORF1*, *ORF2*, *ORF3* and *ORF4*) were located in the corresponding 121 kb region of CR071, excluding those ORFs encoding transposon and retrotransposon proteins (Figure 5A). *ORF1*, *LOC\_Os07g15670*, encodes a putative peroxiredoxin. *ORF2*, *LOC\_Os07g15680*, encodes a putative phospholipase D. *ORF3*, *LOC\_Os07g15770*, encodes a CCT motif family protein, GHD7, which was reported to regulate heading date and yield potential in rice [42] (Figure 5B). *ORF4*, *LOC\_Os07g15820*, encodes an expressed protein with unknown function.

**Figure 5.** Candidate gene of *GLW7.1*. (**A**) A deletion of 53 kb was detected in J23B compared with CR071, which included the candidate gene of *GLW7.1* (the arrow symbol in magenta), while no variation was detected in the protein-coding regions of the other three ORFs (arrow symbols in cyan) and their corresponding promoter regions. (**B**) *ORF3* encodes the CCT motif family protein, GHD7. CCT domain is indicated in the cyan box. (**C**) The protein sequences of GHD7 for four *Ghd7* alleles. CR071 carried *Ghd7-3* allele of *Ghd7* compared with Minghui 63 (*Ghd7-1*), Nipponbare (*Ghd7- 2*) and J23B (*Ghd7-0*). CCT domain (aa. 190–233) is indicated with cyan background. Polymorphic amino acids are indicated by different colors using the amino acids sequence of *Ghd7-1* allele as reference. Asterisks above amino acid sequences indicate positions of amino acids (10, 30, 50, 70, 90, 110, 130, 150, 170, 190, 210, 230 and 250) and asterisks within amino acid sequences indicate stop codons. **Figure 5.** Candidate gene of *GLW7.1*. (**A**) A deletion of 53 kb was detected in J23B compared with CR071, which included the candidate gene of *GLW7.1* (the arrow symbol in magenta), while no variation was detected in the protein-coding regions of the other three ORFs (arrow symbols in cyan) and their corresponding promoter regions. (**B**) *ORF3* encodes the CCT motif family protein, GHD7. CCT domain is indicated in the cyan box. (**C**) The protein sequences of GHD7 for four *Ghd7* alleles. CR071 carried *Ghd7-3* allele of *Ghd7* compared with Minghui 63 (*Ghd7-1*), Nipponbare (*Ghd7-2*) and J23B (*Ghd7-0*). CCT domain (aa. 190–233) is indicated with cyan background. Polymorphic amino acids are indicated by different colors using the amino acids sequence of *Ghd7-1* allele as reference. Asterisks above amino acid sequences indicate positions of amino acids (10, 30, 50, 70, 90, 110, 130, 150, 170, 190, 210, 230 and 250) and asterisks within amino acid sequences indicate stop codons.

### *2.4. Positional Cloning of GLW7.1*

To ascertain the candidate gene underlying *GLW7.1*, we compared the genomic sequences of the three ORFs (*ORF1*, *ORF2* and *ORF4*) from J23B and CR071, including promoter regions and protein-coding regions, and found no variation. Therefore, *ORF3* or *Ghd7*, which is located in the 53 kb deleted region of J23B, was likely to be the candidate of *GLW7.1*. We subsequently compared the genomic sequences of *Ghd7* from Minghui63, CR071 and Nipponbare. Different from the reported *Ghd7-1* allele of Minghui63, the allele of CR071 was termed as *Ghd7-3* because of three amino acid substitutions, and the allele of Nipponbare was termed as *Ghd7-2* because of four amino acid substitutions. The allele of J23B and Zhenshan97 was termed as *Ghd7-0* because of the loss of the complete gene region [42] (Figure 5C).

To determine whether the *Ghd7-3* allele underlies the QTL *GLW7.1*, we conducted a knockout experiment by editing the allele using the CRISPR-Cas9 system in the background of NIL-C. The sequence (c. 512 TGGCCAATGTTGGGGAGAGC) in the second exon was designed as the sgRNA target site to produce mutations neighboring the CCT domain coding region (Figure 6A). Three mutated alleles were obtained, named A1, A4 and A8. The allele A4 showed minor amino acids change (AN → D) and still retained the CCT domain. By contrast, both 1 bp insertion in allele A1 and 20 bp deletion in allele A8 resulted in frameshift mutations, which caused the loss of the CCT domain (Figure 6B). As expected, the alleles A1 and A8 produced smaller grains than the allele A4 in the NIL-C background (Figure 6C–F). We also conducted a complementation experiment by expressing the cDNA of the *Ghd7-3* allele driven by its native promoter in the NIL-J background. The grains produced from the complemented lines Com1, Com2 and Com3 were larger than those from negative transgenic plants NIL-J-Neg (Figure 6C–F). Additionally, the other two alleles of *Ghd7* (*Ghd7-1* and *Ghd7-2*) could also increase grain size in the Zhenshan97 background (Figure S4). Together, these data indicate that *Ghd7-3* is the functional gene underlying *GLW7.1*.

#### *2.5. Haplotype Analysis of Ghd7*

In order to investigate natural variations in *Ghd7*, we analyzed the sequencing data of 533 core germplasms in the *Ghd7* region [50]. Based on the nonsynonymous polymorphisms in the coding region that lead to amino acid substitutions or protein premature truncation (Table S4), nine haplotypes of *Ghd7* can be identified (Figure 7A), in agreement with the types reported [51]. Of these, four major haplotypes contained 498 accessions, while the remaining five rare haplotypes contained only 27 accessions. The four major haplotypes of *Ghd7* have been reported to be strong function, weak function and loss-of-function, respectively [42]. Three major haplotypes mainly existed in *indica*: Hap1 (*Ghd7-1*) represented by Minghui63 and 9311 was the type with strong function, Hap2 (*Ghd7-3*) represented by Teqing, and CR071 was another type with strong function, and the Hap9 (*Ghd7-0*) represented by Zhenshan97 and J23B was the type with loss of function. In addition, Hap4 (*Ghd7-2*) represented by Nipponbare and Zhonghua11, was the major haplotype with weak function in *japonica*.

Given that the middle region of CCT domain proteins was previously reported to have transactivation activity [52], we performed transcription activation assay in rice protoplasts prepared from leaf sheath of Nipponbare seedlings to verify the GHD7-mediated activation among three haplotypes. The three allelic GHD7 proteins (Hap1, Hap2 and Hap4 were respectively derived from Minghui63, CR071 and Nanyangzhan) were fused to the GAL4 DNA-binding domain (GAL4DBD) to generate effectors, and the firefly luciferase gene (*LUC*) was used as the reporter (Figure 7B) [53]. Compared with the GAL4 negative control, the three allelic GHD7 proteins showed dramatically different activation activity (Figure 7B and Figure S5B). The protein GHD7-Hap4 exhibited the strongest activation activity, compared with the weaker activity exhibited by GHD7-hap1 and the weakest activity by GHD7-Hap2. The difference in activation activity of the three allelic proteins may be due to amino acid substitutions.

To test this hypothesis, we added another four allelic GHD7 proteins (Hap3, artificial HapN1 combining exon1 region of Hap1 with exon2 region of Hap2, artificial HapN2 combining exon1 region of Hap1 with exon2 region of Hap4, and artificial HapN3 combining exon1 region of Hap2 with exon2 region of Hap4) to generate effectors (Figure 7B). A comparison of GHD7-Hap1 and GHD7-Hap3 revealed that the A233P substitution seems to have no effect on activation activity (Figure 7B and Figure S5C,D), consistent with its position in the CCT DNA-binding domain. By comparing GHD7-Hap2 and GHD7-HapN1, GHD7-HapN2 and GHD7-HapN3, we found that the A111G substitution seems to also have no effect on activation activity (Figure 7B and Figure S5C,D). A comparison of GHD7-Hap3, GHD7-Hap4 and GHD7-HapN2 revealed that both the G122E/S136G and V174D substitutions could strengthen the activation activity (Figure 7B and Figure S5C,D). By contrast, the D173N substitution could weaken the activation activity, by comparing GHD7-Hap3 and GHD7-HapN1 (Figure 7B and Figure S5C,D). These results suggest that the different genetic effects of the three major haplotypes may be due to the transcription activation activity

difference among GHD7 proteins, which was promoted when G122E/S136G and V174D substitutions were contained and weakened when D173N substitution were contained. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 9 of 24

**Figure 6.** Transgenic validation of *GLW7.1*. (**A**,**B**) Gene editing targeting the upstream of CCT domain was conducted using CRISPR-Cas9 system. Compared with the amino acid change mutation (AN → D) caused by mutated allele A4, the mutated allele A1 with 1 bp insertion and A8 with 20 bp deletion resulted in frameshift mutations, which caused the loss of the CCT domain in the GHD7 protein. (**A**) The sgRNA target site is shown in magenta, the PAM sequence is shown in cyan, and asterisks above nucleotide sequences indicate positions of nucleotides (510, 530 and 550). (**B**) CCT domain is indicated in cyan background, polymorphic amino acids are indicated in magenta, and asterisks above amino acid sequences indicate positions of amino acids (170, 190, 210 and 230). (**C**,**E**) Grains morphology of transgenic lines. Scale bar: 0.5 cm. (**D**) Grain length of transgenic lines. (**F**) **Figure 6.** Transgenic validation of *GLW7.1*. (**A**,**B**) Gene editing targeting the upstream of CCT domain was conducted using CRISPR-Cas9 system. Compared with the amino acid change mutation (AN → D) caused by mutated allele A4, the mutated allele A1 with 1 bp insertion and A8 with 20 bp deletion resulted in frameshift mutations, which caused the loss of the CCT domain in the GHD7 protein. (**A**) The sgRNA target site is shown in magenta, the PAM sequence is shown in cyan, and asterisks above nucleotide sequences indicate positions of nucleotides (510, 530 and 550). (**B**) CCT domain is indicated in cyan background, polymorphic amino acids are indicated in magenta, and asterisks above amino acid sequences indicate positions of amino acids (170, 190, 210 and 230). (**C**,**E**) Grains morphology of transgenic lines. Scale bar: 0.5 cm. (**D**) Grain length of transgenic lines. (**F**) Grain width of transgenic lines. All phenotypic data were measured from paddy-grown transgenic lines grown under normal cultivation conditions. Data are represented as mean ± s.e.m. (*n* = 27 for NIL-J-Neg, 15 for NIL-J-Com1, 22 for NIL-J-Com2, 20 for NIL-J-Com3, 10 for NIL-C-A4, 14 for NIL-C-A1 and 4 for NIL-C-A8) and Duncan's multiple range tests were used to conduct statistical analysis (a, b and c indicate *p* < 0.05; A and B indicate *p* < 0.01).

Grain width of transgenic lines. All phenotypic data were measured from paddy-grown transgenic lines grown under normal cultivation conditions. Data are represented as mean ± s.e.m. (*n* = 27 for NIL-J-Neg, 15 for NIL-J-Com1, 22 for NIL-J-Com2, 20 for NIL-J-Com3, 10 for NIL-C-A4, 14 for NIL-C-A1 and 4 for NIL-C-A8) and Duncan's multiple range tests were used to conduct statistical anal-

In order to investigate natural variations in *Ghd7*, we analyzed the sequencing data of 533 core germplasms in the *Ghd7* region [50]. Based on the nonsynonymous polymorphisms in the coding region that lead to amino acid substitutions or protein premature truncation (Table S4), nine haplotypes of *Ghd7* can be identified (Figure 7A), in agreement with the types reported [51]. Of these, four major haplotypes contained 498 accessions, while the remaining five rare haplotypes contained only 27 accessions. The four major haplotypes of *Ghd7* have been reported to be strong function, weak function and loss-offunction, respectively [42]. Three major haplotypes mainly existed in *indica*: Hap1 (*Ghd7- 1*) represented by Minghui63 and 9311 was the type with strong function, Hap2 (*Ghd7-3*) represented by Teqing, and CR071 was another type with strong function, and the Hap9 (*Ghd7-0*) represented by Zhenshan97 and J23B was the type with loss of function. In addition, Hap4 (*Ghd7-2*) represented by Nipponbare and Zhonghua11, was the major haplo-

Given that the middle region of CCT domain proteins was previously reported to have transactivation activity [52], we performed transcription activation assay in rice protoplasts prepared from leaf sheath of Nipponbare seedlings to verify the GHD7-mediated activation among three haplotypes. The three allelic GHD7 proteins (Hap1, Hap2 and Hap4 were respectively derived from Minghui63, CR071 and Nanyangzhan) were fused to the GAL4 DNA-binding domain (GAL4DBD) to generate effectors, and the firefly luciferase gene (*LUC*) was used as the reporter (Figure 7B) [53]. Compared with the GAL4 negative control, the three allelic GHD7 proteins showed dramatically different activation activity (Figures 7B and S5B). The protein GHD7-Hap4 exhibited the strongest activation activity, compared with the weaker activity exhibited by GHD7-hap1 and the weakest activity by GHD7-Hap2. The difference in activation activity of the three allelic proteins

ysis (a, b and c indicate *p* < 0.05; A and B indicate *p* < 0.01).

*2.5. Haplotype Analysis of Ghd7* 

type with weak function in *japonica*.

may be due to amino acid substitutions.

**Figure 7.** Haplotype analysis of *Ghd7*. (**A**) The nonsynonymous polymorphisms in the *Ghd7* CDS region that cause changes in the amino acid sequence of 525 cultivars. CCT domain is indicated in the cyan box. Polymorphic nucleotides that cause amino acid substitutions are indicated in yellow using the amino acids sequence of Hap1 as reference, and the nucleotides that cause frame-shift mutation or absence of the gene region are indicated in magenta. S, U, W and N represent strong functional, unknown functional, weak functional and nonfunctional alleles, respectively. (**B**) The transactivation activity of different allelic GHD7 proteins. Six allelic GHD7 were fused to the GAL4 DNA-binding domain (GAL4DBD). The relative activity of firefly luciferase (LUC) under control of the 5×GAL4-binding element was measured. Renilla luciferase (REN) activity was used as internal control. Data are represented as mean ± s.e.m. (*n* = 10) and Duncan's multiple range tests were used to conduct statistical analysis (a, b, c, d and e indicate *p* < 0.05).

#### *2.6. GLW7.1 Determines Grain Size via Grain-Size Genes*

To uncover the molecular pathway by which *GLW7.1* regulates grain size, we conducted a yeast two-hybrid (Y2H) screen using the C-terminal of GHD7 (aa. 208–257) as bait and a normalized prey library derived from young panicles of Zhenshan97. A total of 325 candidate positive clones were detected on synthetic growth medium without leucine, tryptophan, histidine and adenine. Of them, seven grain-size proteins (OsFBK12, FZP, OsNAC024, OsNAC025, OsNF-YC12, RICE STARCH REGULATOR 1 (RSR1) and SNB) [37,38,54–57] and the rice DELLA protein SLENDER RICE 1 (SLR1) [58] were selected for examination by reconstructing the prey vectors with the full-length protein sequences. The interactions of GHD7 with these proteins were then confirmed by X-α-gal filter lift assays (Figure 8A). Furthermore, the interactions were also demonstrated using split firefly luciferase complementation (SFLC) assays in tobacco leaf epidermal cells (Figure 8B). These results imply that the GHD7 protein may regulate the grain size and weight through interactions with the above grain-size proteins and DELLA protein.

To explore the downstream genes of *GLW7.1* in regulating grain size, we detected the expression levels of 63 grain-size genes in NIL-J and NIL-C by qRT-PCR analysis using the young panicles (8–10 cm in length). The *GLW7.1* locus significantly up-regulated the expression of nine positive grain-size genes (fold-change > 1.5 and *p* < 0.01), including genes encoding OsBZR1 (BES1/BZR1 homolog protein) [59], OsMAPK6 [15], GLW7 [31], OsbHLH107 [35], IDEAL PLANT ARCHITECTURE 1 (IPA1) [60], SRS3 (a kinesin motor

domain protein) [61], SMALL AND ROUND SEED 5 (SRS5) (alpha-tubulin protein) [62] and OsWRKY53 [19] (Figure 9A). Furthermore, we observed an extremely significant downregulation of *OsMADS1* (fold-change = 56.7 and *<sup>p</sup>* = 6.1 <sup>×</sup> <sup>10</sup>−<sup>7</sup> ), which negatively regulates grain length [9] (Figure 9A). Furthermore, transcription activation assay in rice protoplasts prepared from leaf sheath of Zhenshan97 seedlings shown that the LUC activity driven by the *OsMAPK6* promoter was significantly induced by GHD7 (Figure 9B), which indicates that GHD7 could directly activate the expression of *OsMAPK6*. Overall, these results suggest that *GLW7.1* positively regulates grain size through a series of grain-size genes. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 12 of 24

**Figure 8.** Eight candidate proteins that interact with GHD7. (**A**) Yeast two-hybrid assays. Serial dilutions of 103–10 transformed yeast cells were spotted on the control medium DDO (SD/-Trp/-Leu) and selective medium QDO (SD/-Trp/-Leu/-His/-Ade). The protein self-dimerization of OsMADS1 was used as positive controls. Co-transformed empty vectors pGADT7 (AD) and pGBKT7 (BD) were used as negative controls. (**B**) Split firefly luciferase complementation (SFLC) assays. nLUC-tagged GHD7 was co-transformed into tobacco leaves along with the cLUC-targeted candidate proteins. The protein self-dimerization of OsMADS1 was used as positive controls. Co-transformed empty vectors nLUC and cLUC were used as negative controls. **Figure 8.** Eight candidate proteins that interact with GHD7. (**A**) Yeast two-hybrid assays. Serial dilutions of 103–10 transformed yeast cells were spotted on the control medium DDO (SD/-Trp/-Leu) and selective medium QDO (SD/-Trp/-Leu/-His/-Ade). The protein self-dimerization of OsMADS1 was used as positive controls. Co-transformed empty vectors pGADT7 (AD) and pGBKT7 (BD) were used as negative controls. (**B**) Split firefly luciferase complementation (SFLC) assays. nLUC-tagged GHD7 was co-transformed into tobacco leaves along with the cLUC-targeted candidate proteins. The protein self-dimerization of OsMADS1 was used as positive controls. Co-transformed empty vectors nLUC and cLUC were used as negative controls.

To explore the downstream genes of *GLW7.1* in regulating grain size, we detected the expression levels of 63 grain-size genes in NIL-J and NIL-C by qRT-PCR analysis using the young panicles (8–10 cm in length). The *GLW7.1* locus significantly up-regulated the expres-

coding OsBZR1 *(*BES1/BZR1 homolog protein) [59], OsMAPK6 [15], GLW7 [31], OsbHLH107 [35], IDEAL PLANT ARCHITECTURE 1 (IPA1) [60], SRS3 (a kinesin motor domain protein) [61], SMALL AND ROUND SEED 5 (SRS5) (alpha-tubulin protein) [62] and OsWRKY53 [19] (Figure 9A). Furthermore, we observed an extremely significant down-regulation of *OsMADS1* (fold-change = 56.7 and *p* = 6.1 × 10−7), which negatively regulates grain length [9] (Figure 9A). Furthermore, transcription activation assay in rice protoplasts

prepared from leaf sheath of Zhenshan97 seedlings shown that the LUC activity driven by the *OsMAPK6* promoter was significantly induced by GHD7 (Figure 9B), which indicates that GHD7 could directly activate the expression of *OsMAPK6*. Overall, these results suggest that *GLW7.1* positively regulates grain size through a series of grain-size genes.

**Figure 9.** *GLW7.1* regulates the expression of grain-size genes. (**A**) Relative expression level of 12 grain-size-related genes in young panicles (8–10 cm) of NIL-J and NIL-C. *OsActin* (*LOC\_Os03g50885*) was used as the control and the values of expression level in NIL-J were set to 1. Data were represented as mean ± s.e.m. (*n* = 9). The Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). (**B**) GHD7 induces the LUC activity driven by *OsMAPK6* promoter. Hap2 allelic GHD7 were overexpressed as effectors. The relative activity of firefly luciferase (LUC) under control of the promoter region of *OsMAPK6* was measured. Renilla luciferase (rLUC) activity was used as internal control. Data are represented as mean ± s.e.m. (*n* ≥ 5). The Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). **Figure 9.** *GLW7.1* regulates the expression of grain-size genes. (**A**) Relative expression level of 12 grain-size-related genes in young panicles (8–10 cm) of NIL-J and NIL-C. *OsActin* (*LOC\_Os03g50885*) was used as the control and the values of expression level in NIL-J were set to 1. Data were represented as mean ± s.e.m. (*n* = 9). The Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). (**B**) GHD7 induces the LUC activity driven by *OsMAPK6* promoter. Hap2 allelic GHD7 were overexpressed as effectors. The relative activity of firefly luciferase (LUC) under control of the promoter region of *OsMAPK6* was measured. Renilla luciferase (rLUC) activity was used as internal control. Data are represented as mean ± s.e.m. (*n* ≥ 5). The Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01).

#### *2.7. GLW7.1 Positively Regulates GA Biosynthesis 2.7. GLW7.1 Positively Regulates GA Biosynthesis*

Mutants with defection in GA biosynthesis, such as *gdd1* and *sgd2*, usually show reduced height and small seeds [27,29]. Using quantitative RT-PCR, we detected the upregulated expression of four GA biosynthetic genes (*KS1*, *KO2*, *KAO* and *GA3ox2*), GA catabolic gene (*GA2ox3*) and the GA signaling pathway gene *SLR1* in the young panicles of NIL-C, compared to NIL-J (Figure 10A). To examine whether *Ghd7* was involved in GA biosynthesis, we analyzed the response of NIL-J and NIL-C to exogenous GA3 and paclobutrazol (PAC, a GA biosynthesis inhibitor) treatment. The length of the second leaf sheath of NIL-J was significantly shorter than NIL-C and could be restored to the NIL-C level by exogenous GA3 treatment (Figure 10B,C). In addition, the growth of both NIL lines was simultaneously inhibited by exogenous PAC treatment, and their second leaf sheaths were almost the same in length (1.51 cm in NIL-J and 1.41 cm in NIL-C) (Figure 10B,C). Moreover, the second leaf sheath of NIL-J was almost as long as NIL-C with different GA3 concentrations treatment (Figure S6). Collectively, these findings suggest that *Ghd7* may be involved in GA biosynthesis rather than GA response. Mutants with defection in GA biosynthesis, such as *gdd1* and *sgd2*, usually show reduced height and small seeds [27,29]. Using quantitative RT-PCR, we detected the upregulated expression of four GA biosynthetic genes (*KS1*, *KO2*, *KAO* and *GA3ox2*), GA catabolic gene (*GA2ox3*) and the GA signaling pathway gene *SLR1* in the young panicles of NIL-C, compared to NIL-J (Figure 10A). To examine whether *Ghd7* was involved in GA biosynthesis, we analyzed the response of NIL-J and NIL-C to exogenous GA<sup>3</sup> and paclobutrazol (PAC, a GA biosynthesis inhibitor) treatment. The length of the second leaf sheath of NIL-J was significantly shorter than NIL-C and could be restored to the NIL-C level by exogenous GA<sup>3</sup> treatment (Figure 10B,C). In addition, the growth of both NIL lines was simultaneously inhibited by exogenous PAC treatment, and their second leaf sheaths were almost the same in length (1.51 cm in NIL-J and 1.41 cm in NIL-C) (Figure 10B,C). Moreover, the second leaf sheath of NIL-J was almost as long as NIL-C with different GA<sup>3</sup> concentrations treatment (Figure S6). Collectively, these findings suggest that *Ghd7* may be involved in GA biosynthesis rather than GA response.

To further confirm the role of *Ghd7* in GA biosynthesis, we measured endogenous GA1 levels in 2-week-old seedlings of NIL-J and NIL-C. The GA1 level in NIL-J was To further confirm the role of *Ghd7* in GA biosynthesis, we measured endogenous GA<sup>1</sup> levels in 2-week-old seedlings of NIL-J and NIL-C. The GA<sup>1</sup> level in NIL-J was approximately 84.6% (0.69 ng/g) of that in NIL-C (0.81 ng/g) (Figure 10D). The effect of exogenous GA<sup>3</sup> treatment on *Ghd7* expression was also investigated by quantitative RT-PCR. The diurnal expression pattern of *Ghd7* was consistent with the previous report [42], and the expression level of *Ghd7* was significantly inhibited at 6 h after the exogenous

GA<sup>3</sup> treatment (Figure 10E). Interestingly, the expression levels of two GA biosynthetic genes, *GA20ox2* and *GA3ox2*, were also reported to be inhibited after the exogenous GA3 treatment [27]. Together, these results suggest that *Ghd7* positively regulates endogenous GA biosynthesis. treatment (Figure 10E). Interestingly, the expression levels of two GA biosynthetic genes, *GA20ox2* and *GA3ox2*, were also reported to be inhibited after the exogenous GA3 treatment [27]. Together, these results suggest that *Ghd7* positively regulates endogenous GA biosynthesis.

approximately 84.6% (0.69 ng/g) of that in NIL-C (0.81 ng/g) (Figure 10D). The effect of exogenous GA3 treatment on *Ghd7* expression was also investigated by quantitative RT-PCR. The diurnal expression pattern of *Ghd7* was consistent with the previous report [42], and the expression level of *Ghd7* was significantly inhibited at 6 h after the exogenous GA3

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 14 of 24

**Figure 10.** GLW7.1 participates in the biosynthesis of GA. (**A**) Relative expression level of 6 GA related genes in young panicles (8–10 cm) of NIL-J and NIL-C. *OsActin* (*LOC\_Os03g50885*) was used as the control and the values of expression level in NIL-J were set to 1. Data were represented as mean ± s.e.m. (*n* = 9). Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). (**B**) Seedling phenotype of NIL-J was rescued by 0.01 μM GA3. The germinated seeds were grown in the nutrient solution that contained 0.01 μM GA3 or 10 μM paclobutrazol (PAC) and incubated at 28 °C under 13 h light/11 h dark conditions. CK, nutrient solution without any exogenous hormones. After 10 days, the seedlings were photographed. Scale bar: 5 cm. Arrow symbols indicate the second leaf sheaths. (**C**) The length of the second leaf sheaths were measured after treatment. Data are represented as mean ± s.e.m. (*n* ≥ 15). Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). (**D**) Levels of endogenous GA1 in NIL-J and NIL-C 2-week-old seedlings grown in nutrient solution without treatment. Data are represented as mean ± s.e.m. (*n* = 4). Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). (**E**) Expression pattern of *Ghd7* under GA3 treatment. The Zhonghua11 germinated seeds were grown in the nutrient solution. Two weeks later, half of the seedlings were moved to the nutrient solution that contained 50 μM GA3, while the other half were moved to the nutrient solution without treatment as control. Relative expression level of *Ghd7* were detected. *OsActin* was used as the control and the values of expression level at 0 h were set to 1. Data were represented as mean ± s.e.m. (*n* ≥ 3). Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). **3. Discussion**  *3.1. GLW7.1 Simultaneously Improves Grain Yield and Quality*  **Figure 10.** GLW7.1 participates in the biosynthesis of GA. (**A**) Relative expression level of 6 GA related genes in young panicles (8–10 cm) of NIL-J and NIL-C. *OsActin* (*LOC\_Os03g50885*) was used as the control and the values of expression level in NIL-J were set to 1. Data were represented as mean ± s.e.m. (*n* = 9). Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). (**B**) Seedling phenotype of NIL-J was rescued by 0.01 µM GA<sup>3</sup> . The germinated seeds were grown in the nutrient solution that contained 0.01 µM GA<sup>3</sup> or 10 µM paclobutrazol (PAC) and incubated at 28 ◦C under 13 h light/11 h dark conditions. CK, nutrient solution without any exogenous hormones. After 10 days, the seedlings were photographed. Scale bar: 5 cm. Arrow symbols indicate the second leaf sheaths. (**C**) The length of the second leaf sheaths were measured after treatment. Data are represented as mean ± s.e.m. (*n* ≥ 15). Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). (**D**) Levels of endogenous GA1 in NIL-J and NIL-C 2-week-old seedlings grown in nutrient solution without treatment. Data are represented as mean ± s.e.m. (*n* = 4). Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01). (**E**) Expression pattern of *Ghd7* under GA<sup>3</sup> treatment. The Zhonghua11 germinated seeds were grown in the nutrient solution. Two weeks later, half of the seedlings were moved to the nutrient solution that contained 50 µM GA<sup>3</sup> , while the other half were moved to the nutrient solution without treatment as control. Relative expression level of *Ghd7* were detected. *OsActin* was used as the control and the values of expression level at 0 h were set to 1. Data were represented as mean ± s.e.m. (*n* ≥ 3). Student's *t*-test was used to produce *p* values (\*\* indicates *p* < 0.01).

#### In this study, we performed the map-based cloning of *GLW7.1*, a novel QTL regulat-**3. Discussion**

#### ing grain size, and confirmed that *Ghd7*, encoding a CCT motif family protein, underlies *3.1. GLW7.1 Simultaneously Improves Grain Yield and Quality*

the QTL. *Ghd7* was previously reported as a major regulator of heading date and improved yield by increasing grain number [42]. Our results demonstrate that *GLW7.1,* or In this study, we performed the map-based cloning of *GLW7.1*, a novel QTL regulating grain size, and confirmed that *Ghd7*, encoding a CCT motif family protein, underlies the QTL. *Ghd7* was previously reported as a major regulator of heading date and improved yield by increasing grain number [42]. Our results demonstrate that *GLW7.1,* or *Ghd7-3,* not only increases grain number, but also increases grain weight, manifested as increases in both grain length and width (Figure 1). On the other hand, *GLW7.1* has effects on reducing chalkiness and improving cooking and eating quality (Figure 2). Thus, *GLW7.1* or *Ghd7-3* is a positive regulator of both rice yield and quality. In contrast, many genes regulating grain size have negative effects on rice quality. For example, *GS2* and *GW2* increase not only grain size and weight, but also chalkiness simultaneously [46,47]. In addition, only 73 out of 533 accessions carry the *GLW7.1* allele (Figure 7A). Therefore, *GLW7.1* is a promising allele for simultaneously improving grain yield and quality during rice breeding.

#### *3.2. Natural Variations Alter the Transcriptional Activity of GHD7*

In this study, we performed haplotype analysis of *Ghd7* using a germplasm population consisting of 533 accessions and identified nine haplotypes based on nonsynonymous SNPs in the coding region (Figure 7A). Among those, the three reported functional alleles with different effects on heading date were included, which were Hap1 (*Ghd7-1*), Hap2 (*Ghd7-3*) and Hap4 (*Ghd7-2*) [42]. In order to uncover the reason underlying different effects of the three allelic GHD7 proteins, we focused on the six amino acid substitutions among them. The subsequent transcription activation activity assay showed that the G122E/S136G and V174D substitutions strengthen the activation activity, the D173N substitution weakens it, and the A111G and A233P substitutions have no effect on it (Figure 7B and Figure S5). It has been demonstrated that acidic amino acids in the activation domain are essential for the transcriptional activation of transcription factors. For example, when all the acidic amino acid residues in the activation domain of transcription factor OCT4 were replaced by alanine, its transcriptional activation activity decreased dramatically [63]. In our study, Hap4 allelic GHD7 contains two amino acid substitutions that increase acidic amino acids, G122E and V174D; Hap2 allelic GHD7 contains an amino acid substitution that decreases the acidic amino acid, D173N. Coincidentally, these acidic amino acid substitutions in the activation domain also cause similar changes in GHD7 transcriptional activation.

Furthermore, we noticed the transcriptional repression activity of GHD7 on *ARE1* [44], which seems to contradict our results. The study investigated the transcriptional repression of GHD7 on a specific downstream gene, *ARE1*, but did not investigate the transcriptional activation or repression activity of GHD7 itself. Actually, there was one study investigating the transcriptional repression activity of GHD7 [45]. Weng et al. found that the transcriptional activation activity of GAL4-VP16-GHD7 fusion protein was significantly weaker than that of GAL4-VP16, and they draw the conclusion that GHD7 had intrinsic transcriptional repression activity. We thought the conclusion was not rigorous because GHD7 fused to the C-terminal of VP16 may weaken the transcriptional activation activity dependent on VP16 C-terminal, and the transcriptional repression activity may not be caused by GHD7, but by C-terminal fusion. In our study, we investigated the transcriptional activation activity of GHD7 itself (Figure 7B and Figure S5) and observed its transcriptional activation of *OsMAPK6* (Figure 9B). However, it was reported that the ABI4 protein has different transcriptional activity for different downstream target genes. ABI4 could bind to promoters of *GA2ox7* and *NCED6* to activate the expression of these two genes [64], and could also bind to promoters of *CYP707A1* and *CYP707A2* to inhibit the expression of these two genes [65]. We hypothesized that GHD7 may function in a similar way on target genes.

In conclusion, natural variations in GHD7 proteins affect its transcriptional activity, which is likely to influence transcription of the downstream genes, and finally result in different effects on heading date and other traits. In addition, there are another three amino acid substitutions with unknown effects on its activation activity (Figure 7A), which may endue the remaining four haplotypes with different functions. Therefore, it would be of great importance to reveal the effect of each amino acid substitution and further construct different combinations of these natural variations, which may provide new ideas for future applications of *Ghd7* in rice breeding.

#### *3.3. The Pathway of GLW7.1 Controlling Grain Size*

In this study, for the first time, we showed that *Ghd7* is a major gene regulating grain size, which has long been recognized as a major regulator of heading date in rice [42,51]. Recently, the interactions of GHD7 with the CCAAT-box-binding transcription factors, OsNF-YB11 and OsNF-YC2, have been elucidated [66]. Similar to the interactions between GHD7 and OsNF-YB11/OsNF-YC2, a physical interaction between GHD7 with OsNF-YC12, another CCAAT-box-binding transcription factor, was detected in yeast cells and tobacco leaf epidermal cells (Figure 8). For *OsNF-YC12*, the mutants showed a decrease in grain length and width, and the overexpression showed an increase, implying a positive regulation in grain size [56]. We also found the interactions between GHD7 with three AP2 domain containing proteins, FZP, RSR1 and SNB (Figure 8) [37,38,57]. Of the three, FZP positively and SNB negatively regulate grain length and width by simultaneously affecting cell proliferation and expansion in spikelet hulls. Additionally, OsFBK12, an F-box protein containing a Kelch repeat motif, which positively regulates seed size by increasing cell size but decreasing cell number [54], and two NAC-type TFs, OsNAC024 and OsNAC025, which bind to the promoters of three grain size/weight regulating genes (*GW2*, *GW5* and *DWARF 11* (*D11*)) to modulate grain size [55], were also observed in the yeast two-hybrid assays and SFLC assays (Figure 8).

Expression analysis revealed that *GLW7.1* up-regulated the transcription of eight genes having positive effects on grain size, and down-regulated that of *OsMADS1*, a negative regulator of grain size, in NIL-C panicles (Figure 9A). Interesting, the expression of *OsMADS1* was significantly reduced in the mutants of *SNB* and *FZP*, which encode proteins interacting with GHD7, implying *OsMADS1* may be downstream of these two genes [67,68]. In addition, cytological observations showed that *GLW7.1* enhances grain size by promoting cell proliferation and expansion (Figure 3A–F). Among the up-regulated positive regulation genes of grain size, *OsBZR1* can affect grain length and grain width simultaneously, which are located at the downstream of BR signaling pathway [59]. The OsSPL13 protein encoded by *GLW7* could bind to the promoter of *SRS5* and activate its expression, and the two genes could promote glume cell expansion and jointly regulate grain length [31,62]. *OsMAPK6* and *OsWRKY53*, both of which belong to the MAPK signaling cascade, could promote cell proliferation and thus increase grain length and width in rice [15,19]. *OsMADS1*, a significantly down-regulated negative regulation gene of grain length, also regulates rice grain length by affecting cell proliferation [9]. However, the transcription activation activity assay showed that GHD7 could directly activate the expression of *OsMAPK6* (Figure 9B), which indicates that GHD7 may directly bind to the OsMAPK6 promoter to regulate its expression and thus control rice grain size.

We also investigated the expression of several genes that regulate cell cycle and cell expansion and found significantly higher expression levels of 10 cell-cycle-related genes and 3 cell expansion related-genes in NIL-C panicles (Figure 3G). However, we noted the up-regulation of three negative grain-size genes and the down-regulation of one expansion gene, which may indicate some unclear regulatory mechanisms. Therefore, based on the results above, we proposed that GHD7 interacts with several grain-size-related proteins, up-regulates the positive regulation genes of grain size, such as *OsMAPK6*, and downregulates the negative, *OsMADS1*, thereby promoting the transcription of downstream cell division and expansion genes, and finally enhancing the grain size as well as grain weight (Figure S7).

#### *3.4. GHD7 Participates in GA Pathway*

In the GA signaling pathway, GA binds to the GID1 receptor, leading to the formation of a GID1-GA-DELLA complex, which further stimulates the interaction of DELLA with the SCFGID2 complex. Once recruited to SCFGID2 complex, DELLA is polyubiquitylated and then subsequently degraded through the 26S proteasome pathway [69]. Mutants with defection in the GA signaling pathway usually show reduced height and small seeds. For example, *GDD1* encodes a kinesin-like protein that directly regulates the expression of the *KO2* gene [27], and *SGD2* encodes an HD-ZIP II transcription factor that positively regulates the expression of GA biosynthesis genes [29], while their mutants reduce endogenous GA levels, leading to a decrease in cell size, thus resulting in a severely dwarfed, small-grain phenotype. Moreover, overexpressing RGG2, which mediates internal GA biosynthesis and participates in the GA signaling pathway, also causes dwarfism and small grains [7]. Moreover, the grain-size gene that was upregulated most in NIL-C, SRS3 (Figure 9A), the mutant of which showed reduced height and short seeds, was also reported to participate in regulating the expression of genes in the GA biosynthesis pathway [28]. In this study, we also observed that *Ghd7* could positively regulate endogenous GA biosynthesis (Figure 10).

In *Arabidopsis*, DELLA proteins physically interact with the CCT domain of CON-STANS (CO) and integrate gibberellic acid and photoperiod signaling to regulate flowering under long days [70]. Here, we identified the interactions of GHD7 with the rice DELLA protein, SLR1 (Figure 8), which has been reported to interact with transcription factors, such as GRF4-GIF1, NACs and OsPIL14, and inhibit their transcriptional activation of downstream genes [71–73]. Comprehensively considering the pleiotropic effect of *Ghd7* on plant height and grain size (Figure 1, Figure 6 and Figure S8), we hypothesized that GHD7 participates in the GA biosynthesis to increase grain size (Figure S7) and is regulated by the GID1-GA-DELLA module as feedback of the pathway.

#### **4. Materials and Methods**

#### *4.1. Plant Materials and Growth Conditions*

Two *indica* cultivars Jin23B (J23B) and CR071 were used to construct the QTL mapping population, with details described in Figure S1. The NIL populations used for the genetic analysis of *GLW7.1* in grain size in two years were, respectively, isolated from BC4F<sup>1</sup> and BC5F<sup>2</sup> plants with heterozygous allele *GLW7.1/glw7.1*. The progeny test was conducted in the BC5F<sup>4</sup> generation. The NILs (NIL-J and NIL-C) involved in a series of subsequent experiments were isolated from the BC5F<sup>n</sup> (*n* ≥ 6) plant with heterozygous allele *GLW7.1/glw7.1*. The NIL-J-Com lines were obtained by complementation test. The mutants (NIL-C-A lines) were obtained by CRISPR/Cas9-based genome editing. The T<sup>1</sup> generation materials were used for the analysis. The *Ghd7-2* allele lines (NIL-NYZ and NIL-ZS) were kindly provided by Guangming Lou. The Ghd7-1 allele lines (ZS-Com and ZS-Neg) were kindly provided by Lei Wang. The rice plants were grown at the experimental field of Huazhong Agricultural University in Wuhan during the summer with a density of 16 cm × 26 cm under normal field management.

#### *4.2. Trait Measurement*

Fully filled grains from each plant were used for measuring grain length, grain width, grain number, grain yield and 1000-grain weight by the yield traits scorer (YTS) platform [74] after 2014. (The phenotype of grain length for primary QTL mapping was measured by an electronic digital display caliper in 2012.) The plant height was measured from the main culm. The number of tillers per plant was counted as all fertile panicles in one plant. The percentage of grains with chalkiness, amylose content and gel consistency were measured according to the NY/T 593-2013 standard published by the Ministry of Agriculture, China (http://www.zbgb.org/27/StandardDetail1476335.htm, accessed on 3 October 2019). The taste score of milled rice was evaluated using a taste analyzer kit (Satake, RLTA10B-KC, Hiroshima, Japan) [75].

#### *4.3. Linkage Analysis and QTL Mapping*

A primary mapping population of 238 BC3F<sup>1</sup> individuals was generated from a cross between J23B and CR071 (Figure S1). The plants were then genotyped by 157 polymorphic SSR markers covering the whole genome. The grain length was measured and the linkage analysis was carried out by composite interval mapping module of the software WinQTLCart 2.5 [76]. The R/qtl package [77] was employed to plot the linkage map using the output of WinQTLCart.

For fine mapping of the *GLW7.1*, we developed a BC5F<sup>3</sup> population consisting of 30,000 individuals from NIL plants with heterozygous allele *GLW7.1*/*glw7.1*. *GLW7.1* was mapped to the interval between LG18 and K5 by a subsequent linkage analysis, and then narrowed to the region between markers K17 and K19 by progeny test. Relevant primer sequences were listed in Table S1.

#### *4.4. Scanning Electron Microscopy*

Lemmas of spikelets at the heading stage were collected and fixed in FAA solution (50% ethanol, 5% glacial acetic acid and 3.7% formaldehyde) for more than 16 h. The fixed samples were dehydrated in a graded ethanol series and then critical point dried, followed by being coated with gold. The samples were then observed with a scanning electron microscope (JEOL, JSM-6390LV, Tokyo, Japan) at an accelerating voltage of 10 kV and a spot size of 30 nm. The morphology of lemma cells was scanned at a magnification of 100× to measure cell length and cell width, and at 50× with three pictures that are combined to cover the entire lemma to measure the cell number. The cell size of the lemmas was measured from pictures using ImageJ software (NIH), and the cell number was counted manually.

#### *4.5. RNA Extraction and Expression Analysis*

Total RNA was extracted from young panicles (8–10 cm in length) and 2-week-old seedlings using TRIzol reagent (Invitrogen, 15596026, Shanghai, China). DNase I (Invitrogen, 18068015, Shanghai, China) pre-treated RNA was reverse-transcribed using the M-MLV Reverse Transcriptase kit (Promega, M170A, Madison, WI, USA) following the manufacturer's instructions. The qRT–PCR was then conducted in a total volume of 10 µL, which consisted of 5 µL of cDNA (10 ng/µL), 0.25 µL of each primer (10 µM), and 4.5 µL of 2× SYBR Green PCR Master Mix (Roche, 4913914001, Mannheim, Germany), using ABI Real-Time PCR systems (Q6 and ViiA7) according to the manufacturer's instructions. The *OsActin* gene (*LOC\_Os03g50885*) was used as the internal control. The relative gene expression levels were calculated by the 2−∆∆Ct method. Each measurement was performed with three biological samples and three replicates for each sample. Relevant primer sequences were listed in Table S2.

#### *4.6. De Novo Assembly of Two Genomes and Sequences Comparison*

In order to fine map the candidate genes, the whole genomes of J23B and CR071 were separately sequenced on Illumina and Nanopore (ONT) platforms to capture the target candidate segment sequences. For CR071, 50.3 Gb of ONT data (~135× genome coverage) and 3.8 Gb of Illumina data (~10× genome coverage) were used. For J23B, 10.2 Gb of ONT data (~28× genome coverage) and 5.2 Gb of Illumina data (~14× genome coverage) were used. The Nanopore reads were assembled using Canu [78] for CR071 and using wtdbg2 [79] for J23B. The contigs generated with Canu and wtdbg2 were polished with three rounds of Racon [80] based on Nanopore reads, followed by one round of Pilon [81] based on Illumina short reads. The assembled genomes were used in the subsequent analysis. We then captured the target candidate segment sequences in five genomes using the primer sequences of the two SNP markers and found a 53 kb deletion in J23B compared with CR071 (Table S3). The sequence comparison of J23B and CR071 was conducted by aligning the ONT reads and Illumina short reads of J23B and CR071 to the three annotated genomes (Zhenshan97, Minghui63 and Nipponbare) using minimap2 [82] and SAMtools [83] to detect the ORF underlying *GLW7.1*.

#### *4.7. Haplotype Analysis*

The variations in *Ghd7* in 533 accessions were queried from RiceVarMap v2.0 (http://ricevarmap.ncpgr.cn/, accessed on 5 July 2021) with variation IDs (vg0709154754, vg0709154664, vg0709154489, vg0709154469, vg0709154456, vg0709154415, vg0709152671, vg0709152659, vg0709152655, vg0709152479) in the coding region. We then identified nine haplotypes based on the diversity. Relevant data are listed in Table S4.

#### *4.8. Vector Construction and Transformation*

For preparing the complementation construct (Com), a 5 kb fragment, which consisted of 2.2 kb promoter and 2.8 kb genomic DNA of *Ghd7*, was amplified from CR071 and then cloned into the plant binary vector pCAMBIA1301. For preparing the CRISPR/Cas9 knockout construct, the sequence (c.512 TGGCCAATGTTGGGGAGAGC) in the second exon was designed as the sgRNA target site. The reverse complement sequence of the target site was inserted into the intermediate vector pER8-Cas9-U6 and then cloned into vector pCXUN-Cas9 [84]. The complementation construct was introduced into NIL-J and the knockout construct was introduced into NIL-C, respectively, by *Agrobacterium tumefaciens* (*EHA105*)-mediated transformation. The transgenic lines were further confirmed by PCR detection and direct sequencing. Relevant primer sequences were listed in Table S5.

#### *4.9. Transcription Activation Assay*

The coding sequences of exon1 and exon2 from different allelic *Ghd7* were amplified and combined to generate seven allelic *Ghd7.* These different allelic *Ghd7* were then fused with GAL4 DNA binding domain to generate effectors [53]. The firefly *LUC* gene was used as the reporter to analyze the transcriptional activity. Two allelic *OsMADS1* from Zhenshan97 and Nangyangzhan were fused with GAL4 DNA binding domain to generate the positive controls GALZ and GALN. To explore the downstream target genes, the Hap2 allelic *Ghd7* was fused into the 'None' effector vector, while the promoter of candidates (*OsBZR1*, *OsMADS1*, *OsMAPK6*, *OsSPL13* and *OsWRKY53*) was cloned into the '190LUC' reporter vector. The renilla *LUC* gene was used as an internal transformation control. The rice protoplasts prepared from the leaf sheath of Nipponbare and Zhenshan97 seedlings were transfected with different combinations of vectors by PEG-mediated transformation [85]. The firefly luciferase activity was detected after at least 12h using the Dual-Luciferase reporter kit (Promega, E1960, Madison, WI, USA), according to the manufacturer's protocol. Relevant primer sequences were listed in Table S5.

#### *4.10. Yeast Two-Hybrid Assays*

The prey library was derived from young panicles (5–15 cm in length) of Zhenshan97. The coding sequence of the C-terminal of GHD7 (aa. 208–257) was amplified and then cloned into the bait vector pGBKT7 (Clontech, 630443, Mountain View, CA, USA) for yeast two-hybrid screening. Full-length cDNAs of *OsFBK12*, *FZP*, *OsNAC024*, *OsNAC025*, *OsNF-YC12*, *RSR1*, *SNB* and *SLR1* were amplified and then cloned into the prey vector pGADT7 (Clontech, 630442, Mountain View, CA, USA), respectively, for subsequent yeast two-hybrid assays. All procedures were conducted according to the manufacturer's protocol. Relevant primer sequences are listed in Table S5.

#### *4.11. SFLC Assays*

Full-length cDNAs of *Ghd7*, *OsFBK12*, *FZP*, *OsNAC024*, *OsNAC025*, *OsNF-YC12*, *RSR1*, *SNB* and *SLR1* were amplified and then cloned into the nLUC vector (pCAMBIA1300- 35S-HA-Nluc-RBS) or cLUC vector (pCAMBIA1300-35S-Cluc-RBS) [9], respectively, for subsequent split firefly luciferase complementation (SFLC) assays. Vectors for testing the protein–protein interactions (such as GHD7-nLUC and FBK12-cLUC), together with the p19 silencing vector, were co-transfected into tobacco (*N. benthamiana*) leaves via *Agrobacterium tumefaciens* (*EHA105*) infiltration. After at least 48 h, injected leaves were sprayed with 5 mM luciferin (Promega, E1605, Madison, WI, USA). The LUC signal was captured using a cooling CCD imaging apparatus (Tanon, Tanon-5200, Shanghai, China). Each assay was repeated at least three times. Relevant primer sequences were listed in Table S5.

### *4.12. Exogenous GA<sup>3</sup> and PBZ Treatment of Seedlings*

The germinated seeds of NIL-J and NIL-C were grown in a nutrient solution that contained various concentrations of GA<sup>3</sup> (Sangon Biotech, A600738, Wuhan, China) or 10 µM Paclobutrazol (Sangon Biotech, A630332, Wuhan, China) and incubated at 28 ◦C under 13 h light/11 h dark conditions. After 10 days, the length of the second leaf sheaths was measured.

### *4.13. Measurement of GA<sup>1</sup>*

The shoots of 2-week-old seedlings were sampled, frozen in liquid nitrogen, and ground to fine powder. Tissues weighing 0.1 g were extracted with 1 mL 0.01 M PBS solution at 4 ◦C for 12 h. After centrifugation (12,000 rpm, 4 ◦C, 15 min), the supernatant was collected for GA<sup>1</sup> measurement. Endogenous GA<sup>1</sup> levels were detected by enzymelinked immunosorbent assay (ELISA) following the manufacturer's instructions (Jingmei Biotechnology, JM-110038P2, Yancheng, China).

#### *4.14. Statistical Analysis*

ANOVA analysis or Student's *t*-test analysis were conducted using SPSS 22 (SPSS Inc., Chicago, IL, USA).

**Supplementary Materials:** The supporting information can be downloaded at: https://www.mdpi. com/article/10.3390/ijms23158715/s1.

**Author Contributions:** R.L. conducted most of the experiments, including fine mapping, gene cloning, genetic transformation, expression analysis, scanning electron microscopic analysis, Y2H analysis, transcription activation assay, SFLC assay and GA treatment. Q.F. participated in the Y2H analysis, transcription activation assay and SFLC assay. P.L. participated in revising the manuscript. G.L. conducted parts of the phenotyping. G.C. and H.J. participated in the development of the NILs. G.G., Q.Z., J.X., X.L. and L.X. participated in field management and logistics. Y.H. designed and supervised the study. Y.H. and R.L. analyzed the data and wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Natural Science Foundation of China (U21A20211, 91935303), the Ministry of Science and Technology (2021YFF1000200, 2020YFD0900302), Hubei Science and Technology (2021ABA011) and China Agriculture Research System (CARS-01-03).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The whole-genome sequencing data in this paper can be found in the NCBI database under the following accession numbers: The whole-genome resequencing of J23B and CR071 (PRJNA791417), Illumina reads of J23B (SRR17299467), Nanopore reads of J23B (SRR17299468), Illumina reads of CR071 (SRR17299469), Nanopore reads of J23B (SRR17299470). Gene sequence in this paper can be found in the Rice Genome Annotation Project Database (http://rice.uga.edu/): *Ghd7* (*LOC\_Os07g15770*).

**Acknowledgments:** We thank Lei Wang for providing genetic materials containing *Ghd7-1*.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Dynamic Change in Starch Biosynthetic Enzymes Complexes during Grain-Filling Stages in BEIIb Active and Deficient Rice**

**Yining Ying <sup>1</sup> , Feifei Xu <sup>1</sup> , Zhongwei Zhang <sup>1</sup> , Piengtawan Tappiban <sup>1</sup> and Jinsong Bao 1,2,\***


**Abstract:** Starch is the predominant reserve in rice (*Oryza sativa* L.) endosperm, which is synthesized by the coordinated efforts of a series of starch biosynthetic-related enzymes in the form of a multiple enzyme complex. Whether the enzyme complex changes during seed development is not fully understood. Here, we investigated the dynamic change in multi-protein complexes in an *indica* rice variety IR36 (wild type, WT) and its BEIIb-deficient mutant (*be2b*) at different developmental stages. Gel permeation chromatography (GPC) and Western blotting analysis of soluble protein fractions revealed most of the enzymes except for SSIVb were eluted in smaller molecular weight fractions at the early developing stage and were transferred to higher molecular weight fractions at the later stage in both WT and *be2b*. Accordingly, protein interactions were enhanced during seed development as demonstrated by co-immunoprecipitation analysis, suggesting that the enzymes were recruited to form larger protein complexes during starch biosynthesis. The converse elution pattern from GPC of SSIVb may be attributed to its vital role in the initiation step of starch synthesis. The number of protein complexes was markedly decreased in *be2b* at all development stages. Although SSIVb could partially compensate for the role of BEIIb in protein complex formation, it was hard to form a larger protein complex containing over five proteins in *be2b*. In addition, other proteins such as PPDKA and PPDKB were possibly present in the multi-enzyme complexes by proteomic analyses of high molecular weight fractions separated from GPC. Two putative protein kinases were found to be potentially associated with starch biosynthetic enzymes. Collectively, our findings unraveled a dynamic change in the protein complex during seed development, and potential roles of BEIIb in starch biosynthesis via various protein complex formations, which enables a deeper understanding of the complex mechanism of starch biosynthesis in rice.

**Keywords:** rice; starch synthase; starch branching enzyme; protein–protein interaction; protein complex; grain filling

## **1. Introduction**

Rice (*Oryza sativa* L.) is one of the most important food crops worldwide. Starch, composed of amylose and amylopectin, is the most abundant component in the rice grain that provides nutrients to the developing embryo and seedling. The development of the rice endosperm starts after double fertilization, then the fertilized polar nuclei undergo mitotic cell proliferation and cellularization until 6 to 7 days after flowering (DAF), and it is at 5–20 DAF that genes encoding the enzymes of starch biosynthesis are vigorously expressed and accumulation of starch and other storage compounds occurs [1–3].

The starch biosynthesis pathway is orchestrated by multiple enzymes in rice endosperm, which mainly include ADP-glucose pyrophosphorylase (AGPase), granulebound starch synthase (GBSS), soluble starch synthase (SSs), starch branching enzyme (BEs), starch debranching enzyme (DBEs), and phosphorylases (Phos). Seven AGPase

**Citation:** Ying, Y.; Xu, F.; Zhang, Z.; Tappiban, P.; Bao, J. Dynamic Change in Starch Biosynthetic Enzymes Complexes during Grain-Filling Stages in BEIIb Active and Deficient Rice. *Int. J. Mol. Sci.* **2022**, *23*, 10714. https://doi.org/10.3390/ ijms231810714

Academic Editor: Prem L. Bhalla

Received: 21 July 2022 Accepted: 7 September 2022 Published: 14 September 2022 Corrected: 11 May 2023

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

subunits [4–7], eleven isoforms of SSs [8,9], three isoforms of BEs [10–12], four isoforms of DBEs [13–15], and two isoforms of Phos [16,17] were identified, and their functions in the process of starch synthesis have been elucidated by mutant and transgenic analyses. In addition, starch biosynthesis isozymes display as a complex form, rather than as monomers, to accomplish their activities in the starch biosynthesis pathway by associating with other enzymes. For example, starch plastidial phosphorylase (Pho1, also known as PHS1 and SP) form a complex with disproportionating enzyme (Dpe1) to use a broader range of sugars to synthesize malto-oligosaccharides (MOSs) and enhance the synthesis of long MOSs, which plays an important role in the initiation of starch synthesis [18]. Moreover, Pho1 may facilitate interactions between SSIIa and SSIIIa in amyloplasts; although, no direct association between SSIIa and SSIIIa was detected [19].

The direct evidence of physical interactions among starch biosynthetic enzymes was initially demonstrated in the developing endosperm of wheat [20]. Subsequently, further evidence for the identification of more phosphorylation-dependent complexes was accumulated in wheat [21], maize [22–27], barley [28], and rice [29–32]. Specifically, an approximately 230 kDa trimeric complex formed between SSI, SSIIa, and BEIIb is one of the best studied and well-characterized protein complexes, which is commonly found in cereals and contributes to the synthesis of short and intermediate amylopectin chains within the clusters [33]. A large protein complex (approximately 670 kDa) involving SSIII interacting with SSIIa, BEIIa, BEIIb, and several other proteins including pyruvate orthophosphate dikinase (PPDK) and AGPase has also been demonstrated in maize endosperm [23]. In addition, gel filtration analyses of soluble proteins from developing rice endosperm in a *japonica* background showed that rice starch biosynthetic enzymes, including SSI, SSIIa, SSIIIa, SSIVb, BEI, BEIIb, and PUL, form a larger protein complex (>700 kDa) than those found in other cereals [29].

BEs are essential for amylopectin biosynthesis because they are the only enzymes that introduce α-1,6 glycosidic bonds into α-polyglucans [34]. The three isoforms in rice (BEI, BEIIa, and BEIIb) play distinct roles, i.e., BEI transfers a variety of both short chains and intermediate chains (DP ≤ 40) while BEIIa and BEIIb preferentially form amylopectin short chains of DP 6-15 and DP 6-7, respectively [10,34]. Although BEI accounts for the largest relative activities (approximately 80%) [35], BEIIb (the major form of BEII in maize and rice) made the greatest contribution to amylopectin synthesis [10,12]. The inactivation of BEIIb resulted in the *amylose extender* (*ae*) mutant-containing starch molecules with longer amylopectin chains and fewer branches, increasing the amylose content, changing the crystalline pattern of starch from A-type to B- or C-type and producing an opaque seed phenotype in rice [10,12,36–38], whereas no such change has been so far reported in the *be1* and *be2a* mutants [11,39]. In addition, mutations in *BEIIb* gene altered the interactions of starch biosynthetic enzymes and changed the formation of multi-enzyme complexes in maize [24,25], barley [28], and rice [30]. These observations imply that BEIIb plays a specific role in the formation of multi-enzyme complexes in amylopectin synthesis. However, there is little information concerning the dynamic change in starch biosynthetic isozyme complexes at different developmental stages of rice endosperm. In addition, most studies on protein complex formation were focused on *japonica* cultivars [30–32], which have almost inactive SSIIa [40]. Therefore, whether and how novel protein complexes are present in *indica* rice with active SSIIa isozyme remains unknown.

In a previous study, we investigated the changes in starch fine structure and functional properties in *indica* rice variety IR36 (wild type, WT) and its BEIIb-deficient mutant with reduced BEIIb level (*be2b*) during three typical developmental stages from 5 to 15 DAF [41]. In the present study, we analyzed dynamic changes in total protein expression profiles and multi-enzyme complexes in the same materials by gel-filtration chromatography, co-immunoprecipitation, and Western blot analysis. In addition, proteomic analysis was further conducted to confirm the putative large complexes (>400 kDa) to reveal the presence of other proteins in the complexes and differentially expressed proteins between WT and *be2b* at different developmental stages. Finally, the essential role of protein phosphorylation

and protein kinases in multi-protein complex formation was discussed. These results not only provide valuable insights into the roles of BEIIb in the formation of multi-enzyme complexes, but also provide a novel understanding of the complex network of starch biosynthetic enzymes involved in starch accumulation in developing endosperm.

#### **2. Results**

#### *2.1. Accumulation of Starch Biosynthetic Related Enzymes (SSREs)*

To investigate the effects of BEIIb deficiency on other starch biosynthetic-related enzyme (SSRE) accumulation in rice endosperm at different developmental stages, total proteins were isolated from the WT and *be2b* mutant [41], and 11 antibodies from GBSS, SS, BE, DBE, and Pho classes were applied to detect the expression of SSREs (Figure 1A). As shown in Figure 1B, quantification of Western blot bands revealed SSREs were differentially accumulated in WT and *be2b* at different grain filling stages. It is expected that the accumulation of BEIIb significantly decreased in *be2b*, and a significant reduction in SSI and SSIIa was also observed. However, the number of SSIVb, BEI, ISA1, PUL, and Pho1 was up-regulated in *be2b*. The total GBSSI level was lower in *be2b* than in WT at 5 and 10 DAF, but higher at 15 DAF, while SSIIIa and BEIIa were higher at 5 DAF, but subsequently decreased. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 4 of 24

129

**Figure 1.** *Cont*.

**Figure 1.** The expression of starch biosynthetic related enzymes (SSREs) during different grain-filling stages (5 DAF, 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant: (**A**) Total proteins were separated by SDS-PAGE. Each lane was loaded with 50 µg of proteins. Gels were analyzed by Western blotting using the indicated isozyme-specific antibodies. The anti-actin antibody is used as a loading control. Results shown are representative of two biological replicates. (**B**) Relative density of Western blotting results measured by Image J. Data are Mean ± SD from two biological replicates. The asterisks indicate statistical significance between WT and *be2b*, as determined by the Student's *t*-test (\*, *p* < 0.05; \*\*, *p* < 0.01; \*\*\*, *p* < 0.001).

The relative levels of SSREs vary with the dynamic changes in endosperm development in both WT and *be2b*; although, the protein levels fluctuated within a narrow range (less than twofold) as estimated by immunoblot (Figure 1B). In WT, BEI, BEIIb, ISA1, and PUL increased in the steady state level over development, while SSIIa and SSIIIa decreased, and the accumulated levels of GBSSI, SSI, SSIVb, BEIIa, and Pho1 peaked at the mid developmental stage. In the *be2b* mutant, the dynamic changes in some enzymes were strongly affected by a dramatic reduction in the BEIIb level. GBSSI and Pho1 displayed an upward trend from 5 to 15 DAF, and the relative level of SSIVb fell from 5 to 10 DAF, then increased. The changing pattern of other enzymes, SSI, SSIIa, SSIIIa, BEI, BEIIa, ISA1, and PUL, were identical to that of WT.

#### *2.2. Molecular Weight Distribution of SSREs*

Soluble proteins extracted from developing seeds of WT and *be2b* at different grainfilling stages were separated by gel-filtration chromatography. The eluted fractions were denatured and analyzed by Western blotting using enzyme-specific antibodies to examine possible changes in the aggregation state of the main SSREs involved in amylopectin

biosynthesis. In addition, the phosphorylation status of BEs was also examined with sitespecific phosphopeptide antibodies (BEI-Ser562, BEIIb-Ser685) identified previously [42]. Figure 2 shows that while some proteins could be detected at smaller than their expected monomeric size derived from partial fragments of isozymes, all starch biosynthetic proteins analyzed were eluted at higher molecular weights, suggesting the existence of protein multimerization in all samples. As expected, faint signals of the BEIIb band (ca. 87 kDa) were observed in the fractions of *be2b* compared with WT. In comparison with BEI and BEIIb (Figure 2B), phosphoproteins showed elution patterns with similar elution positions but narrower molecular weight ranges (Figure 2C). Although some differences in elution patterns of isozymes were noted at 5 and 10 DAF between WT and *be2b*, striking differences were observed at 15 DAF except SSIIIa. The majority of SSI, BEI, and BEIIb were eluted in fractions 2 to 5 (>400 kDa) in WT at 15 DAF, while in the *be2b* mutant, all of these enzymes were eluted mainly in fractions 6 to 9 (100–300 kDa). Similarly, SSIIa, BEIIa, ISA1, PUL, and Pho1 also eluted earlier from the column in WT than those in *be2b* extracts, indicating that these proteins are components of larger complexes in WT rather than in *be2b*. By contrast, SSIVb was eluted in a broad molecular weight range (fractions 2–9, >100 kDa) in *be2b* at 15 DAF, but in a low molecular weight form (fractions 7–9, 100–200 kDa) in WT.

Considering the dynamic changes in endosperm development, most of the enzymes were eluted in higher molecular weight fractions at later stages of endosperm development (10 and 15 DAF) than at the earlier stage (5 DAF), in both WT and *be2b*. Specifically, most of SSI, BEI, BEIIb, and PUL were eluted in low molecular weight fractions 7 to 9 (100–200 kDa) at 5 and 10 DAF in WT, whereas the amount of these enzymes eluted in fractions 2 to 6 (>300 kDa) was considerably elevated at 15 DAF. SSIIa was mainly detected in fractions 6 and 7 (200–300 kDa) at 5 and 10 DAF in WT, while only a small amount was found in these fractions, and most of them were eluted in fractions 4 and 5 (400–600 kDa) at 15 DAF. BEIIa was detected in fractions 7, fractions 6 and 7, and fractions 2 to 4 (>500 kDa) at 5, 10, and 15 DAF in WT, respectively. ISA1 and Pho1 were found in fractions 4 to 7 (200–600 kDa) at 5 and 10 DAF in WT, whereas they were mainly eluted in fractions 2 and 3 (>600 kDa) at 15 DAF. The elution patterns of SSIIIa in different samples were almost identical; although, the overall trend was that with the development of seed, the high molecular weight fractions were increased. Similar results were also obtained from the *be2b* mutant, indicating an apparent increase in molecular mass or aggregation state of these enzymes with the endosperm development. However, an opposite trend was found in the SSIVb protein; it was inclined to form a larger protein complex at earlier stages in WT, while in *be2b*, the elution patterns of SSIVb were similar among different developmental stages, suggesting SSIVb plays a different role in protein complex formation.

**Figure 2.** *Cont*.

181

**Figure 2.** Molecular weight distributions of SSREs during different grain-filling stages (5 DAF, 10 183 DAF, 15 DAF) in wild type (WT) and *be2b* mutant. Fractions 1–12 obtained after gel filtration chro- 184 matography of soluble proteins extracted from the developing seeds were separated by SDS-PAGE 185 prior to Western blotting with indicated isozyme-specific antibodies. Each lane was loaded with 5 186 μL of 10-fold concentrated fractions. The molecular weight of protein standards is shown at the top 187 (black bars): (**A**) Western blotting of SS isozymes. (**B**) Western blotting of BE isozymes. (**C**) Western 188 blotting of phosphor BE isozymes. (**D**) Western blotting of BE isozymes. 189 **Figure 2.** Molecular weight distributions of SSREs during different grain-filling stages (5 DAF, 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant. Fractions 1–12 obtained after gel filtration chromatography of soluble proteins extracted from the developing seeds were separated by SDS-PAGE prior to Western blotting with indicated isozyme-specific antibodies. Each lane was loaded with 5 µL of 10-fold concentrated fractions. The molecular weight of protein standards is shown at the top (black bars): (**A**) Western blotting of SS isozymes. (**B**) Western blotting of BE isozymes. (**C**) Western blotting of phosphor BE isozymes. (**D**) Western blotting of DBE isozymes and Pho1.

182

#### *2.3. Co-Immunoprecipitation* 190 *2.3. Co-Immunoprecipitation*

In order to investigate the differences in possible interacting partners among SSREs 191 in WT and *be2b* at different developmental stages, soluble proteins were extracted from 192 the developing seeds and co-immunoprecipitation experiments were performed using en- 193 zyme-specific antibodies. All of the antibodies used for precipitation (anti-SSI, anti-SSIIa, 194 anti-SSIIIa, anti-SSIVb, anti-BEI, anti-BEIIa, and anti-BEIIb) were able to recognize and 195 precipitate their respective target protein (Figure 3). The results are summarized in Table 196 1. At the early grain filling stage (5 DAF), no strong pairwise association was detected by 197 reciprocal co-immunoprecipitation in both WT and *be2b*, while weak signals were ob- 198 tained from reciprocal co-immunoprecipitation experiments for the pairwise interactions 199 BEIIb–SSI, BEIIb–SSIIa, BEIIa–SSIVb, BEIIb–SSIVb, and BEIIb–BEI in WT and only BEIIa– 200 SSIVb in *be2b*. Clear Western blot signals were obtained for only one way in the co-im- 201 munoprecipitation experiments in some cases; among such interactions, BEI–SSIIa, SSIIa– 202 SSI, BEIIa–SSI, and BEIIa–SSIIa were commonly observed in WT and *be2b* (first acronym, 203 the antibody used for immunoprecipitation; second acronym, the isozyme subsequently 204 detected by Western blotting). 205 In order to investigate the differences in possible interacting partners among SSREs in WT and *be2b* at different developmental stages, soluble proteins were extracted from the developing seeds and co-immunoprecipitation experiments were performed using enzyme-specific antibodies. All of the antibodies used for precipitation (anti-SSI, anti-SSIIa, anti-SSIIIa, anti-SSIVb, anti-BEI, anti-BEIIa, and anti-BEIIb) were able to recognize and precipitate their respective target protein (Figure 3). The results are summarized in Table 1. At the early grain filling stage (5 DAF), no strong pairwise association was detected by reciprocal co-immunoprecipitation in both WT and *be2b*, while weak signals were obtained from reciprocal co-immunoprecipitation experiments for the pairwise interactions BEIIb–SSI, BEIIb–SSIIa, BEIIa–SSIVb, BEIIb–SSIVb, and BEIIb–BEI in WT and only BEIIa– SSIVb in *be2b*. Clear Western blot signals were obtained for only one way in the coimmunoprecipitation experiments in some cases; among such interactions, BEI–SSIIa, SSIIa– SSI, BEIIa–SSI, and BEIIa–SSIIa were commonly observed in WT and *be2b* (first acronym, the antibody used for immunoprecipitation; second acronym, the isozyme subsequently detected by Western blotting).

However, at later stages (10 and 15 DAF), when a higher molecular mass of enzymes 206 was detected in the gel-filtration experiments (Figure 2), strong pairwise associations were 207 However, at later stages (10 and 15 DAF), when a higher molecular mass of enzymes was detected in the gel-filtration experiments (Figure 2), strong pairwise associations were

obtained by reciprocal co-immunoprecipitation and less weak immunodetections of co- 208

obtained by reciprocal co-immunoprecipitation and less weak immunodetections of coprecipitated protein were observed. In WT, BEIIb was clearly co-immunoprecipitated by SSI, SSIIa, SSIIIa, BEI, and BEIIa at 10 DAF and SSI, SSIIa, and BEI at 15 DAF, while no interactions were observed between BEIIb and other enzymes in *be2b* mutant as expected. In addition, interactions among other starch biosynthetic enzymes were reduced in *be2b*, whereas the interaction between BEIIa and BEI was observed by reciprocal coimmunoprecipitation in *be2b* but assessed by a one-side signal in WT at 10 DAF. Pairwise associations detected by reciprocal co-immunoprecipitation were observed for SSIVb–SSIIIa and BEIIa–SSIVb and one-side signals were obtained for SSI–SSIVb, BEI–SSIVb, and BEIIb– SSIVb in WT at 10 DAF. However, at 15 DAF, when SSIVb was eluted in lower molecular weight fractions (Figure 2), only weak pairwise associations for SSIVb–SSIIIa were observed in WT. Interactions of ISA1 with other starch biosynthetic enzymes only occurred between BEIIb and ISA1 at 15 DAF in WT, indicating that ISA1 was not active in multi-enzyme complex formation in rice. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 9 of 24 precipitated protein were observed. In WT, BEIIb was clearly co-immunoprecipitated by SSI, SSIIa, SSIIIa, BEI, and BEIIa at 10 DAF and SSI, SSIIa, and BEI at 15 DAF, while no interactions were observed between BEIIb and other enzymes in *be2b* mutant as expected. In addition, interactions among other starch biosynthetic enzymes were reduced in *be2b*, whereas the interaction between BEIIa and BEI was observed by reciprocal co-immunoprecipitation in *be2b* but assessed by a one-side signal in WT at 10 DAF. Pairwise associations detected by reciprocal co-immunoprecipitation were observed for SSIVb–SSIIIa and BEIIa–SSIVb and one-side signals were obtained for SSI–SSIVb, BEI–SSIVb, and BEIIb– SSIVb in WT at 10 DAF. However, at 15 DAF, when SSIVb was eluted in lower molecular weight fractions (Figure 2), only weak pairwise associations for SSIVb–SSIIIa were observed in WT. Interactions of ISA1 with other starch biosynthetic enzymes only occurred between BEIIb and ISA1 at 15 DAF in WT, indicating that ISA1 was not active in multi-

**Figure 3.** *Cont*.

enzyme complex formation in rice.

**Figure 3.** Protein–protein interactions between SSREs during different grain-filling stages (5 DAF, 224 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant by co-immunoprecipitation (Co-IP). Soluble 225 proteins extracted from developing seeds were incubated with the indicated isozyme-specific anti- 226 bodies and protein A magnetic beads. After washing, the captured proteins were released by boiling 227 in a 1×SDS buffer and 5 μL of each supernatant was separated by SDS-PAGE prior to Western blot- 228 ting. The antibodies used for precipitation as shown in the top panels. The antibodies used for West- 229 ern blotting are indicated on the right: (**A**) Western blotting of SS isozymes. (**B**) Western blotting of 230 BE isozymes. (**C**) Western blotting of DBE isozymes. 231 **Figure 3.** Protein–protein interactions between SSREs during different grain-filling stages (5 DAF, 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant by co-immunoprecipitation (Co-IP). Soluble proteins extracted from developing seeds were incubated with the indicated isozyme-specific antibodies and protein A magnetic beads. After washing, the captured proteins were released by boiling in a 1× SDS buffer and 5 µL of each supernatant was separated by SDS-PAGE prior to Western blotting. The antibodies used for precipitation as shown in the top panels. The antibodies used for Western blotting are indicated on the right: (**A**) Western blotting of SS isozymes. (**B**) Western blotting of BE isozymes. (**C**) Western blotting of DBE isozymes and Pho1.

**Table 1.** Comparison of protein-protein interactions among starch synthetic related enzymes during 232 different grain-filling stages (5 DAF, 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant endo- 233 sperms determined by co-immunoprecipitation. 234

Strong Signalb Weak Signalb Strong Signalb Weak Signalb

BEI–SSI **BEI–SSIIa**  BEI–Pho1

**SSIIa–SSI BEIIa–SSI BEIIa–SSIIa**  SSIVb–SSIIIa SSI–SSIVb BEI–SSIVb BEIIa–BEI BEIIa–BEIIb

Sample Reciprocal One Sideda

BEIIb–SSI BEIIb–SSIIa **BEIIa–SSIVb**  BEIIb–SSIVb BEIIb–BEI

223

5 DAF WT


**Table 1.** Comparison of protein-protein interactions among starch synthetic related enzymes during different grain-filling stages (5 DAF, 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant endosperms determined by co-immunoprecipitation.

<sup>a</sup> The antibody used for immunoprecipitation is shown on the left and the coprecipitated enzyme detected by Western blotting is shown on the right. Same interactions between the wild type and *be2b* mutant are indicated in bold. <sup>b</sup> Protein bands in black or dark grey were defined as strong signals, while those in light gray (but clear) were defined as weak signals.

#### *2.4. Identification of Large Protein Complex Components*

To assess dynamic changes in the composition of high-molecular-weight complexes (>400 kDa) in WT and the *be2b* mutant, fractions 1–5 from two samples at three developmental stages separated by GPC were collected, pretreated, digested, and then analyzed by liquid chromatography–tandem mass spectrometry (LC-MS/MS). In all endosperm samples, we obtained 2524 identities representing 2488 unique proteins (Figure 4A; Table S1). Among these proteins detected, only 720 common proteins were detected in all groups, indicating that the components of large protein complexes involved in starch synthesis differ among developmental stages and between WT and mutant. The number of specific proteins identified in WT and *be2b* peaked at 5 DAF (178 and 280, respectively), followed by 10 DAF (86 and 63, respectively) and 15 DAF (168 and 12, respectively). We further analyzed the

expression patterns of key enzymes involved in starch synthesis in high-molecular-weight complexes. It is noted that PUL was not detected by LC-MS/MS due to the lack of MSU ID. As shown in Figure 5, most of the enzymes (SSI, SSIIa, BEI, BEIIa, BEIIb, and Pho1) showed higher expression levels in WT compared to *be2b*, and their abundance was significantly enhanced at 15 DAF. The combination of this proteomic information with the results from GPC and Western blot analysis of soluble protein fractions (Figure 2) suggested that more SSREs assemble into large molecular weight protein complexes in WT, especially at 15 DAF.

To detect possible changes in the composition of high-molecular-weight complexes in relation to developmental stages, differentially expressed proteins (DEPs) were identified by RStudio (version 4.1.2). The DEPs were detected between WT and *be2b* in the corresponding period and the differences between 5 and 10 DAF, 10 and 15 DAF, were also compared between WT and *be2b* (Figure 4B: Tables S2 and S3). Both developmental stages and *be2b* mutation showed a large effect on the formed protein complex. From the perspective of endosperm development, a total of 157 DEPs were differentially expressed between 5 and 10 DAF or 10 and 15 DAF in WT and *be2b*. Among these identified DEPs, some proteins were consistently up-regulated or down-regulated in both WT and *be2b*, while a few enzymes displayed converse regulative patterns between WT and *be2b*. One of the most striking DEPs, isopropylmalate synthase (IPMS1, LOC\_Os11g04670), was found to be down-regulated from 5 to 10 DAF and up-regulated from 10 to 15 DAF in WT, but inversely up-regulated from 5 to 10 DAF and down-regulated from 10 to 15 DAF in *be2b*. In comparison with WT, the numbers of up-regulated DEPs in *be2b* were 57, 37, and 20 at 5, 10, and 15 DAF, respectively. In contrast, there were 45, 31, and 83 downregulated DEPs, respectively. Among these DEPs, two proteins of known function, pyruvate orthophosphate dikinase A (PPDKA, LOC\_Os03g31750) and PPDKB (also known as FLO4, LOC\_Os05g33570), were of particular interest because both of them were dramatically down-regulated in *be2b* at 15 DAF with a fold-change of 7.3 and 14.7, respectively. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 13 of 24

291

299

**Figure 4.** Proteins in high-molecular-weight complexes (> 400 kDa) identified by LC-MS/MS during 292 different grain-filling stages (5 DAF, 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant: (**A**) Upset 293 plot showing the number of proteins that were detected in WT and *be2b* mutant endosperm among 294 the different developmental stages. Red dots indicate proteins identified in all six samples or in only 295 one single sample, blue dots indicate remaining proteins, respectively. (**B**) Number of differentially 296 expressed proteins (DEPs) in relation to *be2b* mutation and developmental stages (3-fold change 297 with P-value <0.05 and PSM >2). 298 **Figure 4.** Proteins in high-molecular-weight complexes (>400 kDa) identified by LC-MS/MS during different grain-filling stages (5 DAF, 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant: (**A**) Upset plot showing the number of proteins that were detected in WT and *be2b* mutant endosperm among the different developmental stages. Red dots indicate proteins identified in all six samples or in only one single sample, blue dots indicate remaining proteins, respectively. (**B**) Number of differentially expressed proteins (DEPs) in relation to *be2b* mutation and developmental stages (3-fold change with *p*-value < 0.05 and PSM > 2).

**Figure 5.** Heat map showing the expression patterns of key enzymes involved in starch synthesis in 300 high-molecular-weight complexes (> 400 kDa) identified by LC-MS/MS. 301 **A B** 

**Figure 5.** Heat map showing the expression patterns of key enzymes involved in starch synthesis in 300 high-molecular-weight complexes (> 400 kDa) identified by LC-MS/MS. 301 **Figure 5.** Heat map showing the expression patterns of key enzymes involved in starch synthesis in high-molecular-weight complexes (>400 kDa) identified by LC-MS/MS.

**Figure 4.** Proteins in high-molecular-weight complexes (> 400 kDa) identified by LC-MS/MS during 292 different grain-filling stages (5 DAF, 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant: (**A**) Upset 293 plot showing the number of proteins that were detected in WT and *be2b* mutant endosperm among 294 the different developmental stages. Red dots indicate proteins identified in all six samples or in only 295 one single sample, blue dots indicate remaining proteins, respectively. (**B**) Number of differentially 296 expressed proteins (DEPs) in relation to *be2b* mutation and developmental stages (3-fold change 297 with P-value <0.05 and PSM >2). 298

291

299

As protein kinases play vital roles in protein complex formation by mediating phosphorylation events, we further delved into these protein kinases identified in high-molecularweight fractions (>400 kDa) by LC-MS/MS and a total of 29 protein kinases were found (Table S4). Some of them were synchronously present in WT and *be2b*, while other kinases were not. For example, 50 -AMP-activated protein kinase (AMPK) β<sup>1</sup> subunit-related protein (LOC\_Os08g29160) existed in both WT and *be2b* at 5 and 15 DAF, and cystathionine β-synthetase (CBS) domain-containing membrane protein (LOC\_Os03g63940) existed in both WT and *be2b* at 5 DAF (Figure 6A). Further prediction of protein interaction with STRING (http://String-db.org) revealed the potential association between AMPK β1 subunit-related protein, CBS domain-containing membrane protein, and some starch biosynthetic enzymes in rice (Figure 6B). In addition, the number of protein kinases found in high-molecular-weight fractions in WT and *be2b* peaked at 5 DAF, followed by 10 DAF and 15 DAF (Table S4).

**Figure 6.** Two protein kinases identified by LC-MS/MS: (**A**) Structure of two protein kinases. (**B**) Potential association networks of two protein kinases and starch synthesis enzymes in rice endosperm from STRING, a database of known and predicted protein interactions. Displayed here is the evidence view, where different line colors represent the types of evidence for the association. **Figure 6.** Two protein kinases identified by LC-MS/MS: (**A**) Structure of two protein kinases. (**B**) Potential association networks of two protein kinases and starch synthesis enzymes in rice endosperm from STRING, a database of known and predicted protein interactions. Displayed here is the evidence view, where different line colors represent the types of evidence for the association.

#### **3. Discussion**

**3. Discussion**  Starch biosynthetic enzymes in cereal are known to interact with each other and to form protein**–**protein complexes. The main objectives of the current study were to investigate how starch biosynthetic isozyme complexes were altered during grain-filling stages between BEIIb active and deficient rice. Based on the present and past investigations, possible starch biosynthetic protein complexes in WT (Figure 7 A-C) and *be2b* (Figure 7 D-F) developing endosperm are schematically illustrated. There are some possibilities that starch biosynthetic enzymes interact with each other through glucanmediated association. However, it seems unlikely since previous studies showed that preincubation of wheat and maize amyloplast extracts with glucan-degrading enzymes to remove glucan polymers did not prevent co-immunoprecipitation of SSs, SBEs, and Pho1 Starch biosynthetic enzymes in cereal are known to interact with each other and to form protein–protein complexes. The main objectives of the current study were to investigate how starch biosynthetic isozyme complexes were altered during grain-filling stages between BEIIb active and deficient rice. Based on the present and past investigations, possible starch biosynthetic protein complexes in WT (Figure 7A–C) and *be2b* (Figure 7D–F) developing endosperm are schematically illustrated. There are some possibilities that starch biosynthetic enzymes interact with each other through glucan-mediated association. However, it seems unlikely since previous studies showed that pre-incubation of wheat and maize amyloplast extracts with glucan-degrading enzymes to remove glucan polymers did not prevent co-immunoprecipitation of SSs, SBEs, and Pho1 [21,25], and had no effect on the GPC analysis [23], suggesting that the formation of multi-enzyme complexes is due to specific protein–protein interactions.

[21,25], and had no effect on the GPC analysis [23], suggesting that the formation of multi-

enzyme complexes is due to specific protein**–**protein interactions.

**Figure 7.** Schematics of speculated starch biosynthetic protein complexes during different grain- 321 filling stages (5 DAF, 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant: (**A**) WT–5 DAF; (**B**) WT– 322 10 DAF; (**C**) WT–15 DAF; (**D**) *be2b*–5 DAF; (E) *be2b*–10 DAF; (F) *be2b*–15 DAF. SS isozymes are in 323 red, BE isozymes are in blue, DBE isozymes are in green, Pho1 is in yellow, and phosphate groups 324 are indicated in a small circle. Single, double, triple, and quadruple asterisks indicate the formation 325 of protein complexes confirmed in this study, previously reported by Crofts, et al. [29], Chen and 326 Bao [43], Crofts, et al. [30], and Chen, et al. [44], respectively. Monomeric molecular weights of each 327 isozyme are indicated at the bottom. Note that the schematics do not intend to imply specific details 328 of stoichiometry or direct contacts within the complexes since both direct and indirect physical in- 329 teractions could be detected by co-immunoprecipitation. 330 **Figure 7.** Schematics of speculated starch biosynthetic protein complexes during different grainfilling stages (5 DAF, 10 DAF, 15 DAF) in wild type (WT) and *be2b* mutant: (**A**) WT–5 DAF; (**B**) WT–10 DAF; (**C**) WT–15 DAF; (**D**) *be2b*–5 DAF; (**E**) *be2b*–10 DAF; (**F**) *be2b*–15 DAF. SS isozymes are in red, BE isozymes are in blue, DBE isozymes are in green, Pho1 is in yellow, and phosphate groups are indicated in a small circle. Single, double, triple, and quadruple asterisks indicate the formation of protein complexes confirmed in this study, previously reported by Crofts, et al. [29], Chen and Bao [43], Crofts, et al. [30], and Chen, et al. [44], respectively. Monomeric molecular weights of each isozyme are indicated at the bottom. Note that the schematics do not intend to imply specific details of stoichiometry or direct contacts within the complexes since both direct and indirect physical interactions could be detected by co-immunoprecipitation.

320

#### *3.1. Components of Multi-Protein Complexes Vary at Different Seed Development Stages* 331 *3.1. Components of Multi-Protein Complexes Vary at Different Seed Development Stages*

The formation and development of starch undergo a complicated physiological and 332 biochemical process in cereal endosperm [45]. It is clear that the structure and composition 333 of starch granules differ during endosperm development in wheat [46,47], barley [48], 334 maize [49], and rice [41], which can be summarized as an increase in amylose content, a 335 decrease in ordered degree, and changes in the amylopectin chain length distribution. All 336 of these changes in starch structure could be explained by the dynamic expression pattern 337 of genes related to starch biosynthesis [1], giving rise to a different accumulation of en- 338 zymes, as described above (Figure 1). These enzymes were demonstrated to display as a 339 complex form through post-translational modifications [50]. Consequently, we attempted 340 The formation and development of starch undergo a complicated physiological and biochemical process in cereal endosperm [45]. It is clear that the structure and composition of starch granules differ during endosperm development in wheat [46,47], barley [48], maize [49], and rice [41], which can be summarized as an increase in amylose content, a decrease in ordered degree, and changes in the amylopectin chain length distribution. All of these changes in starch structure could be explained by the dynamic expression pattern of genes related to starch biosynthesis [1], giving rise to a different accumulation of enzymes, as described above (Figure 1). These enzymes were demonstrated to display as a complex form through post-translational modifications [50]. Consequently, we attempted to reveal how these protein complexes existed in rice endosperm during seed development,

which may facilitate the understanding of starch biosynthetic protein complex formation and the ensuing starch structure and properties.

At the early phase of seed development (5 DAF), when endosperm starch begins to accumulate, most of the enzymes were eluted in low molecular weight fractions in both WT and *be2b* as demonstrated by GPC and Western blot analysis (Figure 2). Accordantly, no strong pairwise association was detected by reciprocal co-immunoprecipitation in rice endosperm at 5 DAF (Figure 3; Table 1), suggesting that proteins weakly associate to form small complexes (Figure 7A,D). However, the immunoblotting signal of SSIVb detected in high-molecular-weight fractions was stronger at 5 DAF compared with later stages (Figure 2A). At the mid-phase of seed development (10 DAF), although no remarkable difference was seen in the position of eluted fractions compared with 5 DAF, the number of proteins in higher molecular-weight fractions was visibly elevated, except for SSIVb (Figure 2). This stage involved active protein–protein interaction in rice endosperm as strong pairwise associations were obtained by reciprocal co-immunoprecipitation and less weak immunodetections of co-precipitated protein were observed (Figures 3 and 7B,E; Table 1). At the late phase of seed development (15 DAF), when endosperm starch rapidly increased, there was a notable shift in the elution patterns of the majority of soluble proteins, suggesting an apparent increase in the aggregation state of these enzymes (Figures 2 and 7C,F); although, no increase was observed in pairwise associations obtained by co-immunoprecipitation (Figure 3; Table 1), which was likely due to the steric inhibition effects emanated from the large molecular mass protein complex. These findings were consistent with previous results from developing wheat endosperm [21] and maize endosperm [24] in which the eluted SS and BE activity in high-molecular-weight fractions could only be detected at the later stages. Although the exact mechanism of SSIVb in rice endosperm starch biosynthesis remains unknown, it showed an opposite elution pattern from other patterns during endosperm development, suggesting its possible role in the interaction with other enzymes and in the formation of large complexes to accomplish its activity at the very early stage of endosperm development (Figure 7A). In fact, SSIV is known to play a significant role in the initiation step of starch synthesis involved in MOSs extension and starch granule control in barley [51], wheat [52], and *Arabidopsis* [53,54]. In rice, the mutation of SSIVb did not show a major impact on the starch structure, but the loss of both SSIVb and SSIIIa resulted in opaque seeds with spherical starch granules, suggesting that SSIVb and SSIIIa are key enzymes affecting starch granule morphology [55]. Intriguingly, in the *be2b* mutant, the elution patterns of SSIVb were similar among different developmental stages (Figure 2A). The increased SSIVb in high-molecular-weight complexes might be regulated by the deficiency of BEIIb (see below). Hence, further investigation of enzyme elution patterns and protein–protein interactions in rice mutant lines lacking SSIVb is necessary to gain a more comprehensive understanding of starch synthesis.

The dynamic changes were further emphasized by proteomic results of high-molecularweight fractions in which most of the enzymes, including SSI, SSIIa, BEI, BEIIa, BEIIb, and Pho1, showed increased trends from 5 to 15 DAF (Figure 5), indicating these proteins were inclined to form a larger complex at a later developmental stage. IPMS1 was characterized as a DEP in relation to developmental stages, which showed an inverse regulation pattern between WT and *be2b*. IPMS1 was recently identified to regulate seed vigor involved in starch hydrolysis, glycolytic activity, and energy levels [56]. Further work is needed to investigate the effects of BEIIb deficiency on different regulative patterns, which will aid in the understanding of the role of IPMS1 in multi-enzyme complex formation and carbohydrate metabolism.

#### *3.2. Effects of BEIIb Deficiency on Multi-Enzyme Complex Formation in the Developing Rice Seed*

Protein–protein interactions of starch biosynthetic enzymes are thought to be an important mechanism for efficient starch synthesis [57]. Moreover, studies on mutants that lack starch biosynthetic enzymes have indicated the importance of the formation of starch biosynthetic protein complexes because they could maintain minimal starch biosynthesis

by the recruitment of other starch biosynthetic isozymes and the formation of alternative protein complexes; although, the loss of specific starch biosynthetic enzymes may slow starch biosynthesis [33]. Rice mutant seeds with all three major SS activities reduced (*ss1L*/*ss2aL*/*ss3a*) complemented the composition of protein complexes within the same enzyme family (SS isozyme) to maximize the storage of photosynthetic products such as starch [32]. The substitution of starch biosynthetic protein complexes in BE deficient mutant of maize [24,25], barley [28], and *japonica* rice [30] could also be a similar phenomenon. Based upon previous experimental results, differences in the component of protein complexes between normal rice (WT) and the BEIIb deficiency mutant (*be2b*) in *indica* background with the active SSIIa isozyme were suspected (Figure 7).

Western blot analysis of SDS-PAGE gels of total proteins from rice endosperm shows that *be2b* used in our study showed a substantial reduction in BEIIb as well as SSI and SSIIa, whilst the protein levels of SSIVb, BEI, ISA1, PUL, and Pho1 were up-regulated compared with WT (Figure 1). Accordingly, the reduction in SSI activity is always accompanied by a deficiency of BEIIb in rice [12,58]. The combined GPC and Western blot analysis of soluble protein involved in amylopectin biosynthesis revealed that the elution pattern of major starch biosynthetic enzymes was altered in *be2b*. In *be2b*, SSI, SSIIa, SSIIIa, BEI, BEIIa, ISA1, PUL, and Pho1 were eluted in lower molecular-weight fractions, and the number of monomeric enzymes (fractions 9–12, <100 kDa) increased compared to WT, especially at 15 DAF (Figure 2). Taking co-immunoprecipitation results into account, associations among these enzymes of *be2b* were weaker than that of WT in the corresponding period (Figure 3; Table 1), indicating that the reduction in BEIIb may either reduce the formation or stability of the protein complex consisting of all these enzymes. Consequently, a considerable decrease in the number of protein complexes at all development stages was hypothesized in *be2b* compared with WT (Figure 7). In addition, no large protein complex containing more than five proteins existed in the *be2b* endosperm, even at 15 DAF (Figure 7F), implying a significant role of BEIIb in protein complex formation. By contrast, SSIVb in *be2b* rice was eluted in earlier fractions than WT and was present in a broader molecular weight range. In addition, the interactions of BEI–SSIVb and BEIIa–SSIVb were stronger in *be2b*, implying alternative protein complexes may form to compensate for BEIIb deficiency. These observations were different from previous studies in *ae–* mutants of maize [24] and *japonica* rice [30], which reported the SSI–SSIIa–BEIIb trimeric protein was substituted by the compensatory effects of BEI/BEIIa/Pho1 and BEIIa, respectively, to form the altered complexes. Maize BEI and *japonica* rice BEIIa showed similar changed elution patterns in *be2b* with an increased ratio in the 200–300 kDa fraction (the trimeric protein complex elutes) and broader eluded molecular weight range [24,30]. However, neither BEI nor BEIIa in *be2b* investigated in our study displayed this alteration in molecular weight distribution. Taken together, although BEIIb deficiency leads to a similar altered amylopectin fine structure with less amylopectin short chains and more amylopectin long chains, giving rise to increased gelatinization temperature and amylose content as a result of reduced amylopectin biosynthesis in maize [25], *japonica* rice [58], and *indica* rice [41], the underlying mechanism involved in the synthesis of such *ae* starch may differ as judged by distinct compensatory effects. Our present results suggested that SSIVb likely complemented the role of BEIIb in *be2b* as it was observed in higher molecular-weight protein complexes and had enhanced Western blot signals by co-immunoprecipitation (Figures 2A and 3A), which were analogous to that of BEI/BEIIa/Pho1 in the maize *ae–* mutant [24] and BEIIa in the *japonica* rice *ae–* mutant [30]. However, neither SSI nor SSIIa associated with SSIVb in *be2b* to form the trimeric protein complex substituted for SSI–SSIIa–BEIIb. The likely explanation is that other potential proteins might bridge the binding of SSIVb to SSI–SSIIa to form a larger alternative protein complex and enhance the association among other starch biosynthetic enzymes to maintain starch biosynthesis in *be2b* (Figure 7E,F). Several enzymes and novel non-enzymatic proteins, including MFP1 [59], PII1 [60], PHS1, and PTST2 [53], were identified to interact with SSIV in *Arabidopsis*. Recently, Zhang, et al. [61] found that carbohydrate-binding module 48 (CBM48) domain-containing protein, FLO6, had a

physical association with SSIVb in rice. In addition, correlation analysis between model fitting parameters of amylose and amylopectin chain-length distributions and ratios of the protein content of enzyme pairs revealed that rice SSIVb functionally interacted with SSI, SSIIa, BEI, BEIIb, ISA1, and PUL [62]. Further analyses of the possible interacting partners of SSIVb, the major SSIV isozymes in rice endosperm, will give a better understanding of protein complex formation and starch biosynthesis.

Proteomic evidence of high-molecular-weight fractions from GPC showed that most of the enzyme levels (SSI, SSIIa, BEI, BEIIa, BEIIb, and Pho1) significantly decreased in *be2b* compared with WT (Figure 5), which supported the conclusion that those enzymes are present in a complex with BEIIb. Both PPDKA and PPDKB were significantly downregulated in the *be2b* mutant at 15 DAF. PPDK catalyzes the formation of the CO<sup>2</sup> acceptor phosphoenolpyruvate (PEP) from pyruvate, which is most well known as a photosynthetic enzyme in C<sup>4</sup> plants [63], and the PPDKB deficient mutant caused by T-DNA insertion showed a white-core endosperm, suggesting that the essential function of PPDKB in modulating the carbon flow during grain filling [64]. It was reported that PPDK1 and/or PPDK2 existed in high molecular mass forms that require multiple starch biosynthetic enzymes as both of them were present in the partially purified C670 fraction and identified in the eluate from the SSIIIHD affinity column [23]. In addition, PPDK was identified by nano-LC-MS/MS in rice starch granule-bound proteins whose composition was thought to reflect the composition of the starch biosynthetic protein complex [30]. These findings, taken together with the proteomic investigations of high-molecular-weight fractions from GPC, suggest that PPDKA and PPDKB might also be assembled into large molecular-weight protein complexes in rice for starch biosynthesis. It can be speculated that PPDKA and PPDKB may participate in the complex with the combination of BEIIb because both of them were remarkably down-regulated in *be2b* as described above. Further analyses are required to detect the direct evidence of such protein–protein interactions using antibodies against PPDK, to validate the prediction of the composition of protein complexes.

#### *3.3. The Essential Role of Protein Phosphorylation and Protein Kinases in Multi-Protein Complexes Formation*

It is now well accepted that the ability of some key enzymes to form physical interactions with other proteins and their catalytic activity is modulated by protein phosphorylation [50,65]. The phosphorylation status of BEs was also confirmed in our study (Figure 2C), and their narrower eluted molecular weight ranges compared with BEI and BEIIb reflected a regulatory mechanism of protein phosphorylation in rice endosperm.

Protein phosphorylation is a reversible process regulated by a series of protein kinases [57]; however, the kinases present in starch biosynthetic enzyme complexes remain to be defined. One of the protein kinases eluted from high-molecular-weight fractions and characterized by LC-MS/MS in the present study was AMPK β<sup>1</sup> subunit-related protein, which contains a dual specificity protein phosphatases (DSPs) domain and an AMPK1\_carbohydrate-binding module (AMPK1\_CBM) domain (Figure 6A). DSPs regulated the activity of their substrates by dephosphorylating threonine/serine and/or tyrosine residues [66]. The AMPK1\_CBM domain showed high similarity to CBM20, CBM48, and CBM53 [67], whose surface revealed a carbohydrate-binding pocket to help the kinases, as well as AMPK1\_CBM-associated enzymes, bind to starch. AMPK is a ubiquitously expressed, highly conserved heterotrimeric kinase complex with an α (catalytic) subunit and regulatory β and γ subunits in eukaryotic animal cells, which works as a cellular energy sensor in glucose and lipid metabolism [68], while the exact role of AMPK in plants and fungi has not been declared yet. Another kinase was CBS domain-containing membrane protein, which also possessed an AMPK1\_CBM domain (Figure 6A). We boldly speculated that this kinase might work as a γ subunit of AMPK in rice because it was composed of three tandem repeats of the CBS domain (four CBS in γ subunit), which bound AMP, ADP, or ATP in a competitive manner according to the changes in energy to modulate the activity of AMPK [69]. Consistent with our hypothesis, the prediction of functional

protein association networks showed that the protein interaction of two identified kinases can be formed in the rice endosperm (Figure 6B). Interestingly, both of them also associated with several starch biosynthetic enzymes (Figure 6B), indicating they might be essential components of multi-enzyme complexes in starch biosynthesis. It is difficult to verify such associations by conventional methods because the interaction between kinase and starch biosynthetic enzymes was weak and transient, and the investigation might suffer from the low water solubility of protein. Therefore, the employment of emerging approaches such as TurboID-based proximity labeling technology [70], will shed new light on the key roles of regulatory kinases in multi-protein complexes formation and starch synthesis.

In conclusion, the evidence presented in this article clearly shows that the components of multi-protein complexes changed among different developmental stages, as well as between BEIIb active and deficient rice. With the development of endosperm, most of the enzymes tend to form larger protein complexes except for SSIVb. The converse distribution pattern of SSIVb may be attributed to its vital role in the initiation step of starch synthesis. However, in the *be2b* mutant, BEIIb is not present in any complex, which results in the reduced molecular weight of protein complexes, and a considerable decrease in the number of protein–protein interactions. The large and coordinated protein complexes formed in normal rice endosperm coincide with the fact that starch biosynthesis becomes accelerative during seed development. At the late stage, the diffusion rates of enzymes and substrates are expected to decelerate compared with the relatively aqueous environments during the early stages of seed development, so more efficient machinery is necessary. However, the massive changes in the formation of multi-enzyme complexes in *be2b* endosperm hindered efficient starch synthesis and resulted in the production of modified amylopectin with reduced branching frequency, and longer DP of branches during seed development [41], since other enzymes could only supplement but could not substitute for the crucial role of BEIIb. Again, our results emphasized the importance of the temporal and spatial coordination of multiple starch biosynthetic enzymes for efficient starch synthesis [29,33]. The loss of any components of protein complexes at any stage of seed development will potentially endow starches with altered structure. Hence, the identification of novel proteins involved in multi-enzyme complexes will not only be required to gain further insight into the mechanisms responsible for starch synthesis, but also to provide new targets for improving the quality and yield of rice grains.

#### **4. Materials and Methods**

#### *4.1. Plant Materials and Growth Conditions*

*Oryza sativa* subsp. *indica* cv. IR36 (wild type, WT) and a BEIIb-deficient mutant (*be2b*) were used in this study [41]. Both of them were planted at the experimental farm of Zhejiang University, Hangzhou, China, during the summer months under natural conditions. Individual panicles were labeled during flowering and developing seeds were handpicked from fresh plants at 5, 10, and 15 DAF, then immediately frozen in liquid nitrogen and stored at −80 ◦C. Husks and seed coats were removed before further analysis.

#### *4.2. Protein Extraction*

Total protein and soluble proteins were extracted on ice as previously described [29]. After extraction, samples were centrifuged at 14,000× *g* at 4 ◦C for 40 min. The supernatant was collected, and the protein concentration was estimated using a NANODROP 2000 spectrophotometer (Thermo, San Jose, CA, USA) before further analysis.

#### *4.3. Gel Permeation Chromatography*

Soluble protein was filtered through a 0.22 µm syringe filter to remove large particles and injected into a 500 µL sample loop, prior to fractionation by gel permeation chromatography (GPC) using Superdex 200 resin packed in a 10/300 column connected to an ÄKTATM *prime plus* chromatography system (GE Healthcare, Chicago, IL, USA). The column was routinely calibrated using commercial gel filtration calibration kits from 75 to 669 kDa (GE

Healthcare, USA) and equilibrated with 10 mM HEPES-KOH, pH 7.5, 100 mM NaCl, at a flow rate of 0.4 mL min−<sup>1</sup> . Fractions of 0.8 mL were collected when elution volume was 6.6 mL and concentrated 10-fold using an Amicon Ultra 30K centrifugal filter unit (Merck Millipore, Darmstadt, Germany). Concentrated samples were further supplemented with SDS–PAGE sample loading buffer (Beyotime Biotechnology, Haimen, China) following the manufacturer's instructions before SDS-PAGE and Western blotting.

#### *4.4. Co-Immunoprecipitation*

Co-immunoprecipitation experiments were conducted as described in Crofts, et al. [29] with some modifications. Soluble proteins, extracted as described above, were incubated with isozyme-specific antibodies and protein A magnetic beads (New England Biolabs, Ipswich, MA, USA). After extensive washing with phosphate-buffered saline (PBS) (137 mM NaCl, 10 mM Na2HPO4, 2.7 mM KCl, and 1.8 mM KH2PO<sup>4</sup> at pH 7.4), bound proteins were released by boiling in 1× sodium dodecyl sulfate (SDS) buffer (Beyotime Biotechnology, China) and 5 µL of each supernatant was analyzed by Western blotting.

#### *4.5. Western Blotting*

Proteins were resolved by 8% SDS-PAGE (SDS-polyacrylamide gel electrophoresis), and transferred onto polyvinylidene fluoride (PVDF) membranes using a transblotter. Western blotting procedure was carried out according to the method of Crofts, et al. [71]. Anti-rice GBSSI, SSI [72], SSIIa, SSIIIa [71], SSIVb [29], BEI, BEIIb [73], PUL [74], and Pho1 [16] antibodies were kindly gifted by Prof. Naoko Fujita (Akita Prefectural University, Akita, Japan). Anti-rice BEIIa (the polypeptide CAGAPGKVLVPG) and ISA1 (the polypeptide CEPLVDTGKPAPYD) [29] antibodies were produced by company (HuaAn Biotechnology Co., Ltd., Hangzhou, China). Site-specific phosphopeptide antibodies (BEI-Ser562, BEIIb-Ser685) were produced as previously described [50]. Anti-beta-actin antibody was purchased from Sigma-Aldrich (St. Louis, MO, USA). Western blot results were quantitated using the Image J software [75].

#### *4.6. Protein Preparation, Digestion, LC-MS/MS, and Data Analysis*

High-molecular-weight fractions (fractions 1–5, >400 kDa) of 300 µL separated by GPC were mixed together and concentrated 30-fold using an Amicon Ultra 30K centrifugal filter unit (Merck Millipore, Germany). FASP digestion and LC-MS/MS analysis were performed according to the method described by Pang, et al. [50]. Raw data were analyzed with Proteome Discoverer (version 2.4) and were compared with the rice database. Searches were performed using a fragment tolerance of 0.10 Da, and a parent tolerance of 20 ppm, with carbamidomethyl of cysteine as a fixed and oxidation of methionine as variable modifications. Trypsin/P was specified as the enzyme, with maximum missed cleavages allowed of up to 2. Protein identifications were accepted if they achieved a minimum of 1 peptide per protein and a false discovery rate (FDR) of <1%.

#### *4.7. Statistical and Bioinformatic Analyses*

For differentially expressed proteins (DEPs) analysis, quantified protein abundances were integrated and then normalized by the DEP (Differential Enrichment analysis of Proteomics data) package (version 1.8.0) in RStudio (version 4.1.2) as the input for statistical analysis. Candidates that met the following criteria were selected: fold-change > 3, *p* < 0.05, and peptide-spectrum match (PSM) >2. The least significant difference (LSD) multiple range test was conducted for comparison of the mean of samples at *p* < 0.05.

#### **Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms231810714/s1.

**Author Contributions:** J.B. and Y.Y. conceived the original research plans; J.B. and F.X. supervised the experiments; Y.Y., Z.Z. and P.T. performed the experiments; Y.Y. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work was financially supported by the National Natural Science Foundation of China (32201817) and the Zhejiang Provincial Natural Science Foundation (Grant No. LZ21C130003).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data supporting the findings of this study are available within the article and its supplementary materials.

**Acknowledgments:** We sincerely thank Naoko Fujita, Akita Prefectural University, Akita, Japan, for kindly providing some antibodies used in this study.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


## *Article* **Chromosome-Level Genome Assembly of a Fragrant** *Japonica* **Rice Cultivar 'Changxianggeng 1813' Provides Insights into Genomic Variations between Fragrant and Non-Fragrant** *Japonica* **Rice**

**Ruisen Lu <sup>1</sup> , Jia Liu <sup>1</sup> , Xuegang Wang <sup>2</sup> , Zhao Song <sup>3</sup> , Xiangdong Ji <sup>2</sup> , Naiwei Li 1,\*, Gang Ma <sup>2</sup> and Xiaoqin Sun 1,\***


**Abstract:** East Asia has an abundant resource of fragrant *japonica* rice that is gaining increasing interest among both consumers and producers. However, genomic resources and in particular complete genome sequences currently available for the breeding of fragrant *japonica* rice are still scarce. Here, integrating Nanopore long-read sequencing, Illumina short-read sequencing, and Hi-C methods, we presented a high-quality chromosome-level genome assembly (~378.78 Mb) for a new fragrant *japonica* cultivar 'Changxianggeng 1813', with 31,671 predicated protein-coding genes. Based on the annotated genome sequence, we demonstrated that it was the *badh2-E2* type of deletion (a 7-bp deletion in the second exon) that caused fragrance in 'Changxianggeng 1813'. Comparative genomic analyses revealed that multiple gene families involved in the abiotic stress response were expanded in the 'Changxianggeng 1813' genome, which further supported the previous finding that no generalized loss of abiotic stress tolerance associated with the fragrance phenotype. Although the 'Changxianggeng 1813' genome showed high genomic synteny with the genome of the non-fragrant *japonica* rice cultivar Nipponbare, a total of 289,970 single nucleotide polymorphisms (SNPs), 96,093 small insertion-deletion polymorphisms (InDels), and 8690 large structure variants (SVs, >1000 bp) were identified between them. Together, these genomic resources will be valuable for elucidating the mechanisms underlying economically important traits and have wide-ranging implications for genomics-assisted breeding in fragrant *japonica* rice.

**Keywords:** *BADH2*; 'Changxianggeng 1813'; fragrant rice; genome assembly; genomic variations; *japonica* cultivar

### **1. Introduction**

Fragrant rice (*Oryza sativa* L.), well-known for its pleasant and subtle aroma, is widely preferred among rice consumers and fetches a higher price than non-fragrant rice in both domestic and international markets [1,2]. At present, Basmati rice from India and Pakistan and Jasmine rice from Thailand are the two most popular fragrant rice cultivars in the world [3,4]. It is, however, noteworthy that both of these two fragrant rice cultivars belong to the *indica* subspecies, with fluffy and dry cooked rice, while consumers from East Asia, including China, Japan, and Korea tend to prefer *japonica* rice that becomes sticky and soft when cooked [4]. Although East Asia has diverse and rich germplasm resources of fragrant *japonica* rice, none of them have been fully commercially utilized [5]. Thus, breeding and cultivation of fragrant *japonica* rice has become one of the most important jobs in modern rice breeding projects, especially in East Asia [6].

Hundreds of volatile compounds have been detected in fragrant rice, but the key compound responsible for the characteristic fragrance is 2-acetyl-1-pyrroline (2AP) [2,7].

**Citation:** Lu, R.; Liu, J.; Wang, X.; Song, Z.; Ji, X.; Li, N.; Ma, G.; Sun, X. Chromosome-Level Genome Assembly of a Fragrant *Japonica* Rice Cultivar 'Changxianggeng 1813' Provides Insights into Genomic Variations between Fragrant and Non-Fragrant *Japonica* Rice. *Int. J. Mol. Sci.* **2022**, *23*, 9705. https:// doi.org/10.3390/ijms23179705

Academic Editors: Jinsong Bao and Jianhong Xu

Received: 1 August 2022 Accepted: 24 August 2022 Published: 26 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Investigations into the genetic basis of rice fragrance have demonstrated that the fragrance phenotype is largely controlled by a recessive *betaine aldehyde dehydrogenase 2* (*BADH2*) gene, which comprises 15 exons and 14 introns with approximately 7 kilobase pairs in length [7,8]. The dominant *BADH2* gene encoding the active BADH2 catalyzes the oxidation of γ-aminobutyraldehyde (AB-ald, a 2AP precursor), while the recessive *BADH2* gene encoding the inactive BADH2 results in the accumulation of both AB-ald and its cyclic form ∆ <sup>1</sup>pyrroline, and finally acetylates 2AP through enzymatic or non-enzymatic reactions [7,9–11]. To date, multiple types of loss-of-function mutations in the *BADH2* gene responsible for rice fragrance have been reported, e.g., an 8-bp deletion and three single nucleotide polymorphisms (SNPs) in the seventh exon (designated as *badh2-E7* or *badh2.1*), a 7 bp deletion in the second exon (*badh2-E2* or *badh2.2*), and an 803 bp deletion between the fourth and fifth exons (*badh2-E4/5*) [6,7,12–14]. Based on the above information, functional molecular markers have also been developed for various SNPs and small insertion-deletion polymorphisms (InDels) on different exons of *BADH2*, improving the efficiency of selection and breeding of fragrant rice, e.g., [6,13]. However, these molecular markers were usually developed based on old conventional fragrant rice varieties, most of which have relatively low yields and demonstrate inferior agronomic performance, such as weak disease resistance and low tolerance to climatic stresses [3]. Thus, excluding inferior agronomic traits has become a major challenge during introgression of fragrance alleles from old conventional fragrant rice cultivars into modern rice cultivars [3].

The new fragrant *japonica* rice cultivar 'Changxianggeng 1813' (2AP content: ~310 ug/kg, unpublished data), derived from a cross between '93-63/wuyungeng 20' and 'wuyungeng 31', was developed by the Changshu Institute of Agricultural Sciences (Changshu, Jiangsu, China) and licensed for release in Jiangsu Province, China in 2020 [15]. In contrast to old conventional fragrant *japonica* rice cultivars, 'Changxianggeng 1813' shows high resistances to lodging and blast, with both high yield and good quality [15], which is not only suitable for being widely planted in the South Yangtze River regions, but also could be used as a parental line to develop new fragrant *japonica* rice cultivars. Therefore, the construction of a high-quality genome of 'Changxianggeng 1813' is essential for further improvement of this cultivar or its progenies, as well as accelerating the process of fragrant *japonica* rice breeding, by providing genomic resources that could be directly applied to fragrant *japonica* rice cultivars.

With the rapid progress in next-generation sequencing technologies, unprecedented amounts of genomic data for wild and cultivated rice are currently available, providing important resources for investigation of the genetic basis behind rice domestication and improvement [16–29]. Within cultivated rice, however, genome assemblies for most cultivars were based on short-read sequencing data, which often showed higher levels of incompleteness than those generated from long-read sequences, e.g., [16,30–32]. Moreover, the information from highly polymorphic regions, especially for large structural variations (SVs), would often be inevitably lost by direct mapping of short sequencing reads onto a single reference genome (typically, *O. sativa japonica* Nipponbare) [23,33]. Thus, highquality, chromosome-level genome assemblies for different rice cultivars are still needed to comprehensively capture the genomic variations in rice.

In this study, we generated a high-quality, chromosome-level genome sequence of the fragrant *japonica* rice cultivar 'Changxianggeng 1813', based on Oxford Nanopore, Illumina, and Hi-C sequencing technologies. Then, we aligned the *BADH2* gene in 'Changxianggeng 1813' to previously described *BADH2* haplotypes to verify the presence/absence of the mutations associated with fragrance and determine their phylogenetic relationships. We also carried out comparative genomic analyses to provide insights into the evolution and adaptation of this cultivar. Finally, we performed a pairwise genome comparison between the fragrant *japonica* cultivar 'Changxianggeng 1813' and the non-fragrant *japonica* cultivar Nipponbare to identify genomic variations (SNPs, InDels, SVs). Of note, this is the first high-quality de novo assembly genome sequence for fragrant *japonica* rice published to date, and is expected to have a lasting direct impact on molecular breeding and improvement of fragrant *japonica* rice.

#### **2. Results and Discussion**

#### *2.1. Genome Sequencing and De Novo Assembly*

With the rapid development of genome sequencing methods, long-read sequencing technologies such as Oxford Nanopore Technology and Pacific Biosciences combined with Illumina short-read sequencing and chromosome conformation capture (Hi-C) technologies have become a common standard protocol to generate high-quality assemblies of plant genomes [34–36]. In this study, the genome of 'Changxianggeng 1813' was sequenced and de novo assembled by a hybrid strategy combining Oxford Nanopore, Illumina, and Hi-C technologies. A total of ~51.59 Gb Nanopore long reads, ~28.21 Gb Illumina short reads, and ~41.45 Gb Hi-C reads were generated, respectively, after filtering (Table S1). Using *k*-mer analysis with Illumina clean reads, the genome size of 'Changxianggeng 1813' was estimated to be approximately 394.39 Mb, with a heterozygosity rate of 0.08% (Table S2).

The 'Changxianggeng 1813' genome was preliminarily assembled based on Nanopore long reads, followed by two rounds of assembly corrections using both of Nanopore and Illumina sequencing data, which produced an assembled genome (scaffold level) with a total length of ~378.78 Mb, a GC content of 43.55%, and a surprisingly long scaffold N50 of 29.83 Mb (Table 1). Despite the super-long scaffolds generated, Hi-C data were employed to further improve assembly contiguity and obtain a high-quality reference genome of 'Changxianggeng 1813'. Approximately 62.20 million valid interaction pairs (~18.65 Gb Hi-C data), accounting for 82.55% of the unique mapped read pairs, were used for the Hi-C assembly. Consequently, all ~378.78 Mb (100%) data in 20 scaffolds were anchored and orientated onto 12 chromosomes by agglomerative hierarchical clustering, with their lengths ranging from 22.66 to 43.60 Mb (Figure 1a,b; Table S3). The 12 chromosomes could be distinguished obviously, and the near-diagonal interaction signals were considerably stronger than that of other positions within each chromosome, which illustrated that Hi-C scaffolding was reliable and robust (Figure 1a).


**Table 1.** Statistics of the genome assembly of 'Changxianggeng 1813'.

**Figure 1.** Basic characteristics of the 'Changxianggeng 1813' genome. (**a**) Genome-wide Hi-C heat map of the 'Changxianggeng 1813' genome showing chromatin interactions among the 12 chromosomes. Darker red color indicates higher contact probability. The blue boxes show the location of the chromosomes. (**b**) Circos plot of the multidimensional topography of the 12 chromosomes in the 'Changxianggeng 1813' genome. Concentric circles, from outermost to innermost, show (i) the chromosome, (ii) gene density, (iii) percentage of repeats, and (iv) GC content. The three metrics were calculated in 500 kb sliding windows. In the innermost circle, each line shows the syntenic relationship between different chromosomes, indicating the existence of large episodic duplications derived from the ancient whole-genome duplication in rice.

The accuracy and completeness of the genome assembly were first assessed by mapping the Illumina reads back to the reference genome, which revealed a mapping efficiency of 99.11% (Table 1). Furthermore, 1552 (96.16%) of 1614 conserved BUSCO (Benchmarking Universal Single-Copy Orthologs) genes, including 1514 (93.8%) complete and single-copy BUSCOs and 38 (2.4%) complete and duplicated BUSCOs (Tables 1 and S4), were completely recalled in our assembly. Taken together, these results implied that the genome assembly of 'Changxianggeng 1813' was performed well and in high completeness. In a word, the assembled genome of 'Changxianggeng 1813' was at the chromosomal level, with a longer scaffold N50 length than in most de novo assemblies of *Oryza* genomes e.g., [16,31,37,38], which provides good quality, high-resolution resources for associating traits of interest with genetic variations and identifying the genes controlling those important economical traits in fragrant *japonica* rice.

#### *2.2. Genome Annotation*

Repetitive sequences constitute large proportions of plant genomes and often play key roles in plant genome evolution due to their roles in both genome size variation and functional adaption [39,40]. Using a combination of homology-based and de novo approaches, about 50.52% of the 'Changxianggeng 1813' genome was identified as transposable elements (TEs; Table S5). Of these TEs, DNA transposons were the most abundant, occupying 24.47% of the genome, followed by long terminal repeats (LTRs; 24.15%), while long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs) accounted for 1.78% and 0.12%, respectively (Table S5). Additionally, approximately 0.97% of the 'Changxianggeng 1813' genome was identified as tandem repeats (Table S5). Indeed, among various types of repetitive sequences, LTRs are one of the most important contributors to the genome size variation across the *Oryza* genus [41,42]. It was thus speculated that the genome size of 'Changxianggeng 1813' (~379 Mb), nearly half of that in *O. granulate* (~777 Mb), is largely due to the differences of the proportion of LTRs between them (24.15% for 'Changxianggeng 1813' and 59.33% for *O. granulate*) [24].

A total of 32,165 protein-coding genes were predicted by integrating protein-based homology, de novo and transcriptome-based prediction approaches, with average gene and coding sequence lengths of 4244 and 1224 bp, respectively, and an average of 4.62 exons per gene (Table 2). Among these protein-coding genes, 98.46% (31,671) could be annotated by at least one of the six functional databases employed, including Uniprot, Pfam, GO (Gene Ontology), KEGG (Kyoto Encyclopedia of Genes and Genomes), and NR (Non-redundant) (Table S6). In addition, 3524 RNAs were identified as potential noncoding RNAs, including 1756 microRNAs (miRNAs), 715 transfer RNAs (tRNAs), 322 ribosomal RNAs (rRNAs), and 731 small nuclear RNAs (snRNAs) (Table S7).

**Table 2.** Prediction of protein-coding gene models in the 'Changxianggeng 1813' genome.


#### *2.3. Characterization and Evolutionary Analysis of BADH2 Gene*

Since 'Changxianggeng 1813' has been identified to be a fragrant rice (2AP content: ~310 ug/kg, unpublished data), we checked whether it indeed carries a recessive *BADH2* gene and investigated its allelic variation. By comparing to the non-fragrant rice cultivar Nipponbare, a 7 bp deletion (50 -CGGGCGC-30 ) in the second exon was observed at the *BADH2* allele (*badh2-E2*), which generated a premature stop codon that disabled the BADH2 enzyme (Figure S1a), thereby promoting the accumulation of 2AP in 'Changxianggeng 1813'. The *badh2-E2* allele carried by 'Changxianggeng 1813' was consistent with that in a number of Chinese fragrant *japonica* rice cultivars, e.g., 'Wuxiang 9915', 'Xiangjing 111', 'Zhenxiangjing 5', suggesting that this allele is common by descent in Chinese fragrant rice cultivars [8,13]. Phylogenetic analysis of *BADH2* haplotype data (Figure S1b) further showed that the *badh2-E2* allele in 'Changxianggeng 1813' clustered together with a previously identified haplotype sequence endemic to two cultivated *japonica* rice (see [2] for full details). Taken together, these findings provided additional support for the previous studies indicating that *badh2-E2* may arise and become fixed in the *japonica* gene pool [7,8,13]. However, it is worth noting that the sample size analyzed to date is still inadequate to comprehensively detect the origin and evolution of *badh2-E2* in fragrant rice.

#### *2.4. Genome Synteny*

Comparisons of genome synteny within and between species have provided a framework to reveal evolutionary processes that lead to diversity of genome structure and function in many lineages [43]. Nowadays, genome synteny analysis has become an integral part of comparative genomics for almost every new published genome. Using the MCScan toolkit, a total of 19,912 and 24,975 gene pairs were identified in the intergenomic comparisons of the fragrant cultivar 'Changxianggeng 1813' vs. the non-fragrant cultivar Nipponbare (Figure 2a), and 'Changxianggeng 1813' vs. the common wild rice *O. rufipogon* (Figure 2b), respectively. In general, extremely high degrees of collinearity were observed

in these two comparisons; each chromosome of 'Changxianggeng 1813' corresponded to one chromosome of Nipponbare and *O. rufipogon*, respectively, although some interchromosomal rearrangement events were detected (Figure 2c). It was also found that there were fewer scattered points in the comparison of 'Changxianggeng 1813' vs. Nipponbare, than in 'Changxianggeng 1813' vs. *O. rufipogon* (Figure 2a,b), suggesting a close relationship between 'Changxianggeng 1813' and Nipponbare.

**Figure 2.** Chromosome synteny between the fragrant *japonica* cultivar 'Changxianggeng 1813' and its close relatives, i.e., the non-fragrant *japonica* cultivar Nipponbare and the common wild rice *O. rufipogon*. (**a**,**b**) Syntenic dot plots for intergenomic comparisons of (**a**) 'Changxianggeng 1813' vs. Nipponbare, and (**b**) 'Changxianggeng 1813' vs. *O. rufipogon*. (**c**) Macrosyntenic relationship pattern between 'Changxianggeng 1813' and its two close relatives (Nipponbare and *O. rufipogon*).

#### *2.5. Gene Family Evolution and Phylogenetic Relationships*

Of the 32,165 protein-coding genes identified in the 'Changxianggeng 1813' genome, 11,413 were classified as single-copy orthologs, 9531 as multiple-copy orthologs, 2899 as unique paralogs, and 17,837 as other paralogs (Figure 3a). All the 32,165 protein-coding genes were clustered into 26,658 gene families, of which 1576 (5.91%) were unique in the 'Changxianggeng 1813' genome (Table S8). A total of 7658 single-copy orthologous genes shared among the six *Oryza* genomes were identified and used for phylogenetic analysis. Phylogenetic analysis strongly supported that the fragrant cultivar 'Changxianggeng 1813' and the non-fragrant cultivar Nipponbare, both of which belong to the *japonica* subspecies, were sister to each other, and jointly sister to the common wild rice *O. rufipogon* (Figure 3b). The divergence time of 'Changxianggeng 1813' and Nipponbare was estimated to be 0.5 (0.4–0.6) million years ago (Ma; Figure 3b), unambiguously older than the date of domestication of the rice (10,000 years ago). One possible explanation for this is that the divergence

between these two cultivars from two different subpopulations (temperate *japonica* and *aromatic*) is in part due to differentiation of their ancestral populations in different locations and/or at different times. Furthermore, although our estimated divergence time is slightly older than the date for *japonica* and *indica* (about 0.44 Ma), this estimate conformed generally with the previous findings that (i) genomic variation in the rice is deeply partitioned and that divergent haplotypes can be readily associated with major varietal groups and subpopulations, and (ii) rice domestication proceeded from multiple predifferentiated ancestral pools much earlier than the beginning of agriculture in Asia [37,44].

**Figure 3.** (**a**) Comparison of copy numbers in gene clusters residing in the genomes of 'Changxianggeng 1813' and five other members of *Oryza*. (**b**) Phylogenetic tree inferred from single-copy orthogroups. Numbers near each node refer to divergence times (in million years ago, Ma). Bootstrap values are all 100. Numbers marked in green and red represent gene family expansions and contractions, respectively. (**c**) Visualization of results from GO enrichment analysis of significantly expanded gene families in 'Changxianggeng 1813'. The top 20 GO terms were selected for display after using the Benjamini–Hochberg multiple test correction for *p*-value adjustment (adjusted *p*-value < 0.01).

Gene family expansion and contraction are generally considered important evolutionary mechanisms that contribute to evolutionary adaption to the environment [45,46]. To reveal gene family expansion and contraction related to environmental stress in 'Changxianggeng 1813', we undertook a computational analysis of gene family sizes among different members of *Oryza*. Our results indicated that 896 gene families in 'Changxianggeng 1813' genome underwent expansion, while 1467 genes families underwent contraction (Figure 3b). Functional enrichment analysis of expanded gene families revealed 25 GO

terms that were significantly enriched (*p*.adjust < 0.01). The expanded gene families were mainly enriched in genes associated with RNA-DNA hybrid ribonuclease activity (GO:0004523, *<sup>p</sup>*.adjust = 2.6 <sup>×</sup> <sup>10</sup>−30), hydrogen peroxide catabolic process (GO:0042744, *<sup>p</sup>*.adjust = 1.71 <sup>×</sup> <sup>10</sup>−37), peroxidase activity (GO:0004601, *<sup>p</sup>*.adjust = 5.05 <sup>×</sup> <sup>10</sup>−35), and response to oxidative stress (GO:0006979, *<sup>p</sup>*.adjust = 8.24 <sup>×</sup> <sup>10</sup>−32) (Figure 3c). It needs to be emphasized here that oxidative stress is regarded as a major damaging factor in plants exposed to a variety of abiotic stresses [47]. Thus, these expanded oxidative stress response genes may have a role in conferring enhanced stress tolerance to 'Changxianggeng 1813' during periods of rapid climate change. This result also supported the previous findings that *BADH2* does not play a role in abiotic stress tolerance in rice, and no generalized loss of abiotic stress tolerance associated with the fragrance phenotype [48].

#### *2.6. Genomic Variations between 'Changxianggeng 1813' and Nipponbare*

Since large-scale genome sequencing has been undertaken in rice, a substantial number of genetic variations, such as single nucleotide polymorphisms (SNPs) and small insertion-deletion polymorphisms (InDels), have become available across the rice genome, e.g., [26,49]. However, few recent studies have been concentrated on fragrant *japonica* rice, resulting in a severe lack of knowledge for valuable fragrant *japonica* rice, especially in East Asia. Although the genome assembly of the fragrant *japonica* cultivar 'Changxianggeng 1813' very closely matched the genome of non-fragrant *japonica* cultivar Nipponbare (Figure 2a,c), a total of 289,970 SNPs and 96,093 InDels were identified in the 'Changxianggeng 1813' genome when compared to the Nipponbare genome, with an average density of 0.76 SNPs and 0.25 InDels per kb, respectively (Figure 4; Tables S9 and S10). The number of SNPs and InDels per 1 Mb varied considerably across each chromosome. In particular, chromosome 9 had the highest density of both SNPs (208.3 Mb−<sup>1</sup> ) and InDels (49.0 Mb−<sup>1</sup> ), while chromosome 4 had the lowest SNP (19.6 Mb−<sup>1</sup> ) and Indel (1.2 Mb−<sup>1</sup> ) densities (Figure 4a,b; Tables S9 and S10). The distribution of SNPs and InDels was also uneven within a chromosome. For example, on chromosome 1, SNPs and InDels were dense from 11.9 to 12.7 Mb, but sparse from the regions of 9.8–10.5 and 17.6–20.0 Mb (Figure 4a,b). The distributions of SNPs and InDels were positively correlated, and both were more abundant in intergenic spacer (IGS) regions. More specifically, about 67.12% (13,142/19,580, chromosome 10) to 80.07% (4728/5905, chromosome 4) of SNPs and 67.62% (6233/9217, chromosome 10) to 79.20% (2856/3606, chromosome 4) of InDels were located in the IGS regions (Figure 4c,d; Tables S9 and S10). The distributions of the SNPs and InDels in the genomic regions were also examined, which indicated that most of them were in the introns (SNPs: 57.36% on chromosome 8 to 70.20% on chromosome 11; InDels: 61.53% on chromosome 10 to 76.89% on chromosome 12), while 50 UTRs, 30 UTRs, and CDS contained only a small fraction (Figure 4c,d; Tables S9 and S10). The information described here can be exploited in future studies to provide novel perspectives on genetics and breeding of fragrant *japonica* rice.

**Figure 4.** (**a**,**b**) Distribution patterns of SNPs (**a**) and InDels (**b**) across the 'Changxianggeng 1813' genome by comparing to the Nipponbare genome. (**c**,**d**) The distribution of (**c**) SNPs and (**d**) InDels in different genomic regions, including intergenic spacer regions (IGS), 50 untranslated regions (UTR), 3 0 UTR, intron and protein coding regions (CDS).

It is also noteworthy that SNPs and small InDels do not capture all the meaningful genomic variations that underlie crop improvement, and that structure variants (SVs) also play an important role in plant evolution and agriculture [50,51]. SVs typically defined as genomic variations that involve segments of DNA larger than 1 kb in length, hence detecting SVs with short-read sequencing is a challenging problem, leaving the vast majority of SVs poorly resolved in rice [26,33]. Nowadays, the recent development of high-throughput Oxford Nanopore long-read sequencing has enabled us to take a broad survey on previously hidden SVs in rice genomes [16]. In this study, establishing a high-quality de novo genome assembly for 'Changxianggeng 1813' allowed us to resolve large SVs between fragrant and non-fragrant *japonica* rice. A total of 8690 large SVs were identified between the genomes of 'Changxianggeng 1813' and Nipponbare through direct genome comparison (Figure 5a, Table S11). Of these SVs, the dominant type was DUP (gap between two mutually consistent alignments), accounting for 81.51% (7083/8650) of all identified SVs, followed by BRK (other inserted sequence) (11.09%, 964/8650) and GAP (gap between two mutually consistent alignments) (4.10%, 356/8650), while the JMP (rearrangement) (1.31%, 114/8650), SEQ (rearrangement with another sequence) (1.13%, 98/8650), and INV (rearrangement with inversion) (0.86%, 75/8650) were least abundant (Figure 5a; Table S11).

**Figure 5.** (**a**) SV types and numbers across 12 chromosomes of the 'Changxianggeng 1813' genome. (**b**) Total counts of SVs overlapping genes for each chromosome in the 'Changxianggeng 1813' genome. GAP, gap between two mutually consistent alignments; DUP, inserted duplication; BRK, other inserted sequence; JMP, rearrangement; INV, rearrangement with inversion; SEQ, rearrangement with another sequence.

The total number of SVs detected also varied across different chromosomes. To be specific, the highest number of SVs (Total: 1718; GAP: 17, DUP: 1598, BRK: 180, JMP: 5, INV: 7, SEQ: 11) was observed on chromosome 1, while chromosome 4 had the lowest number of SVs (Total: 192; GAP: 23, DUP: 130, BRK: 20, JMP: 4, INV: 6, SEQ: 9) (Figure 5a, Table S11). Because SVs overlapping genes can impact gene functions and expression, and those in noncoding genes have a disproportionate impact on gene expression of nearby genes [51,52], we examined the distributions of SVs in different genomic regions. Our results indicated that a majority (~70%) of SVs located in noncoding regions, notably higher than the proportion (~30%) in gene regions (Figure 5b, Table S11). As expected, SVs overlapping genes were also distributed unevenly on each chromosome, ranging from 60 SVs on chromosome 4 to 582 SVs on chromosome 1 (Figure 5b, Table S11). This result suggested that some regions might be conserved and share a common ancestral gene pool between the two *japonica* cultivars ('Changxianggeng 1813' and Nipponbare).

#### **3. Materials and Methods**

#### *3.1. Plant Materials and DNA Extraction*

Genomic DNA was extracted from fresh leaves of 15-day-old seedlings of the fragrant *japonica* cultivar 'Changxianggeng 1813' using the DNAsecure Plant Kit (Tiangen Biotech, Beijing, China) according to the manufacturer's protocol. The quality and integrity of the DNA products were assessed using agarose gel electrophoresis, NanoDrop spectrophotometry (NanoDrop Technologies, Wilmington, DE, USA), and Qubit fluorometry (Thermo Fisher Scientific, Waltham, MA, USA). The genomic DNA that met the quality and quantity standards was used to construct Illumina and Nanopore libraries.

#### *3.2. Genome and Transcriptome Sequencing*

For Illumina sequencing, a short-insert (350 bp) genomic library was performed using the NEBNext Ultra DNA Library Prep Kit (New England Biolabs, Beverly, MA, USA), and sequenced on the Illumina NovaSeq 6000 platform using a paired-end sequencing strategy. To reduce the effect of sequencing errors, we discarded those reads that met either of the following criteria: (i) reads with adapters; (ii) reads having more than 50% bases with Phred quality < 5; (iii) reads with N bases more than 5%; and (iv) PCR duplicated reads. All the

obtained clean reads were prepared to carry out genome size estimation, genome assembly correction and evaluation.

For Nanopore sequencing, approximately 10 µg of genomic DNA was size-selected (10–50 kb) with the BluePippin System (Sage Science, Beverly, MA, USA), and then the DNA was subjected to a 30 µL end-repair/dA-tailing reaction using the NEBNext Ultra End Repair/dA-Tailing module (New England Biolabs, Beverly, MA, USA). The sequencing adaptors were further ligated using the Ligation Sequencing Kit SQK-LSK109 (Oxford Nanopore Technologies, Oxford, UK) based on the manufacturer's instructions. After purifying using Ampure XP beads and the ABB wash buffer (Oxford Nanopore Technologies), the resulting library was sequenced on R9.4 flow cells using the PromethION DNA sequencer (Oxford Nanopore Technologies). Raw signal data in fast5 format was subsequently base called using Guppy v.2.3.5 (Oxford Nanopore Technologies) with default parameters, and the reads with the mean\_qscore\_template <7 were filtered.

For chromatin conformation capture (Hi-C) sequencing, fresh leaves from the same 'Changxianggeng 1813' plant that were used for Illumina and Nanopore sequencing were collected. A Hi-C library was created in a similar manner to that described by Lieberman-Aiden et al. [53]. Briefly, chromatin was first fixed in 1% final concentration of formaldehyde, and the extracted fixed chromatin was digested using the restriction enzyme DpnII. The 5 0 overhangs were then filled in with biotinylated nucleotides, and free blunt ends were ligated. After ligation, cross-links were reversed, and the DNA was purified from the protein. Purified DNA was further filtered to remove unligated but biotin-labeled fragments and subjected to selection for fragments with lengths between 300 and 700 bp. The quality of the purified library was evaluated with an Agilent 2100 instrument, a Qubit fluorometer (Thermo Fisher Scientific, Waltham, MA, USA), and quantitative PCR (qPCR). Finally, the qualified library was sequenced on an Illumina HiSeq X Ten platform with the layout of pair-ended 150 bp reads.

For transcriptome sequencing (RNA-Seq), the best-quality RNA samples of each tissue (root, branch, leaf, and panicle) were mixed together to build a Nanopore sequencing library using the Ligation Sequencing Kit (SQK-LSK109, Oxford Nanopore Technologies) by following the manufacturer's protocol. The cDNA library was added to FLO-MIN109 flow cells and sequenced on the Nanopore PromethION platform. Raw reads were filtered with the following settings: minimum average read quality score 7 and minimum read length 500 bp. Ribosomal RNA was discarded by searching against the Silva rRNA database (https://www.arb-silva.de, accessed on 27 August 2020). The RNA-Seq data were used to improve the annotation of 'Changxianggeng 1813'. All library construction, sequencing, and data filtering were conducted in Wuhan Benagen Tech Solutions Company Limited, Wuhan, China.

#### *3.3. Genome Size Estimation and Genome Assembly*

All Illumina clean reads were used for the estimation of genome size and heterozygosity with *k*-mer analysis. The data were run through Jellyfish v.2.3.0 [54] to generate *k*-mer frequency distribution, with a *k*-mer size of 19. Genome size was estimated by the commonly used formula: genome size = *k*-mer\_number/*k*-mer\_depth, where *k*-mer\_number is the total number of *k*-mers, and *k*-mer\_depth is the main peak of *k*-mer frequency.

NextDenovo v.2.4.0 (https://github.com/Nextomics/NextDenovo, accessed on 27 December 2020) was applied to de novo assembly of 'Changxianggeng 1813' genome using nanopore long reads. Briefly, the NextCorrect module was employed to correct raw reads and extract consensus sequences, and then the NextGraph module was used to assemble the draft genome. To improve the accuracy of the draft genome, we used Racon v.1.4.11 [55] and Pilon v.1.23 [56] to polish the assembly for two rounds, respectively, based on the corrected nanopore long reads and the cleaned Illumina short reads. After these two-step polishing strategies, the scaffold-level genome assembly was generated. To further anchor the genome assembly to the chromosome level, HiCUP v.0.6.17 [57] was used to produce cleaned mapped data accompanied with QC reports. Only uniquely aligned read pairs with mapping quality >20 were retained and utilized to cluster, order, and orient the assembly scaffolds onto chromosomes by LACHESIS software [58].

#### *3.4. Quality Assessment of Genome Assembly*

To evaluate the accuracy and completeness of the genome assembly, Illumina reads were mapped back to the reference genome using BWA-MEM v.0.7.17 [59] and assessed by their depth of coverage. Furthermore, BUSCO (Benchmarking Universal Single-Copy Orthologs) v.4.1.4 [60], with the database embryophyta\_odb10, was employed to assess the completeness of the genome assembly.

#### *3.5. Genome Annotation*

The genome of 'Changxianggeng 1813' was annotated at three independent dimensions: (i) repetitive elements, (ii) protein-coding genes, and (iii) noncoding RNAs. For repetitive element annotation, transposable elements (TEs) in the 'Changxianggeng 1813' genome were identified using a hybrid strategy combining homology-based searching in known repeat database and de novo prediction. RepeatMasker v.4.0.6 [61] was used to identify TEs against both the RepBase database of known TEs [62], and a de novo repeat library constructed by RepeatModeler v.1.0.11 (http://www.repeatmasker.org/RepeatModeler/, accessed on 21 October 2017). TEs identified from both homology-based and de novo approaches were further filtered for redundant sequences and merged into a non-redundant repeat library by CD-HIT [63]. In addition, tandem repeats including microsatellites (SSRs) were identified in the reference genome of 'Changxianggeng 1813' using Tandem Repeat Finder (TRF) v.4.0.9 [64].

For protein-coding genes prediction, three different methods, including homologybased, de novo, and transcriptome-based methods were unitedly conducted. Frist, Exonerate v.2.4.0 [65] was used for the homology-based prediction, based on protein sequences of *O. brachyantha*, *O. sativa japonica* Nipponbare, *Aegilops tauschii*, and *Panicum hallii* retrieved from NCBI (http://www.ncbi.nlm.nih.gov, accessed on 28 June 2021). Then, Augustus v.3.3.2 [66] and GlimmerHMM v.3.0.4 [67] were applied for the de novo prediction, with default parameters. Next, TransDecoder v.5.1.0 (https://github.com/TransDecoder/ TransDecoder/wiki, accessed on 28 March 2018) was employed to identify the potential coding regions, based on the assembled transcripts using Stringtie v.2.1.1 [68]. Finally, EvidenceModeler v.1.1.1 [69] was used to integrate the prediction results obtained through the above three methods to generate the final gene set of 'Changxianggeng 1813'. Functional gene annotation was performed by aligning the protein sequences against Uniprot, Pfam, GO (Gene Ontology), KEGG (Kyoto Encyclopedia of Genes and Genomes), and NR (Non-redundant) databases, with an E-value threshold of 1 <sup>×</sup> <sup>10</sup>−<sup>5</sup> . Furthermore, InterProScan v.5.33 [70] was used to annotate the motifs and domains by searching against the InterPro and Pfam databases. These results were further integrated to produce the final genes set.

For noncoding RNA prediction, transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs) were predicted using tRNAscan-SE v.1.23 [71] and RNAmmer v.1.2 [72], respectively. Other types of noncoding RNAs, including small nuclear RNAs (snRNAs) and microRNAs (miRNAs) were identified by using Infernal v.1.1.2 [73] based on the Rfam database [74].

#### *3.6. Characterization and Evolutionary Analysis of BADH2 Gene*

The *BADH2* gene sequence for 'Changxianggeng 1813' was extracted from its genome sequence according to annotation files and then compared to that of the non-fragrant cultivar Nipponbare using the MAFFT multiple sequence alignment program [75], to verify the presence/absence of the mutations associated with fragrance in 'Changxianggeng 1813'. The *BADH2* protein-coding sequence for 'Changxianggeng 1813', was further combined with previously published 38 haplotypes in the *BADH2* coding region for phylogenetic analysis [2]. All 39 *BADH2* coding sequences were aligned using ClustalW [76], and the

resulting alignment was used for Neighbor-Joining (NJ) phylogenetic tree construction using MEGA v.11.0.11 [77], with 1000 bootstrap replicates.

#### *3.7. Genome Synteny and Collinearity Analysis*

To identify chromosome structural changes between 'Changxianggeng 1813' and its two close relatives, i.e., the non-fragrant *japonica* rice cultivar Nipponbare and the common wild rice *O. rufipogon*, genome syntenic blocks were identified using the Python version of MCscan incorporated in jcvi (https://github.com/tanghaibao/jcvi/wiki/MCscan- (Python-version), accessed on 16 June 2020), with default parameters. In brief, all-against-all LAST [78] was performed, and the LAST hits with a distance cutoff of ten genes and at least five syntenic genes per block were chained. Dot plots for pairwise synteny, and macrosyntenic patterns were generated using the commands 'python-m jcvi.graphics.dotplot' and 'python-m jcvi.graphics.karyotype', respectively.

#### *3.8. Gene Family and Phylogenetic Analysis*

OrthoMCL v.2.0.9 [79] was used to identify gene family clusters in the genomes of 'Changxianggeng 1813' and five other members of *Oryza*, including the non-fragrant *japonica* cultivar Nipponbare, the *indica* subspecies, and three wild species (*O. rufipogon*, *O. nivara*, and *O. barthii*). Low-quality protein sequences from these six *Oryza* genomes were firstly filtered, based on default parameters in OrthoMCL. Then, an all-versus-all BLASTP search was conducted for all remaining proteins with an E-value threshold of 1 <sup>×</sup> <sup>10</sup>−<sup>5</sup> . Finally, protein sequences were clustered into paralogous and orthologous genes using the program OrthoMCL, with a default inflation parameter for the Markov cluster algorithm.

To resolve the phylogenetic position of 'Changxianggeng 1813', the single-copy orthologous genes extracted from the above six *Oryza* genomes were aligned using MUS-CLE v.3.8.3 [80] and then concatenated into a super-gene alignment matrix. Phylogenetic analysis was conducted using RAxML-HPC v.8.2.8 [81] with 1000 bootstrap replicates. The best model and parameter settings were chosen according to the Akaike Information Criterion (AIC) using jModelTest v.2.1.4 [82]. Divergence times between these six *Oryza* species/subspecies/cultivars were estimated by the program MCMCTree in PAML v.4.7 [83]. The following four divergence times obtained from the Timetree database (http://www.timetree.org/, accessed on 7 February 2019) were used for calibrations (in million years ago, Ma): (i) *O. barthii* and *O. sativa* (0.95–2.42 Ma), (ii) *O. nivara* and *O. sativa* (0.603–1.089 Ma), (iii) *O. rufipogon* and *O. nivara* (0.603–1.089 Ma), and (iv) *O. rufipogon* and *O. sativa* (0.598–1.255 Ma). To gain more insights into the evolutionary dynamics of the genes, the expansion and contraction of orthologous gene families were determined in these six members of *Oryza* with CAFÉ [84] and then subjected to GO functional annotation.

#### *3.9. Analysis of Genomic Variations*

MUMmer v.3.23 [85] was used to align the 'Changxianggeng 1813' (fragrant *japonica* cultivar) genome against the Nipponbare (non-fragrant *japonica* cultivar) genome by the nucmer utility under the parameters-mum. The delta-filter utility was subsequently used to filter repeats and determine the one-to-one alignment blocks in conjunction with parameters -1 -r -q. Single nucleotide polymorphisms (SNPs) and small insertion-deletion polymorphisms (InDels) were called from the filtered data using the show-snps function under the parameters -Clr TH.

Structural variants (SVs) were detected from the genome alignment between 'Changxianggeng 1813' and Nipponbare by using the show-diff function in MUMmer, and six SV types were obtained, including gap between two mutually consistent alignments (GAP), inserted duplication (DUP), other inserted sequence (BRK), rearrangement (JMP), rearrangement with inversion (INV), and rearrangement with another sequence (SEQ). The SVs with a minimum size of 1000 bp in length were retained in this study.

#### **4. Conclusions**

Here, we presented a high-quality reference genome sequence of a new fragrant rice cultivar 'Changxianggeng 1813', using a combination of Nanopore long reads, Illumina short reads, and Hi-C data. To our knowledge, this is the first de novo chromosome-level genome assembly for fragrant *japonica* rice. The 'Changxianggeng 1813' genome has a total length of ~378.78 Mb and comprises 31,671 high-quality protein-coding genes. Based on this annotated genome sequence, we demonstrated that it was the *badh2-E2* type of deletion (a 7 bp deletion in the second exon) that caused fragrance in this *japonica* rice cultivar. Through pairwise genome comparison between 'Changxianggeng 1813' and the non-fragrant *japonica* cultivar Nipponbare, a total of 289,970 SNPs, 96,093 InDels, and 8690 large SVs were identified. Undoubtedly, these genomic resources will promote the genic and genomic studies of rice and be beneficial for cultivar improvement of fragrant *japonica* rice. However, it should also be noted that our study has two notable limitations. First, we sequenced only a single individual, which was insufficient for investigating population genomic diversity, population structure, and cultivar origins of fragrant *japonica* rice. Second, our study still leaves a gap in our knowledge of genomic variations between *japonica*, *indica*, and *aus* type fragrant rice. Hence, we anticipate that further populationscale, long-read sequencing datasets, as well as improvements in genome comparison algorithms, will help overcome these limitations.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms23179705/s1.

**Author Contributions:** Conceptualization, X.S.; methodology, R.L.; software, R.L. and J.L.; validation, X.W., X.J., N.L. and X.S.; resources, G.M.; data curation, R.L.; writing—original draft preparation, R.L.; writing—review and editing, Z.S., N.L. and X.S.; funding acquisition, R.L., X.W. and X.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Changshu Agricultural Production and Public Service Project, the Jiangsu Key Laboratory of Plant Resources Research and Utilization grant (JSPKLB201921), and the Jiangsu Innovative and Entrepreneurial Talent Programme (JSSCBS20211311).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The assembled genome of 'Changxianggeng 1813' and all raw sequencing data have been deposited under NCBI BioProject PRJNA856027 with accession nos. SRR20046019– SRR20046022.

**Conflicts of Interest:** The authors declare no conflict of interests.

#### **References**


## *Article* **Creation of Two-Line Fragrant Glutinous Hybrid Rice by Editing the** *Wx* **and** *OsBADH2* **Genes via the CRISPR/Cas9 System**

**Yahong Tian † , Yin Zhou † , Guanjun Gao, Qinglu Zhang, Yanhua Li, Guangming Lou and Yuqing He \***

> National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China

**\*** Correspondence: yqhe@mail.hzau.edu.cn

† These authors contributed equally to this work.

**Abstract:** Global food security has benefited from the development and promotion of the two-line hybrid rice system. Excellent eating quality determines the market competitiveness of hybrid rice varieties based on achieving the fundamental requirements of high yield and good adaptability. Developing sterile and restorer lines with improved quality for two-line hybrid breeding by editing quality genes with clustered regularly interspaced short palindromic repeat (CRISPR)/Cas9 is an efficient and practical alternative to the lengthy and laborious process of conventional breeding to improve rice quality. We edited *Wx* and *OsBADH2* using CRISPR/Cas9 technology to produce both homozygous male sterile mutant lines and homozygous restorer mutant lines with Cas9-free. These mutants have a much lower amylose content while having a significantly higher 2-acetyl-1-pyrroline aroma content. Based on this, a fragrant glutinous hybrid rice was developed without too much effect on most agronomic traits. This study demonstrates the use of CRISPR/Cas9 in creating two-line fragrant glutinous hybrid rice by editing the components of the male sterile and the restorative lines.

**Keywords:** *Wx*; *OsBADH2*; CRISPR/Cas9; two-line hybrid rice; 2-acetyl-1-pyrroline aroma; quality

**Citation:** Tian, Y.; Zhou, Y.; Gao, G.; Zhang, Q.; Li, Y.; Lou, G.; He, Y. Creation of Two-Line Fragrant Glutinous Hybrid Rice by Editing the *Wx* and *OsBADH2* Genes via the CRISPR/Cas9 System. *Int. J. Mol. Sci.* **2023**, *24*, 849. https://doi.org/ 10.3390/ijms24010849

Academic Editor: Luigi Cattivelli

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/

Received: 19 October 2022 Revised: 15 December 2022 Accepted: 16 December 2022 Published: 3 January 2023

4.0/).

## **1. Introduction**

The most significant physical and chemical factor that impacts rice quality is the amylose content in rice grains [1–3]. *Wx* is important for the regulation of rice quality (including appearance quality and eating and cooking quality (ECQ)) and encodes granule-bound starch synthase I [4–6]. The abundant natural allelic variations confer the extensive variation of amylose content and rice quality among modern cultivated rice [6]. At least 10 distinct functional alleles of *Wx* have been identified, including *Wx<sup>a</sup>* , *Wx<sup>b</sup>* , *wx*, *Wxin* , *Wxop* , *Wxmp* , *Wxmq* , *Wxhp* , *Wxlv*, and *Wxla*/*Wxmw* [6–13]. Of these, *Wxlv* is an ancestral allele derived from wild rice. *Wxla*/*Wxmw* is a recently identified *Wx* allele derived from intragenic recombination, giving rice good eating qualities and grain transparency. The clustered regularly interspaced short palindromic repeat (CRISPR)/Cas9 system appears to be a popular trend for editing various components of the *Wx* gene to enhance rice quality. Zhang et al. [14] and Xu et al. [15] edited the exon of the *Wx* gene, while Huang et al. [16] and Zeng et al. [17] edited the promoter and 50UTR intron of the *Wx* gene, respectively. Thus, several mutants with different quality traits were produced. According to Liu et al. [18], the amylose content was significantly increased in rice seeds by editing the first intron of the *Wx* gene. These findings suggest that CRISPR/Cas9 mediated gene editing of suitable target sites can produce ideal amylose content and quality breeding materials.

The eating quality of rice is significantly influenced by several factors, including aroma, which is mainly controlled by the recessive gene *fgr*/*OsBADH2* [19]. When the *fgr*/*OsBADH2* gene underwent loss of function, it caused the BADH2 protein to lose its ability to catalyze the oxidation of 4-amino-butanal. This led to an accumulation of 4 amino-butanal, which promoted the synthesis of 2-acetyl-1-pyrroline (2-AP), and resulted in the production of fragrance in rice [20]. Shan et al. edited the *fgr*/*OsBADH2* gene in Nipponbare using transcription activator-like effector nuclease (TALEN) technology, and the 2-AP content in the homozygous T<sup>1</sup> line was significantly increased [21]. Ashokkumar et al. used the CRISPR/Cas9 method to produce novel alleles of *fgr*/*OsBADH2* to induce aroma into the top non-aromatic rice variety ASD16. The phenotype was stably inherited in the T<sup>1</sup> generation [22]. Any non-aromatic rice variety can be made aromatic by gene editing *fgr*/*OsBADH2*.

Heterosis refers to the superior performance of hybrids over their parents. A typical example of using heterosis is the creation of two-line hybrid rice systems. One of the fundamental components of two-line hybrid rice breeding is the light- and temperaturesensitive male sterile line. Although the adoption and use of two-line hybrid rice have significantly improved rice yield, the quality of hybrid rice is typically subpar. Three efficient gene editing techniques can be utilized to accurately and rapidly modify crop target traits: TALEN technology [23], zinc finger nuclease technology [24], and CRISPR technology [25]. In this study, we edited *Wx* and *OsBADH2* using the CRISPR/Cas9 system to create homozygous mutants of the Zhinong 1S (ZN1S) sterile and the Zhinong 1307 (ZN1307) restorer lines, resulting in fragrant glutinous hybrid rice with better yield performance than its parents and providing new ideas and insights for quality improvement of hybrid rice.

#### **2. Results**

#### *2.1. Creation of Fragrant Glutinous Mutants with Cas9-Free*

We developed *wx*-*fgr* double mutants using CRISPR/Cas9 technology on the genetic background of the sterile line ZN1S and restorer line ZN1307 to produce fragrant and waxy hybrid rice (Figure 1A). PAGE combined with Sanger sequencing analysis revealed that ZN1S and ZN1307, respectively, had heterozygous mutations of 9 and 6 distinct mutation types in T<sup>0</sup> transgenic plants (Supplementary Tables S2 and S3). We PCR-selected Cas9-free plants from transgenic (T1–T2) segregating families and identified 5 and 6 homozygous mutant T3-lines for ZN1S and ZN1307, respectively (Figure 1B). Unlike the translucent endosperm of wild type grains, the endosperm of all the mutants showed milky white similar to that of glutinous rice (Figure 1C). The amylose content of grains from these 11 mutant T3-lines was measured, and the results revealed that once the *Wx* gene was mutated, the amylose content in every mutant drastically dropped (Figure 1D). The lowest was the *wx*-*fgr*-S2 mutant line, whose amylose content was 2.2%, which was very close to that of wild glutinous rice. Among them, the amylose content in grains of ZN1S mutants ranged between 2–4%. Amylose content ranged between 2.5–3.5% in ZN1307 mutants, with *wx*-*fgr*-R2 having the lowest value at 2.6% (Figure 1D). Additionally, we used the potassium hydroxide method to conduct a sensory evaluation of these mutants' aromas. The results revealed that the mutant brown rice from both materials could generate a light aroma of rice (Figure 1E). Next, we assessed the 11 homozygous mutant T4-lines for ZN1S and ZN1307's quality traits. All mutants' amylose content (for convenience, the abbreviation AC is used below for amylose content) was significantly lower than the corresponding wild type, consistent with the results of T<sup>3</sup> generation, while their 2-AP and GC were significantly higher (Figure 1F–I). Rapid viscosity analysis (RVA) was used to evaluate the starch quality [26]. Compared to WT-S, the viscosity indexes of the remaining three mutants from ZN1S were lower than those of *wx*-*fgr*-S1 and *wx*-*fgr*-S4 (Figure 1J). Different from mutants from ZN1S, all mutants from ZN1307 showed similar viscosity indexes, but these were obviously less than for WT-R (Figure 1J).

**Figure 1.** Improvement of rice grain quality by editing of the *Wxla* and *OsBADH2* using CRISPR/Cas9. (**A**) Schematic diagram of the targeted sites in *Wx* and *OsBADH2*. The protospacer-adjacent motifs (PAMs) are shown in red. (**B**) Mutations in the edited T<sup>3</sup> lines. Inserted bases are marked in red, and missing bases are indicated by short dashed lines. (**C**) Morphology of milled rice from ZN1S, ZN1307 and the F<sup>1</sup> hybrid ZN1S mutants × ZN1307 mutants. WT-S represents the wild type of sterile line ZN1S and WT-R represents the wild type of restorer line ZN1307. Scale bars, 1000 µm. (**D**) The amylose contents determined by iodine colorimetry of ZN1S, ZN1307 and their T<sup>3</sup> generation homozygous mutant lines. (**E**) Sensory evaluation of the aroma of ZN1S, ZN1307 and their mutants by potassium hydroxide method. (**F**) The grain amylose contents determined by iodine colorimetry of ZN1S, ZN1307 and their T<sup>4</sup> generation homozygous mutant lines. (**G**) Total ion chromatograms (TIC) of 2-AP and TMP in the grains of ZN1S, ZN1307 and their T<sup>4</sup> generation homozygous mutant lines. (**H**) 2-AP content in grains of ZN1S, ZN1307 and their T<sup>4</sup> generation homozygous mutant lines. 2, 4, 6-trimethyl pyridine (TMP) was used as the internal standard. (**I**) Gel consistency. (**J**) Rapid visco analysis profiles of grain starches of ZN1S, ZN1307 and their T<sup>4</sup> generation homozygous mutant lines. cP (centi Poise), viscosity unit. Error bars are means ± SD (*n* = 3). Samples without the same letter show significant difference by Duncan's test (*p* < 0.05).

Furthermore, we looked at the agronomic traits of these homozygous mutant T4-lines for ZN1S and ZN1307 (Supplementary Table S4). Only specific agronomic traits of some mutants were altered, such as plant height in *wx*-*fgr*-S5, number of effective tillers in *wx*-*fgr*-S1 and *wx*-*fgr*-S3, number of primary branches in *wx*-*fgr*-S2, and grain number per panicle in *wx*-*fgr*-S1 and *wx*-*fgr*-S5, etc. Similar to those of ZN1S mutants, most agronomic traits did not change significantly in ZN1307 mutants; only certain mutants showed changes in specific agronomic traits, such as the number of primary branches in *wx*-*fgr*-R1 and *wx*-*fgr*-R2, the setting rate in *wx*-*fgr*-R3, the 1000-grain weight in *wx*-*fgr*-R1 and *wx*-*fgr*-R5. The only noteworthy thing is that all ZN1307 mutants had varying degrees of decreased yield per plant.

#### *2.2. Creation of Fragrant Glutinous Hybrid Rice*

*wx*-*fgr*-S2 with the lowest amylose content among the ZN1S sterile mutants and *wx*-*fgr*-S4 with similar agronomic traits to the wild-type were selected as recipients. In contrast, *wx*-*fgr*-R2 with the lowest amylose content among the ZS1307 restorer mutants and *wx*-*fgr*-R6 with the least impact on yield per plant were selected as donor parents to obtain four types of hybrid rice, namely *wx*-*fgr*-Z22 (*wx*-*fgr*-S2/*wx*-*fgr*-R2), *wx*-*fgr*-Z42 (*wx*-*fgr*-S4/*wx*-*fgr*-R2), *wx*-*fgr*-Z26 (*wx*-*fgr*-S2/*wx*-*fgr*-R6), and *wx*-*fgr*-Z46 (*wx*-*fgr*-S4/*wx*-*fgr*-R6). We identified the four transgenic hybrid rice's primary quality features. Similar to ZN1S sterile mutants and ZS1307 restorer mutants, the grain endosperms produced by all hybrid combinations had low transparency and milky white appearance, obviously different from that of WT-S and WT-R (Figure 1C). Iodine staining results showed that refined rice grains from hybrid rice had lighter coloration in transverse sections than WT-S and WT-R (Figure 2A). Correspondingly, the four transgenic hybrid rice variants have much less amylose than their two parental strains (Figure 2B). In particular, *wx*-*fgr*-Z42 had an amylose content of 1.76%, which was on par with wild-type waxy rice. The four transgenic hybrid rice strains had an aroma substance 2-AP level that was significantly higher than that of the sterile line ZN1S while not comparable to that of the restorer line ZN1307 (Figure 2C,D). However, rice flour from the four transgenic hybrid rice variants had a worse RVA curve pattern than that from thetwo parents, WT-S and WT-R, while having a greater gel consistency than the wild type (Figure 2E,F). In addition, we also investigated the main agronomic traits of the transgenic hybrid rice (Supplementary Table S4). In general, hybrid rice showed obvious superparent advantage in plant height and panicle length, but no obvious changes in other traits, except for specific traits of several mutants, such as the grain weight per panicle in *wx*-*fgr*-Z46, the setting rate in *wx*-*fgr*-Z26 and *wx*-*fgr*-Z46 and the yield per plant in *wx*-*fgr*-Z26.

**Figure 2.** *Cont*.

**Figure 2.** Creation of two-line fragrant glutinous hybrid rice. (**A**) Microscopic observation of iodinestained endosperm. WT-S represents the wild type of sterile line ZN1S and WT-R represents the wild type of restorer line ZN1307. Scale bars, 200 µm. (**B**) The amylose contents determined by iodine colorimetry of ZN1S, ZN1307 and the F<sup>1</sup> hybrid ZN1S mutants × ZN1307 mutants. (**C**) Total ion chromatograms (TIC) of 2-AP and TMP in the grains of ZN1S, ZN1307 and the F<sup>1</sup> hybrid ZN1S mutants × ZN1307 mutants. 2, 4, 6-trimethyl pyridine (TMP) was used as the internal standard. (**D**) 2-AP content. (**E**) Rapid visco analysis. cP (centi Poise), viscosity unit. (**F**) Gel consistency. Error bars are means ± SD (*n* = 3). Samples without the same letter show significant difference by Duncan's test (*p* < 0.05).

#### **3. Discussion**

Although hybrid rice's cooking and eating quality have somewhat improved recently, they still fall short of high-grade and high-quality conventional rice. Incorporating the *wx* mutation into varieties with low initial AAC levels led to further reductions in AAC; however, these effects had little to no impact on the desired gelatinization traits, amylopectin structure types, or the major agronomic traits [27]. Therefore, introducing the *wx* mutation into rice varieties with low baseline AAC levels is a feasible strategy for increasing the ECQ of rice. In this study, the wild-type sterile line ZN1S and the restorer line ZN1307 had initial amylose contents of about 10% (Figure 1D,F), which were not very high. All of the mutants' amylose contents were reduced to 2–4% levels by Cas9-mediated gene editing, along with some modifications to their gelatinizing properties (Figure 1). Notably, sterile line mutants and restorer line mutants have gel consistency that is significantly higher than that of the corresponding wild type (Figure 1I). However, different from the great changes in ZN1S and ZN1307 mutants, only *wx*-*fgr*-Z26 had significantly higher gel consistency in hybrid rice, but only between the two parents (Figure 2F). The viscosity curve of the sterile line mutants varied in degree from the wild type, whereas all restorer line mutants and hybrid rice had softer pasting properties (Figure 1J). These results indicate that the heredity of quality traits in hybrid rice may not be determined by simple additive effect. In addition, it may not be enough to manipulate only *Wx*, the main quality gene. Combining with other quality genes, even some quality genes with minor effects, may be needed to finally improve rice quality.

Overall, our study shows how to directly alter high-quality genes of interest in elite sterile and restorer lines to produce enhanced hybrid rice with the potential for commercialization.

#### **4. Materials and Methods**

#### *4.1. Plant Materials and Growing Conditions*

ZN1S is a photoperiod-sensitive genic male sterile rice variety, whereas ZN1307 is an indica restorer rice variety with high quality and high yield, and both carry a *Wxla* allele. Plants were grown in the field throughout the regular rice growing season at Huazhong

Agricultural University's experimental station, Wuhan (Hubei), and Lingshui (Hainan). The seeds were sown on May 15 every year, and the seedlings (25 or 30 days old) were transplanted onto the field with a single plant spacing of 16.5 cm and 26.4 cm between rows. The field was managed using customary agricultural practices. At the end of the third stage of panicle differentiation, ZN1S male sterile lines—whose fertility transformation temperature was 24 ◦C—were generally transferred to a cold water pool (22.5 ◦C), then after 15 days of treatment from the cold water pool to the field.

#### *4.2. Design of the Wx and OsBADH2 Target Sites and Construction of the CRISPR/Cas9 Double-Targeting Vector*

We designed target sites at bases 14–33 of the third exon of the *Wx* gene and bases 122–141 of the fourth exon of the *OsBADH2* gene, respectively, to produce waxy and fragrant transgenic lines. The online tool CRISPR GE was used to design the target sites [28]. The target sequences for the *Wx* gene were GGGTCATGGTGATCTCTCCTCGG and for the *OsBADH2* gene ATCAACCCAACTACACCGATAGG. The U6-sgRNA expression cassette encoding the 20 nucleotides (nt) *Wx* target sequence was amplified by polymerase chain reaction (PCR) and ligated into the pCXUN vector, which *Kpn I* digested to create the pCXUN-U6-sgRNA intermediate vector. The 20 nt *OsBADH2* target sequence from the U3-sgRNA expression cassette was then amplified by PCR and ligated into the pCXUN-U6-sgRNA intermediate vector, digested by *Sac I* to create the final vector pCXUN-U6-U3 sgRNA. The constructed CRISPR/Cas9 final vector was introduced into the ZN1S and the ZN1307 receptor using *Agrobacterium tumfaciens*-mediated genetic transformation [29]. The primer sequences used to construct the vector are listed in Supplementary Table S1.

#### *4.3. Molecular Characterization of the Mutant Plants*

Genomic DNA was extracted from seedling leaves using the sodium dodecyl sulfate method [30]. PCR amplification was performed using primer pairs that produce amplicons containing the target sites. The amplified products were then sequenced using the Sanger sequencing method and assembled using the software SeqMan from the Lasergene package. We compared the amplicon sequences generated from the corresponding wild-type and real transgenic templates to identify mutations. Polyacrylamide gel electrophoresis (PAGE) was used to determine the homozygosity/heterozygosity for a mutant individual. Indel primer pairs that produce an amplicon, including the target sites, were created. Agarose gel electrophoresis was used to identify Cas9 in each individual. The relevant PCR primers for these steps are listed in Supplementary Table S1.

#### *4.4. Evaluation of Rice Quality*

Amylose content and gel consistency were measured using the previously described method [31]. Ten full grains of brown rice were placed in a Petri dish for the sensory evaluation of rice fragrance. Following this, 10 mL of 1.7% potassium hydroxide solution was added, and the dish was covered and left at room temperature for 10 min. The Petri dishes were then opened one by one, and at least three subjects with a normal sense of smell were requested to assess and average each sample. Gas chromatography-mass spectrometry (GC-MS) was used to determine the content of flavor ingredient 2-AP. The extract was added after the brown rice was ground into a powder. The final sample was injected into the Agilent6890 GC connected to the HP5973 MS detector (Agilent Technologies, Palo Alto, CA, USA) for detection. MS database and retention time were used to identify the volatile compounds. Internal standard methylene chloride. 2, 4, 6-trimethyl pyridine was injected at a concentration of 0.4585 ng/µL, and the injection volume was 1 µL. A previous study has outlined the detailed steps [32].

#### *4.5. Investigation of Agronomic Characters*

In the paddy field, plant height, grains per panicle, panicle length, effective panicles, primary branches, and secondary branches were measured. Grain number per panicle, grain weight per panicle, setting rate, 1000-grain weight and yield per plant were investigated indoors. Seeds were collected for each plant and dried at 37 ◦C for two weeks to determine the yield per plant. Using the SC-A grain analysis system (Wseen company, Hangzhou, China), 1000-grain weight was measured. Three similar-sized panicles from each plant were selected, dried at 37 ◦C for two weeks, and the average value was then calculated to determine the number of grains and grain weight per panicle.

## *4.6. Performance of the F<sup>1</sup> Hybrids Obtained from ZN1S* × *ZN1307*

ZN1S mutant lines (*wx*-*fgr*-S2 and *wx*-*fgr*-S4) and ZN1307 restorer lines (*wx*-*fgr*-R2 and *wx*-*fgr*-R6) were crossed in pairs. To avoid unforeseen pollination, the panicles were covered with brown paper bags. The F<sup>1</sup> seeds were harvested 30 days after the grains matured, and the F<sup>1</sup> hybrids were sown in the field plots. The experiment used a replicative complete block design with 36 plants in each plot that was replicated twice.

#### *4.7. Microscopy*

In the iodine staining experiment, the mature seeds were first processed into milled rice, and the milled rice was cut with a sharp blade after being stained with iodine solution, and then observed and photographed with a stereo fluorescence microscope (SMZ25, Nikon, Tokyo, Japan). The appearance of milled rice was directly observed and photographed by a stereo fluorescence microscope (SMZ25, Nikon, Tokyo, Japan).

#### *4.8. Statistical Analysis*

The various diagrams in this study were drawn using GraphPad Prism 8. One-way analysis of variance and Duncan multiple comparisons were used to analyze differences between different groups using IBM Statistical Package for Social Sciences v16.0 software. Microsoft Excel 2016 was used to perform the preliminary processing and analysis of phenotypic data.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms24010849/s1.

**Author Contributions:** G.L. wrote the manuscript. Y.T. and G.L. performed most of the experiments. Y.Z. constructed the CRISPR/Cas9 double-targeting vector and completed the preliminary genotype detection work. Y.L. provided guidance for determination of some quality traits. G.G. and Q.Z. participated in part of the field experiments. Y.H. designed experiments. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by grants from the Ministry of Science and Technology (2021YFF1000200), National Natural Science Foundation of China (U21A20211, 91935303), the Science and Technology Major Program of Hubei Province (2021ABA011), and China Agriculture Research System (CARS-01-03).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** For materials, please contact the corresponding author's email address.

**Conflicts of Interest:** The authors declare that there are no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Effect of Heading Date on the Starch Structure and Grain Yield of Rice Lines with Low Gelatinization Temperature**

**Naoko Crofts, Kaito Hareyama, Satoko Miura, Yuko Hosaka, Naoko F. Oitome and Naoko Fujita \***

Department of Biological Production, Akita Prefectural University, 241-438 Kaidobata-Nishi, Shimoshinjo-Nakano, Akita City 010-0195, Japan

**\*** Correspondence: naokof@akita-pu.ac.jp

**Abstract:** Early flowering trait is essential for rice cultivars grown at high latitude since delayed flowering leads to seed development at low temperature, which decreases yield. However, early flowering at high temperature promotes the formation of chalky seeds with low apparent amylose content and high starch gelatinization temperature, thus affecting grain quality. Deletion of starch synthase IIa (SSIIa) shows inverse effects of high temperature, and the *ss2a* mutant shows higher apparent amylose content and lower gelatinization temperature. *Heading date 1* (*Hd1*) is the major regulator of flowering time, and a nonfunctional *hd1* allele is required for early flowering. To understand the relationship among heading date, starch properties, and yield, we generated and characterized near-isogenic rice lines with *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* genotypes. The *ss2a Hd1* line showed the highest plant biomass; however, its grain yield varied by year. The *ss2a Hd1 hd1* showed higher total grain weight than *ss2a hd1*. The *ss2a hd1* line produced the lowest number of premature seeds and showed higher gelatinization temperature and lower apparent amylose content than *ss2a Hd1.* These results highlight *Hd1* as the candidate gene for developing high-yielding rice cultivars with the desired starch structure.

**Keywords:** rice; heading date; starch; yield; low gelatinization temperature; starch synthase IIa

### **1. Introduction**

Heading date is one of the most important agricultural traits, particularly for rice (*Oryza sativa* L.) cultivars cultivated in high-latitude areas, because early flowering ensures seed development at optimum temperature during the short summer, thus maximizing yield [1]. Cultivars with different heading dates have been selected at different latitudes through natural and artificial means [1–3]. Several genes governing heading date have been identified in rice [4–15] and are shown in Figure 1. *Heading date 1* (*Hd1*) encodes a zinc-finger protein and is the major determinant of heading date [4]. *Hd1* represses the expression of florigen, *Hd3a*, under a long-day photoperiod but promotes its expression under short days [5,14,16]. Once heading is initiated, flowering generally occurs within a couple of days. Therefore, the nonfunctional *hd1* allele is required for early flowering under long-day conditions. Different rice cultivars have acquired several single nucleotide polymorphisms (SNPs) in *Hd1* during the process of domestication [2,3,17,18].

Starch, the major component of rice grain, is composed of glucose polymers of essentially linear amylose and precisely, but highly, branched amylopectin [19,20]. The ratio of amylose to amylopectin as well as the length and frequency of amylopectin branches affect the physicochemical properties of starch and transparency of grains, thus affecting the quality of rice [21–23]. The amylose found in rice endosperm is exclusively synthesized by granule-bound starch synthase I (GBSSI); thus, the expression level of *GBSSI* determines the amylose content of rice grains. Polymorphisms at the last nucleotide of the first intron of the *GBSSI* gene are commonly seen in japonica rice (*O. sativa* L. ssp. *japonica*) [24–27] and are known to reduce the splicing efficiency of *GBSSI* mRNA, especially under high

**Citation:** Crofts, N.; Hareyama, K.; Miura, S.; Hosaka, Y.; Oitome, N.F.; Fujita, N. Effect of Heading Date on the Starch Structure and Grain Yield of Rice Lines with Low Gelatinization Temperature. *Int. J. Mol. Sci.* **2022**, *23*, 10783. https://doi.org/10.3390/ ijms231810783

Academic Editors: Jinsong Bao and Jianhong Xu

Received: 25 August 2022 Accepted: 13 September 2022 Published: 15 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

temperature during seed development, which decreases GBSSI protein production and consequently amylose content [28–30].

Amylopectin is synthesized in the rice endosperm by the synergistic and balanced actions of multiple isozymes of starch synthases (SSs), branching enzymes (BEs), and debranching enzymes by forming multiprotein complexes [31,32]. Chromosomal locations of genes encoding these isozymes are summarized in Figure 1. According to the current understanding of amylopectin biosynthesis, SSIIIa synthesizes long glucan chains (amylopectin backbone) with degree of polymerization (DP) > 30, and BEI generates long branches. BEIIb generates short amylopectin branches with DP 6–7, and SSI elongates these short branches to DP 8–12. SSIIa further elongates these branches to DP 12–24 in most indica rice (*O. sativa* L. ssp. *indica*) cultivars, but the SSIIa isozyme of japonica rice is less active than that of indica rice and produces fewer intermediate chains with DP 12– 24 [33]. Unnecessary branches are trimmed off by debranching enzymes such as isoamylase 1 [31,34]. Rice lines lacking BEIIb exhibit fewer short amylopectin chains, greater long amylopectin chains, and consequently higher gelatinization temperature than the wild type [35–38]. High temperature during seed development also impacts the expression level and activity of BEIIb [39–42], which increases the long amylopectin branch chains, gelatinization temperature, and chalky seed frequency and decreases the palatability of cooked rice [39,42].

The effects of SSIIa loss on amylopectin structure are opposite to those of BEIIb loss, although loss of either one of these enzymes has the same effect on amylose content. A *ss2a* null mutant rice line, EM204, was previously isolated from the N-nitroso-N-methylurea (NMU)-treated mutant panel of the japonica rice cultivar Kinmaze [43]. EM204 harbors a point mutation at the last nucleotide of the intron 5 of *SSIIa*, resulting in the loss of exon 6 and no detectable SSIIa activity in developing seeds [43]. Loss of SSIIa activity increased short amylopectin branches with DP < 11 and lowered the gelatinization temperature by 5 ◦C compared with the parental line (Kinmaze), although Kinmaze and other typical japonica rice cultivars exhibit lower SSIIa activity than typical indica rice varieties [33,43]. In addition, loss of SSIIa activity increased the apparent amylose content to 24%, which was considerably higher than that of Kinmaze (20%) [43]. Both Kinmaze and EM204 flower in early September in Akita, Japan (39.7◦ N, 140.1◦ E). Although the starch of EM204 shows great potential as an anti-retrogradation agent, the agricultural traits of this mutant line, such as heading date and yield, need further improvement since nighttime temperature sharply declines in September, which drastically reduces grain yield, depending on the harvest year. Thus, EM204 was backcrossed twice with a high-yielding elite rice cultivar, Akita 63 [44], which flowers in early August. Although more than half of the backcrossed lines flowered in early August, some flowered in September because the *SSIIa* and *Hd1* genes are located in close proximity to each other on chromosome 6 (Figure 1).

A previous study analyzed the effects of different *Hd1* alleles on agronomic traits and amylose content using multiple genetic backgrounds, such as glutinous rice, japonica rice, and indica rice. However, because these rice genotypes harbor different alleles of *SSIIa* and *GBSSI*, in addition to the genes responsible for plant biomass and yield components [18], the effects of *Hd1* alleles on starch properties could not be evaluated properly. Therefore, in this study, we used Kinmaze (the parental line of EM204) and Akita 63, both of which have *ss2a<sup>L</sup>* , to identify the allele(s) responsible for the differences in heading dates. In addition, to accurately evaluate the effects of different heading dates on starch properties and agricultural traits in the absence of SSIIa, we backcrossed EM204 (late-heading *ss2a* mutant) with Akita 63 (early-heading elite rice cultivar) and generated near-isogenic lines (NILs) with three different combinations: *ss2a ss2a Hd1 Hd1* (*ss2a Hd1*), *ss2a ss2a Hd1 hd1* (*ss2a Hd1 hd1*), and *ss2a ss2a hd1 hd1* (*ss2a hd1*). The effects of three *Hd1* genotypes on agricultural traits, apparent amylose content, amylopectin structure, and starch gelatinization temperature, in the absence of SSIIa, are discussed.

zation temperature, in the absence of SSIIa, are discussed.

**Figure 1.** Chromosomal locations of genes responsible for the regulation of heading date and endosperm starch biosynthesis in rice. Genes controlling heading date are written in black writing (*Hd1* is enlarged) and those involved in starch biosynthesis in the rice endosperm are highlighted in gray. Note that *Hd1* and *SSIIa* are located in close proximity of each other on chromosome 6. **Figure 1.** Chromosomal locations of genes responsible for the regulation of heading date and endosperm starch biosynthesis in rice. Genes controlling heading date are written in black writing (*Hd1* is enlarged) and those involved in starch biosynthesis in the rice endosperm are highlighted in gray. Note that *Hd1* and *SSIIa* are located in close proximity of each other on chromosome 6.

#### **2. Results**

**2. Results** 

#### *2.1. Nucleotide Sequence of Hd1 in Kinmaze, Akita 63, and Akitakomachi 2.1. Nucleotide Sequence of Hd1 in Kinmaze, Akita 63, and Akitakomachi*

The *SSIIa* and *Hd1* genotypes and heading dates of different rice accessions are summarized in Table 1. Genomic DNA sequence of *Hd1* was amplified from Kinmaze, Akita 63, Akitakomachi, and Nipponbare using primers #5 and #12 (Table S1 and Figure S1) and compared (Figure 2b–d). Kinmaze is the parental line of EM204, which flowers in early September; Akita 63 is the high-yielding elite rice cultivar used for backcrossing and flowers in early August; Akitakomachi is commonly grown in Akita, Japan, and flowers in late July (1 week before Akita 63); and Nipponbare is the model japonica rice cultivar that flowers in late August in Akita, Japan (Table 1). Locations of SNPs found in *Hd1* sequences and the resulting amino acid substitutions are summarized in Figure 2a. The results showed that the *Hd1* sequence of Akita 63 was identical to that of Akitakomachi but different from the *Hd1* sequences of Nipponbare and Kinmaze (Figure 2b–d). The *Hd1* sequence of Kinmaze was also different from that of Nipponbare. In addition, the *Hd1* sequences of Akita 63 and Akitakomachi were 4807 bp in length, while that of Kinmaze was 4850 bp. The *Hd1* of Nipponbare was 4814 bp in length and contained two exons (1325– 2152 bp and 2790–3149 bp) (Figure 2a) [4]. The *Hd1* of Akita 63, Akitakomachi, and Kinmaze carried a cytosine to thymine polymorphism at the 1640th nucleotide relative to the *Hd1* of Nipponbare, resulting in a histidine to tyrosine substitution (Figure 2a,b,d). In addition, the *Hd1* of Akita 63, Akitakomachi, and Kinmaze harbored 36 nucleotide insertions between the 1657th and 1658th nucleotides, resulting in 12 amino acid insertions between the 110th and 111th amino acid residues, compared with Nipponbare (Figure 2 a,b,d). The remaining *Hd1* sequence in Kinmaze was the same as that in Nipponbare. Therefore, Kinmaze was predicted to produce a functional Hd1 protein (Figure 2b–d). On the contrary, the *Hd1* of Akita 63 and Akitakomachi contained 43 additional nucleotide deletions The *SSIIa* and *Hd1* genotypes and heading dates of different rice accessions are summarized in Table 1. Genomic DNA sequence of *Hd1* was amplified from Kinmaze, Akita 63, Akitakomachi, and Nipponbare using primers #5 and #12 (Table S1 and Figure S1) and compared (Figure 2b–d). Kinmaze is the parental line of EM204, which flowers in early September; Akita 63 is the high-yielding elite rice cultivar used for backcrossing and flowers in early August; Akitakomachi is commonly grown in Akita, Japan, and flowers in late July (1 week before Akita 63); and Nipponbare is the model japonica rice cultivar that flowers in late August in Akita, Japan (Table 1). Locations of SNPs found in *Hd1* sequences and the resulting amino acid substitutions are summarized in Figure 2a. The results showed that the *Hd1* sequence of Akita 63 was identical to that of Akitakomachi but different from the *Hd1* sequences of Nipponbare and Kinmaze (Figure 2b–d). The *Hd1* sequence of Kinmaze was also different from that of Nipponbare. In addition, the *Hd1* sequences of Akita 63 and Akitakomachi were 4807 bp in length, while that of Kinmaze was 4850 bp. The *Hd1* of Nipponbare was 4814 bp in length and contained two exons (1325–2152 bp and 2790–3149 bp) (Figure 2a) [4]. The *Hd1* of Akita 63, Akitakomachi, and Kinmaze carried a cytosine to thymine polymorphism at the 1640th nucleotide relative to the *Hd1* of Nipponbare, resulting in a histidine to tyrosine substitution (Figure 2a,b,d). In addition, the *Hd1* of Akita 63, Akitakomachi, and Kinmaze harbored 36 nucleotide insertions between the 1657th and 1658th nucleotides, resulting in 12 amino acid insertions between the 110th and 111th amino acid residues, compared with Nipponbare (Figure 2a,b,d). The remaining *Hd1* sequence in Kinmaze was the same as that in Nipponbare. Therefore, Kinmaze was predicted to produce a functional Hd1 protein (Figure 2b–d). On the contrary, the *Hd1* of Akita 63 and Akitakomachi contained 43 additional nucleotide deletions between the 2032nd and 2074th nucleotides compared with the *Hd1* of Nipponbare, resulting in a frame shift at the 236th amino acid and a premature stop codon at the end of exon 1 (Figure 2a–d). Akita 63 and Akitakomachi theoretically produced only 259 of the 407 amino acids of the

Hd1 protein (Figure 2d), although the truncated protein could be degraded. Therefore, Akita 63 and Akitakomachi were speculated to produce a nonfunctional hd1 protein. degraded. Therefore, Akita 63 and Akitakomachi were speculated to produce a nonfunctional hd1 protein.

between the 2032nd and 2074th nucleotides compared with the *Hd1* of Nipponbare, resulting in a frame shift at the 236th amino acid and a premature stop codon at the end of exon 1 (Figure 2a–d). Akita 63 and Akitakomachi theoretically produced only 259 of the 407 amino acids of the Hd1 protein (Figure 2d), although the truncated protein could be

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 4 of 20


<sup>1</sup> Superscript L denotes leaky mutation present in the *SS2a* allele of wild-type japonica rice. <sup>2</sup> Typical heading dates from 2017 to 2021 in Akita, Japan. *hd1* allele of Kasalath is shown as *hd1Kas* to distinguish from that of Akitakomachi, Akita 63, and *ss2a hd1*. 1 Superscript Ldenotes leaky mutation present in the *SS2a* allele of wild-type japonica rice. 2 Typical heading dates from 2017 to 2021 in Akita, Japan. *hd1* allele of Kasalath is shown as *hd1Kas* to distinguish from that of Akitakomachi, Akita 63, and *ss2a hd1*.

**Figure 2.** *Cont*.


**Figure 2.** Comparisons of *Hd1* DNA sequences and deduced amino acid sequences in various rice lines. (**a**) Schematic representation of the *Hd1* gene structure in Akita 63. The positions of SNPs and resulting amino acid substitutions relative to Nipponbare are indicated. Ins, insertion; del, deletion; fs\*12, frame shift-generated stop codon after 12 amino acids. The letter 'g' followed by a number indicates the nucleotide position in genomic DNA. Similarly, letters 'c' and 'p' followed by numbers represent the nucleotide position in cDNA and amino acid position in protein, respectively. Numbers in brackets indicate the number of nucleotide or amino acid insertions. (**b**,**c**) DNA sequence alignments of *Hd1* from 1598 to 1764 bp (**b**) and from 2005 to 2117 bp (**c**). The nucleotide positions correspond to the *Hd1* sequence of Nipponbare. (**d**) Full-length amino acid sequence alignment of Hd1. DNA and protein sequences different from Nipponbare are indicated with gray boxes, and regions missing in Akita 63 and Akitakomachi are indicated by black boxes. Sequences used to create the alignments are as follows: Nipponbare (AB041838), Kinmaze (MK449352), Kasalath (AB041839), Akitakomachi (MK449350), and Akita 63 (MK449351). Asterisks indicate identical nucleotides (**b**,**c**) and amino acid residues (**d**). *2.2. Genotyping and Western Blotting of Rice Accessions with Different Hd1 and SSIIa Alleles*  **Figure 2.** Comparisons of *Hd1* DNA sequences and deduced amino acid sequences in various rice lines. (**a**) Schematic representation of the *Hd1* gene structure in Akita 63. The positions of SNPs and resulting amino acid substitutions relative to Nipponbare are indicated. Ins, insertion; del, deletion; fs\*12, frame shift-generated stop codon after 12 amino acids. The letter 'g' followed by a number indicates the nucleotide position in genomic DNA. Similarly, letters 'c' and 'p' followed by numbers represent the nucleotide position in cDNA and amino acid position in protein, respectively. Numbers in brackets indicate the number of nucleotide or amino acid insertions. (**b**,**c**) DNA sequence alignments of *Hd1* from 1598 to 1764 bp (**b**) and from 2005 to 2117 bp (**c**). The nucleotide positions correspond to the *Hd1* sequence of Nipponbare. (**d**) Full-length amino acid sequence alignmentof Hd1. DNA and protein sequences different from Nipponbare are indicated with gray boxes, and regions missing in Akita 63 and Akitakomachi are indicated by black boxes. Sequences used to create the alignments are as follows: Nipponbare (AB041838), Kinmaze (MK449352), Kasalath (AB041839), Akitakomachi (MK449350), and Akita 63 (MK449351). Asterisks indicate identical nucleotides (**b**,**c**) and amino acid residues (**d**).

#### PCR markers for *Hd1* have been generated for the selection of the early-flowering *2.2. Genotyping and Western Blotting of Rice Accessions with Different Hd1 and SSIIa Alleles*

trait in rice cultivars such as KantoHD1 [45] and Milky Summer [46], which were generated via the introduction of the nonfunctional *hd1Kas* allele from Kasalath. It is important to note that although both Kasalath and Akita 63 flower at the same time (early August in PCR markers for *Hd1* have been generated for the selection of the early-flowering trait in rice cultivars such as KantoHD1 [45] and Milky Summer [46], which were generated viathe introduction of the nonfunctional *hd1Kas* allele from Kasalath. It is important to note that

Akita, Japan), the *hd1Kas* allele of Kasalath is different from that of Akita 63 (Table 1, Figure

although both Kasalath and Akita 63 flower at the same time (early August in Akita, Japan), the *hd1Kas* allele of Kasalath is different from that of Akita 63 (Table 1, Figure 2d). Therefore, such selection markers would not be applicable to Akita 63 (Figure 2d). To distinguish the *hd1*, *Hd1 hd1*, and *Hd1* seedlings from the NILs generated by crossing EM204 and Akita 63, a new molecular marker was generated (Figure 3a, Table S1 and Figure S1). Early-flowering lines with the *hd1* allele (such as Akita 63, Akitakomachi, and *ss2a hd1*) generated 130 bp PCR products, whereas late-flowering lines with the *Hd1* allele (such as Nipponbare, Kinmaze, EM204, and *ss2a Hd1*) generated 173 bp PCR products (Figure 3a). Both 173 and 130 bp PCR products were detected in the heterozygous (*Hd1 hd1*) line (Figure 3a). The PCR products exhibited clear differences in migration patterns, thus enabling the distinction among the *Hd1*, *Hd1 hd1*, and *hd1* lines (Figure 3a). Presence of the *ss2a* allele was confirmed via the derived cleaved amplified polymorphic sequence (dCAPS) marker (Figure 3b, [43]); the 141 bp PCR product amplified from Akita 63 was not digested by *Bgl*II, while that amplified from *ss2a Hd1*, *ss2a Hd1 hd1*, *ss2a hd1*, and EM204 was digested into 111- and 30-bp products by *Bgl*II. 2d). Therefore, such selection markers would not be applicable to Akita 63 (Figure 2d). To distinguish the *hd1*, *Hd1 hd1*, and *Hd1* seedlings from the NILs generated by crossing EM204 and Akita 63, a new molecular marker was generated (Figure 3a, Table S1 and Figure S1). Early-flowering lines with the *hd1* allele (such as Akita 63, Akitakomachi, and *ss2a hd1*) generated 130 bp PCR products, whereas late-flowering lines with the *Hd1* allele (such as Nipponbare, Kinmaze, EM204, and *ss2a Hd1*) generated 173 bp PCR products (Figure 3a). Both 173 and 130 bp PCR products were detected in the heterozygous (*Hd1 hd1*) line (Figure 3a). The PCR products exhibited clear differences in migration patterns, thus enabling the distinction among the *Hd1*, *Hd1 hd1*, and *hd1* lines (Figure 3a). Presence of the *ss2a* allele was confirmed via the derived cleaved amplified polymorphic sequence (dCAPS) marker (Figure 3b, [43]); the 141 bp PCR product amplified from Akita 63 was not digested by *Bgl*II, while that amplified from *ss2a Hd1*, *ss2a Hd1 hd1*, *ss2a hd1*, and EM204 was digested into 111- and 30-bp products by *Bgl*II. 2d). Therefore, such selection markers would not be applicable to Akita 63 (Figure 2d). To distinguish the *hd1*, *Hd1 hd1*, and *Hd1* seedlings from the NILs generated by crossing EM204 and Akita 63, a new molecular marker was generated (Figure 3a, Table S1 and Figure S1). Early-flowering lines with the *hd1* allele (such as Akita 63, Akitakomachi, and *ss2a hd1*) generated 130 bp PCR products, whereas late-flowering lines with the *Hd1* allele (such as Nipponbare, Kinmaze, EM204, and *ss2a Hd1*) generated 173 bp PCR products (Figure 3a). Both 173 and 130 bp PCR products were detected in the heterozygous (*Hd1 hd1*) line (Figure 3a). The PCR products exhibited clear differences in migration patterns, thus enabling the distinction among the *Hd1*, *Hd1 hd1*, and *hd1* lines (Figure 3a). Presence of the *ss2a* allele was confirmed via the derived cleaved amplified polymorphic sequence (dCAPS) marker (Figure 3b, [43]); the 141 bp PCR product amplified from Akita 63 was not digested by *Bgl*II, while that amplified from *ss2a Hd1*, *ss2a Hd1 hd1*, *ss2a hd1*, and

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 6 of 20

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 6 of 20

Western blotting of the total protein extracted from mature seeds using anti-SSIIa antibody confirmed the absence of SSIIa in *ss2a Hd1*, *ss2a Hd1 hd1*, *ss2a hd1*, and EM204 and the presence of SSIIa in Kinmaze and Akita 63 (Figure 4). The *Hd1* and *SSIIa* genotypes of rice accessions used in this study are summarized in Table 1. Differences in protein levels of SSI, GBSSI, and BEIIb are explained below (Sections 2.4 and 2.5). Western blotting of the total protein extracted from mature seeds using anti-SSIIa antibody confirmed the absence of SSIIa in *ss2a Hd1*, *ss2a Hd1 hd1*, *ss2a hd1*, and EM204 and the presence of SSIIa in Kinmaze and Akita 63 (Figure 4). The *Hd1* and *SSIIa* genotypes of rice accessions used in this study are summarized in Table 1. Differences in protein levels of SSI, GBSSI, and BEIIb are explained below (Sections 2.4 and 2.5). EM204 was digested into 111- and 30-bp products by *Bgl*II. Western blotting of the total protein extracted from mature seeds using anti-SSIIa antibody confirmed the absence of SSIIa in *ss2a Hd1*, *ss2a Hd1 hd1*, *ss2a hd1*, and EM204 and the presence of SSIIa in Kinmaze and Akita 63 (Figure 4). The *Hd1* and *SSIIa* genotypes of rice accessions used in this study are summarized in Table 1. Differences in protein

levels of SSI, GBSSI, and BEIIb are explained below (Sections 2.4 and 2.5).

**Figure 3.** PCR-based screening of rice accessions with variable *Hd1* and *SSIIa* genotypes. (**a**) Screening of *Hd1* alleles using #5 and #12 PCR primers listed in Table S1. Lines with the *hd1* allele generated 130 bp PCR products, while those with the *Hd1* allele generated 173 bp PCR products; the heterozygous (*Hd1 hd1*) line produced both PCR products. (**b**) Screening of the *ss2a* allele. Lines carrying the *ss2a* allele showed 111 bp fragment after digesting with *Bgl*II, whereas Akita 63 (harboring the *ss2aL* allele) showed a 141 bp band (undigested by *Bgl*II). **Figure 3.** PCR-based screening of rice accessions with variable *Hd1* and *SSIIa* genotypes. (**a**) Screening of *Hd1* alleles using #5 and #12 PCR primers listed in Table S1. Lines with the *hd1* allele generated 130 bp PCR products, while those with the *Hd1* allele generated 173 bp PCR products; the heterozygous (*Hd1 hd1*) line produced both PCR products. (**b**) Screening of the *ss2a* allele. Lines carrying the *ss2a* allele showed 111 bp fragment after digesting with *Bgl*II, whereas Akita 63 (harboring the *ss2a<sup>L</sup>* allele) showed a 141 bp band (undigested by *Bgl*II). **Figure 3.** PCR-based screening of rice accessions with variable *Hd1* and *SSIIa* genotypes. (**a**) Screening of *Hd1* alleles using #5 and #12 PCR primers listed in Table S1. Lines with the *hd1* allele generated 130 bp PCR products, while those with the *Hd1* allele generated 173 bp PCR products; the heterozygous (*Hd1 hd1*) line produced both PCR products. (**b**) Screening of the *ss2a* allele. Lines carrying the *ss2a* allele showed 111 bp fragment after digesting with *Bgl*II, whereas Akita 63 (harboring the *ss2aL* allele) showed a 141 bp band (undigested by *Bgl*II).

**Figure 4.** Western blotting analysis of total protein extract prepared from the mature seeds of different rice accessions. Starch biosynthetic enzymes were detected using the corresponding antibodies. **Figure 4.** Western blotting analysis of total protein extract prepared from the mature seeds of different rice accessions. Starch biosynthetic enzymes were detected using the corresponding antibodies. **Figure 4.** Western blotting analysis of total protein extract prepared from the mature seeds of different rice accessions. Starch biosynthetic enzymes were detected using the corresponding antibodies.

The agricultural traits of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* NILs were examined over 2 years (Figures 5 and 6, Tables S2 and S3). The three NILs were germinated or transplanted on the same respective dates and grown in the same paddy field under the same

The agricultural traits of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* NILs were examined over 2 years (Figures 5 and 6, Tables S2 and S3). The three NILs were germinated or transplanted on the same respective dates and grown in the same paddy field under the same

*2.3. Effect of Hd1 Alleles on the Agricultural Traits of NILs* 

*2.3. Effect of Hd1 Alleles on the Agricultural Traits of NILs* 

#### *2.3. Effect of Hd1 Alleles on the Agricultural Traits of NILs*

The agricultural traits of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* NILs were examined over 2 years (Figures 5 and 6, Tables S2 and S3). The three NILs were germinated or transplanted on the same respective dates and grown in the same paddy field under the same growth conditions. The heading dates of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* lines were remarkably different (Figure 5a,b). Although the actual heading dates of NILs slightly differed between the two years, they showed the same trend (Figure 5b, Table 1). The *ss2a hd1* line showed the earliest heading date (early August; August 4 or 7), followed by *ss2a Hd1 hd1* (late August; August 21 or 26) and *ss2a Hd1* (early September; September 2 or 13). The flowering period of individual plants of the same genotype was well synchronized; plants of the same genotype flowered within 2–3 days. In addition, the seed development and maturation period showed the order *ss2a hd1* < *ss2a Hd1 hd1* < *ss2a Hd1*, and the *ss2a hd1*, *ss2a Hd1 hd1*, and *ss2a Hd1* lines took 40, 44–47, and 48–54 days, respectively, to reach maturity after heading (Figure 5b). Only the *ss2a Hd1* line was prematurely harvested on 1 November 2021, since no further seed development was expected because of the arrival of winter (Figure 5c). The vegetative phase of lines *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* was 104–112, 92–94, and 72–78 days, respectively (Figure 5b), and correlated well with the whole-plant dry weight, dry straw weight, plant height, and culm length (Figure 6a–d, Tables S2 and S3). The longer the vegetative period, the longer the culm and the heavier the straw weights. However, the duration of the vegetative phase did not influence the length and number of ears (Figure 6e,f, Tables S2 and S3). *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 7 of 20 growth conditions. The heading dates of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* lines were remarkably different (Figure 5a,b). Although the actual heading dates of NILs slightly differed between the two years, they showed the same trend (Figure 5b, Table 1). The *ss2a hd1* line showed the earliest heading date (early August; August 4 or 7), followed by *ss2a Hd1 hd1* (late August; August 21 or 26) and *ss2a Hd1* (early September; September 2 or 13). The flowering period of individual plants of the same genotype was well synchronized; plants of the same genotype flowered within 2–3 days. In addition, the seed development and maturation period showed the order *ss2a hd1* < *ss2a Hd1 hd1* < *ss2a Hd1*, and the *ss2a hd1*, *ss2a Hd1 hd1*, and *ss2a Hd1* lines took 40, 44–47, and 48–54 days, respectively, to reach maturity after heading (Figure 5b). Only the *ss2a Hd1* line was prematurely harvested on 1 November 2021, since no further seed development was expected because of the arrival of winter (Figure 5c). The vegetative phase of lines *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1*  was 104–112, 92–94, and 72–78 days, respectively (Figure 5b), and correlated well with the whole-plant dry weight, dry straw weight, plant height, and culm length (Figure 6a–d, Tables S2 and S3). The longer the vegetative period, the longer the culm and the heavier the straw weights. However, the duration of the vegetative phase did not influence the length and number of ears (Figure 6e,f, Tables S2 and S3).

**Figure 5.** *Cont*.

**Figure 5.** Comparison of the heading dates of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1*. (**a**) Photo of NILs showing the differences in their heading dates. Note that *ss2a hd1* is mature, *ss2a Hd1 hd1* is at the mid-developmental stage, and *ss2a Hd1* is still flowering. (**b**) Differences among rice NILs in the number of days to heading and to maturity. Numbers (month/day) below the ribbon represent the actual dates of sowing, transplanting, heading, flowering, and maturity. (**c**) Average day temperature during the period from the end of May (transplanting) to the beginning of November (harvesting) in 2020 (gray) and 2021 (black). (**d**) Minimum temperature for 2 weeks before the heading date of *ss2a Hd1* in 2020 (gray) and 2021 (black). Dashed line indicates the threshold temperature (17 °C) that reduces the fertility rate. (**e**) Average temperature for 2 weeks before the heading date of *ss2a hd1* (black) and *ss2a Hd1 hd1* (gray) in 2021. **Figure 5.** Comparison of the heading dates of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1*. (**a**) Photo of NILs showing the differences in their heading dates. Note that *ss2a hd1* is mature, *ss2a Hd1 hd1* is at the mid-developmental stage, and *ss2a Hd1* is still flowering. (**b**) Differences among rice NILs in the number of days to heading and to maturity. Numbers (month/day) below the ribbon represent the actual dates of sowing, transplanting, heading, flowering, and maturity. (**c**) Average day temperature during the period from the end of May (transplanting) to the beginning of November (harvesting) in 2020 (gray) and 2021 (black). (**d**) Minimum temperature for 2 weeks before the heading date of *ss2a Hd1* in 2020 (gray) and 2021 (black). Dashed line indicates the threshold temperature (17 ◦C) that reduces the fertility rate. (**e**) Average temperature for 2 weeks before the heading date of *ss2a hd1* (black) and *ss2a Hd1 hd1* (gray) in 2021.

**Figure 6.** Agricultural traits of *ss2a Hd1* (black), *ss2a Hd1 hd1* (gray), and *ss2a hd1* (stripe) NILs. (**a**) Whole-plant dry weight, (**b**) dry straw weight per plant, (**c**) culm length, (**d**) ear length, (**e**) ear number per plant, (**f**) total grain weight per plant, (**g**) dehulled grain weight per plant, (**h**) fertility rate, (**i**) percentage of green immature seeds. Data represent mean ± standard error (SE). The three bars on the left represent data from 2020, and those on the right represent data from 2021. Data collected during the same harvest year were statistically analyzed via the Tukey‒Kramer method (*p* < 0.05). The number of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* plants was 9, 20, and 8, respectively, in 2020, and 20 plants of each line were analyzed in 2021. Different lowercase letters above bars indicate significant differences. **Figure 6.** Agricultural traits of *ss2a Hd1* (black), *ss2a Hd1 hd1* (gray), and *ss2a hd1* (stripe) NILs. (**a**) Whole-plant dry weight, (**b**) dry straw weight per plant, (**c**) culm length, (**d**) ear length, (**e**) ear number per plant, (**f**) total grain weight per plant, (**g**) dehulled grain weight per plant, (**h**) fertility rate, (**i**) percentage of green immature seeds. Data represent mean ± standard error (SE). The three bars on the left represent data from 2020, and those on the right represent data from 2021. Data collected during the same harvest year were statistically analyzed via the Tukey-Kramer method (*p* < 0.05). The number of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* plants was 9, 20, and 8, respectively, in 2020, and 20 plants of each line were analyzed in 2021. Different lowercase letters above bars indicate significant differences.

The total grain weight and dehulled grain weight of *ss2a Hd1 hd1* were greater than those of *ss2a hd1* (Figure 6f,g). While those of *ss2a Hd1* differed between the 2 years, those values of *ss2a Hd1* were greater than those of *ss2a Hd1 hd1* in 2020 but lower than those of *ss2a hd1* in 2021. Some correlation was detected between plant biomass and grain yield; the higher the biomass, the better the yield, as long as the temperature during seed development remained optimal (Figures 5c and 6a,b,f,g). Reduction in the grain yield of *ss2a Hd1* in 2021 was likely caused by low temperature from mid-August to early September (Figure 5c,d). This delayed the heading date, which prolonged seed development and re-The total grain weight and dehulled grain weight of *ss2a Hd1 hd1* were greater than those of *ss2a hd1* (Figure 6f,g). While those of *ss2a Hd1* differed between the 2 years, those values of *ss2a Hd1* were greater than those of *ss2a Hd1 hd1* in 2020 but lower than those of *ss2a hd1* in 2021. Some correlation was detected between plant biomass and grain yield; the higher the biomass, the better the yield, as long as the temperature during seed development remained optimal (Figures 5c and 6a,b,f,g). Reduction in the grain yield of *ss2a Hd1* in 2021 was likely caused by low temperature from mid-August to early September (Figure 5c,d). This delayed the heading date, which prolonged seed development and

duced starch synthesis, thus increasing the number of premature grains (Figure 6i). Low

reduced starch synthesis, thus increasing the number of premature grains (Figure 6i). Low average day temperature is also known to prolong the seed maturation period [47]. In fact, the *ss2a Hd1* plants did not fully reach maturity in 2021 (Figure 6g, Tables S2 and S3). Although the effect of this phenomenon was minor, low temperature also led to reduced fertility rate (Figures 5d and 6h). Studies show that fertility rate declines when the minimum temperature remains under 17 ◦C for 2 weeks before the heading date [48] and when the temperature is too high [49]. These findings are consistent with the lower fertility rates of *ss2a Hd1* and *ss2a hd1* than that of *ss2a Hd1 hd1* (Figures 5e and 6h). Therefore, the functional *Hd1* allele is unsuitable for rice cultivars grown in high-latitude areas for the maintenance of stable grain quality and yield. However, if an increase in plant biomass is desired, especially for rice cultivars utilized as feed (straw) or for ethanol production, the functional *Hd1* allele is necessary for prolonging the vegetative phase.

The average weight of one dehulled seed of EM204 was only 16.5 mg [43], which is approximately 55% of that of *ss2a Hd1*. Therefore, the yield of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd*1 lines generated in the present study was greatly improved, owing to backcrossing with the high-yielding parental line Akita 63.

#### *2.4. Effect of Hd1 Alleles on Apparent Amylose Content and GBSSI Expression Level*

Apparent amylose content affects texture of cooked rice and rice products [50,51], and the abundance of GBSSI, which is responsible for amylose synthesis, is affected by the temperature during seed development [30]. Since the temperature during seed development varied considerably among NILs, depending on their heading dates (Figure 5c, Table S4), the apparent amylose content was measured via gel filtration chromatography using a series of single HW-55S and triple HW-50S Toyopearl columns (Table 2, Figure S2). Amylose was eluted in fraction I, and the long and short chains of amylopectin were eluted in fractions II and III, respectively (Figure S2).

Apparent amylose content of Akita 63 (17–18%) was relatively low (Table 2) compared with that of Kinmaze (22%) [43,52]. This is partly because Akita 63 flowered in early August when the temperature was high (average day temperature = 25–30 ◦C) during seed development, while Kinmaze flowered in early September under lower temperature (average day temperature = ~20 ◦C). Although the apparent amylose content of EM204 (24%) was higher than that of Kinmaze, both rice accessions flowered at a similar time (early September; average day temperature = ~20 ◦C). Therefore, the absence of SSIIa resulted in an increase of apparent amylose content in the Kinmaze background. Similarly, the amylose contents of all three NILs were significantly higher than that of Akita 63, as determined using a pairwise *t*-test (Table 2).

To determine whether the different heading dates of NILs affect apparent amylose content in the absence of SSIIa in the Akita 63 background, the apparent amylose contents of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* were compared (Table 2, Figure S2). The results showed that the apparent amylose contents of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* were 27.0%, 25.1%, and 22.1%, respectively, in 2020, and 28.0%, 26.6%, and 24.7%, respectively, in 2021 (Table 2, Figure S2). Thus, the amylose content of *ss2a Hd1* was the highest among the three lines and was significantly higher than that of *ss2a hd1*. We found that the earlier the heading date, the higher the seed development temperature and the lower the amylose content (Tables 2, S2 and S3, Figure S2). The apparent amylose content of *ss2a hd1* was 3–5% lower than that of *ss2a Hd1* and 4–8% higher than that of Akita 63 (Table 2). This suggests that loss of SSIIa mitigates the reduction in amylose content, even if the temperature during seed development is high.

To determine whether apparent amylose content is correlated with the GBSSI protein level, we performed western blotting of NILs (Figure 4). The amount of GBSSI protein showed a strong correlation with the apparent amylose content (Figure 4). Additionally, the GBSSI protein was the least abundant in Akita 63, and the level of GBSSI in *ss2a Hd1* was greater than that in *ss2a hd1* (Figure 4).

accessions.


**Table 2.** Apparent amylose content and ratio of short to long chain of amylopectin in different rice accessions. **Rice Accession Apparent Amylose Content (%) 1 Ratio of Short to Long Chains of Amylopectin 1**

**2020 2021 2020 2021** 

**Table 2.** Apparent amylose content and ratio of short to long chain of amylopectin in different rice

<sup>1</sup> Apparent amylose content and the ratio of short to long chains of amylopectin were calculated from fraction I and fraction III/fraction II in Figure S2. Data represent the mean ± SE of three replicates. Different lowercase letters indicate significant differences among *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* (Tukey-Kramer method; *p* < 0.05). Asterisk indicates significant differences relative to Akita 63 (*t*-test; *p* < 0.05). *ss2a hd1* (Tukey‒Kramer method; *p* < 0.05). Asterisk indicates significant differences relative to Akita 63 (*t*-test; *p* < 0.05).

#### *2.5. Effect of Hd1 Alleles on Amylopectin Structure 2.5. Effect of Hd1 Alleles on Amylopectin Structure*  The ratio of short amylopectin chains to long amylopectin chains (eluted in fraction

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 11 of 20

The ratio of short amylopectin chains to long amylopectin chains (eluted in fraction III and fraction II, respectively, via gel filtration chromatography) was higher in *ss2a Hd1* than in *ss2a hd1* (Table 2). Therefore, the detailed amylopectin branch structure was analyzed via capillary electrophoresis using debranched starch purified from mature rice seeds (Figures 7 and S3). The differences in amylopectin structure were shown as a differential curve. III and fraction II, respectively, via gel filtration chromatography) was higher in *ss2a Hd1* than in *ss2a hd1* (Table 2). Therefore, the detailed amylopectin branch structure was analyzed via capillary electrophoresis using debranched starch purified from mature rice seeds (Figures 7 and S3). The differences in amylopectin structure were shown as a differential curve.

**Figure 7.** Differences in the amylopectin branch structure of NILs. (**a**,**b**) Subtraction curves showing the effects of *Hd1* alleles on amylopectin branch structure (**a**) and the effect of the loss of SSIIa on amylopectin structure (**b**). Each panel shows one typical representative data set of at least three replications. Data shown here were obtained from samples harvested in 2021, and the data of samples harvested in 2020 are shown in Figure S3. **Figure 7.** Differences in the amylopectin branch structure of NILs. (**a**,**b**) Subtraction curves showing the effects of *Hd1* alleles on amylopectin branch structure (**a**) and the effect of the loss of SSIIa on amylopectin structure (**b**). Each panel shows one typical representative data set of at least three replications. Data shown here were obtained from samples harvested in 2021, and the data of samples harvested in 2020 are shown in Figure S3.

To reveal the effect of heading date on amylopectin structure, values of chain length distribution of *ss2a hd1* were subtracted from those of *ss2a Hd1* or *ss2a Hd1 hd1* (Figures 7 and S3). The results showed that the number of short amylopectin chains (DP < 14) was larger in *ss2a Hd1* and *ss2a Hd1 hd1* seeds than in *ss2a hd1* seeds harvested in both years. While the number of long amylopectin chains (DP ≥ 15) was larger in *ss2a hd1* than in *ss2a Hd1* and *ss2a Hd1 hd1* (Figures 7a and S3a). The degree of difference was greater in *ss2a Hd1* than in *ss2a Hd1 hd1* (Figures 7a and S3a). The reason why *ss2a hd1* seeds contained fewer short amylopectin chains and more long amylopectin chains was probably because of a slight decrease in BEIIb levels in *ss2a hd1*, as shown via western blotting (Figure 4).

To reveal the effect of the loss of SSIIa on amylopectin structure, values of chain length distribution from Akita 63 were subtracted from those of *ss2a Hd1*, *ss2a Hd1 hd1*, or *ss2a hd1*. All three NILs, which lacked SSIIa, showed similar trends, i.e., a considerable increase in short amylopectin chains with DP 5–10 and a decrease in intermediate amylopectin chains with DP 12–24 (Figures 7b and S3b). These results are consistent with the role of SSIIa, which synthesizes intermediate chains [43].

#### *2.6. Effect of Hd1 Alleles on the Thermal Properties of Starch*

The gelatinization temperature of starch depends on the number of amylopectin branches with DP ≤ 24 [53,54]. An increase in short amylopectin branches lowers the gelatinization temperature [43], while an increase in long amylopectin branches raises the gelatinization temperature [52]. Therefore, we measured the gelatinization temperature of starch in *ss2a Hd1*, *ss2a Hd1 hd1*, or *ss2a hd1* using differential scanning calorimetry and compared the results with the gelatinization temperature of starch in Akita 63 (Table 3).


**Table 3.** Peak gelatinization temperature (Tp) of starch purified from rice grains harvested in 2020 and 2021, as analyzed via differential scanning calorimetry.

<sup>1</sup> Data represent the mean <sup>±</sup> SE of three replicates. Different lowercase letters indicate significant differences (Tukey-Kramer method; *p* < 0.05).

The gelatinization temperature of lines lacking SSIIa (*ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1*) was lower than that of Akita 63. To precisely evaluate the effect of the absence of SSIIa on starch gelatinization temperature, the gelatinization temperatures of starch in *ss2a hd1* and Akita 63 were compared since both rice accessions flowered in early August. The peak gelatinization temperature of *ss2a hd1* was 1.3–4.8 ◦C lower than that of Akita 63, although the heading dates of both these accessions were essentially the same. This is because *ss2a hd1* (owing to the loss of SSIIa) contained a higher number of short amylopectin chains with DP < 10 and lower number of chains with DP ≥ 10 than Akita 63 (Figure 7b). This suggests that the loss of SSIIa lowers the gelatinization temperature of starch, even under high temperature during seed development.

In addition, the peak gelatinization temperature of NILs followed the order *ss2a hd1* > *ss2a Hd1 hd1* > *ss2a Hd1*, although exact values of each line differed between the years (Table 3). This trend of the peak gelatinization temperature of NILs may be explained by differences in the chain length distribution of amylopectin among the NILs: the number of amylopectin chains with DP < 15 showed the order *ss2a Hd1* > *ss2a Hd1 hd1* > *ss2a hd1* and that of amylopectin chains DP > 15 followed the order *ss2a Hd1* < *ss2a Hd1 hd1* < *ss2a hd1* (Figure 7a). This suggests that gelatinization temperature is affected by the heading date: the higher the temperature during seed development, the higher the gelatinization temperature of starch, even in the absence of SSIIa (Figure 5, Tables 3 and S4).

#### **3. Discussion**

#### *3.1. SNPs in Hd1*

In this study, SNPs responsible for the differences in the heading dates of Kinmaze (the parental line of the *ss2a* null mutant EM204), Akita 63, and Akitakomachi were identified. Furthermore, the precise effects of different heading dates, determined by *Hd1*, *Hd1 hd1*, and *hd1*, on the agricultural traits and starch properties of rice were evaluated in NILs (lacking SSIIa) generated using Akita 63, an elite rice cultivar, as the recurrent parent. Sequencing analyses revealed that Akita 63 carries a loss-of-function *hd1* allele, while Kinmaze harbors a functional *Hd1* allele. The heading date of *ss2a hd1* was the earliest and 72–78 days after transplanting. Heading dates of *ss2a Hd1 hd1* and *ss2a Hd1* were 14–19 and 26–40 days later than those of *ss2a hd1*, respectively. These differences in the heading dates of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* were likely caused by the different *Hd1* alleles, although several other genes are also involved in determination of the heading date (Figure 1).

In addition, analyses of *Hd1* gene sequences using the basic local alignment search tool (BLAST; https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 20 June 2019) and the alignment of Hd1 amino acid sequences revealed that the *hd1* allele of Akita 63 and Akitakomachi is identical to that of the HS66 mutant (AB041841; [4]) and Sasanishiki (AB433218) (Figure S4a) but different from that of Kasalath (AB041839; [4], Figure 2d), Ginbouzu (AB041840; [4]), and Koshihikari (AB375859; [6]) (Figure S4b). The *Hd1* allele of Koshihikari is identical to that of Nipponbare, while Ginbouzu shares the same *Hd1* sequence as Kinmaze (MK449352; this study), Hoshinoyume (AB353276; [7]), and Hayamasari (AB353275; [7]) (Figure S4c). The PCR marker generated in this study (Table S1, Figure S1) as well as other PCR markers generated by Mo et al. [18] will serve as useful tools for determining the different types of *Hd1* alleles, which will accelerate the breeding of new rice cultivars with different heading dates. Different *Hd1* alleles have already been utilized to distribute the workload of the peak harvesting hours. For example, low-amylose rice lines harboring the *Wxmq* gene, such as Milky Summer, Milky Queen, and Milky Autumn, are grown in the central to southern parts of Japan (https://www.naro.go.jp/publicity\_report/ press/laboratory/nics/079175.html, accessed on 28 June 2022). The choice of different *Hd1* alleles should be carefully considered, depending on the application (yield increase, starch property, or workload distribution).

#### *3.2. Effect of Hd1 Alleles on Grain Yield*

Differences in the heading date impacted the agricultural traits of NILs (Figure 6, Tables S2 and S3). The total grain yield of *ss2a Hd1 hd1* tended to be higher than that of *ss2a hd1*, although it was statistically insignificant due to a statistics outlier, while the total grain yield of *ss2a Hd1* varied depending on the year (Figure 6, Tables S2 and S3). The percentages of green immature grains were lowest in *ss2a hd1* and highest in *ss2a Hd1* (Figure 6, Tables S2 and S3). Presence of the *hd1* allele enabled efficient grain filling by promoting flowering at the appropriate temperature for starch biosynthesis during seed development, thus minimizing the time required for seed maturation and desiccation. However, because of the short vegetative period, the amount of stored photosynthetic products to be translocated from the culm might be decreased, which may lead to reduced yield. Thus, the heading date of *ss2a Hd hd1* seemed the most suitable for cultivation in Akita (Japan) as it showed stable high-level production of grains, judging from the limited data obtained under the extreme temperature conditions in 2020 and 2021, although the heterozygous allele (*Hd1 hd1*) would not be appropriate for commercial rice production as it would segregate in subsequent generations. The *ss2a hd1* NIL is also suitable for cultivation in northern Japan because the percentage of green immature grains of this genotype was the lowest, although its yield could be improved further. Increase in grain yield while maintaining seed quality should be possible by finetuning the combinations of other genes involved in the determination of the heading date, to ensure that the rice flowers in mid-August.

The *ss2a Hd1* NIL was not suitable for grain production because the heading date was too late and risked the early arrival of winter during seed development, which could lead to large yield differences between years. Moreover, if the heading date is delayed because of low temperature in August, there is a high chance that seed development may not be completed in time, resulting in drastic yield losses. However, *ss2a Hd1* showed the highest culm length and straw dry weight. Therefore, use of the *Hd1* allele would be beneficial for increasing the plant biomass, which could be used as feed for livestock or as a raw material for bioethanol production. Farmers generally prefer to grow rice varieties with relatively shorter culm length to avoid lodging. Short culm produces less waste, requires less fertilizer, and improves work efficiency. Therefore, cultivars with suitable *Hd1* alleles should be carefully considered, depending on whether the ultimate goal is to harvest grains or whole plants. The latitude and altitude of the planting area should also be taken into account when selecting rice cultivars with different *Hd1* alleles. Since *Hd1* functions by repressing heading under long days and promoting heading under short days [5,14,16], the effects of *Hd1* at different latitudes are expected to differ. The presence of the *hd1* allele likely prevents premature heading of rice plants grown near the equator and helps increase the tiller and ear numbers before transitioning to the reproductive phase. Growing *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* genotypes at different latitudes and temperatures will provide additional useful information for achieving high yields in the respective regions.

### *3.3. Effect of Hd1 Alleles on Starch Structure*

High temperature during seed development reduces apparent amylose content by reducing the abundance of GBSSI and increases the gelatinization temperature of starch by decreasing the abundance of BEIIb, thus affecting the quality of rice [28,30,40,42]. Compared with the effects of high temperature, the loss of SSIIa activity has opposite effects; the *ss2a* mutant shows higher apparent amylose content and lower gelatinization temperature compared with its parental line [43]. To reveal whether the loss of SSIIa can mitigate the above-described effect on starch under high temperature during seed development, the starch properties of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* NILs were evaluated since temperatures during the seed development of these lines were low, medium, and high, respectively, because of differences in their heading dates (Figure 5c and Table S4). The apparent amylose content of *ss2a hd1* was 22.1–24.7%, which was lower than that of *ss2a Hd1 hd1* (23.2–26.6%) and *ss2a Hd1* (27.0–28.0%) but higher than that of Akita 63 (17.1–18.1%) (Table 2). Nonetheless, both *ss2a hd1* and Akita 63 flowered at almost the same time and possessed an identical genetic background, except SSIIa. Therefore, the loss of SSIIa increased the apparent amylose content even if seed development occurred under high temperature. Increasing the apparent amylose content of rice grains can be used as one of the breeding strategies for increasing the health benefit of rice, since high apparent amylose content elevates the resistant starch content [52,55]. The *Hd1 hd1* and *Hd1* alleles are beneficial for increasing the apparent amylose content because these alleles delay flowering and facilitate seed development under cooler temperatures. However, to achieve high yield and avoid the risk of the early arrival of winter, heading dates should be no later than late August, especially if the rice is grown in the northern area of Japan.

#### *3.4. Effect of Hd1 Alleles on Starch Gelatinization Temperature*

The *ss2a hd1* NIL possessed a higher number of short amylopectin chains (DP < 15) than Akita 63, and its gelatinization temperature (57 ◦C) was lower than that of Akita 63 (62.0 ◦C) but higher than that of *ss2a Hd1 hd1* (55.5 ◦C) and *ss2a Hd1* (52.3 ◦C) (Figure 7, Table 3). One of the reasons why the gelatinization temperature of *ss2a hd1* was higher than that of *ss2a Hd1 hd1* and *ss2a hd1* might be the relatively lower abundance of the BEIIb protein under high temperature, which increased the number of long amylopectin branches (Figure 4). The balance between amylopectin branch generation and removal is important for controlling amylopectin structure, and loss of BEIIb can be mitigated by the additional loss of isoamylase 1 [56]. Therefore, reduction in the isoamylase 1 level

may be one way to counterbalance the reduction in BEIIb level under high temperature during seed development. Alternatively, delaying the heading date of *ss2a hd1* offers a more practical way. Possible target genes for delaying the flowering time of *ss2a hd1* are *Ghd7* and *OsPRR37*, since combinations of the presence or absence of these genes and that of *Hd1* alleles allow the heading date of rice to be further finetuned [57,58]. Rice with a low gelatinization temperature is expected to retrograde slowly and be tasty. Thus, introduction of the *ss2a* allele into rice lines cultivated near the equator (with high temperature during seed development) may improve the quality of rice and rice products produced in tropical regions. Analysis of the retrogradation properties of *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* lines will provide additional information for the use of these NILs in the food industry.

#### **4. Materials and Methods**

#### *4.1. Plant Materials*

Rice (*Oryza sativa* L.) *ss2a* mutant, EM204, was previously isolated from the NMUmutagenized populations of the wild-type japonica cultivar, Kinmaze, which flowers late (early September) at high latitude [43]. EM204 harbors a mutation at the last nucleotide of intron 5, which inhibits splicing and results in the deficiency of 15 amino acids [43]. EM204 was backcrossed twice with the early-flowering, high-yielding elite japonica rice cultivar, Akita 63 [44]. The resulting F<sup>1</sup> seedlings were grown and self-pollinated to obtain the F<sup>2</sup> progeny. DNA was isolated from F<sup>2</sup> seedlings, and genotyping was performed as described previously [43]. The *ss2a Hd1 hd1* line was self-pollinated to obtain *ss2a Hd1*, *ss2a Hd1 hd1*, and *ss2a hd1* NILs. Theoretically, 87.5% of the genome in these three NILs was derived from Akita 63. Akitakomachi was obtained from Akita Prefectural Agricultural Experiment Station, Akita, Japan, and Kasalath and Nipponbare were obtained from the Genebank, National Agricultural and Food Research Organization, Tsukuba, Japan. All rice lines were grown in an experimental paddy field of Akita Prefectural University during the summer under natural light conditions.

#### *4.2. Sequencing of the Hd1 Gene*

Genomic DNA was isolated from leaves of Akita 63, Akitakomachi, and Kinmaze. Approximately 3 cm of young leaf was powdered with liquid nitrogen using a Multi-beads Shocker (Yasui Kikai, Osaka, Japan). The powder was extracted with 400 µL of 200 mM Tris-HCl, pH 7.5, 250 mM NaCl, 25 mM EDTA, 0.5% SDS. After centrifugation, 300 µL of the supernatant was mixed with an equal volume of isopropanol, let stand for 20 min or longer, and centrifuged. The DNA pellet was rinsed with 70% ethanol, dried, and resuspended in 25 µL of TE buffer containing 25 µL of 10 mM Tris-HCl, 1 mM EDTA. 1 µL of DNA was used for 10 µL PCR reaction. PCR amplification was carried out using the Quick Taq HS dye mix (TOYOBO, Osaka, Japan), dimethyl sulfoxide (DMSO; 5% final concentration), and sequence-specific primers (Table S1) under the following conditions: 94 ◦C for 2 min, and 38 cycles of 94 ◦C for 20 s, 50 ◦C for 20 s, and 68 ◦C for 20 s. The PCR products were sequenced at the Biotechnology Center in Akita Prefectural University, and the obtained sequences were aligned with that the *Hd1* gene of Nipponbare using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/, accessed on 20 June 2019) and analyzed using BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 20 June 2019). The identified *Hd1* sequences were deposited in the NCBI GenBank database (https: //www.ncbi.nlm.nih.gov/, accessed on 25 January 2019) under the following accession numbers: MK449350 (Akitakomachi), MK449351 (Akita 63), and MK449352 (Kinmaze).

#### *4.3. Genotyping of Hd1 and SS2a Alleles*

The *SSIIa* gene was genotyped as described [43]. To genotype the *Hd1* gene, PCR was performed using the Quick Taq HS dye mix (TOYOBO, Osaka, Japan), 5% DMSO, and sequence-specific primers (50 -GGCATGTATTTTGGTGAAGTCG-30 and 50 -GTTGT CGTAGTACGAATTGTACCCGAC-30 ) under the following conditions: 94 ◦C for 2 min, and 30 cycles of 94 ◦C for 20 s, 60 ◦C for 20 s, and 68 ◦C for 20 s. This enabled successful

amplification, since the region was enriched in guanine and cytosine. PCR products were separated via electrophoresis on 15% acrylamide gel in 1× TBE buffer. The expected sizes of the PCR products were 170 bp for *Hd1* and 130 bp for *hd1*.

#### *4.4. Field Experiments and Agricultural Traits*

All rice lines were sown and transplanted on the same day, with a spacing of 20 cm between plants and 25 cm between rows. A total of 35 plants each of *ss2a Hd1* and *ss2a Hd1* genotypes, 100 plants of the *ss2a Hd1 hd1* genotype, and 20 plants each of the parental lines were grown according to the local agricultural practices. Heading date was recorded when 50% of plants of a given genotype initiated heading. Maturation date was recorded when 90% of the panicles turned yellow. Plant height and ear length were measured prior to harvesting. After 2 weeks of desiccation, whole-plant dry weight and total grain weight were measured, and dry straw weight was calculated by subtracting the total grain weight from the whole-plant dry weight. Total grain weight was measured including empty seeds. Grains were dehulled and sieved through a mesh with 1.9 mm pore size using a sieving machine, TEST Grain Selector (TWSB, Satake, Tokyo, Japan), and the weight of grains above 1.9 mm thickness and width was measured as total dehulled grain weight. Fertility rate was calculated by counting and subtracting the number of empty seeds from the total number of seeds. Quality of brown rice was analyzed using the VIRGO Rice Grain Selector (ES-V; Shizuoka Seiki, Shizuoka, Japan) by detecting the green premature seeds. Data were obtained in 2020 and 2021.

#### *4.5. Meteorological Data*

Meteorological data for 2021 were obtained from the Japan Meteorological Agency. Daily temperature data were extracted, and average temperature during seed development was calculated.

#### *4.6. Western Blot Analysis*

Three mature seeds of each rice genotype were ground to a fine powder, and total protein was extracted using 20 volumes (*w*/*v*) of buffer containing 125 mM Tris-HCl (pH 6.8), 8 M urea, 4% (*w*/*v*) SDS, 5% (*v*/*v*) β-mercaptoethanol, and 0.05% (*w*/*v*) bromophenol blue. After centrifugation, proteins in the supernatants were subjected to sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) on 7.5% acrylamide gel and blotted onto a membrane. Membranes were incubated with the following primary antibodies: anti-SSI (1:3000 dilution [59]), anti-SSIIa (1:1000 dilution [60], anti-GBSSI (1:5000 [59]), and anti-BEIIb (1:5000 [35]). Subsequently, secondary antibody incubation and protein detection were performed as described previously [60].

#### *4.7. Measurement of Apparent Amylose Content and Short to Long Chain Amylopectin Ratio*

Starch was purified using the cold-alkaline method as described previously [61,62]. Purified starch was debranched using *Pseudomonas* isoamylase (Hayashibara, Okayama, Japan) and analyzed via gel filtration chromatography (Toyopearl HW-55S and HW-50S×3; Tosoh, Tokyo, Japan) [63–65]. Amylose (fraction I), long amylopectin chains (fraction II), short amylopectin chains (fraction III), and apparent amylose content were quantified as described previously [63–65].

#### *4.8. Analysis of Amylopectin Structure*

Debranched purified starch was fluorescently labeled and analyzed via capillary electrophoresis (P/ACE MDQ Plus Carbohydrate System; AB Sciex, Framingham, MA, USA), as described [66].

#### *4.9. Measurement of Gelatinization Temperature*

The thermal properties of purified starch were analyzed via differential scanning calorimetry (Seiko Instrument 6100; Seiko, Chiba, Japan) as described previously [59,67].

#### **5. Conclusions**

This study precisely evaluated the agricultural traits and starch properties of rice NILs (*ss2a hd1*, *ss2a Hd1 hd1*, and *ss2a Hd1*) lacking SSIIa and showing different heading dates, although the data were limited to two harvest years. These NILs were generated by crossing the elite rice cultivar Akita 63 (as the recurrent parent) with the *ss2a* null mutant EM204. Sequencing analyses revealed that Akita 63 carries a loss-of-function *hd1* allele, while Kinmaze (the parental line of EM204) possesses a functional *Hdl* allele. The *ss2a hd1* NIL was the first to initiate heading (early August), while the heading dates of *ss2a Hd1 hd1* and *ss2a Hd1* were approximately 2 and 4 weeks later, respectively, than that of *ss2a hd1*. The time required to reach maturity was the shortest in *ss2a hd1*, which reached maturation in mid-September, while the harvesting dates of *ss2a Hd1 hd1* and *ss2a Hd1* were approximately 4 and 6 weeks later, respectively, than that of *ss2a hd1*. Although *ss2a hd1* showed the lowest whole-plant dry weight, it also produced the lowest number of green immature seeds. Analyses of starch properties showed that the amylose content of *ss2a hd1* was lower than that of *ss2a Hd1*, but its gelatinization temperature was higher. Overall, this study provides useful information about the different heading dates, agricultural traits, and starch properties of rice accessions.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms231810783/s1.

**Author Contributions:** Conceptualization, N.C. and N.F.; methodology, N.C. and K.H.; resources, N.C., S.M., N.F.O. and Y.H.; writing—original draft preparation, N.C.; writing—review and editing, N.C. and N.F.; supervision, N.F.; project administration, N.F.; funding acquisition, N.C. and N.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Science and Technology Research Promotion Program for Agriculture, Forestry and Fisheries and Food Industry (25033AB and 28029C awarded to N.F.), the President's Funds of Akita Prefectural University (N.F. and N.C.), Grant-in-Aid for JSPS fellows from Japan Society for the Promotion of Science (#15J40176 and JP18J40020 awarded to N.C.), and Japan Society for the Promotion of Science (#16K18571, JP18K14438, and 20K05961 awarded to N.C.; and 19H01608 awarded to N.F.).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The sequence data generated in this study are freely available from the NCBI GenBank database (accession numbers: MK449352.1, MK449351.1, MK449350.1).

**Acknowledgments:** The authors thank Yuko Nakaizumi (Akita Prefectural University) for growing the rice plants. The authors also thank Toshihiro Kumamaru for providing EM204. *Pseudomonas* isoamylase, used for debranching amylopectin, was a kind gift from Hayashibara Co., Ltd.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **The Function of DNA Demethylase Gene ROS1a Null Mutant on Seed Development in Rice (***Oryza Sativa***) Using the CRISPR/CAS9 System**

**Faiza Irshad 1,†, Chao Li 1,2,†, Hao-Yu Wu 1,†, Yan Yan <sup>1</sup> and Jian-Hong Xu 1,2,3,\***


**Abstract:** The endosperm is the main nutrient source in cereals for humans, as it is a highly specialized storage organ for starch, lipids, and proteins, and plays an essential role in seed growth and development. Active DNA demethylation regulates plant developmental processes and is ensured by cytosine methylation (5-meC) DNA glycosylase enzymes. To find out the role of *OsROS1a* in seed development, the null mutant of *OsROS1a* was generated using the CRISPR/Cas9 system. The null mutant of *OsROS1a* was stable and heritable, which affects the major agronomic traits, particularly in rice seeds. The null mutant of *OsROS1a* showed longer and narrower grains, and seeds were deformed containing an underdeveloped and less-starch-producing endosperm with slightly irregularly shaped embryos. In contrast to the transparent grains of the wild type, the grains of the null mutant of *OsROS1a* were slightly opaque and rounded starch granules, with uneven shapes, sizes, and surfaces. A total of 723 differential expression genes (DEGs) were detected in the null mutant of *OsROS1a* by RNA-Seq, of which 290 were downregulated and 433 were upregulated. The gene ontology (GO) terms with the top 20 enrichment factors were visualized for cellular components, biological processes, and molecular functions. The key genes that are enriched for these GO terms include starch synthesis genes (*OsSSIIa* and *OsSSIIIa*) and cellulose synthesis genes (*CESA2*, *CESA3*, *CESA6*, and *CESA8*). Genes encoding polysaccharides and glutelin were found to be downregulated in the mutant endosperm. The glutelins were further verified by SDS-PAGE, suggesting that glutelin genes could be involved in the null mutant of *OsROS1a* seed phenotype and *OsROS1a* could have the key role in the regulation of glutelins. Furthermore, 378 differentially alternative splicing (AS) genes were identified in the null mutant of *OsROS1a*, suggesting that the *OsROS1a* gene has an impact on AS events. Our findings indicated that the function on rice endosperm development in the null mutant of *OsROS1a* could be influenced through regulating gene expression and AS, which could provide the base to properly understand the molecular mechanism related to the *OsROS1a* gene in the regulation of rice seed development.

**Keywords:** rice (*Oryza sativa*); *OsROS1a*; CRISPR/Cas9; seed storage protein; starch; RNA-Seq; alternative splicing

## **1. Introduction**

The development of seeds is an essential process in the angiosperm life cycle. It involves embryo and endosperm development. The rice endosperm provides energy and materials for seed germination and development, which contains seed storage proteins (SSPs), starch, lipids, and additional trace substances, and occupies most of the space. The production and quality of seed is directly determined by endosperm development at the

**Citation:** Irshad, F.; Li, C.; Wu, H.-Y.; Yan, Y.; Xu, J.-H. The Function of DNA Demethylase Gene ROS1a Null Mutant on Seed Development in Rice (*Oryza Sativa*) Using the CRISPR/CAS9 System. *Int. J. Mol. Sci.* **2022**, *23*, 6357. https://doi.org/ 10.3390/ijms23126357

Academic Editor: Yong-Gu Cho

Received: 18 May 2022 Accepted: 4 June 2022 Published: 7 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

filling stage. Amino acids, sugars, and other key metabolites' storage are important for the development of rice endosperm and affect the quality and milling yield [1]. These metabolites are distributed to numerous biosynthetic pathways, mainly the metabolism of starch and the biosynthesis and storage of proteins. In addition, it is also accountable for proteins and starch synthesis in defined amounts and ratios [2].

Rice (*Oryza sativa*) is an excellent material for studying the biosynthesis of SSPs, as it is one of the few plants that synthesize and accumulate both major classes of SSPs, i.e., prolamins and glutelins. The glutelins account for more than 60% of total SSPs in rice, which are encoded by 15 genes, while prolamins are 20% to 30% of total SSPs and are encoded by 34 genes [3,4]. Based on the sequence similarity of the amino acid, glutelins can be divided into four groups (GluA, GluB, GluC, and GluD) [3], and prolamins into three types, 10-, 13-, and 16-kDa [4]. Glutelins are a form of precursor protein (proglutelin), and they synthesize in the endoplasmic reticulum (ER) and transfer the protein body II (PBII) into the protein storage vacuoles (PSV) through the Golgi apparatus [5,6]. Eventually, they are prepared into mature 20-kDa basic and 37-kDa acidic subunits, which are linked by disulfide bonds [7].

The DNA methylation profile study revealed that preferentially expressed genes in the endosperm are mainly coding for major SSPs and starch synthesizing enzymes, and the most important mechanisms for gene activation in rice endosperm are CG and CHG hypomethylation [8]. DNA methylation, an evolutionarily conserved epigenetic mechanism, controls many biological processes such as the imprinting of genes, expression of the tissue-specific genes, stress responses, and transposable elements' inactivation. The cytosine methylation (5-meC) occurs in three contexts (CG, CHG, and CHH, in which H represents A, T, or C) in plants, and is dynamically regulated by balanced methylation and demethylation [9]. In plants, the active form of DNA demethylation is initialized by the REPRESSOR OF SILENCING (ROS1) transglucosylase gene family, including ROS1, DEMETER (DME), DEMETER-like 2 (DML2), and DML3 [10–12]. They all are capable to excise 5-mC, regardless of whether the methylation is in CG, CHG, or CHH form [13–15]. Previous studies showed that the loss function of *OsROS1a* produced sterile rice through defects in both female and male gametogenesis [16,17]. *OsROS1a* demethylates both the vegetative cell genome and central cell genome that is vital for viable seed production [18]. The mutation of *OsROS1* generated a new transcript of 21-nt insertion, which increased the number of aleurone cell layers of rice seed by initializing hypermethylation of *rice seed beta-zipper1* (*RISBZ1*) and *rice prolamin*-*box binding factor* (*RPBF*) [19,20]. Furthermore, *ROS1* genes are also involved in the seed development of rice, wheat, and barley by epigenetic influence on the accumulation of SSPs [21,22].

Alternative splicing (AS) is a key regulatory mechanism that directly contributes to the structural and functional diversity of mRNA and proteins [23]. Advancements in highthroughput technology enabled a global analysis of AS, which has been widely discovered in plants, including rice [24], maize (*Zea mays*) [25,26], *Arabidopsis* (*Arabidopsis thaliana*) [27], cotton (*Gossypium raimondii*) [28], pineapple (*Ananas comosus*) [29], and soybean (*Glycine max*) [30]. Furthermore, stage-dependent AS events are probably significantly important, and considerably affect the grain yield significantly. One of the FLOWERING LOCUST (FT) homologues' gene *FT2* undergoes AS and results in two isoforms in brachypodium (*Brachypodium distachyon*), which work on different functions in the same flowering regulating pathway [31]. In rice, *OsbZIP74* mRNA can be alternatively spliced under treatment with ER stress-inducing agents, which is induced by heat stress, and involved in plant resistance against pathogens or parasites [32]. Based on RNA-Seq, a total of 16,995 AS events of lncRNAs were identified in tomato root, leaf, and flower tissues [33]. More than 1000 genes that experienced AS events were identified in the nitrogen-treated maize roots, and one of the transcription factor *ZmNLP6* isoforms were found to have the strong ability to activate downstream genes [34], suggesting that AS plays a vital role in plant growth and development. There has been emerging evidence showing that DNA methylation can regulate AS. Knockdown of DNA methyltransferase 3 (*Dnmt3*) in honey bees reduced

global genomic methylation levels and induced global and diverse changes in AS in fat tissue [35]. The deficiency of DNA methylation in mouse embryonic stem cells has been proven to influence the splicing of more than 20% of alternative exons [36]. However, the effect of the DNA methylation pattern on AS has not been studied in rice.

To better define the role of *OsROS1a* in seed development, the null mutant of *OsROS1a* was generated using the CRISPR/Cas9 system, with 75-nt deletion and 1-nt substitution in the coding region, which destroy the permuted version of a methylated CpG-discriminating CXXC (Per-CXXC) domain, causing alteration in grain size. RNA-Seq analysis revealed that the polysaccharides and glutelin coding genes are downregulated in the 15 days after pollination (DAP) endosperm of the mutant, and 378 genes that experienced AS events were identified, indicating that the function of the rice endosperm development in the null mutant of *OsROS1a* could be influenced through gene expression and AS.

#### **2. Results**

#### *2.1. The Phenotypes of the Null Mutant of OsROS1a*

In order to well describe the role of *OsROS1a* in active DNA demethylation and seed development, we created null mutants of *OsROS1a*, S1 and S2 using the CRISPR/Cas9 system. The homozygous mutant, S1contains an in-frame deletion of 25 amino acid residues from the 1798th amino acid and an amino acid A to T substitution in the Per-CXXC domain, which could lead to the complete loss function of this domain, while the RNA recognition motif fold (RRMF) domain, started from the 1831st amino acid, was maintained (Figure 1). Another biallele gene-editing event was also obtained that has one allele with the same editing as S1 and the other with 86-nt deletion. As the frameshift mutation of *OsROS1a* produces sterile rice, the S2 mutant of T<sup>1</sup> generation has the same genotype as S1. Therefore, the S1 mutant was used for further analysis.

**Figure 1.** CRISPR/Cas9-induced *OsROS1a* gene editing. (**a**) Schematic of *OsROS1a* gene structure. Exons and introns are denoted as black blocks and lines, respectively. The translation initiation codon (ATG) and the termination codon (TAG) are shown. The recovered mutated allele is shown below the WT reference sequences. The target sites' nucleotides are indicated in black capital letters. The white dashes indicate the deleted nucleotides, and the substitution nucleotide A is shown in red. (**b**) The protein structure of WT and the null mutant S1. The predicted structure of WT contained the Per-CXXC domain and RRMF domain, while the Per-CXXC domain is not predicted inthe null mutant S1.

Microscopic analysis of rice anther showed that T<sup>0</sup> generation of the null mutant of *OsROS1a* displays partial male sterility with smaller anthers than WT (Figure 2a,g). The S1 mutant anthers had fewer pollen grains as compared to WT, and only few pollen grains were stained by a I2-KI solution (Figure 2b,h). To further examine the cellular defects in the S1 mutant, transverse section analysis was performed on the WT and S1 mutant anthers, which were observed at four different developmental stages. At stage 6 (the microsporemother-cells stage), epidermis, endothecium, tapetum, and the microspore mother cells were clearly visible, and no obvious difference was observed between WT and S1 mutant (Figure 2c,i). However, at stage 9 (the young microspore stage), the microspores in WT were globular, deeply stained, and densely distributed, while the microspores in the S1 mutant were lighter, scarcer, and more irregularly shaped (Figure 2d,j). At stage 10 (the vacuolated pollen stage), as compared to WT microspores that were round and vacuolated, half of the S1 mutant microspores had a normal round shape, comparable to WT, while the remaining microspores were degraded and irregularly shaped (Figure 2e,k). At stage 13 (the mature pollen stage), the WT anther locule was full of mature pollen grains with entirely formed pollen walls and accumulated starch. However, there were fewer microspores in the S1 mutant and some of them appeared to be degenerated and partially sterile (Figure 2f,l). These results are consistent with the pollen staining, indicating that the S1 mutant has partial pollen sterility.

**Figure 2.** The phenotype of S1 mutant. The spikelet of WT (**a**) and S1 mutant (**g**). The pollen fertility of WT (**b**) and the S1 mutant (**h**) by iodine staining magnified 20×. Transverse section analysis of the anther development in WT (**c**–**f**) and the S1 mutant (**i**–**l**). Locules from the anther section of WT and at stage 6 (the microspore mother cells stage), stage 9 (the young microspore stage), Stage 10 (the vacuolated pollen stage), and stage 13 (the mature pollen stage). E, epidermis; En, endothecium; T, tapetum; MMC, microspore mother cells; Msp, microspores; MP, mature pollen; DMsp, degenerated microspores; DP, degenerated pollen. Scale bar: 50 µm.

T<sup>1</sup> generation of the S1 mutant plants were used to examine various agronomic traits. The plant height, panicle length, and the primary branch number were not significantly changed (Figure 3). The plant height was only reduced by 2.4% in the S1 mutant as compared to WT (Figure 3a,c). Similarly, in the case of panicle length, only a 12.8% reduction was observed in the S1 mutant as compared to WT (Figure 3b,e). Likewise, the number of primary branches was reduced by 12.5% (Figure 3f). However, the tiller number was significantly increased (43.9%) (Figure 3d) and the seed fertility percentage was significantly reduced (23.0%) in the null mutant of *OsROS1a* when compared to WT (Figure 3g).

The null mutant of *OsROS1a* significantly changed the grain size. The seed length was significantly increased by 5.8% and the seed width was significantly decreased by 4.8% in the S1 mutant as compared to WT (Figure 4). To know whether the grain shape and size can affect the seed storage materials, the transverse section analysis was performed for the dehusked mature grains, which involved cutting the seed transversely into two halves. No detectable difference was observed in the aleurone layer between seeds of the S1 mutant and WT. However, the null mutant S1 consisted of deformed seeds containing underdeveloped and less-starch-producing endosperms (Figure 5a), and the shape of embryos was slightly irregular when compared to WT. In contrast to the semitransparent grains of the WT, the S1 mutantgrains were slightly opaque (Figure 5b). The starch granules were further observed by SEM, and the starch granules of WT grains had sharp edges, flat surfaces, and compound and similarly sized polygonal granules. In contrast, the S1 mutant starch granules were variable in size and shape, with rounded and irregular surfaces (Figure 5c). Furthermore, the total starch content was decreased from 72.73% in WT to 69.58% in the S1 mutant, while the amylose content was increased from 11.24% in WT to 12.79% in the null mutant of *OsROS1a* (Figure 6).

**Figure 3.** Phenotypes of WT and the S1 mutant. (**a**) Phenotype comparison of the WT and the S1 mutant(**b**) The panicle of WT and the S1 mutant. The statistic calculation of agronomic traits of (**c**) plant height, (**d**) tiller number, (**e**) panicle length, (**f**) number of primary branches, and (**g**) fertile seed percentage. Lettering indicates statistical significance at *p* ≤ 0.01. Data are means ± standard deviation (SD) (*n* = 3).

**Figure 4.** Comparison of seed length (**A**) and seed width (**B**), and the statistical analysis of seed length (**C**) and seed width (**D**) between WT and the S1 mutant. Lettering indicates statistical significance at *p* ≤ 0.01. Data are means ± SD (*n* = 3).

**Figure 5.** Genetic screening of *OsROS1a* gene editing mutant by half-seed assay. (**a**) Transversally and longitudinally sectioned of WT and the S1-mutant dehusked, mature grains, stained with the Evans blue dye. Arrowheads indicate the aleurone. se, starchy endosperm; e, embryo. (**b**) Transversely sectioned and nonsectioned mature grains showed the opaque endosperm in the S1 mutant, as compared to the semitransparent endosperm in WT. Scale bar, 0.5 mm. (**c**) Scanning electron micrographs of the endosperm in transverse sections are shown with increasing magnification from left to right. Scanning electron microscopy (SEM) of WT grain revealed similarly sized polygonal starch granules with sharp edges; smooth, flat surfaces; and compound starch granules while the S1 granules are rounded, variable in size and shape, and have uneven surfaces.

**Figure 6.** Total starch and amylose content in the mature seed of WT and the S1 mutant. (**a**) Total starch content in the seeds of WT and S1 mutant. (**b**) Total amylose content in seeds of WT and S1 mutant. Lettering indicates the statistical difference at *p* ≤ 0.01. Data are means ± SD with three replicates.

#### *2.2. RNA-Seq Identifies Responsive Genes in the Null Mutant of OsROS1a*

Because the null mutant of *OsROS1a* can affect the rice grain, 15 DAP immature endosperms of the S1 mutant and WT were harvested for RNA-Seq. In total, more than 45,000,000 clean reads were generated for each sample, and the QC30 ratios were all above 94.2%, indicating the transcriptome was of high quality. The abundance of the expressed genes was then quantified using FPKM.

DEGs between WT and the S1 mutant were screened out based on the threshold of the log<sup>2</sup> fold change being either ≥1 or ≤−1 and a *p*-value < 0.05. Under such criteria, 723 DEGs were identified in total, and among them 290 (40.1%) genes were downregulated and 433 (59.9%) genes were upregulated in the S1 mutant (Figure 7). Furthermore, a GO term enrichment analysis was performed using agriGO [37]. The GO terms with the top 20 enrichment factors were observed, which consisted of eleven biological processes (BP), six molecular functions (MF), and three cellular components (CC). A total of ten out of eleven enriched BP terms are connected to polysaccharide synthetic. In CC, the GO term "macromolecular complex" (GO:0032991) was enriched, including ten glutelin genes (Figure 8).

**Figure 7.** Differential expression of 15 DAP endosperm from WT and the S1 mutant. (**a**) The number of upregulated genes and downregulated genes. (**b**) Volcano plot of differentially expressed genes with the threshold of |log<sup>2</sup> foldchange| > 1. Orange indicates upregulated genes, and blue indicates downregulated genes.

**Figure 8.** GO enrichment of DEGs. GO-term enrichment analysis was performed using agriGO [37], and the top 20 GO terms were shown that belonged to biological processes, cellular components, and molecular functions.

We found that the expressions of seven starch synthesis- and cellulose-synthesisrelated genes were significantly reduced, and most of the starch-synthesis-related genes were downregulated in the S1 mutant (Figure 9, Table 1). Nine of the ten glutelin genes were significantly downregulated, which were then confirmed by SDS-PAGE in dry seeds. The 37-40-kDa acidic and 20-kDa basic two subunits of rice glutelin storage proteins were significantly decreased in the S1 mutant as compared to WT (Figure 10).

**Figure 9.** The expression of polysaccharide-related genes and starch synthesis pathway. (**a**) The heat plot of differential expressed polysaccharide-related genes. (**b**) The starch synthesis pathway. The downregulated genes were shown in blue colors, amylose in red showed increased content, and amylopectin in blue showed decreased content.

**Table 1.** Expression values of starch and cellulose-synthesis-related genes.


**Figure 10.** The gene expression and protein accumulation of glutelins. (**a**) The heat plot of significantly differentially expressed glutelin genes based on the RNA-Seq data. (**b**) SDS-PAGE analysis of glutelins. Glutelins extracted from mature seeds of WT and the S1 mutant were separated on a 12% SDS-PAGE gel and stained with Coomassie Brilliant Blue (CBB).

#### *2.3. RNA-Seq Identifies Differentially AS Events in the Null Mutant of OsROS1a*

As DNA methylation can regulate AS, to further investigate whether the 25 amino acid deletion of complete Per-CXXC domain deletion in the demethylase gene *OsROS1a* can regulate AS events occurring in rice 15 DAP endosperm or not, rMATs were used to identify both annotated and novel AS events in the S1 mutant and the expression level of

AS events was defined as "exon inclusion level". In total, 378 differentially (234 up- and 144 downregulated) AS genes were identified, which belonged to SE (Skipped exon), A5SS (Alternative 5' splice site), A3SS (Alternative 3' splice site), MXE (Mutually exclusive exon), and IR (Intron retained), which are five major AS types (Figure 11a). Among these, the majority of AS genes occurred via SE, which contains 271 genes (71.69%), and the second most common occurred via IR (58/378, 15.34%) (Figure 11b).

**Figure 11.** Alternative splicing affected by the mutation of *OsROS1a*. (**a**) Five types of alternative splicing events. (**b**) Statistics of differential alternative spliced genes between WT and the S1 null mutant. (**c**) GO enrichment of differential alternative spliced genes.

To study the trends in functions of the AS genes, GO enrichment analyses were carried out for genes with differential AS events in rice 15 DAP endosperm. The top 20 enriched GO terms included the largest number of enriched GO terms of 13 in BP, only 1 GO term in CC, and 6 in MF (Figure 11c). In BP, cellular nitrogen compound metabolic process (GO:0034641), nucleobase, nucleoside, nucleotide metabolic process (GO:0055086), and small molecule metabolic process (GO:0044281) are all relevant to small molecule metabolism; cellular localization (GO:0051641), protein localization (GO:0008104), and macromolecule localization (GO:0033036) are all relevant to cellular compound localization. In addition, we found that genes enriched to protein localization (GO:0008104), macromolecule localization (GO:0033036), and establishment of protein localization (GO:0045184) share quite a few common genes, indicating that such a function is likely to be influenced by *OsROS1a* through AS regulation in rice endosperm.

#### **3. Discussion**

In cereals, the endosperm is a highly specialized storage organ for starch, proteins, and lipids, and plays a crucial role in seed growth and development. Active DNA demethylation in plants is initialized by the ROS1 transglucosylase gene family, including DME, ROS1, DML2, and DML3, which regulates plant developmental processes [10–12]. The loss function of *OsROS1a* produced sterile rice through defects in both female and male gametogenesis [16,17]. Furthermore, only two heterozygous lines with one allele of 9-bp and 6-bp produced few seeds, and all frameshift mutants, including six having only truncated RRMF domain, failed to produce seeds [17], suggesting that the RRMF domain is necessary for fertility. We generated the null mutant of *OsROS1a* with 75-bp deletion and 1-bp substitution, which resulted in the complete loss of the Per-CXXC domain but the complete retention of the RRMF domain (Figure 1), which showed semisterile pollen and reduced seed fertility, indicating that the Per-CXXC domain also plays a role in rice fertility. Besides the fertility, the null mutant S1 also altered the rice grain morphology, including long and narrow grain, deformed seeds containing underdeveloped endosperm and a lower amount of starch, slightly irregular shape of the embryo, and slightly opaque grain (Figures 4 and 5). The mutation of *OsROS1a* with a 21-nt insertion generated a new transcript *mOsROS1a* with the insertion of seven amino acid residues (CSNVMRQ) in the RRMF domain to make the thick aleurone and improve rice grain nutrition, which resulted from the hypermethylation and reduced expression of *RISBZ1* and *RPBF* [19,20]. While the expression of *RISBZ1* and *RPBF* was similar between WT and the S1 mutant, this suggested that the Per-CXXC domain cannot alter the expression of these two important TFs. Overexpression of *BiP* suppresses SSPs and starch content and displays the opaque phenotype with shrunken and floury features in rice seeds [38,39].

A total of 723 DEGs were identified in the S1mutant by RNA-Seq analysis. most of these genes are related to starch and SSPs synthesis, suggests that the synthesis of starch, cellulose, and other types of polysaccharide in the endosperm are regulated by *OsROS1a*. The top 20 enrichment factors in GO terms were observed, in which 11 terms consisted of BP, 6 were in MF, and 3 terms were in CC. The BP occupies the largest proportion of GO terms. Out of 11 enriched BP terms, 10 are connected to polysaccharide synthesis. We identified the key genes in starch synthesis, such as *starch synthase IIa* (*OsSSIIa*) and *OsSSIIIa*, and in cellulose synthesis, such as *cellulose synthase A2* (*CESA2*), *CESA3*, *CESA6*, and *CESA8*, that are enriched for these terms, providing more evidence for the hypothesis. *OsSSI* and *OsSSIIIa* contribute to a huge portion of the overall SS enzyme activity in rice during endosperm development [40]. SSIIIa protein linked with other proteins in rice endosperm [41]. Similarly, the *SSIIIa* proportion is also present in a large complex containing *Pyruvate orthophosphate dikinase* (*PPDK*), *ADP-glucose pyrophosphorylase* (*AGPase*), *SSIIa*, and *the starch branching enzyme gene IIa* (*SBEIIa*) and *SBEIIb* in maize [42]. Likewise, *OsSSIIa* enzymes have abundant gene expression in the starch filling stage [43,44], and modify the quality of rice starch [45]. The GO term "macromolecular complex" (GO:0032991) was enriched in the cellular compound and nine glutelin genes related to this term were significantly downregulated in the S1 mutant, which was further confirmed by proteins with SDS-PAGE analysis. The decreased expression of these starch and protein-associated genes in our study suggested that it might be involved in the null mutant of *OsROS1a* seed phenotype and *OsROS1a* might have a main role in the regulation of these genes. The study of the DNA methylation profile during the development of endosperm revealed that genes expressed preferentially in endosperm were related to key storage proteins and starch synthesizing enzymes, are normally hypomethylated [8], and suggested that the demethylation of CG and CHG was the main mechanism for gene activation.

AS is an essential post-transcriptional process that generates several mRNA variants from a single pre-mRNA molecule and improves the genome coding and regulatory potential. Several studies have been carried out to find the AS events in different plant species, tissues, and environmental conditions [32–34]. However, the effect of the DNA methylation pattern on AS has not been reported in rice. Advancements in high-throughput technol-

ogy enabled a global analysis of AS to study its functional characteristics in response to stress [46]. In this study, we identified 378 differentially AS genes in the 15 DAP endosperm of the S1 mutant, which belonged to SE, A5SS, A3SS, MXE, and IR, the five major AS types (Figure 11a), and the SE (71.69%) and IR (15.34%) were the top two types (Figure 11b). Whereas in *Arabidopsis*, AS had the low level of SE (<5%) and the high level of IR (65%) [47]. The SE and IR were predominant AS events in maize endosperm containing 28.33% and 29.85%, respectively [48]. Earlier studies have revealed that AS changes are affected by developmental and environmental factors [49,50]. The SE type is comparatively low (5%) in most maize tissues, which can be increased by up to more than 27% under abiotic stress. Furthermore, during seed development of maize, AS isoforms alter considerably in seed, embryo, and endosperm [25]. Our results revealed that AS genes of the A5SS and the MXE types were relatively less common and less influenced by the mutation of *OsROS1a* during endosperm development and that the function modification of *OsROS1a* did not lead to AS type-specific. The mutation of the *OsROS1a* gene showed a major impact on the alternation of AS events of downregulated genes and the impacts are not AS type-specific in the 15 DAP rice endosperm. Several GO terms indicated identical functions were enriched. Moreover, we identified that *OsROS1a* might influence the genes which are enriched to "protein localization" (GO:0008104), "macromolecule localization" (GO:0033036), and "establishment of protein localization" (GO:0045184) through AS regulation in the endosperm. These results suggested that rice endosperm development mediated by *OsROS1a* gene could be influenced through AS regulating.

#### **4. Materials and Methods**

#### *4.1. Generation of Null Mutants of ROS1a in Rice Using CRISPR/Cas9 System*

The CRISPR/Cas9 binary vector, pYLCRISPR/Cas9 Pubi-H, was used for targeted genome editing [51]. Two single guide RNA (sgRNA) were designed to precisely target the 13th exon of *OsROS1a*. Two sgRNA intermediate vectors, pYLsgRNA-OsU6a and pYLsgRNA-OsU6b, were used for gene targeting on rice OsU6a and OsU6b small nuclear RNA promoters, respectively. The sgRNA sequences were cloned into a binary vector that contained sgRNA and Cas9 expression cassettes, and the resulting construct was transformed into rice cultivar Nipponbare by *Agrobacterium* infection of callus explants.

#### *4.2. Pollen Fertility Examination and Histochemical Assay*

Floral organs were photographed with a dissecting microscope. For the analysis of pollen fertility, the anther from the WT (wild type, Nipponbare) and *osros1a* mutant plants were sampled from the spikelets just before flowering and the 1% potassium iodide (I2-KI) solution was used to stain the pollen grains. A Nikon eclipse Ni fluorescence microscope was used to be visualized, and the stained pollen grains was photographed. For transverse section analysis, spikelets of various anther development stages from WT and the S1 mutant, according to the previous study [52], were collected and frozen in an OCT (Tissue-Tek; Sakura Finetek, Torrance, CA, USA) compound. Samples were sectioned with the help of the freezing microtome (Thermo Shandon Cryotome FE, Shandon, China). Then, 0.05% toluidine blue dye was used to stain the crossed section and the microscopic images were captured by a Nikon eclipse Ni fluorescence microscope.

#### *4.3. Agronomic Traits Analyses*

The agronomic traits of plant height, panicle length, tiller number, fertile seed percentage, and the primary and secondary branches' number in the main panicle of the *osros1a* mutant and WT were measured for plant grown in the greenhouse.

#### *4.4. Half-Seed Assay and Scanning Electron Microscopy (SEM)*

For Evans Blue staining-based half-seed assay, mature grains from WT and the null mutant S1 were dehusked and transversally sectioned into two halves using a razor blade, then dipped in 0.1% (*w*/*v*) Evans Blue solution for 10 min, and by using distilled water, were washed three times. All samples were observed and photographed under a dissecting microscope (Nikon). Further, to study the mature seeds' morphological changes and their starch granules, the WT and the S1 mutant seeds were transversely sectioned. Samples were then examined and photographed by SEM.

#### *4.5. RNA Extraction, Library Preparation, and RNA-Seq*

The S1 mutant was cultivated in the greenhouse, and the extraction of total RNA from the mutant and WT endosperm was conducted at 15 DAP. For RNA-Seq analysis, 1 µg RNA per sample was used. The mRNA was enriched using magnetic beads with oligo-dT, then divalent cations in NEBNext First Strand Synthesis Reaction Buffer (5X) were used to break it into short fragments and then reversed transcription into cDNA. DNA fragments 3 0 -ends were adenylated, and a hairpin loop structure NEBNext Adaptor was ligated to prepare for hybridization. After purification, terminal modification, fragments selection, and the amplification of PCR with Phusion High-Fidelity DNA polymerase and another round of purification, two libraries (one for WT and one for the mutant) were constructed for Next-Generation Sequencing (NGS). The Agilent Bioanalyzer 2100 system was used to assess the library quality.

A cBot Cluster Generation System was utilized for performing clustering of the indexcoded samples, using TruSeq PE Cluster Kit v3-cBot-HS (Illumia) as per the manufacturer's instructions. After cluster generation, the preparations of the library were sequenced on an Illumina Novaseq platform and 150 bp paired-end reads were generated.

### *4.6. Quality Control and RNA-Seq Analysis*

The clean reads were obtained by removing reads containing ploy-N, adapter, and low-quality from raw data. Hisat2 (v2.0.5, Daehwan Kim, Dallas, TX, USA) was adopted to build the reference genome index and align the pair-end clean read to the genome [53]. We chose IRGSP1.0 (https://rapdb.dna.affrc.go.jp/download/irgsp1.html, accessed on 20 December 2020), as the reference genome for its high quality and comprehensive annotation. Samtools (v1.4.1, Heng Li, London, UK) was then used to convert the SAM (Sequence Align Mapping) file to its binary coding form BAM file [54], which was smaller and faster to perform downstream analysis on.

To count the reads numbers mapped for each gene, Feature Counts (v1.5.0-p3, Yang Liao, Melbourne, Australia) was used [55]. The FPKM (Fragments Per Kilobase of transcript sequence per Millions base pairs sequenced) of each gene was then calculated based on the gene length and reads count mapped to this gene. DESeq2 R package v1.26.0 was then used to determine DEGs (Differential Expression Genes) [56]. The threshold for significant differential expression was set as fold change ≥2 and *p*-value ≤ 0.05. GO (Gene Ontology) enrichment analysis of DEGs was implemented using agriGO (v2.0, Tian Tian, Beijing, China) [37].

#### *4.7. SDS-PAGE Analysis*

Storage protein analysis was conducted by SDS-PAGE with the Laemmli method. Mature rice seeds of the S1 mutant and WT were ground with the help of mortar and pestle to make it fine powder. Rice glutelin was extracted from powdered seeds (25 mg) by 0.2% NaOH after stepwise removal of albumins (with deionized water), globulins (with 0.5 M NaCl and 50 mM Tris-HCl; PH 6.8), and prolamins (with 70% (*v*/*v*) alcohol). SDS-PAGE of 12% gel was conducted by using Standard Twin Mini Gel Unit (CA, USA) for 2.5 h at 120 V. The glutelin proteins were detected by staining the gel with a staining solution having Coomassie Brilliant Blue R-250 (Sigma) for 30 min and then 20% methanol, and 5% acetic acid in DW solution was used for destaining. The protein bands of the sample were measured by comparing it with the protein ladder (Thermo Scientific Page Ruler Prestained Protein Ladder 10-250 KDa) in the electropherogram.

#### *4.8. Identification of Differential AS Events*

To investigate the difference in AS pattern between WT and mutant, rMATs (v4.1.1, Shihao Shen, CA, USA) was used to identify both annotated and novel AS events in the mutant [57], and the expression level of AS events was defined as "exon inclusion level". The threshold was set to 0.0001 and AS events with a *p*-value above 0.05 were filtered out for further analysis. As a result, five major types of AS events, including SE, A5SS, A3SS, MXE, and IR, were identified. GO enrichment analysis of DAGs (Differential Alternative splicing Genes) was implemented using DAVID [58].

#### **5. Conclusions**

In summary, we identified that the loss function of the *OsROS1a* gene altered the rice grain size, having longer grain and a reduced width as compared to WT. The null mutant of *OsROS1a* seeds were deformed containing underdeveloped, less-starch-producing endosperm and a slightly irregularly shaped embryo. Furthermore, RNA-Seq analysis showed that many genes encoding polysaccharides and glutelins were found to be downregulated in the endosperm of the S1 mutant, suggesting that it might be involved in the S1 mutant seed phenotype and *OsROS1a* might have a main role in the regulation of these genes. Moreover, AS analysis revealed that the *OsROS1a* gene has a major impact on the alternation of AS events of these downregulated genes. These findings have provided a base to properly understand the molecular mechanism related to the *OsROS1a* gene in the regulation of rice seed development.

**Author Contributions:** Conceptualization, J.-H.X.; methodology, F.I., C.L. and H.-Y.W.; validation, C.L. and Y.Y.; formal analysis, F.I., C.L., H.-Y.W. and J.-H.X.; investigation, J.-H.X.; resources, J.-H.X.; data curation, F.I., C.L., H.-Y.W., Y.Y. and J.-H.X.; writing—original draft preparation, F.I. and H.-Y.W.; writing—review and editing, J.-H.X. and C.L.; supervision, J.-H.X.; funding acquisition, J.-H.X. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Shandong (Linyi) Institute of Modern Agriculture, Zhejiang University.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available within the article.

**Acknowledgments:** We are grateful to Li Xinxin for her obtaining *osros1a* mutants, and Yao-Guang Liu for his kindly providing CRISPR/Cas9 binary vector.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Loss of Function of the RRMF Domain in OsROS1a Causes Sterility in Rice (***Oryza sativa* **L.)**

**Jian-Hong Xu 1,2,3,\* ,† , Faiza Irshad 2,†, Yan Yan <sup>2</sup> and Chao Li 2,3,\***


**Abstract:** For crop seed production, the development of anthers and male fertility are the main agronomic traits and key biological processes for flowering plants. Active DNA demethylation regulates many plant developmental processes and is ensured by 5-meC DNA glycosylase enzymes. To find out the role of *OsROS1a*, *OsROS1a* gene editing mutants were generated using the CRISPR/Cas9 system. The *osros1a* mutants had shrink spikelets, smaller anthers and pollen grains, and were not stained by iodine staining showing a significant reduction in total soluble sugar and starch contents as compared to wildtype (WT), which caused complete male sterility. Similarly, the expression of genes involved in pollen and anther development was decreased in *osros1a* mutants as compared to WT. Furthermore, bisulfite sequencing showed that the CG and CHG methylation of the *OsPKS2* gene promoter was significantly increased in the *osros1a* mutant, which caused a reduced expression of *OsPKS2* in *osros1a* mutants. DNA methylation of the *TDR* gene promoter was similar between WT and *osros1a* mutants, indicating that the DNA methylation effect by *OsROS1a* was gene specific. The expression of *OsROS1a* in the mutants was not changed, but it produced a frame-shift mutation to truncate the Pem-CXXC and RRMF domains. Combined with previous studies, our findings suggested that the RRMF domain in OsROS1a is the functional domain and loss of RRMF for *OsROS1a* causes sterility in rice.

**Keywords:** *ROS1*; DNA demethylation; CRISPR/Cas9; pollen fertility; bisulfite sequencing; rice (*Oryza satia* L.)

### **1. Introduction**

DNA methylation plays a crucial role in plant growth, development, and stress responses [1]. In plants, DNA methylation usually occurs at the cytosine (C) base in CG, CHG and CHH configurations (H = A, C, or T), and is dynamically regulated by balanced methylation and demethylation [1]. Plant 5-meC DNA glycosylases are placed in the DEMETER-like (DML) family, which is related to the HhH-GPD superfamily and is the main functionally-varied group of DNA glycosylases [2]. DML proteins ranging from 1100 to over 2000 residues are unusually large DNA glycosylases, which are bifunctional enzymes having both DNA glycosylase and apurinic/apyrimidinic (AP) lyase activities [3]. Specific DNA glycosylases, including Repressor of silencing 1 (ROS1) [4], DEMETER (DME) [5], DML2 and DML3 [6] catalyze the cytosine demethylation in *Arabidopsis*. DNA demethylation mediated by DME is important for the reproduction of plants, and the inheritance of loss-of-function paternal or maternal mutant *dme* alleles resulting in a reduced sperm transmission or seed abortion, respectively [5,7]. present only in dicots, but not in monocots. ROS1 encodes a DNA glycosylase/lyase that represses DNA methylation of promoter DNA [8]. The loss-of-function of *ROS1* causes DNA hypermethylation of CG and CHH and decreases the related gene expression in *Arabidopsis* [3,4]. The rice

**Citation:** Xu, J.-H.; Irshad, F.; Yan, Y.; Li, C. Loss of Function of the RRMF Domain in OsROS1a Causes Sterility in Rice (*Oryza sativa* L.). *Int. J. Mol. Sci.* **2022**, *23*, 11349. https://doi.org/ 10.3390/ijms231911349

Received: 24 August 2022 Accepted: 19 September 2022 Published: 26 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

genome contains four *ROS1* paralog genes (*OsROS1a, OsROS1b*, *OsROS1c,* and *OsROS1d*) that mediate DNA demethylation [9]. *OsROS1a* is highly expressed in rice anthers and pistils [10]. The disrupted *OsROS1a* mutants exhibited severe defects in both male and female gametogenesis to produce a sterile phenotype [10,11]. The rice vegetative cell genome is actively demethylated by *OsROS1a* and is vital for viable seed production, and sperm non-CG methylation is indirectly promoted by DNA methylation in the vegetative cell, which suggests that dynamic DNA methylation reprogramming occurs during plant embryogenesis [12]. The aleurone layer is the most nutritious part of cereal grains which stores lipids, vitamins, proteins, and minerals. The point mutation in the fourteenth intron of *OsROS1* generates a new transcript *mOsROS1* with a 21-nt insertion, which can lead to DNA hypermethylation and suppresses the expression of two transcription factors, *RISBZ1* and *RPBF*, increasing the number of aleurone cell layers [13,14]. Knockout or knockdown of *OsROS1b* causes DNA hypermethylation of *Tos17* retrotransposons in rice, and overexpression of *OsROS1b* extensively reduces DNA methylation of the rice genome [15]. Furthermore, *ROS1* is involved in the seed development of rice, wheat and barley by its epigenetic influence on the accumulation of seed storage proteins (SSPs) [15,16].

Pollen fertility is important for successful seed production in flowering plants. However, defective development of the anthers can lead to either an absence or formation of non-functional pollen grains [17]. Successful development of male reproductive organs includes events such as specification of the meristem, cell differentiation, cell-to-cell communication, meiosis and mitosis [18–21]. The development of pollen is controlled specifically by four sporophytic cell layers of the anther (tapetum, middle layer, endothecium and epidermis), which surround the gametophytic pollen grains [22]. Many genes involved in the development of pollen and anthers have been identified in rice. A plant-specific type III polyketide synthase gene *OsPKS2* is involved in the normal development of pollen wall formation and mutation in the *OsPKS2* caused male sterility in rice [23]. The rice tapetum degeneration retardation gene (*TDR*) has an essential role in regulating the transcriptional network for the development and degradation of tapetum, and mutation in this gene caused male sterility [24,25]. MEIOSIS ARRESTED AT LEPTOTENE1 (*MEL1*) gene has a crucial role in microsporogenesis, abnormal accumulation of *MEL1* can make a semi-sterile phenotype in rice [26]. Cytochrome P450 family member *CYP704B2* is specifically expressed in the tapetum and microspores, which is required for the synthesis of cutin monomers, and its mutant *cyp704B2* shows a male sterile phenotype with a swollen sporophytic tapetal layer and aborted pollen grains [27]. Pollen development, being extremely susceptible to the cellular environment, is regulated by the successive expression of genes specifically expressed in reproductive tissues. Male gametogenesis also shows abnormalities and becomes inactive if any imbalances occur in the expression of anther and microspores development-related genes.

This study aimed to reveal the underlying mechanism of how *OsROS1a* regulates pollen fertility. The CRISPR/Cas9 system was used to generate *OsROS1a* gene editing mutants, and the *osros1a* S6, S7, and S16 mutants exhibited complete male sterile, which resulted from the disrupted Pem-CXXC and RNA recognition motif fold (RRMF) domains. The qRT-PCR results showed that the expression of genes involved in pollen and anther development, and starch biosynthesis decreased in these mutants as compared to WT. Furthermore, bisulfite sequencing showed that the CG and CHG methylation of the *OsPKS2* gene promoter was significantly increased in the *osros1a* mutants. These findings could provide new insights to reveal the DNA demethylation of *OsROS1a* on sterility in rice.

#### **2. Results**

#### *2.1. Phylogenetic Analysis of OsROS1a Glycosylase Domain*

*OsROS1a* contains 17 exons that encode a protein having 1952 amino acids (aa), and 3 0 -and 50 -UTRs of 607 and 73 bp, respectively (Figure 1A). The *OsROS1a* contains three key domains of DNA glycosylase, Per-CXXC and RRMF (Figure 1B), having lengths of 141, 31 and 102 aa, respectively. The location of these domains in the OsROS1a CDS protein

sequences was 1472–1612 for DNA glycosylase, 1797–1827 for Per-CXXC, and 1831–1932 for RRMF (Figure 1C). tein sequences was 1472–1612 for DNA glycosylase, 1797–1827 for Per-CXXC, and 1831– 1932 for RRMF (Figure 1C).

*OsROS1a* contains 17 exons that encode a protein having 1952 amino acids (aa), and 3′-and 5′-UTRs of 607 and 73 bp, respectively (Figure 1A). The *OsROS1a* contains three key domains of DNA glycosylase, Per-CXXC and RRMF (Figure 1B), having lengths of 141, 31 and 102 aa, respectively. The location of these domains in the OsROS1a CDS pro-

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 3 of 15

**Figure 1.** Schematic of Os*ROS1a* gene structure and protein features. (**A**) Exons and introns of the *OsROS1a* gene are denoted as black blocks and lines, respectively. The translation initiation codon (ATG) and termination codon (TAG) are shown. (**B**) The protein structure of the *OsROS1a* gene. The open reading frame (ORF) is 1952 amino acid residues. (**C**) The protein sequences of the *OsROS1a* gene. The DNA glycosylase, Per-CXXC and RRMF domains are shown in blue, green and red colors, and all others showed in black. **Figure 1.** Schematic of *OsROS1a* gene structure and protein features. (**A**) Exons and introns of the *OsROS1a* gene are denoted as black blocks and lines, respectively. The translation initiation codon (ATG) and termination codon (TAG) are shown. (**B**) The protein structure of the *OsROS1a* gene. The open reading frame (ORF) is 1952 amino acid residues. (**C**) The protein sequences of the *OsROS1a* gene. The DNA glycosylase, Per-CXXC and RRMF domains are shown in blue, green and red colors, and all others showed in black.

To determine the evolutionary relationship of *OsROS1a* genes in different species, its protein sequences were used, and 159 homologous gene copies were identified in 52 plant species. The conserved DNA glycosylase domain (151 amino acids) of 159 genes was then used for constructing the phylogenetic tree with the neighbor-joining (NJ) method (Figure 2). The phylogenetic tree can be clustered into 4 clades, monocots *ROS1* (mROS1), dicots *DME* (dDME), dicots *ROS1* (dROS1) and dicots *DML2* (dDML2) (Figure 2). Alignment of multiple sequences of deduced amino acid sequences revealed that all homologous *ROS1* genes contain the DNA glycosylase domain. The evolutionary analysis discovered that DNA demethylase gene copy number varied greatly among different species or different families of the same species. The homologous copy number varies from 1 to 6 in different To determine the evolutionary relationship of *OsROS1a* genes in different species, its protein sequences were used, and 159 homologous gene copies were identified in 52 plant species. The conserved DNA glycosylase domain (151 amino acids) of 159 genes was then used for constructing the phylogenetic tree with the neighbor-joining (NJ) method (Figure 2). The phylogenetic tree can be clustered into 4 clades, monocots *ROS1* (mROS1), dicots *DME* (dDME), dicots *ROS1* (dROS1) and dicots *DML2* (dDML2) (Figure 2). Alignment of multiple sequences of deduced amino acid sequences revealed that all homologous *ROS1* genes contain the DNA glycosylase domain. The evolutionary analysis discovered that DNA demethylase gene copy number varied greatly among different species or different families of the same species. The homologous copy number varies from 1 to 6 in different plants.

plants.

**Figure 2.** Phylogenetic analysis of the *ROS1* and *ROS1*-like DNA glycosylase family. A phylogenetic tree based on conserved DNA glycosylase domain was constructed. Monocot ROS1 (mROS1) proteins are shown in red color, while dicot DME (dDME), dicot ROS1 (dROS1) and dicot DML2 (dDML2) proteins are shown in blue, green and black, respectively. Protein sequences were downloaded from Phytozome (https://phytozome-next.jgi.doe.gov) (accessed on 19 October 2018).

The sequence logos of the identified DNA glycosylase domain in all 52 species were generated using the WebLogo program to further verify the conservation of aa residues (Figure 3A). A total of three conserved motifs were determined, including the Helixhairpin-Helix (H-h-H), GPD and [4Fe-4S] motifs. The motifs length ranged from 21 to 26 aa (Figure 3B). The [4Fe-4S] cluster motif contained four cysteine residues that function to keep a [4Fe-4S] cluster and was essential for 5mC excision. The sequence logos of the identified DNA glycosylase domain in all 52 species were generated using the WebLogo program to further verify the conservation of aa residues (Figure 3A). A total of three conserved motifs were determined, including the Helix-hairpin-Helix (H-h-H), GPD and [4Fe-4S] motifs. The motifs length ranged from 21 to 26 aa (Figure 3B). The [4Fe-4S] cluster motif contained four cysteine residues that function to keep a [4Fe-4S] cluster and was essential for 5mC excision.

**Figure 2.** Phylogenetic analysis of the *ROS1* and *ROS1*-like DNA glycosylase family. A phylogenetic tree based on conserved DNA glycosylase domain was constructed. Monocot ROS1 (mROS1) proteins are shown in red color, while dicot DME (dDME), dicot ROS1 (dROS1) and dicot DML2 (dDML2) proteins are shown in blue, green and black, respectively. Protein sequences were down-

loaded from Phytozome (http://www.phytozome. net) (accessed on 19 October 2018).

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 5 of 15

**Figure 3.** Conserved domain analysis of DNA glycosylase. (**A**) The conserved domain analysis of DNA glycosylase by the WebLogo program. The letter height shows the amino acid residue at each position designating the conservation degree. On the *x*-axis, numbers represent the sequence position in the corresponding conserved domains. The *y*-axis indicates the information content calculated in bits. (**B**) The consensus sequences of three motifs in the glycosylase domain. **Figure 3.** Conserved domain analysis of DNA glycosylase. (**A**) The conserved domain analysis of DNA glycosylase by the WebLogo program. The letter height shows the amino acid residue at each position designating the conservation degree. On the *x*-axis, numbers represent the sequence position in the corresponding conserved domains. The *y*-axis indicates the information content calculated in bits. (**B**) The consensus sequences of three motifs in the glycosylase domain.

#### *2.2. Generation of OsROS1a Gene Editing Mutants by CRISPR/Cas9 2.2. Generation of OsROS1a Gene Editing Mutants by CRISPR/Cas9*

Previous studies showed that the disrupted *OsROS1a* exhibited severe defects in both male and female gametogenesis to produce a sterile phenotype [10,11]. To well describe the function of *OsROS1a* in active DNA demethylation on pollen and seed development, the stable *OsROS1a* gene editing mutants were created using the CRISPR/Cas9 system, and three stable mutants (S6, S7, S16) were obtained. The S6 and S16 mutants are homozygous mutations on both target sides, while S16 had a homozygous mutation in the first target and a biallelic mutation in the second target (Figures 4A and S1). All mutants caused a frame shift and generated premature stop codons at the thirteenth exon of *OsROS1a*  (Figure 4B), which made a frame shift from the 1798th amino acid. The S6 and S16 mutants created a stop codon at the 1807th amino acid, while S7 terminated translation by a stop codon at the 1820th amino acid by a one base deletion in the first target (Figure 4B). Both frame shifts occurred before Pem-CXXC and RRMF domains that altered the function of cytosine DNA demethylation. Previous studies showed that the disrupted *OsROS1a* exhibited severe defects in both male and female gametogenesis to produce a sterile phenotype [10,11]. To well describe the function of *OsROS1a* in active DNA demethylation on pollen and seed development, the stable *OsROS1a* gene editing mutants were created using the CRISPR/Cas9 system, and three stable mutants (S6, S7, S16) were obtained. The S6 and S16 mutants are homozygous mutations on both target sides, while S16 had a homozygous mutation in the first target and a biallelic mutation in the second target (Figure 4A and Figure S1). All mutants caused a frame shift and generated premature stop codons at the thirteenth exon of *OsROS1a* (Figure 4B), which made a frame shift from the 1798th amino acid. The S6 and S16 mutants created a stop codon at the 1807th amino acid, while S7 terminated translation by a stop codon at the 1820th amino acid by a one base deletion in the first target (Figure 4B). Both frame shifts occurred before Pem-CXXC and RRMF domains that altered the function of cytosine DNA demethylation.

#### *2.3. The osros1a Mutations Cause Complete Male Sterility*

All three obtained *osros1a* mutants were completely male sterile, exhibiting shrunk spikelet morphology and smaller anthers compared to WT controls (Figure 5A–C). To understand the male sterile phenotype of *osros1a* mutants, we then examined the starchstaining capacity of pollen grains and found that the pollen grains of the mutants were not stained by I2-KI solution, indicating that they contained little or no starch. While the pollen grains were stained dark within WT controls (Figure 5D–F), and the fertile seed percentage in all mutants were 0% compared to WT controls (77%) (Figure 5G). Furthermore, at the

harvest stage, agronomic traits were examined and no significant difference was found between the WT and *osros1a* mutants (Figure S2). The plant height only increased by 2.3% and 0.4% in S6 and S16 mutants, respectively, when compared to WT controls. Similarly, the primary branch number per panicle increased in S6 and S16 by 2.7% and 5.4% compared to WT controls. While in the case of panicle length, a 2.9% decrease was observed only in S16 compared to WT controls. Moreover, the tiller number decreased by 1.3% in S16 mutants only compared to WT (Figure S2). *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 6 of 15

**Figure 4.** CRISPR/Cas9-induced *OsROS1a* gene modification in rice. (**A**) Schematic of Os*ROS1a* gene structure. Exons and introns are denoted as black blocks and lines, respectively. The translation initiation codon (ATG) and termination codon (TAG) are shown. The recovered mutated alleles are shown below the wild-type reference sequence. The target sites nucleotides are indicated in black capital letters. The white dashes with black backgrounds indicate the deleted nucleotides. The white capital letters with black backgrounds indicate the inserted nucleotides. (**B**) Alignment of the protein sequence of *ROS1a* in mutant and wild type. The open reading frame (ORF) is 1952 amino acid residues in length in wild type. The dotted line indicates the frame shift mutation position because of the insertion or deletion of nucleotides due to the gene editing in mutants. The gray color indicates the positions where the premature stop codon was generated in the mutant alleles. **Figure 4.** CRISPR/Cas9-induced *OsROS1a* gene modification in rice. (**A**) Schematic of *OsROS1a* gene structure. Exons and introns are denoted as black blocks and lines, respectively. The translation initiation codon (ATG) and termination codon (TAG) are shown. The recovered mutated alleles are shown below the wild-type reference sequence. The target sites nucleotides are indicated in black capital letters. The white dashes with black backgrounds indicate the deleted nucleotides. The white capital letters with black backgrounds indicate the inserted nucleotides. (**B**) Alignment of the protein sequence of *ROS1a* in mutant and wild type. The open reading frame (ORF) is 1952 amino acid residues in length in wild type. The dotted line indicates the frame shift mutation position because of the insertion or deletion of nucleotides due to the gene editing in mutants. The gray color indicates the positions where the premature stop codon was generated in the mutant alleles.

*2.3. The osros1a Mutations Cause Complete Male Sterility*  All three obtained *osros1a* mutants were completely male sterile, exhibiting shrunk spikelet morphology and smaller anthers compared to WT controls (Figure 5A–C). To understand the male sterile phenotype of *osros1a* mutants, we then examined the starchstaining capacity of pollen grains and found that the pollen grains of the mutants were not stained by I2-KI solution, indicating that they contained little or no starch. While the pollen grains were stained dark within WT controls (Figure 5D–F), and the fertile seed percentage in all mutants were 0% compared to WT controls (77%) (Figure 5G). Furthermore, at the harvest stage, agronomic traits were examined and no significant difference was found between the WT and *osros1a* mutants (Figure S2). The plant height only increased by 2.3% and 0.4% in S6 and S16 mutants, respectively, when compared to WT controls. Similarly, the primary branch number per panicle increased in S6 and S16 by 2.7% and 5.4% compared to WT controls. While in the case of panicle length, a 2.9% decrease was observed only in S16 compared to WT controls. Moreover, the tiller number decreased by 1.3% in S16 mutants only compared to WT (Figure S2). To further investigate the cellular defects in the *osros1a* mutants, we performed transverse section analysis on the anthers of the WT and *osros1a* mutant. Previous studies have divided the development of rice anthers into 14 stages based on the morphological landmarks of cellular events [28,29]. The anther sections were observed at developmental stages 6, 9, 10 and 13 under light microscopy. At stage 6 normal epidermis, endothecium, middle layer, tapetum, and microspore mother cells (MMC) were found in both WT and the *osros1a* S16 mutant (Figure 6A,E). At stage 9, the microspores were released from tetrads, tapetal cells had deeply stained cytoplasm and the middle layer was hardly visible, and the middle layer appeared degenerated and almost invisible in both mutant and WT. However, the microspores of the WT were globular, whereas those of the *osros1a* mutant were irregularly shaped (Figure 6B,F). At stage 10, in contrast to the WT microspores that were vacuolated and round, the *osros1a* mutant microspores appeared degraded and irregularly shaped (Figure 6C,G). At stage 13, the WT anther locule was full of mature pollen grains with completely formed pollen walls and starch accumulation, which can be deeply stained with toluidine blue. By contrast, the pollen grains of the *osros1a* mutant were aborted (Figure 6D,H), indicating that pollen viability and function are mainly affected by the accumulation of starch and lipids [28,30].

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 7 of 15

**Figure 5.** The male fertility phenotype of WT and the *osros1a* mutants. (**A**) Comparison of the mature panicles between WT and *osros1a* mutants showed that *osros1a* mutants failed to produce seeds. Scale bars 10 cm. Comparison of (**B**) the spikelets and (**C**) the anthers phenotype between WT and *osros1a* mutants before anthesis. Scale bars 1 mm. (**D**–**F**) The pollen grains of WT and *osros1a* mutants staining with I2-KI staining, respectively. Compared to (**D**) WT, (**E**) S6, and (**F**) S16 failed to form viable pollen. Magnified 20X. (**G**) The fertile seed percentage in WT and S6, S7, S16 mutants. \*\* denote significant differences from WT at *p* < 0.01 level. **Figure 5.** The male fertility phenotype of WT and the *osros1a* mutants. (**A**) Comparison of the mature panicles between WT and *osros1a* mutants showed that *osros1a* mutants failed to produce seeds. Scale bars 10 cm. Comparison of (**B**) the spikelets and (**C**) the anthers phenotype between WT and *osros1a* mutants before anthesis. Scale bars 1 mm. (**D**–**F**) The pollen grains of WT and *osros1a* mutants staining with I<sup>2</sup> -KI staining, respectively. Compared to (**D**) WT, (**E**) S6, and (**F**) S16 failed to form viable pollen. Magnified 20X. (**G**) The fertile seed percentage in WT and S6, S7, S16 mutants. \*\* denote significant differences from WT at *p* < 0.01 level. were irregularly shaped (Figure 6B,F). At stage 10, in contrast to the WT microspores that were vacuolated and round, the *osros1a* mutant microspores appeared degraded and irregularly shaped (Figure 6C,G). At stage 13, the WT anther locule was full of mature pollen grains with completely formed pollen walls and starch accumulation, which can be deeply stained with toluidine blue. By contrast, the pollen grains of the *osros1a* mutant were aborted (Figure 6D,H), indicating that pollen viability and function are mainly affected by the accumulation of starch and lipids [28,30].

**Figure 5.** The male fertility phenotype of WT and the *osros1a* mutants. (**A**) Comparison of the mature panicles between WT and *osros1a* mutants showed that *osros1a* mutants failed to produce seeds.

To further investigate the cellular defects in the *osros1a* mutants, we performed trans-

len grains with completely formed pollen walls and starch accumulation, which can be deeply stained with toluidine blue. By contrast, the pollen grains of the *osros1a* mutant were aborted (Figure 6D,H), indicating that pollen viability and function are mainly affected by the accumulation of starch and lipids [28,30]. **Figure 6.** Transverse section analysis of the anther development in WT and *osros1a* S16 mutant. Locules from the anther section of WT and S16 at (**A**,**E**) stage 6 (the microspore mother cells stage), (**B**,**F**) stage 9 (the young microspore stage), (**C**,**G**) stage 10 (the vacuolated pollen stage), and (**D**,**H**) stage 13 (the mature pollen stage). WT sections are shown in (**A**–**D**); *osros1a* S16 mutant in (**E**–**H**). E, epidermis; T, tapetum; En, endothecium; MMC, microspore mother cells; Msp, microspores; MP, mature pollen; DMsp, degenerated microspores; DP, the degenerated pollen. Scale bar: 50 µm.

Sugar supply and starch accumulation are essential for pollen grain development that will further affect seed maturation [31–33]. Therefore, the total soluble sugar and starch contents were analyzed in the anthers to investigate whether the *OsROS1a* gene can affect sugar and starch synthesis in pollen grains. The reduction of 36.2% and 56.5% soluble sugar was observed in S6 and S16 mutants, respectively, compared to WT controls (Figure 7A), and similarly 37.1% and 57.9% starch content was significantly (*p* < 0.01) decreased in both mutants, respectively, compared to WT controls (Figure 7B) This was consistent with the pollen staining and transverse section analysis (Figures 5 and 6), confirming that the *osros1a* mutation caused complete male sterile in rice.

ing that the *osros1a* mutation caused complete male sterile in rice.

**Figure 7.** Soluble sugar and starch contents in mature anthers of WT and *osros1a* mutants. (**A**) Total soluble sugar content in mature anthers of WT and *osros1a* mutants. (**B**) Starch content of mature anthers in WT and *osros1a* mutants. Anthers were sampled at stage 13 (the mature pollen stage). Data are means ± SD (*n* = 3). \*\* denote significant differences from WT at *p* < 0.01 level. **Figure 7.** Soluble sugar and starch contents in mature anthers of WT and *osros1a* mutants. (**A**) Total soluble sugar content in mature anthers of WT and *osros1a* mutants. (**B**) Starch content of mature anthers in WT and *osros1a* mutants. Anthers were sampled at stage 13 (the mature pollen stage). Data are means ± SD (*n* = 3). \*\* denote significant differences from WT at *p* < 0.01 level.

**Figure 6.** Transverse section analysis of the anther development in WT and *osros1a* S16 mutant. Locules from the anther section of WT and S16 at (**A**,**E**) stage 6 (the microspore mother cells stage), (**B**,**F**) stage 9 (the young microspore stage), (**C**,**G**) stage 10 (the vacuolated pollen stage), and (**D**,**H**) stage 13 (the mature pollen stage). WT sections are shown in (**A**–**D**); *osros1a* S16 mutant in (**E**–**H**). E, epidermis; T, tapetum; En, endothecium; MMC, microspore mother cells; Msp, microspores; MP, mature pollen; DMsp, degenerated microspores; DP, the degenerated pollen. Scale bar: 50 μm.

Sugar supply and starch accumulation are essential for pollen grain development that will further affect seed maturation [31–33]. Therefore, the total soluble sugar and starch contents were analyzed in the anthers to investigate whether the *OsROS1a* gene can affect sugar and starch synthesis in pollen grains. The reduction of 36.2% and 56.5% soluble sugar was observed in S6 and S16 mutants, respectively, compared to WT controls (Figure 7A), and similarly 37.1% and 57.9% starch content was significantly (*p* < 0.01) decreased in both mutants, respectively, compared to WT controls (Figure 7B) This was consistent with the pollen staining and transverse section analysis (Figures 5 and 6), confirm-

#### *2.4. The Reduction Expression of Genes Involved in Anther and Pollen Development in Osros1a 2.4. The Reduction Expression of Genes Involved in Anther and Pollen Development in Osros1a Mutants*

*Mutants*  To identify the potential targets responsible for the *osros1a* mutant phenotype, qRT-PCR was performed to validate the expression of anther and pollen development-related genes in young panicles. The results showed that the expression of *OsPKS2* and *CYP704B2*  genes was reduced in S6 and S16 mutants compared to WT controls (Figure 8B,D). It might be suggested that *OsROS1a* may involve in the activation of these genes by the process of DNA demethylation. Interestingly, the *OsROS1a* gene in mutants had half the expression level of WT controls (Figure 8A), suggesting that the frame-shift mutations that altered the Pem-CXXC and RRMF domains could cause the sterility phenotype in rice. The expression of genes related to soluble sugar and starch synthesis were further investigated to validate the previously obtained soluble sugar and starch content analysis results and found that the expression of *CSA* was decreased in mutants compared to WT controls (Figure 8G), and the expression of other genes was not significantly changed (Figure 8C,E,F,H) These results are consistent with the pollen staining and transverse section anal-To identify the potential targets responsible for the *osros1a* mutant phenotype, qRT-PCR was performed to validate the expression of anther and pollen development-related genes in young panicles. The results showed that the expression of *OsPKS2* and *CYP704B2* genes was reduced in S6 and S16 mutants compared to WT controls (Figure 8B,D). It might be suggested that *OsROS1a* may involve in the activation of these genes by the process of DNA demethylation. Interestingly, the *OsROS1a* gene in mutants had half the expression level of WT controls (Figure 8A), suggesting that the frame-shift mutations that altered the Pem-CXXC and RRMF domains could cause the sterility phenotype in rice. The expression of genes related to soluble sugar and starch synthesis were further investigated to validate the previously obtained soluble sugar and starch content analysis results and found that the expression of *CSA* was decreased in mutants compared to WT controls (Figure 8G), and the expression of other genes was not significantly changed (Figure 8C,E,F,H) These results are consistent with the pollen staining and transverse section analysis (Figures 5 and 6), showing that the *osros1a* mutant promotes the pollen sterility phenotype.

#### ysis (Figures 5 and 6), showing that the *osros1a* mutant promotes the pollen sterility phenotype. *2.5. DNA Hypermethylation of OsPKS2 Gene Promoter in Osros1a Mutants*

To reveal the function of *OsROS1a* in DNA demethylation in pollen, the DNA methylation level of the 221 bp *OsPKS2* gene promoter was investigated (Figure 9A). The methylation levels at both CG and CHG sites of the *OsPKS2* promoter were increased in *osros1a* mutants compared to WT controls (Figure 9B,C). In CG residues, 96.15% and 100% methylation levels were observed in S6 and S16 mutants, respectively, while only 51.79% in WT controls. In CHG residues, methylation levels increased from 54.29% in WT controls to 66.15% and 67.69% in S6 and S16 mutants, respectively. However, no obvious difference was observed in CHH residues between WT and S16, whilst a decrease was observed in S6 mutants (Figure 9B,C). These results suggest that DNA hypermethylation in the promoter region of *OsPKS2* may repress the gene expression. While DNA methylation of the *TDR* promoter was not significantly changed between WT and *osros1a* mutants in all three contexts (Figure S3).

**Figure 8.** Expression patterns of anther and pollen development-related genes (**A**–**E**) and starch synthesis-related genes were analyzed by qRT-PCR. The young panicles (at stage 10, the vacuolated pollen stage) of WT, *osros1a* S6 and S16 mutants were sampled for RNA extraction. Data are shown as means ± standard deviations, and *OsACTIN* was used as an internal control. **Figure 8.** Expression patterns of anther and pollen development-related genes (**A**–**E**) and starch synthesis-related genes were analyzed by qRT-PCR. The young panicles (at stage 10, the vacuolated pollen stage) of WT, *osros1a* S6 and S16 mutants were sampled for RNA extraction. Data are shown as means ± standard deviations, and *OsACTIN* was used as an internal control. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 10 of 15

**Figure 9.** *Cont*.

between WT and *osros1a* mutants.

**3. Discussion** 

**Figure 9.** The DNA methylation of *OsPKS2* gene promoter in young panicles. (**A**) The diagram of bisulfite sequencing of 221 bp *OsPKS2* promoter. (**B**) The statistics of bisulfite sequencing of 221 bp *OsPKS2* promoter in WT and *osros1a* mutants. (**C**) The dot plot comparison of DNA methylation

The loss function of the *OsROS1a* gene caused complete male sterility with degenerated pollen (Figures 5 and 6), which was consistent with previous studies [10–12]. Starch is the key storage reservoir in matured pollen grains in cereal crops, which is important for pollen germination, pollen tube growth, supplying energy and a carbon skeleton. In pollen grains, the starch granule accumulation starts at stage 11 and prolongs throughout stage 12 until maturation [29]. *OsBZR1* gene was expressed in anther vascular tissue, tapetum and the developing seeds, and *OsBZR1* directly promotes *CSA* expression thus activating downstream gene expression [30]. Similar to sugar partitioning in rice, *CSA* is a

**Figure 9.** The DNA methylation of *OsPKS2* gene promoter in young panicles. (**A**) The diagram of bisulfite sequencing of 221 bp *OsPKS2* promoter. (**B**) The statistics of bisulfite sequencing of 221 bp *OsPKS2* promoter in WT and *osros1a* mutants. (**C**) The dot plot comparison of DNA methylation between WT and *osros1a* mutants. **Figure 9.** The DNA methylation of *OsPKS2* gene promoter in young panicles. (**A**) The diagram of bisulfite sequencing of 221 bp *OsPKS2* promoter. (**B**) The statistics of bisulfite sequencing of 221 bp *OsPKS2* promoter in WT and *osros1a* mutants. (**C**) The dot plot comparison of DNA methylation between WT and *osros1a* mutants.

#### **3. Discussion 3. Discussion**

The loss function of the *OsROS1a* gene caused complete male sterility with degenerated pollen (Figures 5 and 6), which was consistent with previous studies [10–12]. Starch is the key storage reservoir in matured pollen grains in cereal crops, which is important for pollen germination, pollen tube growth, supplying energy and a carbon skeleton. In pollen grains, the starch granule accumulation starts at stage 11 and prolongs throughout stage 12 until maturation [29]. *OsBZR1* gene was expressed in anther vascular tissue, tapetum and the developing seeds, and *OsBZR1* directly promotes *CSA* expression thus activating downstream gene expression [30]. Similar to sugar partitioning in rice, *CSA* is a The loss function of the *OsROS1a* gene caused complete male sterility with degenerated pollen (Figures 5 and 6), which was consistent with previous studies [10–12]. Starch is the key storage reservoir in matured pollen grains in cereal crops, which is important for pollen germination, pollen tube growth, supplying energy and a carbon skeleton. In pollen grains, the starch granule accumulation starts at stage 11 and prolongs throughout stage 12 until maturation [29]. *OsBZR1* gene was expressed in anther vascular tissue, tapetum and the developing seeds, and *OsBZR1* directly promotes *CSA* expression thus activating downstream gene expression [30]. Similar to sugar partitioning in rice, *CSA* is a key transcriptional regulator during male reproductive development, and mutation in the *CSA* results in carbohydrate level reduction in the later anthers and male sterility [33]. The deficiency of starch synthesis in pollen grains could cause male sterility [34–37]. Our study showed that the soluble sugar and starch contents were significantly reduced in pollen from *osros1a* mutants compared to WT controls (Figure 7), which could be the reason for male sterility affected by disrupted *OsROS1a* gene function.

Previous research has revealed that *OsPKS2* [23], *TDR* [25], *MEL1* [26], and *CYP704B2* [27] are involved in anther and pollen development and mutation in these genes caused rice male sterility. The expression of *OsPKS2* and *CYP704B2* genes were reduced in *osros1a* mutant anther compared to WT controls (Figure 8), suggesting that *OsROS1a* may be involved in the activation of these genes, loss-of-function of *OsROS1a* could degenerate the tapetum and form abnormal pollen walls causing male sterility (Figures 5 and 6). The DNA demethylase genes contain a conserved DNA glycosylase domain, which can be classified into monofunctional and bifunctional types. DNA glycosylase is bifunctional in plants removing the 5-meC base and then cleaving the DNA backbone at the abasic site [38]. ROS1 is a bifunctional DNA glycosylase that can execute DNA demethylation, especially gene promoter DNA methylation by a base excision repair pathway [8,38]. The

reduced expression of pollen and anther development, and starch- and soluble sugarrelated genes in *osros1a* mutants demonstrated that *OsROS1a* may regulate the activation of these genes by controlling their DNA methylation. Bisulfite sequencing results showed that the CG and CHG methylations in the *OsPKS2* gene promoter were significantly increased in the *osros1a* mutants when compared to WT controls, but not CHH methylation (Figure 6), suggesting that *OsROS1a* facilitated CG and CHG demethylation. However, DNA methylation of the *TDR* promoter was not significantly changed between WT and *osros1a* mutants (Figure S2), indicating that the demethylation action of the *OsROS1* gene on pollen and anther development-related genes was not global, but gene-specific, which was also found in SSP genes in maize [39].

The expression of the *OsROS1a* gene in the mutants was half that of WT controls (Figure 8), and 1807, 1807 and 1820 amino acids were predicted for S6, S16 and S7, respectively, which resulted in the frame-shift mutation to truncate the Pem-CXXC and RRMF domains (Figures 1 and 3). The whole DNA glycosylase domain, followed by an EndIII\_4Fe-4S domain and two putative nuclear localization signals are complete [15]. The C-terminal of DME and ROS1 including divergent, circularly Per-CXXC and RRMF domains are conserved [40], which has a function in excising 5-meC in vitro [5,41,42]. The zinc finger CxxC (ZF-CxxC) in DNA methyltransferase 1 (*DNMT1*) can block the catalytic activity of *DNMT1* specifically on non-methylated DNA [43]. Furthermore, the *OsROS1a* knock-in mutant disrupted the whole gene affecting both male and female gametophytes [10], while truncation the RRMF domain in *OsROS1a* knock-out mutants using CRISPR/Cas9 induced pollen and embryo sac defects in rice [11]. Recently, a 75 bp deletion caused the complete loss of the Per-CXXC domain, but retention of the whole RRMF domain, which showed partially sterile pollen in rice [44]. All these results suggest that the RRMF domain in OsROS1a could be the main factor in the sterility in rice. The RNA-Seq and whole-genome bisulfite sequencing will be the future work, which will clearly clarify the molecular mechanism of how the *OsROS1a* gene affects the sterility phenotype in rice through its DNA demethylation function.

#### **4. Materials and Methods**

#### *4.1. Identification of OsROS1a Homologous Genes and Phylogenetic Analysis*

The gene structure (Exon-intron distribution) and protein (CDS) sequence of the DNA demethylase gene *OsROS1a* was obtained from Phytozome (https://phytozome-next.jgi. doe.gov/) (accessed on 19 October 2018). The conserved domain sequence position of the OsROS1a protein was searched using the Inter Pro software. The protein sequences of OsROS1a were used to identify the homologous gene copies in 52 plant species by BLAST search. The protein sequences of all identified *OsROS1a* homologous genes were aligned via Muscle by the default settings with manual alterations. The phylogenetic analysis was done by the neighbor-joining (NJ) method using MAGA-X with the default parameters, and the bootstrap test with 1000 replicates was made to determine the confidence of the evolutionary tree [45]. The conserved DNA glycosylase domains (151 amino acid sequences) were analyzed by the WebLogo program (http://weblogo.berkeley.edu/) (accessed on 15 January 2019).

#### *4.2. OsROS1a Gene Editing Mutants Generated by CRISPR/Cas9*

The CRISPR/Cas9 system was used for generating *osros1a* mutants [46]. Two single guide RNAs (sgRNAs) were designed to precisely target the thirteenth exon of the *OsROS1a* gene by using the web tool (http://cbi.hzau.edu.cn/crispr/) (accessed on 19 December 2016), which were ligated with rice OsU6a and OsU6b small nuclear RNA promoters, respectively. The sgRNAs were cloned into a binary vector that contained the sgRNA and Cas9 expression cassettes. Then the binary vector was transferred into rice (Nipponbare) calluses by following the method of *Agrobacterium*-mediated transformation of *Japonica* Rice.

Genomic DNA was isolated from the leaves of the rice transgenic plants to detect the mutation types. The Cas9-specific primers were used to identify the successful transgenic plants by PCR and Figure S4A showed the gel electrophoresis image. And the target genespecific primers were used to amplify and sequence the DNA fragments containing the target sequences (Figure S4B). Then the DSDecode (http://skl.scau.edu.cn/dsdecode/) (accessed on 12 September 2017), a web-based tool, was used to further analyze the genotypes of the targeted mutations.

#### *4.3. Pollen Fertility Examination and Histochemical Assay*

Floral organs were photographed with a dissecting microscope. The anthers from the wild type (WT) and *osros1a* mutants were sampled from the spikelets just before flowering and a 1% potassium iodide (I2-KI) solution was used to stain the pollen grains. A Nikon eclipse Ni fluorescence microscope was used to visualize and photograph the stained pollen grains. For transverse section analysis, spikelets of various anther development stages according to the previous study [29] were collected and frozen in OCT compound (Tissue-Tek) (Sakura Finetek, Torrance, CA USA). Samples were sectioned with the help of a freezing microtome (Shandon Cryotome FE) (ThermoFisher Scientific, Waltham, MA USA). Then, 0.05% toluidine blue dye was used to stain the crossed sections and microscopic images were captured by a Nikon eclipse Ni fluorescence microscope.

### *4.4. Total Soluble Sugar and Starch Contents Analysis*

The 0.05 g anthers from WT and *osros1a* mutants were weighed and ground in 5 mL of 80% ethanol according to the Anthrone method [47]. The mixed samples were centrifuged at 8000 rpm and 1 mL supernatant was obtained. After centrifugation, the pellets were used for starch analysis. The 5 mL anthrone was mixed with 1 mL supernatant and heated at 100 ◦C in a water bath for 10 min. The mixture was cool down, and the sugar content was then determined by a spectrophotometer at a wavelength of 620 nm. Starch analysis was done by perchloric acid digestion and starch content was determined spectrophotometrically at 630 nm wavelength.

#### *4.5. Gene Expression by RT-PCR Analysis*

Total RNA was extracted from young panicles of WT and *osros1a* mutants using TRIzol® Reagent (ThermoFisher, Scientific, Waltham, MA USA). RT reagent Kit PrimeScriptTM having gDNA Eraser (Takara, Japan) was used to eliminate genomic DNA residues, then cDNA was synthesized. For RT-PCR reactions, the first strand of cDNA was used with gene-specific primers and the *OsACTIN* gene was used as an internal control (Table S1).

#### *4.6. DNA Methylation Analysis*

For bisulfite sequencing of *OsPKS2* and *TDR* gene promoters, genomic DNA from young panicles was extracted by the CTAB method. The EpiTect Bisulfite Kit from Qiagen was used for bisulfite conversion of genomic DNA following the manufacturer's instructions. The primers were designed using Methyl Primer Express v1.0 (Applied biosystem) (Table S1). DNA samples were amplified by PCR and separated by 1.5% agarose gel. Then, PCR products were purified with the gel purification kit, and then cloned into the pMD19-T vector and sequenced. Kismeth tool (http://katahdin.mssm.edu/kismeth/revpage.pl) (accessed on 21 October 2019) was used for DNA methylation analysis.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms231911349/s1, Figure S1: DNA sequencing chromatograms of *osros1a* mutants; Figure S2: Phenotype comparison of the WT with *osros1a* mutants; Figure S3: The DNA methylation of the *TDR* gene promoter in young panicle; Figure S4: PCR of *OsROS1a* gene editing plants (A) with Cas9 primers and (B) with knockout target-specific primers; Table S1: Primers used in this study.

**Author Contributions:** Conceptualization, J.-H.X.; methodology, J.-H.X., F.I., and C.L.; validation, C.L. and Y.Y.; formal analysis, J.-H.X., F.I., and C.L.; investigation, J.-H.X.; resources, J.-H.X.; data curation, J.-H.X., F.I., Y.Y. and C.L.; writing—original draft preparation, F.I. and C.L.; writing—review and editing, J.-H.X. and C.L.; supervision, J.-H.X.; funding acquisition, J.-H.X. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the projects from Zhejiang Zhengjingyuan Pharmacy Chain Co., Ltd. (H20151699 and H20151788) and Shandong (Linyi) Institute of Modern Agriculture, Zhejiang University.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We are grateful to Xinxin Li for her obtaining *osros1a* mutants, and Yao-Guang Liu for kindly providing the CRISPR/Cas9 binary vector.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Review* **mRNA Localization to the Endoplasmic Reticulum in Plant Endosperm Cells**

**Laining Zhang <sup>1</sup> , Qidong Si <sup>1</sup> , Kejie Yang <sup>1</sup> , Wenwei Zhang <sup>1</sup> , Thomas W. Okita 2,\* and Li Tian 1,\***


**Abstract:** Subcellular mRNA localization is an evolutionarily conserved mechanism to spatially and temporally drive local translation and, in turn, protein targeting. Hence, this mechanism achieves precise control of gene expression and establishes functional and structural networks during cell growth and development as well as during stimuli response. Since its discovery in ascidian eggs, mRNA localization has been extensively studied in animal and yeast cells. Although our knowledge of subcellular mRNA localization in plant cells lags considerably behind other biological systems, mRNA localization to the endoplasmic reticulum (ER) has also been well established since its discovery in cereal endosperm cells in the early 1990s. Storage protein mRNA targeting to distinct subdomains of the ER determines efficient accumulation of the corresponding proteins in different endosomal storage sites and, in turn, underlies storage organelle biogenesis in cereal grains. The targeting process requires the presence of RNA localization elements, also called zipcodes, and specific RNA-binding proteins that recognize and bind these zipcodes and recruit other factors to mediate active transport. Here, we review the current knowledge of the mechanisms and functions of mRNA localization to the ER in plant cells and address directions for future research.

**Keywords:** mRNA localization; RNA-binding proteins; zipcode RNA; storage proteins

## **1. Introduction**

Asymmetric distribution of mRNAs was first discovered in ascidian eggs and embryos in 1983 [1], where β-actin mRNA was further observed to be specifically localized at the site where muscle-forming cells reside [2]. This observation, followed by investigations of maternal mRNA distributions in *Xenopus* and *Drosophila* oocytes [3–5], supported the previous proposal of prelocalized RNAs during early development [6,7] and raised the hypothesis that specific mRNA pools localized to particular subcellular areas todetermine cell fate and tissue differentiation. The concept of subcellular mRNA localization was then proposed in 1986 by Lawrence and Singer [8], who applied the in situ hybridization technique to locate mRNAs in chicken fibroblasts. This phenomenon was subsequently observed in neurons [9], rice endosperm cells [10], and oligodendrocytes [11]. With the continual findings of mRNA localization in animal, plants, yeast, algae, and even bacteria [12], mRNA localization is proposed to be an ancient, prevalent, universal, and highly conserved mechanism.

Today, mRNA localization is referred to as a mechanism where mRNAs are specifically localized to discrete subcellular compartments. By creating local translation hotspots, mRNA localization provides a highly efficient process to concentrate newly synthesized proteins within a defined intracellular region. mRNA localization enables cells to fine-tune cell polarization, differentiation, and migration as well as quickly respond to intracellular and environmental stimuli.

In eukaryotic cells, localization of mRNAs involves multiple events in both the nucleus and cytoplasm. The process is initiated in the nucleus where cis-acting elements within

**Citation:** Zhang, L.; Si, Q.; Yang, K.; Zhang, W.; Okita, T.W.; Tian, L. mRNA Localization to the Endoplasmic Reticulum in Plant Endosperm Cells. *Int. J. Mol. Sci.* **2022**, *23*, 13511. https://doi.org/ 10.3390/ijms232113511

Academic Editors: Jinsong Bao and Jianhong Xu

Received: 6 September 2022 Accepted: 31 October 2022 Published: 4 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the RNAs, also called zipcodes RNA elements, are recognized and bound by specific transacting factors, typically RNA-binding proteins (RBPs), to form an initial ribonucleoprotein (RNP) complex. The association of mRNAs with some of these RBPs may be maintained during RNA processing and maturation to form a primary messenger ribonucleoprotein (mRNP) complex competent for export from the nucleus. Once transported to the cytoplasm, the primary mRNP complex is remodeled by removing and recruiting one or more protein factors and linking to a cytoskeleton motor protein to initiate mRNA transport. When the mRNP complex arrives at the target location, further remodeling is required to activate local translation, turnover or storage. Thus, mRNA localization is a highly dynamic process. While zipcode elements are an essential prerequisite to determine the localization of mRNAs, multiple RBPs play an extremely important roles throughout the process.

In the past four decades, subcellular mRNA localization has been extensively studied in yeast and metazoan cells, and much of our current understanding on mRNA targeting emanates from research on these organisms. Although mRNA localization in plants wasdiscovered in the early 1990s, information underlying its basis is relatively poor compared to that generated from other systems. The main reason lies in the structural characteristics of plant cells, which usually contain one or more large central vacuoles, which squeezes the cytoplasm to the periphery of the cell and thus impedes effective observation of mRNA localization in a defined area. To date, the phenomenon in plant cells has only been described in the process of mRNA targeting to the cortical endoplasmic reticulum (ER) in cytoplasmic-rich rice endosperm cells [13], mRNAs enrichment on the mitochondrial surface in *Arabidopsis* [14–16] and potato [17], and localization of viral RNAs on the chloroplast envelope [18–20]. Among these different plant systems, localization of storage protein mR-NAs to distinct ER subdomains in rice endosperm cells is relatively well-established, when compared to the bulk mRNA localization to mitochondria and viral mRNA localization to chloroplast. To assist further research on mRNA localization in plant cells, we summarize our current studies and knowledge of mRNA localization to the ER, using rice endosperm cell as a model system.

#### **2. mRNA Targeting to the ER Subdomains Is Driven by Specific RNA Zipcodes**

The early view of protein translation in eukaryotes assumed that after nuclear export to the cytoplasm, mRNAs were translated at random locations within the cytosol. Proteins were localized to specific cellular compartments or organelles by peptide-based determinants. N-terminal transit peptides (TP) served as the targeting signals to the chloroplast and mitochondria, while signal peptides directed the growing nascent polypeptide chain during protein synthesis to the ER. In this latter instance, a signal recognition particle (SRP) recognizes and binds to the signal peptide of the nascent polypeptide to direct the association of a complex of mRNA, ribosome and nascent polypeptide chain to the ER membrane. Signal peptides were considered to be necessary and sufficient information to target proteins to the ER in both plant and animal cells [21–23].

In rice endosperm cells, large amount of storage proteins, prolamine, glutelin and α-globulin are synthesized on the ER where these proteins are translocated to the lumen. Prolamines are retained in the lumen, where they co-assemble as intracisternal granules, which matures into an organelle labeled as ER-derived protein body-I (PB-I, prolamine) [24]. By contrast, glutelins and α-globulins are exported from the ER lumen to the Golgi and then transported to protein storage vacuoles (PSV, containing glutelin and α-globulin) to form PB-II (Figure 1A) [24]. Due to the presence of signal peptide elements in these storage proteins, their synthesis on the ER and packaging of the storage proteins into PB-I and PB-II were initially thought to be dependent on their signal peptides. In 1993, using in situ hybridization at the electron microscopy level, the Okita laboratory reported that *prolamine* and *glutelin* mRNAs were localized on two distinct subdomains of the ER [10]. *Prolamine* mRNAs were localized on the ER (PB-ER) that delimit PB-I, while *glutelin* mRNAs were distributed to the adjoining cisternal ER (cis-ER) (Figure 1A). Later, using optimized in situ RT-PCR technique, the group further discovered that when removing the translation

initiation codon or signal peptide sequences, prolamine mRNAs remained targeted to the PB-ER [25]. These studies indicate that storage protein mRNAs are not randomly localized on the ER and that the localization process is RNA-based. This hypothesis was substantiated by expressing exogenous reporter genes containing *prolamine* or *glutelin* RNA sequences positioned at their 30 UTR. The *β-glucuronidase (GUS)* mRNA, which by itself is normally targeted to the cis-ER, was redirected by *prolamine* RNA sequences to the PB-ER [25], while *GFP* mRNAs containing *prolamine* or *glutelin* RNA sequences were localized to the PB-ER and cis-ER, respectively [26]. the translation initiation codon or signal peptide sequences, prolamine mRNAs remained targeted to the PB-ER [25]. These studies indicate that storage protein mRNAs are not randomly localized on the ER and that the localization process is RNA-based. This hypothesis was substantiated by expressing exogenous reporter genes containing *prolamine* or *glutelin* RNA sequences positioned at their 3′ UTR. The *β-glucuronidase (GUS)* mRNA, which by itself is normally targeted to the cis-ER, was redirected by *prolamine* RNA sequences to the PB-ER [25], while *GFP* mRNAs containing *prolamine* or *glutelin* RNA sequences were localized to the PB-ER and cis-ER, respectively [26].

*in situ* hybridization at the electron microscopy level, the Okita laboratory reported that *prolamine* and *glutelin* mRNAs were localized on two distinct subdomains of the ER [10]. *Prolamine* mRNAs were localized on the ER (PB-ER) that delimit PB-I, while *glutelin* mRNAs were distributed to the adjoining cisternal ER (cis-ER) (Figure 1A). Later, using optimized *in situ* RT-PCR technique, the group further discovered that when removing

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 3 of 16

**Figure 1.** Schematic model of mRNA transport to the cortical ER in developing rice endosperm cells. (**A**) Working model of three pathways to transport mRNAs to distinct ER subdomains in wildtype. The localization of storage protein mRNAs initiates in the nucleus, where newly transcribed mRNAs are recognized and bound by various sets of specific RBPs, forming heterogenous nuclear ribonucleoprotein (hnRNP) complexes. After export from the nucleus to cytoplasm, the complexes undergo dynamic remodeling to form mRNPs that recruit molecular motor proteins for actively transporting to distinct ER subdomains likely on actin filaments (grey lines). *Glutelin* mRNAs (red curve line) are targeted to the cis-ER via endosomal trafficking. Following translation, proglutelins are transported via the Golgi complex to the irregularly shaped PSV, where they are proteolytically processed to acidic and basic subunits and accumulated in the crystalline regions of the PSV (shown **Figure 1.** Schematic model of mRNA transport to the cortical ER in developing rice endosperm cells. (**A**) Working model of three pathways to transport mRNAs to distinct ER subdomains in wildtype. The localization of storage protein mRNAs initiates in the nucleus, where newly transcribed mRNAs are recognized and bound by various sets of specific RBPs, forming heterogenous nuclear ribonucleoprotein (hnRNP) complexes. After export from the nucleus to cytoplasm, the complexes undergo dynamic remodeling to form mRNPs that recruit molecular motor proteins for actively transporting to distinct ER subdomains likely on actin filaments (grey lines). *Glutelin* mRNAs (red curve line) are targeted to the cis-ER via endosomal trafficking. Following translation, proglutelins are transported via the Golgi complex to the irregularly shaped PSV, where they are proteolytically processed to acidic and basic subunits and accumulated in the crystalline regions of the PSV (shown in red). *Prolamine* (blue curve line) and *α-globulin* (orange curve line) mRNAs are transported to the PB-ER, where the mRNAs are translated and newly synthesized prolamine polypeptides are assembled as an intracisternal granule to form multi-layered PB-I. The synthesized α-globulins are

rapidly exported to the Golgi for subsequent transport, via dense vesicles, to the peripheral area of the PSVs. An additional default pathway transports zipcode-less mRNAs (green curve line) to the cis-ER. (**B**) Mutations of key RBPs or factors induce mistargeting of storage protein mRNAs and, in turn, their proteins. Mutants carrying mutations in *RBP-P*, *RBP-L* or *Tudor-SN* mistarget both *prolamine* and *glutelin* mRNAs and cause changes in the shape and/or protein components of PB-I and PSVs. *Got1B* mutations disrupts targeting of *prolamine* mRNAs to the PB-ER. Due to the dysfunction of endosomal factor Rab5a and its effector Rab5a-GEF in *glup6* and *glup4* mutants, respectively, mRNPs (light grey dots) carrying *glutelin* mRNAs are partially mislocalized from the cis-ER to the PB-ER and paramural bodies (PMBs). As endosomes are also involved in glutelin and *α*-globulin protein trafficking from the Golgi to PSVs, both α-globulin (red dots) and proglutelin (orange dots) proteins are found to partially mislocalize in PMBs.

The direct targeting of mRNAs to the ER is best established in yeast. A set of mRNAs, including *ASH1* mRNA, are co-transported on tubular ER that move to the emerging bud or daughter cell [27–30]. The process is found to be driven by multiple cis-acting elements within the mRNA sequences [29]. To investigate the cis-acting elements within the storage protein mRNAs, a series of transgenic rice lines carrying a reporter gene supplemented with partial storage protein mRNA sequences was constructed [26,31,32]. Exogenous GFP whose mRNA was localized on the cis-ER by a default pathway was used as a reporter gene to investigate *prolamine* localization elements [26]. The addition of various 50 and 30 deletions of the *prolamine* sequences led to the identification of two apparent cis-acting elements, one located downstream of the signal peptide coding sequence (zipcode 1) and the other in the 30 UTR (zipcode 2, Figure 2A) [26]. The presence of a single zipcode resulted in only partial localization of prolamine mRNA to the PB-ER. Thus, two cis-acting elements, which shared a conserved U-rich motif element (Figure 2A), are required for restricted *prolamine* mRNA localization.

A similar strategy was applied to identify *glutelin* mRNA localization elements. When the maize *δ-zein* mRNA, a member of the cereal prolamine superfamily [33] was used as a reporter gene, two short sequences located at the 50 and 30 ends of the coding region as well as the 30 UTR of *glutelin* mRNA were sufficient to redirect *δ-zein* mRNA from the PB-ER to the cis-ER (Figure 2B) [31]. These observations suggest that *glutelin* mRNA contains three cis-localization elements and that the glutelin zipcodes are dominant over the *δ-zein* mRNA zipcodes. Further sequence analysis suggests that these glutelin zipcode RNAs contain two conserved motifs (Figure 2B) with the U-rich motif 2 showing some homology to the *prolamine* zipcode [31].

In addition to prolamines and glutelins, rice endosperm cells accumulate small amounts of α-globulins. The saline-soluble proteins are synthesized on the ER-membrane, processed by the Golgi and ultimately deposited together with glutelin in the PSV. Interestingly, *α-globulin* mRNAs are distributed on the PB-ER (Figure 1A), and not on the cis-ER based on the analysis of in situ RT-PCR [32]. Its presence in the 30 UTR of the *GFP* RNA redirects the hybrid RNA from the cis-ER to the PB-ER as well [32], suggesting that *α-globulin* mRNA sequence contains cis-acting elements for localization on the PB-ER. Sequence analysis revealed that *α-globulin* mRNA sequences possess three candidate zipcodes located at both the coding and non-coding regions, sharing high similarity to *prolamine* zipcode RNAs [32]. Collectively, these findings suggest that both *prolamine* and *α-globulin* mRNAs are directed to the cortical ER by specific zipcode RNAs.

Based on the results of these studies, three mRNA targeting pathways to the cortical ER exist in rice endosperm cells (Figure 1A). While *prolamine* and *α-globulin* mRNAs are localized to the PB-ER, *glutelin* mRNAs are targeted to the cis-ER. Both pathways are zipcode RNA-dependent. The third pathway is a default zipcode-independent pathway, which mediates RNAs, including *GFP* and GUS mRNAs, to the cis-ER. The three pathways are not independent but instead hierarchal and inter-related (Figure 3). The glutelin pathway is dominant, as its zipcodes can redirect the transport of *prolamine* and *α-globulin* mRNAs from the PB-ER to the cis-ER [27,28] (Figure 3). In turn, *prolamine* and *α-globulin* mRNAs are able to redirect *GFP* RNA from the default cis-ER to the PB-ER [26,32].

**Figure 2.** RNA zipcodes identified in *prolamine* (**A**) and *glutelin* (**B**) mRNAs (adapted from [34]). (**A**) The proximate locations (**left**) and the consensus motif sequence (**right**) of the two zipcode elements in *prolamine* mRNA. *Prolamine* zipcode elements are located in the coding region and 3′ UTR and consists of only a single zipcode motif (\*). (**B**) The proximate locations (**top**) and the consensus motif sequences (**bottom**) of the zipcode elements in *glutelin* mRNA. Glutelin mRNAs possess three zipcodes (zipcode 1, 2, 3), which consist of two motifs, zipcode motif 1 (orange triangles) and zipcode motif 2 (magenta triangles). 5′ and 3′ denote 5′ UTR and 3′ UTR, respectively. In addition to prolamines and glutelins, rice endosperm cells accumulate small **Figure 2.** RNA zipcodes identified in *prolamine* (**A**) and *glutelin* (**B**) mRNAs (adapted from [34]). (**A**) The proximate locations (**left**) and the consensus motif sequence (**right**) of the two zipcode elements in *prolamine* mRNA. *Prolamine* zipcode elements are located in the coding region and 30 UTR and consists of only a single zipcode motif (\*). (**B**) The proximate locations (**top**) and the consensus motif sequences (**bottom**) of the zipcode elements in *glutelin* mRNA. Glutelin mRNAs possess three zipcodes (zipcode 1, 2, 3), which consist of two motifs, zipcode motif 1 (orange triangles) and zipcode motif 2 (magenta triangles). 50 and 30 denote 50 UTR and 30 UTR, respectively. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 6 of 16

ways are not independent but instead hierarchal and inter-related (Figure 3). The glutelin pathway is dominant, as its zipcodes can redirect the transport of *prolamine* and *α-globulin* mRNAs from the PB-ER to the cis-ER [27,28] (Figure 3). In turn, *prolamine* and *α-globulin* mRNAs are able to redirect *GFP* RNA from the default cis-ER to the PB-ER [26,32]. **Figure 3.** The hierarchal relationship among the three mRNA transport pathways in developing rice endosperm (adapted from [35]). The *glutelin* mRNA localization pathway to the cis-ER (red) is dominant over the pathway targeting *prolamine/α-globulin* mRNAs to the PB-ER (blue), which, in turn, is dominant over the default pathway (gray). Studies from mutant rice lines expressing defective Got1 (*glup2*), Rab5 (*glup4*) and Rab5-GEF (*glup6*), indicate that membrane trafficking mediates the mRNA transport to the PB-ER or cis-ER. Another rice mutant, *glup5*, possessing an undefined genetic defect, misdirects α*-globulin* mRNAs to the cis-ER without affecting the localization of *prolamine* mRNAs to the PB-ER. The key RBPs responsible for *glutelin* and *prolamine* mRNA localization, are marked under each pathway. The three RNA-transport pathways may share some common RBPs or factors for mRNA targeting to the cortical ER membrane, while additional specific factors are required for selective transport during each pathway. **3. mRNA Targeting to the ER Subdomains Requires a Set of Trans-Acting RBPs Figure 3.** The hierarchal relationship among the three mRNA transport pathways in developing rice endosperm (adapted from [35]). The *glutelin* mRNA localization pathway to the cis-ER (red) is dominant over the pathway targeting *prolamine/α-globulin* mRNAs to the PB-ER (blue), which, in turn, is dominant over the default pathway (gray). Studies from mutant rice lines expressing defective Got1 (*glup2*), Rab5 (*glup4*) and Rab5-GEF (*glup6*), indicate that membrane trafficking mediates the mRNA transport to the PB-ER or cis-ER. Another rice mutant, *glup5*, possessing an undefined genetic defect, misdirects α-*globulin* mRNAs to the cis-ER without affecting the localization of *prolamine* mRNAs to the PB-ER. The key RBPs responsible for *glutelin* and *prolamine* mRNA localization, are marked under each pathway. The three RNA-transport pathways may share some common RBPs or factors for mRNA targeting to the cortical ER membrane, while additional specific factors are required for selective transport during each pathway.

Following the identification of cis-acting elements in RNAs, identification of trans-

tively. Affinity chromatography using *prolamine* zipcode RNA as bait to pull-down interacting RBPs was initially applied [36]. Fifteen unique RBPs with specific binding affinity to the *prolamine* zipcode were selectively captured under highly stringent washing and elution conditions. Five of these RBPs, A, I, J, K, and Q, were heterogeneous nuclear ribonucleoproteins (hnRNPs) containing two RNA recognition motifs (RRMs) and were selected for further functional analysis [37]. All five RBPs have binding capability to *prolamine* zipcode RNAs. They form multiple complexes in the nucleus and cytoplasm, suggesting that they mediate various steps during *prolamine* mRNA transport and localization (Figures 4 and 5). RBPs A-J-K and I-J-K assemble into two complexes associated with *prolamine* zipcodes in both the nucleus and cytoplasm, while RBP-Q is involved in the formation of an undefined third complex in the nucleus that is released in the cytoplasm.

which mediates RNAs, including *GFP* and GUS mRNAs, to the cis-ER. The three path-

#### **3. mRNA Targeting to the ER Subdomains Requires a Set of Trans-Acting RBPs 3. mRNA Targeting to the ER Subdomains Requires a Set of Trans-Acting RBPs**

**Figure 3.** The hierarchal relationship among the three mRNA transport pathways in developing rice endosperm (adapted from [35]). The *glutelin* mRNA localization pathway to the cis-ER (red) is dominant over the pathway targeting *prolamine/α-globulin* mRNAs to the PB-ER (blue), which, in turn, is dominant over the default pathway (gray). Studies from mutant rice lines expressing defective Got1 (*glup2*), Rab5 (*glup4*) and Rab5-GEF (*glup6*), indicate that membrane trafficking mediates the mRNA transport to the PB-ER or cis-ER. Another rice mutant, *glup5*, possessing an undefined genetic defect, misdirects α*-globulin* mRNAs to the cis-ER without affecting the localization of *prolamine* mRNAs to the PB-ER. The key RBPs responsible for *glutelin* and *prolamine* mRNA localization, are marked under each pathway. The three RNA-transport pathways may share some common RBPs or factors for mRNA targeting to the cortical ER membrane, while additional specific factors are required for se-

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 6 of 16

Following the identification of cis-acting elements in RNAs, identification of transacting factors, mainly RNA-binding proteins (RBPs), was pursued. Two strategies were used to identify the RBPs required for *prolamine* and *glutelin* mRNA localization, respectively. Affinity chromatography using *prolamine* zipcode RNA as bait to pull-down interacting RBPs was initially applied [36]. Fifteen unique RBPs with specific binding affinity to the *prolamine* zipcode were selectively captured under highly stringent washing and elution conditions. Five of these RBPs, A, I, J, K, and Q, were heterogeneous nuclear ribonucleoproteins (hnRNPs) containing two RNA recognition motifs (RRMs) and were selected for further functional analysis [37]. All five RBPs have binding capability to *prolamine* zipcode RNAs. They form multiple complexes in the nucleus and cytoplasm, suggesting that they mediate various steps during *prolamine* mRNA transport and localization (Figures 4 and 5). RBPs A-J-K and I-J-K assemble into two complexes associated with *prolamine* zipcodes in both the nucleus and cytoplasm, while RBP-Q is involved in the formation of an undefined third complex in the nucleus that is released in the cytoplasm. Following the identification of cis-acting elements in RNAs, identification of transacting factors, mainly RNA-binding proteins (RBPs), was pursued. Two strategies were used to identify the RBPs required for *prolamine* and *glutelin* mRNA localization, respectively. Affinity chromatography using *prolamine* zipcode RNA as bait to pull-down interacting RBPs was initially applied [36]. Fifteen unique RBPs with specific binding affinity to the *prolamine* zipcode were selectively captured under highly stringent washing and elution conditions. Five of these RBPs, A, I, J, K, and Q, were heterogeneous nuclear ribonucleoproteins (hnRNPs) containing two RNA recognition motifs (RRMs) and were selected for further functional analysis [37]. All five RBPs have binding capability to *prolamine* zipcode RNAs. They form multiple complexes in the nucleus and cytoplasm, suggesting that they mediate various steps during *prolamine* mRNA transport and localization (Figures 4 and 5). RBPs A-J-K and I-J-K assemble into two complexes associated with *prolamine* zipcodes in both the nucleus and cytoplasm, while RBP-Q is involved in the formation of an undefined third complex in the nucleus that is released in the cytoplasm.

lective transport during each pathway.

**Figure 4.** A proposed working model of five RBPs involved in *prolamine* mRNA localization (adapted from [37]). Five RBPs, A, I, J, K, and Q assemble into at least three different RBP complexes that recognize and bind to *prolamine* zipcodes. RBPs A, I, J and K form two cytoplasmic complexes, A-J-K and I-J-K. These two RBP complexes together with a third complex containing RBP-Q may also be present in the nucleus. In the nucleus, RBPs I and J may be associated with other proteins preventing their recognition by antibodies. When associated with RBP-A and RBP-K, RBP-Q is not accessible to its antibody, suggesting that it is bound by other proteins that comprise a third multiprotein family. Alternatively, the nucleus contains a simpler complex consisting of RBPs A and K as well as the RBP-Q complex. **Figure 4.** A proposed working model of five RBPs involved in *prolamine* mRNA localization (adapted from [37]). Five RBPs, A, I, J, K, and Q assemble into at least three different RBP complexes that recognize and bind to *prolamine* zipcodes. RBPs A, I, J and K form two cytoplasmic complexes, A-J-K and I-J-K. These two RBP complexes together with a third complex containing RBP-Q may also be present in the nucleus. In the nucleus, RBPs I and J may be associated with other proteins preventing their recognition by antibodies. When associated with RBP-A and RBP-K, RBP-Q is not accessible to its antibody, suggesting that it is bound by other proteins that comprise a third multiprotein family. Alternatively, the nucleus contains a simpler complex consisting of RBPs A and K as well as the RBP-Q complex.

**Figure 5.** Schematic structure of the available RBPs responsible for *prolamine* and/or *glutelin* mRNA localization. While RBP-A, I, J, K, Q, and P contain two RRM motifs, there are three RRMs in RBP-L and RBP208. Tudor-SN consists of four SN-like domains (SN1 to 4) followed by a Tudor domain **Figure 5.** Schematic structure of the available RBPs responsible for *prolamine* and/or *glutelin* mRNA localization. While RBP-A, I, J, K, Q, and P contain two RRM motifs, there are three RRMs in RBP-L

and a fifth abbreviated SN-like domain (SN5). The mutations sites in RBP-P and Tudor-SN that cause mislocalization of *prolamine* and *glutelin* mRNAs are indicated by arrows followed by labeling

While affinity chromatography using prolamine zipcode RNA was successful in capturing specific *trans*-RBPs, it failed to identify specific *glutelin* zipcode binding proteins. An alternative North-western blot approach was later undertaken using *glutelin* zipcode and non-zipcode RNA as a comparison group, which identified RBP-P as a key *glutelin* zipcode-binding protein [38]. RBP-P was also captured by *prolamine* zipcode RNA chromatography [36], suggesting a dual functional role of RBP-P in localization of both glutelin and prolamine mRNAs. Indeed, the two RRMs containing RBP-P (Figure 5) showed highly specific binding to both *glutelin* and *prolamine* mRNAs, especially their zipcodes. Mutations in RBP-P led to a loss of RNA binding activity and caused partial mislocalization of both *glutelin* and *prolamine* mRNAs [39]. Collectively, these results indicate RBP-P plays essential roles in mediating specific targeting of both *glutelin* and *prolamine* mRNAs. Similar to the case of prolamine, multi-protein complexes are also required to determine the localization of *glutelin* mRNAs. Protein–protein interaction revealed that RBP-P, RBP-L and RBP208 interact with each other, forming multiple complexes in the nucleus and/or cytoplasm [39]. RBP-L exhibits similar features as RBP-P (Figure 5), including binding to *glutelin* and *prolamine* mRNAs in vivo and in vitro [34]. When RBP-L expression is knocked down by a DNA segmental mutation in the 3′ UTR region, both *glutelin* and *prolamine* mRNAs [34] are partially mislocalized, suggesting that RBP-L is also a key RBP in regulating the localization of *glutelin* and *prolamine* mRNAs. Given that the complexes formed by RBP-P and RBP-L are not RNA-dependent and are located in both the nucleus and cytoplasm, the two RBPs may form a primary complex to serve as a scaffold to bind other RBPs to form a multi-protein complex that selectively targets *prolamine* and *glutelin* mRNAs to the cortical ER. In addition to RBP-P and RBP-L, a third candidate, RBP208, may also be involved. RBP208 interacts with RBP-P, and this interaction is weakened with several RBP-P mutant proteins. Hence, RBP208 may also be required for precise control of *glutelin* and *prolamine* mRNA localization. Unlike the RBP-P/RBP-L complex, RBP-P/RBP208 complexes are found in both the nucleus and cytoplasm and their interaction is

amino acid substitutions. a.a., amino acids.

and RBP208. Tudor-SN consists of four SN-like domains (SN1 to 4) followed by a Tudor domain and a fifth abbreviated SN-like domain (SN5). The mutations sites in RBP-P and Tudor-SN that cause mislocalization of *prolamine* and *glutelin* mRNAs are indicated by arrows followed by labeling amino acid substitutions. a.a., amino acids.

While affinity chromatography using prolamine zipcode RNA was successful in capturing specific *trans*-RBPs, it failed to identify specific *glutelin* zipcode binding proteins. An alternative North-western blot approach was later undertaken using *glutelin* zipcode and non-zipcode RNA as a comparison group, which identified RBP-P as a key *glutelin* zipcode-binding protein [38]. RBP-P was also captured by *prolamine* zipcode RNA chromatography [36], suggesting a dual functional role of RBP-P in localization of both glutelin and prolamine mRNAs. Indeed, the two RRMs containing RBP-P (Figure 5) showed highly specific binding to both *glutelin* and *prolamine* mRNAs, especially their zipcodes. Mutations in RBP-P led to a loss of RNA binding activity and caused partial mislocalization of both *glutelin* and *prolamine* mRNAs [39]. Collectively, these results indicate RBP-P plays essential roles in mediating specific targeting of both *glutelin* and *prolamine* mRNAs.

Similar to the case of prolamine, multi-protein complexes are also required to determine the localization of *glutelin* mRNAs. Protein–protein interaction revealed that RBP-P, RBP-L and RBP208 interact with each other, forming multiple complexes in the nucleus and/or cytoplasm [39]. RBP-L exhibits similar features as RBP-P (Figure 5), including binding to *glutelin* and *prolamine* mRNAs in vivo and in vitro [34]. When RBP-L expression is knocked down by a DNA segmental mutation in the 30 UTR region, both *glutelin* and *prolamine* mRNAs [34] are partially mislocalized, suggesting that RBP-L is also a key RBP in regulating the localization of *glutelin* and *prolamine* mRNAs. Given that the complexes formed by RBP-P and RBP-L are not RNA-dependent and are located in both the nucleus and cytoplasm, the two RBPs may form a primary complex to serve as a scaffold to bind other RBPs to form a multi-protein complex that selectively targets *prolamine* and *glutelin* mRNAs to the cortical ER. In addition to RBP-P and RBP-L, a third candidate, RBP208, may also be involved. RBP208 interacts with RBP-P, and this interaction is weakened with several RBP-P mutant proteins. Hence, RBP208 may also be required for precise control of *glutelin* and *prolamine* mRNA localization. Unlike the RBP-P/RBP-L complex, RBP-P/RBP208 complexes are found in both the nucleus and cytoplasm and their interaction is RNA-dependent [39]. Hence, RBP208 may be specially recruited by RBP-P to the complex during mRNA localization. As RBP208 only interacts with RBP-L in the cytoplasm, the RBP group P/L/208 may also form multiple complexes to co-regulate *glutelin* and *prolamine* mRNA localization. However, the exact detailed function of RBP208 in the process deserves further investigation.

The requirement of RBP-P, RBP-L and RBP208 for both *glutelin* and *prolamine* mRNA localization indicates that the two transport pathways are inter-related by sharing common trans-factors to assemble the required mRNP complexes. This feature further contributes to the close relationship between the *glutelin* and *prolamine* mRNA transport pathways. The current results suggest a scenario that while the primary scaffold formed by RBP-P and RBP-L, with the undefined role from RBP208, selects *glutelin* and *prolamine* mRNAs for specific targeting to the cortical ER, the complexes formed by RBPs A, I, J, K and Q are involved to target *prolamine* mRNAs to the PB-ER. How the RBP group A/I/J/K/Q links to RBP-P/L scaffold and what specific factors control *glutelin* mRNA transport pathway require further study.

#### **4. A Possible Role for Myosin Motor Protein Driving mRNA Transport to the ER Subdomains on Actin Filaments**

How an mRNP complex carrying target mRNAs and a large collection of RBPs is transported to specific ER subdomains remains a mystery. Although passive mRNA diffusion and anchoring along with cytoplasmic streaming has been reported for *Nos* RNA during *Drosophila* oogenesis [40–42] and for most mRNAs in bacteria [12,43], active transport driven by cytoskeletal-associated motor proteins is the most common mode of mRNA localization in eukaryotic cells. In yeast, the well-known bud-localized *ASH1* mRNA is transported

on actin filaments through its association with the RBPs, She2 and She3, with the type V myosin motor Myo4 [44–47]. The She2–She3 complex also recognizes *CLB2*, *TCB2*, *TCB3*, and *IST2* mRNAs, and drives their active transport on actin filaments [48,49]. In mammalian cells, mRNAs are actively transported on both actin and microtubules [50–52]. For example, *Actb* mRNAs are transported to the leading edge of migrating fibroblasts on both actin and microtubules, where the zipcode binding protein 1 (ZBP1) mediates the transport via its interaction with motor proteins [53–55].

In rice endosperm cells, a preliminary study using a GFP-based RNA movement system suggested that the storage protein mRNAs were transported on actin filaments. In this modified 2-hybrid system, *prolamine* RNA particles were visualized by co-expressing a GFP-MS2 fusion protein and a hybrid *prolamine* RNA containing tandem MS2 binding sites. Microscopic analysis showed that the *prolamine* RNA/GFP-MS2 particles moved in a stop-and-go manner [56,57]. The movement was overall unidirectional but with occasional bidirectional, random, and oscillatory movements, a movement behavior consistent with transport along cytoskeletal elements. Drug treatment known to disrupt the integrity of actin filaments using cytochalasin D and latrunculin B was found to efficiently suppress particle movement [56], indicating that *prolamine* mRNAs are likely transported via myosin along actin filaments.

#### **5. mRNA Transport to the ER Subdomains Meets Membrane Trafficking**

A mutant rice line, *glup2*, carrying a mutation in *Golgi Transport 1(Got1B)* gene, exhibits mislocalization of *prolamine* and *α-globulin* mRNAs [58] (Figure 3). Got1B is usually found on coat protein complex II (COPII) vesicles and functions as a membrane traffickingrelated protein to mediate anterograde transport from the ER to the Golgi [58–60]. In rice, Got1B was reported to interact with COPII component Sec23 and thus mediate COPII vesicle formation at Golgi-associated ER exit sites [60], which may influence transport of glutelin polypeptides and cause abnormal accumulation of proglutelins in the relevant mutants [58,60]. It seems that in addition to protein transport, Got1B mediated COPII trafficking may also be involved in mRNA localization. Although it is not clear how *prolamine* and *α-globulin* mRNA transport is linked to COPII trafficking, it is obvious that the process of mRNA localization has a relationship with membrane trafficking as demonstrated by analysis of rice mutant lines.

The *glup4* and *glup6* rice lines carry mutations in the small GTPase Rab5 and its cognate guanine nucleotide exchange factor (Rab5-GEF), respectively. Rab5 plays multiple roles in membrane transport, ranging from early endosome formation to trafficking from the Golgi to the protein storage vacuole in rice endosperm [61]. Due to the critical function of Rab5 in endosomal transport, both *glup4* and *glup6* lines exhibit a pronounced aborted endocytosis phenotype, where extracellular paramural bodies (PMBs) are formed adjacent to the cytoplasmic membrane (Figure 1B). These paramural bodies contain protein markers for the ER, Golgi, prevacuolar compartment and plasma membrane [61] as well as glutelin, suggesting that normal membrane trafficking is disrupted. In addition, *glutelin* mRNAs were also detected in PMBs [61,62], suggesting a potential connection of endosomal trafficking and *glutelin* mRNA transport.

Long-distance endosomal transport of mRNAs is well studied in *Ustilago maydis*, a fungus that causes corn smut disease. A highly polarized growth of infectious hyphae is heavily dependent on motor-mediated endosomal transport along microtubules (Figure 6A) [27]. During the process, higher-order septin filaments are generated with gradients to set up polarized growth via endosomal transport of *septin* mRNAs, including *cdc3*, *cdc10*, *cdc11*, and *cdc12*. The endosomal transport of *septin* mRNAs requires their binding to the RNA binding protein Rrm4 [28,63,64], which interacts with a membrane-associated linker protein Upa1 through its FYVE domain to link mRNPs on endosomes. Thus, the complex of Rrm4 and Upa1 work together with the other two RBPs, Pab1 and Upa2 [63,64] to co-mediate endosomal transport of *cdc* mRNAs (Figure 6A). Apparently, specific adaptor proteins are required to hitch mRNPs onto endosomes for active transport of mRNAs.

analysis of rice mutant lines.

mal trafficking and *glutelin* mRNA transport.

**Figure 6.** Proposed models of endosome-coupled trafficking of mRNAs in different organisms. (**A**) Working model of *cdc* mRNA transport via trafficking endosomes in the fungus *U. maydis* (adapted from [64]). The mRNAs encoding four septins, Cdc3, 10, 11 and 12, are bound by Rrm4 and Pab1, which recruit GrP1, UPa1, and UPa2 as well as an unknown linker protein (grey color). The resulting mRNP complex is associated with a trafficking endosome through membrane protein UPa1. (**B**) Working model of *glutelin* mRNA transport via trafficking endosomes in rice endosperm cells (adapted from [65]). *Glutelin* mRNAs are recognized and bound by RBP-P and RBP-L, forming an mRNP complex. Through direct or NSF-mediated interaction of RBP-L and RBP-P, respectively, with key endosomal factor, Rab5a, the quaternary complex links the mRNP complex onto endosomes for active transport via the cytoskeleton. Other unknown RBPs or factors (light gray) may also be involved to stabilize the mRNP complex and define the connection to endosomes. (**C**) Working model of the transport of mRNAs encoding mitochondrial proteins via trafficking endosomes in rat neuron cell (adapted from preprint studies of [66,67]). A FERRY (Five-subunit Endosomal Rab5 and RNA/ribosome intermediary) complex, composed of Fy-1 to 5, is required for transport. Fy-2 directly binds to mRNAs through its coiled-coils and recruits Fy4 and Fy-5 to form a clamplike structure. Although Fy-1 and Fy-3 are not required for RNA binding, they assist Fy-2 to adopt the correct folding and conformation. In addition to functioning as the main binding protein to mRNAs, Fy-2 also interacts with the GTP-bound form of Rab5 via its C-terminal region, linking the mRNP complex onto trafficking endosomes for active mRNA transport. **Figure 6.** Proposed models of endosome-coupled trafficking of mRNAs in different organisms. (**A**) Working model of *cdc* mRNA transport via trafficking endosomes in the fungus *U. maydis* (adapted from [64]). The mRNAs encoding four septins, Cdc3, 10, 11 and 12, are bound by Rrm4 and Pab1, which recruit GrP1, UPa1, and UPa2 as well as an unknown linker protein (grey color). The resulting mRNP complex is associated with a trafficking endosome through membrane protein UPa1. (**B**) Working model of *glutelin* mRNA transport via trafficking endosomes in rice endosperm cells (adapted from [65]). *Glutelin* mRNAs are recognized and bound by RBP-P and RBP-L, forming an mRNP complex. Through direct or NSF-mediated interaction of RBP-L and RBP-P, respectively, with key endosomal factor, Rab5a, the quaternary complex links the mRNP complex onto endosomes for active transport via the cytoskeleton. Other unknown RBPs or factors (light gray) may also be involved to stabilize the mRNP complex and define the connection to endosomes. (**C**) Working model of the transport of mRNAs encoding mitochondrial proteins via trafficking endosomes in rat neuron cell (adapted from preprint studies of [66,67]). A FERRY (Five-subunit Endosomal Rab5 and RNA/ribosome intermediary) complex, composed of Fy-1 to 5, is required for transport. Fy-2 directly binds to mRNAs through its coiled-coils and recruits Fy4 and Fy-5 to form a clamp-like structure. Although Fy-1 and Fy-3 are not required for RNA binding, they assist Fy-2 to adopt the correct folding and conformation. In addition to functioning as the main binding protein to mRNAs, Fy-2 also interacts with the GTP-bound form of Rab5 via its C-terminal region, linking the mRNP complex onto trafficking endosomes for active mRNA transport.

and *α-globulin* mRNA transport is linked to COPII trafficking, it is obvious that the process of mRNA localization has a relationship with membrane trafficking as demonstrated by

The *glup4* and *glup6* rice lines carry mutations in the small GTPase Rab5 and its cognate guanine nucleotide exchange factor (Rab5-GEF), respectively. Rab5 plays multiple roles in membrane transport, ranging from early endosome formation to trafficking from the Golgi to the protein storage vacuole in rice endosperm [61]. Due to the critical function of Rab5 in endosomal transport, both *glup4* and *glup6* lines exhibit a pronounced aborted endocytosis phenotype, where extracellular paramural bodies (PMBs) are formed adjacent to the cytoplasmic membrane (Figure 1B). These paramural bodies contain protein markers for the ER, Golgi, prevacuolar compartment and plasma membrane [61] as well as glutelin, suggesting that normal membrane trafficking is disrupted. In addition, *glutelin* mRNAs were also detected in PMBs [61,62], suggesting a potential connection of endoso-

Long-distance endosomal transport of mRNAs is well studied in *Ustilago maydis,* a fungus that causes corn smut disease. A highly polarized growth of infectious hyphae is heavily dependent on motor-mediated endosomal transport along microtubules (Figure 6A) [27]. During the process, higher-order septin filaments are generated with gradients to set up polarized growth via endosomal transport of *septin* mRNAs, including *cdc3*, *cdc10*, *cdc11,* and *cdc12*. The endosomal transport of *septin* mRNAs requires their binding to the RNA binding protein Rrm4 [28,63,64], which interacts with a membrane-associated linker protein Upa1 through its FYVE domain to link mRNPs on endosomes. Thus, the complex of Rrm4 and Upa1 work together with the other two RBPs, Pab1 and Upa2 [63,64] to co-mediate endosomal transport of *cdc* mRNAs (Figure 6A). Apparently, specific adaptor proteins are required to hitch mRNPs onto endosomes for active transport of mRNAs.

In contrast to the working model in *U. maydis*, higher plants may employ a distinct set of proteins to accomplish endosomal trafficking of mRNAs. The abovementioned key RBPs, RBP-P, and RBP-L were found to directly interact with the membrane fusion factor N-ethylmaleimide-sensitive factor (NSF) and Rab5a [65], respectively, thus forming a quaternary complex linked to endosomes (Figure 6B). The complexes carry *glutelin* mRNAs through specific binding activities from RBP-P and RBP-L, while NSF and Rab5a are recruited for active transport of *glutelin* mRNAs on endosomes to the cortical ER membrane. Mistargeting of *glutelin* mRNAs, along with presence of the quaternary complex, to the PMBs in the *rab5a* mutant supports the endosomal transport of *glutelin* mRNAs.

Prior to these findings, the direct binding of RBPs with NSF or Rab5a had not been reported in any other organisms. Such protein-protein interaction may profit from a gain in binding properties by Rab5a and NSF, as well as RBP-P and RBP-L, which allow for the highly selective recognition of an RBP with the NSF membrane fusion factor or the molecular switch Rab5. Due to the high conservation of RRM domains between RBP-P and RBP-L, the N- and C-terminal regions of RBP-P and RBP-L may directly contribute to their protein recognition. In the case of RBP-P/NSF interaction, the N-terminal regions of RBP-P and NSF are likely responsible for their interaction. NSF is a soluble hexameric ATPase predominantly involved in membrane fusion events through its interaction with the soluble NSF attachment protein (SNAP) [68–71]. Due to the absence of SNAP in the assembly of RBP-P/NSF complex [65], NSF may gain a special function in mRNA metabolism by its interaction with RBP-P. Of the quaternary complex, Rab5a interacts with both NSF and RBP-L, but not RBP-P. Given that active GTPase activity of Rab5a is required for the transport of *glutelin* mRNAs on endosomes, NSF and RBP-L may act as Rab5a effectors to regulate endosomal transport of mRNAs.

The identification of these key linker proteins that enable hitchhiking of mRNPs on trafficking endosomes in rice endosperm cells provides new insights on membrane trafficking-mediated mRNA transport in eukaryotes. However, whether the requirement of NSF-Rab5a-RBP machinery in endosomal mRNA transport commonly exists in other eukaryotic organisms or is unique to higher plants needs further investigation.

A recent preprint study in rat neuron cells reported a novel FERRY (Five-subunit Endosomal Rab5 and RNA/ribosome intermediary) complex that directly interacts with mRNAs and Rab5, and functions as a Rab5 effector to mediate mRNA transport to mitochondria via early endosomes [67,72] (Figure 6C). The FERRY complex is assembled by five subunits, Fy-1 to Fy-5, in which the flexible Fy-2 serves as a binding hub to interact with Rab5 and connect all five subunits to mediate the binding to specific mitochondrial mRNAs [66]. While the FERRY complex is developed in some fungi and commonly exists as a full extent of five-subunit assembly in the Chordata [67], Fy proteins share very low sequence similarity with putative GTPase activator proteins in plants (lower than 28% when compared to *Arabidopsis*). Whether a similar mechanism exists in plant cells requires future investigation as well.

### **6. mRNA Localization Plays a Determinant Role in Storage Organelle Biogenesis in Cereal Grains**

A prominent role of intracellular localization of mRNAs is to target the distribution of the encoded protein and, thereby, generate a high concentration at specific locales [73]. Specific localization of *prolamine* mRNAs on the PB-ER results in intensive local translation to generate a high concentration of prolamine polypeptides within the ER lumen, an environment conducive for self-assembly of prolamine polypeptides to form intracisternal granules, which eventually develop into ER-derived mature PB-I [74] (Figure 1A). Targeting of *glutelin* mRNAs to spatially separate cis-ER prevents the potential interaction of the newly synthesized glutelin with prolamine and α-globulin polypeptides in the ER lumen. This enables the proglutelin polypeptide to fold correctly and assemble into a quaternary structure competent for export from the ER lumen to the Golgi and subsequently to the PSVs.

Mislocalization of *glutelin* and *prolamine* mRNAs caused by mutations of key RBPs and other relevant factors results in mistargeting of the encoded proteins and the formation of abnormal storage organelles [32,34,39] (Figure 1B). For instance, the typical PB-I structure is spherical and contains an electron-dense core of cysteine-rich 10 kDa prolamine (CysR10), which is surrounded by an electron lucent layer of cysteine-poor 13 kDa prolamine (CysP13) [75]. When mRNA mislocalization occurs, the mistargeted glutelin or loss of target prolamine impairs the tight packaging of prolamine in PB-I, resulting in the loss of the electron-dense central core and appearance of irregular shaped PB-I [34,76] (Figure 1B).

Although α-globulin proteins are transported to the PSV similar to glutelins, *α-globulin* mRNAs are localized on the PB-ER. The separate transport pathways of *α-globulin* and *glutelin* mRNAs are likely responsible for the asymmetric distribution of α-globulin and glutelin proteins in the matrix and crystalloid regions, respectively, of the PSVs. Mislocalization of *α-globulin* mRNAs to the cis-ER disrupts the normal transport of α-globulin and the distinct, separate distribution of these storage proteins in PB-II. Taken together, the segregation of storage mRNAs to distinct ER subdomains plays a determinant role in precisely controlling the sorting, transport, and deposition of the encoded storage proteins within the storage organelles.

The asymmetric distribution of specific mRNAs on distinct ER subdomain is likely a prevalent and conserved mechanism in plant cells. In maize endosperm cells, mRNAs encoding prolamine family protein zeins and 11S globulin type protein legumin-1 display a similar pattern seen in rice endosperm cells, i.e., they are localized on the ER-bounded zein protein bodies and cis-ER, respectively [77]. The asymmetric distribution of *zein* and *legumin-1* mRNAs directly contributes to the deposition of zein proteins in the ER-derived protein body and legumin-1 in PSV [77]. Given that cereal grains usually accumulate a large amount of storage proteins, mRNA localization serves as a major mechanism to drive storage proteins targeting in cereal grains.

#### **7. Accessory RBPs Is Required for mRNA Localization as well as Other Functions**

An earlier study isolated a cytoskeleton-PB-enriched fraction from developing rice seed lysate by fractionation by sucrose density gradient centrifugation, which was then treated with high salt to solubilized cytoskeletal-associated proteins. This protein fraction was then subjected to a poly(U)-Sepharose chromatography, which resulted in the identification of a cytoskeleton-associated 120 kD protein with a prominent RNA-binding activity [78,79]. The protein was later named OsTudor-SN, an ortholog of the human transcriptional co-activator p100 [80]. OsTudor-SN possesses a multi-domain structure, consisting of four tandem staphylococcus nuclease (SN) domains (4SN module) and a Tudor domain followed by an abbreviated C-terminal SN (Tsn module). OsTudor-SN was found to specifically bind to the 30 UTR regions of prolamine and glutelin mRNAs [78]. Involvement of OsTudor-SN in mRNA localization was evidenced by partial disruption of both *prolamine* and *glutelin* mRNA localization in mutant OsTudor-SN endosperm cells [81]. The N-terminal 4SN module is responsible for RNA binding activity to storage protein mRNAs, while the C-terminal Tsn module acts as a scaffold for protein–protein interaction with other RBPs [81]. Thus, the two modular regions of OsTudor-SN cooperate in *prolamine* and *glutelin* mRNA localization. Although OsTudor-SN functions as a non-zipcode transfactor [35], its association with both cytoskeleton and storage protein mRNAs reveals that OsTudor-SN likely participates in the transport of target mRNAs.

In addition, OsTudor-SN is also required for storage protein expression, storage organelle biogenesis, and seed development. While a decrease in glutelin expression was observed in rice lines carrying mutations in the Tudor domain or loss of the Tsn module, mutation in the 4SN module caused elevated accumulation of the glutelin precursor [76]. The latter mutation also resulted in strong reduction in prolamine expression and, in turn, abnormal formation of protein bodies [76]. The phenotype caused by the 4SN mutation is partially restored by complementation with the wild-type OsTudor-SN gene, suggesting that the 4SN module plays a crucial role in seed development. Data from transcriptome analysis indicate that OsTudor-SN also has functions in regulating the expression of transcription factors and genes involved in seed development and stress response [76]. Collectively, the modular structure confers multiple functions of OsTudor-SN in mRNA localization as well as in other cellular properties.

#### **8. Transport of mRNAs Is Highly Selective and as Large Regulons**

In situ analysis of rice *glup4* and *glup6* mutants carrying loss-of-function Rab5 and its effector GEF, respectively, reveals that these mutations impact mistargeting of *glutelin* mRNAs to the PB-ER and PMBs while having no significant impact on the normal localization of *prolamine* and *α-globulin* mRNAs on the PB-ER. The study of these mutants demonstrates that mRNA transport is highly selective. Analysis of RNA transcripts isolated from purified PMBs from *glup6* mutant reveals that in addition to *glutelin* mRNAs, other mRNAs were also located in the purified PMBs [62]. The composition of PMB-associated mRNAs was found to be not random but selective. Sets of mRNAs encoding cell wall, respiration, photosynthesis, and ribosome-related proteins were highly enriched in the PMB-associated mRNA pool, suggesting that specific types of mRNAs are transported together with *glutelin* mRNAs via the endosomal-mediated transport pathway. This cotransport phenomenon is consistent with the "RNA regulon" hypothesis [82] where sets of mRNAs that encode similar intracellular location or functionally related proteins are co-transported and coordinately regulated.

Additional evidence of co-transport of mRNAs is from the study of *calreticulin* and *zein* mRNAs in maize calli cells [83]. The mRNAs encoding calcium binding protein calreticulin, an ER-resident chaperone protein, were found to be selectively targeted to the PB-ER subdomain where *zein* mRNAs are located. The distribution pattern is distinct from the diffuse distribution of the control mRNAs that encodes actin monomer binding protein profilin [83], further confirming that mRNA localization is a highly selective process. Calreticulin proteins were further found to localize in the ER-derived PBs containing zein. Therefore, *calreticulin* mRNAs were thought to be co-transported with *zein* mRNAs to the PB-ER subdomain, possibly directly determining the enrichment of calreticulin in the PBs and assisting zein retention and assembly of PBs within the ER. Further investigations on the co-transport of these mRNAs may help to identify the factors that regulate protein synthesis and localization.

#### **9. Future Perspectives**

The localization of mRNAs plays an essential role in governing gene expression, protein targeting and thus determines cell fate, development, and polar growth. Although the phenomenon of mRNA localization in plant cells was discovered three decades ago, the underlying mechanism remains largely undefined. Studies from storage protein mRNA localization onto distinct subdomains of the ER in endosperm cells have identified the determinant zipcode RNA sequences and many key RBPs responsible for specific targeting pathways. Although a preliminary model of mRNA localization to the cortical ER in plant cells has been proposed, more questions are waiting to be addressed. Except for a few examples discovered in endosperm or callus cells, are there similar mRNA localization processes in other plant cells and plant species? While endosomal-mediated transport supports *glutelin* mRNA targeting to the cis-ER, what transport pathway regulates the localization of *prolamine* and *α-globulin* mRNAs to the PB-ER? Following the discovery of non-storage protein RNAs targeting to a specific subdomain of the ER, are these mRNAs cotransported as regulons to the same subcellular destination based on their similar function or intracellular location?

To address these questions, a combination of biochemical, cell biological, and highthroughput methods will allow for investigation of mRNA localization at a large-scale profiling level. Employment of RNA-protein immunoprecipitation combined with highthroughput sequencing and high-resolution mass spectrometry will help to further identify cis- and trans-factors responsible for specific mRNA localization. A more detailed network of co-transported mRNAs and the mechanism of assembly and remodeling of multi-RBP complexes to recognize and bind target mRNAs deserve further investigation. Application of fluorescence-based mRNA-labeling systems, such as boxB RNA stem-loop [84], bacteriophage PP7 aptamer tagging [85], GFP-MS2 [56,86], and spinach (*Spinacia oleracea*) tracking systems [13,87–89] in combination with high-resolution microscopy will enable visualization and simultaneously monitor the transport of mRNAs in live cells. All these efforts will help to identify the machinery involved in mRNA targeting and transport and to address the general significance of mRNA localization in plant cells.

**Author Contributions:** Writing—original draft, L.Z., Q.S., K.Y., W.Z. and L.T.; review and editing, T.W.O. and L.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by Zhejiang A&F University Starting Funds of Scientific Research and Development (203402000101 and 203402001301) and the National Science Foundation (NSF) EAGER Grant (2029933).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Effects of Storage Temperature on Indica-Japonica Hybrid Rice Metabolites, Analyzed Using Liquid Chromatography and Mass Spectrometry**

**Lin Zhu 1,† , Yu Tian 2,†, Jiangang Ling <sup>1</sup> , Xue Gong <sup>2</sup> , Jing Sun 1,\* and Litao Tong 1,\***


**Abstract:** The Yongyou series of indica-japonica hybrid rice has excellent production potential and storage performance. However, little is known about the underlying mechanism of its storage resistance. In this study, Yongyou 1540 rice (*Oryza sativa cv. yongyou 1540*) was stored at different temperatures, and the storability was validated though measuring nutritional components and apparent change. In addition, a broad-targeted metabolomic approach coupled with liquid chromatographymass spectrometry was applied to analyze the metabolite changes. The study found that under high temperature storage conditions (35 ◦C), Yongyou 1540 was not significantly worse in terms of fatty acid value, whiteness value, and changes in electron microscope profile. A total of 19 key differential metabolites were screened, and lipid metabolites related to palmitoleic acid were found to affect the aging of rice. At the same time, two substances, guanosine 30 ,50 -cyclophosphate and pipecolic acid, were beneficial to enhance the resistance of rice under harsh storage conditions, thereby delaying the deterioration of its quality and maintaining its quality. Significant regulation of galactose metabolism, alanine, aspartate and glutamate metabolism, butyrate metabolism, and arginine and proline metabolism pathways were probably responsible for the good storage capacity of Yongyou 1540.

**Keywords:** indica/japonica hybrid rice; liquid chromatography–mass spectrometry; wide-targeted metabolomics; storage temperature; storage performance

## **1. Introduction**

Rice (*Oryza sativa* L.) is one of the major staple foods consumed all over the world, notably in China, India, Indonesia, Bangladesh, Vietnam, Thailand, Myanmar, and the Philippines. These countries contribute most highly to rice production, corresponding to 82% of global rice production and 69% of global rice consumption [1]. At the same time, rice also provides 35–60% of dietary calories for most people in the world [2]. However, in the next few decades, with the increase of population, the demand for food in China and the world will remain a serious challenge. Breeding high-yielding rice varieties and developing high-yielding cultivation techniques are considered two key approaches to address this challenge [3,4]. Indica-japonica hybrid rice is a new type of rice cultivar developed by hybridizing indica as the male parent and japonica as the female parent. Moreover, the hybrid rice plant structure has the ability to support super high yield, and the Yongyou series have a positive effect [5].

**Citation:** Zhu, L.; Tian, Y.; Ling, J.; Gong, X.; Sun, J.; Tong, L. Effects of Storage Temperature on Indica-Japonica Hybrid Rice Metabolites, Analyzed Using Liquid Chromatography and Mass Spectrometry. *Int. J. Mol. Sci.* **2022**, *23*, 7421. https://doi.org/10.3390/ ijms23137421

Academic Editors: Jinsong Bao and Jianhong Xu

Received: 7 May 2022 Accepted: 30 June 2022 Published: 4 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The storage of rice is also an important part of its production and transactions, and only with good storage performance can its commercial value be maintained in the commodity transaction link. Starch, protein, lipid, mineral elements and vitamins, are the main nutritional components of rice [6–9], which is susceptible to microbial contamination and oxidative deterioration during storage, leading to reduced physiological activity, fat oxidation, and protein degradation [10]. The rice quality reduction was usually manifested in the weakening of grain respiration, the reduction of vitality, the change of physicochemical properties and the change of protoplasmic colloid structure [11]. There are many factors affecting the aging of rice, such as the variety, water content, temperature, humidity, storage time, storage conditions, and local climatic conditions.

Yongyou 1540 rice is a new variety jointly cultivated by the research group, and it has been proved that Yongyou 1540 has good storability. However, the underlying mechanisms are still largely unknown, which limits the promotion and application of the products. Metabolomics is the systemic study of the metabolites, that is, of all small molecules in a biological sample, to provide a snap-shot of the ongoing biochemical processes [12,13]. At the same time, it is also the science of qualitative and quantitative analysis of all low molecular weight (<1000) metabolites of biological cells or organisms in a specific physiological period [14].

In this study, in order to reveal the storage mechanism of Yongyou 1540, and to provide theoretical data support for the production and promotion of this series of indica-japonica hybrid rice, Yongyou 1540 was stored at different temperatures. The fatty acid and whiteness values of each sample were determined, and the rice surface was observed with an electron microscope. In addition, the changes of Yongyou 1540 metabolites under different storage conditions were investigated using a broad-targeted metabolomic technology based on LC-MS/MS. After multivariate statistical analysis such as principal component analysis, cluster analysis, and path analysis, significant differential metabolites were screened out. Finally, through enrichment analysis of metabolic pathways, the most relevant metabolic pathways were found. The results showed that the key differential metabolites of Yongyou 1540 that were significantly up-regulated were guanosine 30 ,50 -cyclic phosphate, pipecolic acid and GABA. Based on the properties and effects of these substances, it is speculated that these substances increase the resistance of rice to relatively harsh storage conditions, thereby delaying the aging process and maintaining its quality.

#### **2. Result**

#### *2.1. Measurement Results of Fatty Acid Value and Whiteness Value of Rice*

As shown in Figure 1A, the higher the storage temperature, the higher was the fatty acid value (from 0.085 to 0.350 g kg−<sup>1</sup> ). According to the Chinese national standard GB/T20569-2006, the fatty acid value of fresh rice should be less than 0.250 g kg−<sup>1</sup> and that of stored rice should be less than 0.350 g kg−<sup>1</sup> [15]. Even under the storage condition of 35 ◦C, the fatty acid value met the requirements, and the degree of quality deterioration of the Yongyou rice was not significant.

The test results of whiteness value are shown in Figure 1B. The whiteness value represents the level of whiteness of the rice surface. Obviously, with the prolongation of storage time, the whiteness value of rice at different storage temperatures showed a significant downward trend, and then entered a stable stage and continued to decrease slowly. Due to the influence of storage temperature, the higher the temperature, the greater is the decrease of the whiteness value. However, the whiteness values (from 68% to 76%) are still in the reasonable whiteness value range [16].

#### *2.2. Electron Microscope Section Observation Results*

The section images of rice at different storage temperatures were observed under different magnifications (left: 1 mm, right: 200 µm) (Figure 2). Through observation, it can be seen that with the continuous increase of the ambient temperature of the rice, the width and depth of the cracks on the cut surface of the rice samples increased significantly by the

90th day of storage. This may be due to the influence of temperature in the early storage period. The loss of nutrients was accelerated, as well as the rate of surface water loss and the degree of deterioration. On the 180th day of storage, there was no obvious further deterioration compared with the previous period. This may be due to the fact that when kept under unsuitable storage conditions for a long time, a certain metabolic system was activated by the rice itself, which enhanced its resistance to stress and storage performance. *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 3 of 16

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 3 of 16

**Figure 1.** Changes in fatty acid value (**A**) and whiteness value (**B**) at different storage temperatures. **Figure 1.** Changes in fatty acid value (**A**) and whiteness value (**B**) at different storage temperatures. mance.

#### *2.3. Metabolite Characterization of Samples*

The LC-MS-based wide-targeted metabolomics approach was employed in the study. By matching the substance database with conditions such as retention time and mass-tocharge ratio, the substance could be qualitatively identified [17]. A total of 177 metabolites were identified, as shown in Supplementary Table S1. In terms of quantity, alkaloids

**Figure 2.** Electron microscope profiles of rice (200 μm) at different storage temperatures. (**A**) Morphological images of rice stored on day 0. (**B**), (**C**), (**D**) are the morphological images of rice on the

account for 14.12%, amino acids and their derivatives account for 9.60%, phenols and their derivatives account for 8.74%, organic acids and their derivatives account for 8.47%, fatty acyl groups account for 6.7%, and flavonoids account for 7.3%, carbohydrates and their derivatives accounted for 6.21%, terpenoids accounted for 6.21%, and other small amounts of substances such as steroids and their derivatives, plant hormones, coumarin, purines, pyridines, etc. accounted for 12.99%. Alkaloids accounted for the largest proportion, followed by amino acids and their derivatives, while steroids and their derivatives and terpenes were also present. count for 14.12%, amino acids and their derivatives account for 9.60%, phenols and their derivatives account for 8.74%, organic acids and their derivatives account for 8.47%, fatty acyl groups account for 6.7%, and flavonoids account for 7.3%, carbohydrates and their derivatives accounted for 6.21%, terpenoids accounted for 6.21%, and other small amounts of substances such as steroids and their derivatives, plant hormones, coumarin, purines, pyridines, etc. accounted for 12.99%. Alkaloids accounted for the largest proportion, followed by amino acids and their derivatives, while steroids and their derivatives and terpenes were also present.

90th day of storage at 15 °C, 25 °C, and 35 °C storage temperature, respectively. (**E**), (**F**), (**G**) are the morphological images of rice on the 180th day of storage at 15 °C, 25 °C, and 35 °C storage temper-

The LC-MS-based wide-targeted metabolomics approach was employed in the study. By matching the substance database with conditions such as retention time and mass-tocharge ratio, the substance could be qualitatively identified [17]. A total of 177 metabolites were identified, as shown in Supplementary Table S1*.* In terms of quantity, alkaloids ac-

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 4 of 16

#### *2.4. Principal Component Analysis 2.4. Principal Component Analysis*

ature, respectively.

*2.3. Metabolite Characterization of Samples* 

The PCA score scatter plot of all samples (including QC samples) is shown in Figure 3. The value of R2X in the scatter plot of the PCA model was 0.527, which further indicated that the experimental model was not over-fitting, and the experimental results were reliable. It could be clearly observed that there was a significant difference between HT\_6, LT\_6, NT\_6 groups and the CK group, while the difference between the HT\_6 group and CK group was the most significant. The comparisons between the LT\_6 group and the CK group were close in degree of difference, suggesting that storage temperature was an important factor affecting the metabolic activity of rice. However, there are a certain number of samples with no obvious changes in the principal component differences. This may indicate that the rice of this variety produces a certain substance that is self-regulating even under inappropriate storage conditions. The PCA score scatter plot of all samples (including QC samples) is shown in Figure 3. The value of R2X in the scatter plot of the PCA model was 0.527, which further indicated that the experimental model was not over-fitting, and the experimental results were reliable. It could be clearly observed that there was a significant difference between HT\_6, LT\_6, NT\_6 groups and the CK group, while the difference between the HT\_6 group and CK group was the most significant. The comparisons between the LT\_6 group and the CK group were close in degree of difference, suggesting that storage temperature was an important factor affecting the metabolic activity of rice. However, there are a certain number of samples with no obvious changes in the principal component differences. This may indicate that the rice of this variety produces a certain substance that is self-regulating even under inappropriate storage conditions.

**Figure 3***.* Scatter plot from PCA model for CK\_6, LT\_6, NT\_6, HT\_6, and QC groups. The abscissa PC[1] and ordinate PC[2] in the figure represent the scores of the first and second principal components, respectively. Each scatter represents a sample, and the color and shape of the scatter represent different groups. The samples were basically within the 95% confidence interval, and the QC group **Figure 3.** Scatter plot from PCA model for CK\_6, LT\_6, NT\_6, HT\_6, and QC groups. The abscissa PC[1] and ordinate PC[2] in the figure represent the scores of the first and second principal components, respectively. Each scatter represents a sample, and the color and shape of the scatter represent different groups. The samples were basically within the 95% confidence interval, and the QC group was tightly gathered and close to the middle of all samples, suggesting great system stability within the entire measurement queue. R2X represents the model's interpretation of the X variable and Q<sup>2</sup> represents the predictability of the model. The closer the two metrics are to 1, the better the model performs and the higher the interpretability.

#### *2.5. OPLS-DA Analysis*

The abscissa t[1]P in the OPLS-DA scored graph represented the predicted principal component score of the first principal component, showing the difference between sample groups, and the ordinate t[1]O represented the orthogonal principal component score,

showing the difference within the sample group. In the present study, OPLS-DA was modeled for the classification between the HT\_6 and CK\_6 groups, and the HT\_6 and LT\_6 groups, respectively. From the results of the score plot, it can be seen that the HT\_6 group was very significantly distinguished from the CK group and the LT\_6 group, and the samples were all within the 95% confidence interval (Hotelling's T-squared ellipse), indicating that the storage temperature changed the metabolites of the rice samples. According to the statistics, HT\_6 vs. CK groups Q<sup>2</sup> = 0.701, HT\_6 group vs. LT\_6 group Q<sup>2</sup> = 0.654, indicating that if new samples are added to the model, a relatively approximate distribution will be obtained. The permutation test randomly changed the arrangement order of the categorical variable Y, and established the corresponding OPLS-DA, modeled multiple times (*n* = 200 times) to obtain the R2Y and Q<sup>2</sup> values of the random model. The intercept of the regression line of Q<sup>2</sup> and the vertical axis was less than zero. At the same time, with the gradual decrease of the permutation retention, the proportion of the permuted Y variable increased, and the Q<sup>2</sup> of the random model gradually decreased. The results above showed that the original models of HT\_6 vs. CK groups and HT\_6 vs. LT\_6 groups could meet the acceptable requirements. showing the difference within the sample group. In the present study, OPLS-DA was modeled for the classification between the HT\_6 and CK\_6 groups, and the HT\_6 and LT\_6 groups, respectively. From the results of the score plot, it can be seen that the HT\_6 group was very significantly distinguished from the CK group and the LT\_6 group, and the samples were all within the 95% confidence interval (Hotelling's T-squared ellipse), indicating that the storage temperature changed the metabolites of the rice samples. According to the statistics, HT\_6 vs. CK groups Q2 = 0.701, HT\_6 group vs. LT\_6 group Q2 = 0.654, indicating that if new samples are added to the model, a relatively approximate distribution will be obtained. The permutation test randomly changed the arrangement order of the categorical variable Y, and established the corresponding OPLS-DA, modeled multiple times (*n* = 200 times) to obtain the R2Y and Q2 values of the random model. The intercept of the regression line of Q2 and the vertical axis was less than zero. At the same time, with the gradual decrease of the permutation retention, the proportion of the permuted Y variable increased, and the Q2 of the random model gradually decreased. The results above showed that the original models of HT\_6 vs. CK groups and HT\_6 vs. LT\_6 groups could meet the acceptable requirements.

was tightly gathered and close to the middle of all samples, suggesting great system stability within the entire measurement queue. R2X represents the model's interpretation of the X variable and Q2 represents the predictability of the model. The closer the two metrics are to 1, the better the model

The abscissa t[1]P in the OPLS-DA scored graph represented the predicted principal component score of the first principal component, showing the difference between sample groups, and the ordinate t[1]O represented the orthogonal principal component score,

*Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 5 of 16

performs and the higher the interpretability.

*2.5. OPLS-DA Analysis* 

The Q<sup>2</sup> values of NT\_6 vs. CK groups, NT\_6 vs. LT\_6 groups, HT\_6 vs. NT\_6 groups were close to 0.5, indicating that there were differences between sample groups, but the degree of difference was not significant. The Q<sup>2</sup> < 0.5 of the LT\_6 group and the CK group indicated that the metabolic difference between the two samples was not significant within a certain range. This phenomenon shows that the metabolic activity of Yongyou 1540 rice is slow under the storage condition of 15 ◦C (see Figure 4). The Q2 values of NT\_6 vs. CK groups, NT\_6 vs. LT\_6 groups, HT\_6 vs. NT\_6 groups were close to 0.5, indicating that there were differences between sample groups, but the degree of difference was not significant. The Q2 < 0.5 of the LT\_6 group and the CK group indicated that the metabolic difference between the two samples was not significant within a certain range. This phenomenon shows that the metabolic activity of Yongyou 1540 rice is slow under the storage condition of 15 °C (see Figure 4).

**Figure 4.** The scatter plot and permutation tested results of the OPLS-DA model for the HT\_6 vs. CK groups, and the HT\_6 vs. LT\_6 groups. (**A**) The scatter plot of the OPLS-DA modeled on the HT\_6 vs. CK groups; (**B**) The permutation test of the OPLS-DA modeled on the HT\_6 vs. CK groups; (**C**) The scatter plot of the OPLS-DA modeled on the HT\_6 vs. LT\_6 groups; (**D**) The permutation test of the OPLS-DA modeled on the HT\_6 vs. LT\_6 groups.

#### *2.6. Screening and Analysis of Differential Metabolites with Significant Changes in Content*

In order to explore the effects of different storage temperatures on rice metabolites and quality, the VIP value of the OPLS-DA model (threshold > 1) and the *p*-value of Student's *t*-test (threshold < 0.05) were used to screen for differential metabolites. The results showed that there were 105 differential metabolites in HT\_6 vs. CK group, 99 differential

metabolites in the HT\_6 vs. LT\_6 group, 81 differential metabolites in HT\_6 vs. NT\_6 group, 93 differential metabolites in NT\_6 vs. CK group, 87 differential metabolites in NT\_6 vs. LT\_6 group and 49 differential metabolites in LT\_6 vs. CK group. The differential metabolites obtained through the above analysis often have the results and functional similarity/complementarity in biology, or are positively regulated/negatively regulated by the same metabolic pathway, showing similar or opposite expression characteristics between different experimental groups. Hierarchical cluster analysis was used to help explore similarities and differences between different groups. Figure 5 illustrated the thermodynamic diagram of the hierarchical cluster analysis of experimental groups at different storage temperatures [18]. The differential metabolites of the above experimental groups were also subjected to matchstick analysis and chord analysis. Matchstick analysis and chord analysis help to further screen the most closely related differential metabolites. The identification of differential metabolites can help to find the relevant metabolic pathways. HT\_6 vs. CK groups were used as the core of the analysis. The research groups compared the results of their matchstick analysis and chord analysis, and screened for 19 key differential metabolites. These 19 substances are shown in Table 1. Among the above 19 differential metabolites, 8 substances were distinct differential metabolites in the HT\_6 vs. NT\_6 groups, and 14 substances were distinct differential metabolites in the HT\_6 vs. LT\_6 groups. The metabolites of LT\_6 and CK were so similar that it was difficult to find the differential metabolites.

This result indicates that some reactions are only excited at the higher temperature conditions where HT\_6 and NT\_6 are located. The two groups had 8 identical differential metabolites, indicating that when the storage temperature was higher, the corresponding metabolic pathways would be activated to produce corresponding metabolites. The only other 11 differential metabolites expressed in the HT\_6 vs. CK groups were those produced by the corresponding stimulated metabolic pathways in this rice variety under relatively harsher storage conditions. Among these 11 differential metabolites, adenosine 20 ,30−cyclic phosphate, guanosine 30 ,50−cyclic monophosphate, (S) −2−aceto−2−hydroxybutanoate, streptozotocin, palmitoylethanolamide, pimelic acid, pipecolic acid, these 7 substances are up-regulated. Stachyose, maltotetraose, kojibiose, and denudatine were down-regulated. The first three substances that were down-regulated are carbohydrates, and the downregulation of carbohydrates indicated that the metabolism of energy substances was significant under high temperature storage conditions. HT\_6 vs. LT\_6 groups had the highest number of overlapping differential metabolites with HT\_6 vs. CK groups. The HT\_6 vs. CK group was the focus of the analysis, with a larger temperature span and wider coverage.

#### *2.7. Effects of Storage on Metabolic Pathways of Rice*

The research group mapped 19 key differential metabolites screened by the HT\_6 vs. CK groups to authoritative metabolite databases such as KEGG and PubChem, and found all the pathways involved in the regulation of these differential metabolites (Figures 6 and 7). Through a comprehensive analysis of metabolic pathways (including enrichment analysis and topology analysis), the key pathways most associated with metabolite differences were found. As shown in Figure 6A, five of the most relevant pathways were located, including valine, leucine and isoleucine biosynthesis, galactose metabolism, lysine biosynthesis and degradation, and glutathione metabolism. Meanwhile, the differential metabolites with the highest correlation between the HT\_6 and LT\_6 groups were also located, and the most closely related metabolic pathways were identified, as shown in Figure 6B. It can be seen that the difference in metabolic pathways between the two experimental groups is only in the high expression of starch and sucrose metabolism in the HT\_6 and LT\_6 groups. The metabolic activity of rice stored at 15 ◦C was higher than that of rice stored at −80 ◦C. Energy metabolism is the normal physiological activity of rice.

**Figure 5.** Hierarchical cluster analysis heatmap for all the experimental groups. The abscissa in the figure represents different experimental groups, the ordinate represents the differential metabolites compared in the group, the color blocks of different positions represent the relative expression levels of the metabolites at the corresponding positions, with the red indicating high expression of the substance, and the blue indicating low expression of the substance. **Figure 5.** Hierarchical cluster analysis heatmap for all the experimental groups. The abscissa in the figure represents different experimental groups, the ordinate represents the differential metabolites compared in the group, the color blocks of different positions represent the relative expression levels of the metabolites at the corresponding positions, with the red indicating high expression of the substance, and the blue indicating low expression of the substance.



**Table 1.** Key differential metabolites in HT\_6 group.

stored at −80 °C. Energy metabolism is the normal physiological activity of rice.

*2.7. Effects of Storage on Metabolic Pathways of Rice* 

The research group mapped 19 key differential metabolites screened by the HT\_6 vs. CK groups to authoritative metabolite databases such as KEGG and PubChem, and found all the pathways involved in the regulation of these differential metabolites (Figures 6 and 7). Through a comprehensive analysis of metabolic pathways (including enrichment analysis and topology analysis), the key pathways most associated with metabolite differences were found. As shown in Figure 6A, five of the most relevant pathways were located, including valine, leucine and isoleucine biosynthesis, galactose metabolism, lysine biosynthesis and degradation, and glutathione metabolism. Meanwhile, the differential metabolites with the highest correlation between the HT\_6 and LT\_6 groups were also located, and the most closely related metabolic pathways were identified, as shown in Figure 6B. It can be seen that the difference in metabolic pathways between the two experimental groups is only in the high expression of starch and sucrose metabolism in the HT\_6 and LT\_6 groups. The metabolic activity of rice stored at 15 °C was higher than that of rice

**Figure 6.** The key differential metabolites of the HT\_6 group were mapped to metabolic pathways to analysis, and the results of the metabolic pathway analysis are shown in the bubble chart. (**A**) represents HT\_6 vs. ck groups, (**B**) represents HT\_6 vs. LT\_6 groups. Each bubble in the bubble diagram represented a metabolic pathway. The abscissa of the bubble and the size of the bubble represent the size of the pathway influencing factor in the topology analysis. The larger the scale, the greater the influencing factors. The ordinate where the bubble is located and the color of the bubble represented the *p* value of the enrichment analysis (taking the negative natural logarithm, −ln(p), the darker the color, the smaller the *p* value, and the more significant the enrichment). **Figure 6.** The key differential metabolites of the HT\_6 group were mapped to metabolic pathways to analysis, and the results of the metabolic pathway analysis are shown in the bubble chart. (**A**) represents HT\_6 vs. ck groups, (**B**) represents HT\_6 vs. LT\_6 groups. Each bubble in the bubble diagram represented a metabolic pathway. The abscissa of the bubble and the size of the bubble represent the size of the pathway influencing factor in the topology analysis. The larger the scale, the greater the influencing factors. The ordinate where the bubble is located and the color of the bubble represented the *p* value of the enrichment analysis (taking the negative natural logarithm, −ln(p), the darker the color, the smaller the *p* value, and the more significant the enrichment). *Int. J. Mol. Sci.* **2022**, *23*, x FOR PEER REVIEW 10 of 16

**Figure 7.** The key differential metabolites of HT\_6 vs. CK groups and HT\_6 vs. LT\_6 groups were mapped to metabolic pathways, and the results of metabolic pathway analysis are shown in the KEGG pathway results. **Figure 7.** The key differential metabolites of HT\_6 vs. CK groups and HT\_6 vs. LT\_6 groups were mapped to metabolic pathways, and the results of metabolic pathway analysis are shown in the KEGG pathway results.

#### **3. Discussion 3. Discussion**

The basic quality of You 1540 was evaluated from three aspects: fatty acid value, whiteness value, and electron microscope section observation. According to the experimental results, the following conclusions can be drawn: At different storage temperatures, the changing trend of fatty acid value is different. The higher the storage temperature, the more obvious was the increase of fatty acid value. However, under the storage condition of 35 °C, the fatty acid value of rice was also within 0.350 g kg−1, indicating that the rice was slightly unsuitable for storage at this temperature. However, the degree of quality The basic quality of You 1540 was evaluated from three aspects: fatty acid value, whiteness value, and electron microscope section observation. According to the experimental results, the following conclusions can be drawn: At different storage temperatures, the changing trend of fatty acid value is different. The higher the storage temperature, the more obvious was the increase of fatty acid value. However, under the storage condition of 35 ◦C, the fatty acid value of rice was also within 0.350 g kg−<sup>1</sup> , indicating that the rice was slightly unsuitable for storage at this temperature. However, the degree of quality deterioration of

deterioration of Yongyou rice was not significant. Similarly, for the whiteness value of Yongyou 1540, the research team believes that the change of whiteness value was within an acceptable range after evaluation, and there was no obvious deterioration. On the other

magnification of 200 μm. On the 90th day, the depth and width of surface cracks of rice stored at 35 °C were indeed more obvious than those of rice at the other two storage temperatures. However, on the 180th day, the change of rice cracks was not obvious under different storage temperatures, and the degree was lower than that on the 90th day. Based on the above conclusions, it can be speculated that Yongyou 1540 performed certain metabolic activities and regulated certain pathways under unsuitable storage conditions, and

To further analyze and explore the effects of metabolites on the storage performance of Yongyou 1540, the research group used liquid chromatography and mass spectrometry-based broad-target metabolomics techniques to isolate a total of 719 metabolites. Of these, 615 were recorded in public databases. Principal component analysis (PCA) and OPLS-DA analysis were used in this study. PCA analysis is a statistical method that transforms a set of observed, possibly correlated variables into linearly uncorrelated variables (i.e., principal components) through an orthogonal transformation. The PCA method can reveal the internal structure of the data and thus better explained the data variables between different groups [19]. In fact, LC-MS-based metabolomics data are characterized by high dimensionality (many types of metabolites detected) and small sample size (too few

this activity was related to its storage resistance and stress resistance.

Yongyou rice was not significant. Similarly, for the whiteness value of Yongyou 1540, the research team believes that the change of whiteness value was within an acceptable range after evaluation, and there was no obvious deterioration. On the other hand, the section of Yongyou 1540 was observed under an electron microscope with a magnification of 200 µm. On the 90th day, the depth and width of surface cracks of rice stored at 35 ◦C were indeed more obvious than those of rice at the other two storage temperatures. However, on the 180th day, the change of rice cracks was not obvious under different storage temperatures, and the degree was lower than that on the 90th day. Based on the above conclusions, it can be speculated that Yongyou 1540 performed certain metabolic activities and regulated certain pathways under unsuitable storage conditions, and this activity was related to its storage resistance and stress resistance.

To further analyze and explore the effects of metabolites on the storage performance of Yongyou 1540, the research group used liquid chromatography and mass spectrometrybased broad-target metabolomics techniques to isolate a total of 719 metabolites. Of these, 615 were recorded in public databases. Principal component analysis (PCA) and OPLS-DA analysis were used in this study. PCA analysis is a statistical method that transforms a set of observed, possibly correlated variables into linearly uncorrelated variables (i.e., principal components) through an orthogonal transformation. The PCA method can reveal the internal structure of the data and thus better explained the data variables between different groups [19]. In fact, LC-MS-based metabolomics data are characterized by high dimensionality (many types of metabolites detected) and small sample size (too few samples detected). These variables contain not only differential variables related to categorical variables, but also a large number of interrelated indifference variables [20]. Therefore, the OPLS-DA method, a supervised pattern recognition technique that outperforms PCA in class discrimination, was used to find distinct metabolites between each pair of groups. Through OPLS-DA analysis, orthogonal variables unrelated to categorical variables in metabolites were filtered out, and non-orthogonal variables and orthogonal variables were analyzed separately, so as to obtain more reliable information on the difference and influence degree between metabolite groups. PCA analysis can provide support in the acquisition of related metabolites, and OPLS-DA analysis can support the screening and identification of key differential metabolites. At the same time, the 19 most important differential metabolites were finally selected by using matchstick analysis and chord analysis to assist the screening.

Different storage temperatures were set in the experiments. The maximum storage temperature of 35 ◦C was used as the core of the exploration, and the key differential metabolites were analyzed. A certain number of differential metabolites overlapped with differential metabolites screened at other storage temperatures. In order to explore the metabolic activities and reactions of Yongyou 1540 at higher storage temperatures, the research team chose to analyze 11 non-overlapping differential metabolites one by one. Palmitoylethanolamide (PEA) is a natural amide of ethanolamine and palmitic acid. Pimelic acid exists in the free acid form and is synthetically assembled from fatty acids. These two substances are metabolites produced by the aging phenomenon caused by lipid metabolism in rice during storage. Guanosine 30 ,50−cyclic monophosphate (cGMP), a second messenger discovered in the 1960s, is found in both prokaryotes and eukaryotes [21]. The molecule cGMP is synthesized from guanosine triphosphate (GTP) by guanylate cyclase enzymes (GCs) and is involved in various cellular responses, such as protein kinase activity, cyclic nucleotide gated ion channels and cGMP regulated cyclic nucleotide phosphodiesterases [22,23]. There is a positive correlation between the accumulation of cGMP in plants and various developmental processes and responses to abiotic and pathogenic stresses. Multiple groups have demonstrated that both NO-dependent and NO-independent cGMP signaling pathways are important in the activation of defense responses during biological stress [24–28]. Furthermore, NO-cGMP-dependent signaling pathways have been reported to be involved in adventitious root development [29,30], stomatal closure during abiotic and biotic stresses [31,32], protein phosphorylation [33,34] and transcriptional regulation [35]. Internal cGMP production in rice involves various signaling processes in the plant, especially in control of stomatal pore size. In a high-temperature storage environment, this is important for surviving water shortages, and this key metabolite helps rice maintain water content in the body and delay quality deterioration. Pipecolic acid is L-Pip (hereinafter referred to as Pip); the widespread occurrence of the non-protein amino acid L-Pip in plants, animals, fungi, and microorganisms and its biosynthetic origin from Lys in plants and animals was realized in the 1950s [36,37]. Pip, a common lysine catabolite in plants and animals, is a key regulator of induced plant immunity. Pip is one of several key metabolic mediators for the induction of resistance [38–40]. Based on the above analysis of differential metabolites with significant effects, it can be speculated that under the storage condition of higher temperature, Yongyou 1540 significantly accumulated pipecolic acid, which induced rice to improve its own stress resistance to relatively harsh conditions and maintain its own quality.

In conclusion, a broad target metabolomics technique based on liquid chromatography and mass spectrometry was used in this study. The changes of metabolites of Yongyou 1540 at different storage temperatures were fully analyzed. Through the screening analysis and metabolic pathway mapping of key differential metabolites, Yongyou 1540 was shown to produce palmitoleic acid-related lipid metabolites that are associated with rice aging when stored at 35 ◦C. At the same time, the key differential metabolites of Yongyou 1540 were significantly up-regulated: guanosine 30 ,50 -cyclic phosphate, pipecolic acid and GABA.

Based on the properties and effects of these substances, it is speculated that these substances increase the resistance of rice to relatively harsh storage conditions, thereby delaying the aging process and maintaining its own quality [41,42]. The results show that Yongyou 1540 has self-regulating ability under unsuitable storage conditions, and has good storage performance, which is beneficial to maintain the quality of production and transaction links. Under the increasingly heavy pressure for global food staples, this research can provide a theoretical and scientific basis for the popularization and production of Yongyou 1540.

#### **4. Materials and Methods**

#### *4.1. Rice Materials and Sampling*

Yongyou 1540 rice was harvested in Xiangshan County, Ningbo City, Zhejiang Province. The harvested rice was dried (RH 13–14%), and then ground into first-class rice according to GB/T1354 standard. All samples were divided into four groups. The control (CK) was frozen at −80 ◦C immediately after grinding. After the other three groups were milled, the rice was put into PE bags (2.5 kg per bag) and stored in a constant temperature and humidity box at 15 ◦C (LT\_6), 25 ◦C (NT\_6), or 35 ◦C (HT\_6) (RH 60%), which were used as the experimental groups. There were six samples in parallel for each group. All samples were sent for testing after being stored for 6 months.

#### *4.2. Materials and Reagents*

Methanol (CAS: 67-56-1) and acetonitrile (CAS: 75-05-8) were supplied by CNW Technologies, and SIGMA brand formic acid reagent (CAS: 64-18-6) was used. The purity of the above reagents was all LC-MS grade. Ultrapure water was prepared in-house by a Milli-Q Integral water purification system (Millipore, Bedford, MA, USA). During the experiment, Sciex's model ExionLC AD ultra-high performance liquid phase and QTrap 6500+ highsensitivity mass spectrometer were used. In addition, a Thermo centrifuge (Heraeus Fresco17), a Merck Millipore water purifier (Clear D24 UV) and a Waters chromatographic column (ACQUITY UPLC HSS T3 1.8 µm 2.1 ∗ 100 mm) were also used in the experiments.

#### *4.3. Metabolites Extraction*

The freeze-dried samples were crushed with a mixer mill for 240 s at 60 Hz. 400 mg aliquots of individual samples were precision weighed and then transferred to an Eppendorf tube. After addition of 2000 µL of extract solution (methanol/water = 3:1, precooled at −40 ◦C, containing internal standard), the samples were vortexed for 30s, homogenized at

40 Hz for 4 min, and sonicated for 5 min in an ice-water bath. After repeated homogenization and sonication for 3 times, the samples were extracted over night at 4 ◦C on a shaker. The resultant mixtures were centrifuged at 12,000× *g* rpm (RCF = 13,800× *g*, R = 8.6 cm) for 15 min at 4 ◦C, and the supernatant was carefully filtered through a 0.22 µm microporous membrane, and transferred to 2 mL glass vials. Moreover, 20 µL aliquots of each sample were pooled and prepared in parallel with other rice samples to yield quality control (QC) samples. All samples were stored at −80 ◦C until the UHPLC-MS analysis.

#### *4.4. LC-MS Analysis*

The UHPLC separation was carried out using an EXIONLC System (Sciex). The mobile phase A was 0.1% formic acid in water, and the mobile phase B was acetonitrile. The column temperature was set at 40 ◦C. The auto-sampler temperature was set at 4 ◦C and the injection volume was 2 µL. A Sciex QTrap 6500+ (Sciex Technologies, Framingham, MA, USA), was applied for assay development. Typical ion source parameters were: IonSpray Voltage: +5500/−4500 V, Curtain Gas: 35 psi, Temperature: 400 ◦C, Ion Source Gas 1:60 psi, Ion Source Gas 2: 60 psi, DP: ±100 V. In fact, due to the large sample size of this study, the detection task lasts for a long time, so it is very important to monitor the stability of the instrument and whether the signal is normal during the detection process in real time. To ensure the accuracy of the experimental results, the original data included 3 quality control (QC) samples and 24 experimental samples. QC samples were made by mixing 20 µL of each experimental sample. The retention times and peak areas of the QC samples TIC overlap well, indicating good instrument stability. In addition, the retention time and peak area of the internal standard 2-chlorophenylalanine response were very stable, indicating that the instrument's data acquisition stability was very good (Figures S1 and S2). The experimental equipment is reliable and the data is credible [43–51].

#### *4.5. Data Analysis*

SCIEX Analyst Work Station Software (Version 1.6.3, Framingham, MA, USA) was employed for MRM data acquisition and processing. MS raw data (.wiff) files were converted to the TXT format using MSconventer. In-house R program and database were applied to peak detection and annotation. After obtaining the collated data, the SIMCA software (V16.0.2, Sartorius Stedim Data Analytics AB, Umea, Sweden) was used to screen for differential metabolites using multivariate statistical analysis, such as principal component analysis (PCA), orthogonal partial least squares discriminant analysis (OPLS-DA), Student's *t*-test, and variable importance in projection (VIP) principal components of the OPLS-DA model. The card value standard used in this project is that the *p*-value of the Student's *t*-test is less than 0.05, and the VIP of the first principal component of the OPLS-DA model greater than 1. According to the expression profiles of differential metabolites, the changes of metabolites between groups can be summarized, and the useful information behind them can be mined in combination with the disciplinary background. For example, if some metabolites have the same or different variation trends among groups, combining metabolic pathways can help to mine important metabolic pathways and regulatory relationships. Differential metabolites were annotated in CAS and KEGG databases based on retention time and mass-to-charge ratio (*m*/*z*). Afterwards, through comprehensive analysis of differential metabolite pathways (including enrichment analysis and topology analysis), the pathways were further screened to identify the key pathways of the most relevant differential metabolites [52–54].

#### *4.6. Determination of Rice Quality*

4.6.1. Fatty Acid Value (Petroleum Ether Extraction Method)

Nutrients such as starch, protein, moisture, and lipids in rice itself will deteriorate due to the influence of the environment and internal factors during the storage process. Fatty acid value is an important indicator to measure the quality of rice. According to the Chinese national standard GB/T5510-2011 "Grain and Oil Inspection—Determination of

Fatty Acid Value of Grain and Oilseeds", the research team measured the fatty acid value of rice to judge the degree of deterioration of rice at different storage temperatures. Rice was crushed by grinder under different storage conditions (the groups needed cleaning and were passed through a 1.0 mm round hole sieve). A sample of 10 g (precision 0.01 g) was weighed and placed in a 250 mL conical flask. 50 mL petroleum ether was added to the pipette, and the plug was added before shaking for several seconds. The plug was opened and deflated, and then the bottle was closed and the oscillator was shaken for 10 min. Next steps: Take off the conical bottle, tilt and stand for 1–2 min, put filter paper into the short neck glass funnel to filter. Go to the first few drops of filtrate, use colorimetric tube to collect filtrate more than 25 mL, cover and save. (Tight timing is important: placed at 4 ◦C, use within 24 h). 25 mL filtrate was removed in 150 mL conical flask with a pipette, 75 mL 50% ethanol solution was added into the measuring cylinder, 4–5 drops of phenolphthalein indicator was added, shaken, titrated with potassium hydroxide solution until the lower ethanol solution was slightly red, 30 s did not fade, the titration solution volume (*V*1) was recorded, and 25 mL petroleum ether was used as the blank control group, the titration solution volume (*V*0) was recorded.

The acid value (*AK*) formula is as follows:

$$A\_K = (V\_1 - V\_0) \ast c \ast 56.1 \ast \frac{50}{25} \ast \frac{100}{m(100 - w)} \ast 100$$

*c*-potassium hydroxide concentration (mol/L)

*m*—Sample mass (g)

*w*—Sample moisture mass (per 100 g)

56.1—Potassium hydroxide molar mass (g/mol)

50—The volume of the extraction solution used to extract the sample(mL)

25—Volume of sample extract for titration (mL)

100—Converted to the mass of 100 g dry sample (g)

4.6.2. Whiteness Value

The whiteness of rice under different storage conditions was measured with a whiteness meter, and each sample was measured three times.

4.6.3. Observation of Appearance Features in Slices under Electron Microscope

Section observation (2 mm thick cross section), sampling at 90 d and 180 d, respectively. Instrument name of the microscope used in this project: Thermal Field Emission Scanning Electron Microscope (Zeiss G300); Model: GeminiSEM 300.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/ijms23137421/s1.

**Author Contributions:** Y.T.: Methodology, Validation, Writing—Original Draft. L.Z. and L.T.: Conceptualization, Methodology, Investigation, Resources, Writing—Review and Editing, Supervision. J.L.: Investigation, Resources and Supervision. X.G.: Formal analysis and Data Curation. J.S.: Investigation, Formal analysis, Data Curation, Writing—Review and Editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by Funding: Science and Technology Cooperative Innovation Project between Ningbo Academy of Agricultural Sciences and Chinese Academy of Agricultural Sciences (2019CXGC005).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.

## **References**


## *Article* **Molecular Characterization and SNP-Based Molecular Marker Development of Two Novel High Molecular Weight Glutenin Genes from** *Triticum spelta* **L.**

**Yuemei Cao † , Junwei Zhang † , Ruomei Wang, Haocheng Sun and Yueming Yan \***

> Beijing Key Laboratory of Plant Gene Resources and Biotechnology for Carbon Reduction and Environmental Improvement, College of Life Science, Capital Normal University, Beijing 100048, China

**\*** Correspondence: yanym@cnu.edu.cn

† These authors contributed equally to this work.

**Abstract:** Spelt wheat (*Triticum spelta* L., 2n=6x=42, AABBDD) is a valuable source of new gene resources for wheat genetic improvement. In the present study, two novel high molecular weight glutenin subunits (HMW-GS) 1Ax2.1\* at *Glu-A1* and 1By19\* at *Glu-B1* from German spelt wheat were identified. The encoding genes of both subunits were amplified and cloned by allele-specific PCR (AS-PCR), and the complete sequences of open reading frames (ORF) were obtained. *1Ax2.1\** with 2478 bp and *1By19\** with 2163 bp encoded 824 and 720 amino acid residues, respectively. Molecular characterization showed that both subunits had a longer repetitive region, and high percentage of α-helices at the N- and C-termini, which are beneficial for forming superior gluten macropolymers. Protein modelling by AlphaFold2 revealed similar three-diamensional (3D) structure features of 1Ax2.1\* with two x-type superior quality subunits (1Ax1 and 1Ax2\*) and 1By19\* with four y-type superior quality subunits (1By16, 1By9, 1By8 and 1By18). Four cysteine residues in the three x-type subunits (1Ax2.1\*, 1Ax1 and 1Ax2\*) and the cysteine in intermediate repeat region of y-type subunits were not expected to participate in intramolecular disulfide bond formation, but these cysteines might form intermolecular disulfide bonds with other glutenins and gliadins to enhance gluten macropolymer formation. The SNP-based molecular markers for *1Ax2.1\** and *1By19\** genes were developed, which were verified in different F2 populations and recombination inbred lines (RILs) derived from crossing between spelt wheat and bread wheat cultivars. This study provides data on new glutenin genes and molecular markers for wheat quality improvement.

**Keywords:** spelt wheat; HMW-GS; three-diamensional structure; SNP marker; quality improvement

## **1. Introduction**

Wheat *(Triticum aestivum* L., 2n=6x=42, AABBDD) is one of the three major food crops globally with a long history of cultivation, accounting for almost 35% of the human population's dietary needs [1]. Wheat is an allohexaploid species, and has a huge genome (up to 17 GB) containing A, B, and D subgenomes and a large amount of repetitive sequences [2]. Wheat flour can be processed into various foods such as bread, noodles, cookies as well as non-food products because of its versatile functional features and excellent textural properties, which are closely related to the structure of gluten and interactions within the protein complex in grains [3,4].

The gluten proteins include water-insoluble monomeric gliadins and polymeric glutenins, which are considered as the largest protein molecule in nature [5]. Gliadins act as a "plasticizer" to endow dough extensibility or viscous flow. The glutenin proteins include high and low molecular weight glutenin subunits (HMW-GS, LMW-GS), of which HMW-GS serve as "backbone" for interactions with other glutenin subunits and confer dough elasticity or strength [6,7]. HMW-GS are encoded by two tightly linked paralogous genes (larger x- and smaller y-type) located at the *Glu-A1*, *Glu-B1* and *Glu-D1* loci on the long

**Citation:** Cao, Y.; Zhang, J.; Wang, R.; Sun, H.; Yan, Y. Molecular Characterization and SNP-Based Molecular Marker Development of Two Novel High Molecular Weight Glutenin Genes from *Triticum spelta* L.. *Int. J. Mol. Sci.* **2022**, *23*, 11104. https://doi.org/10.3390/ ijms231911104

Academic Editors: Jinsong Bao and Jianhong Xu

Received: 26 August 2022 Accepted: 11 September 2022 Published: 21 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

arm of the chromosome 1A, 1B and 1D, respectively. Theoretically, individual hexaploid wheat cultivars can express 6 different HMW-GS, but usually 3~5 HMW-GS are present due to gene silence [8,9]. The allelic variations at *Glu-1* loci are closely associated with flour processing quality [10]. For example, the subunit pairs and subunits 1Dx5+1Dy10, 1Bx7+1By8, 1Bx13+1By16, 1Bx17+1By18, 1Ax1, 1Ax2\* have positive effects on gluten quality whereas 1Dx2+1Dy12 and 1Bx20 are related to poor dough strength [11–13].

Typical HMW-GS have three structural domains: a long and variable central repeat region conferring elasticity to protein molecules, and flanked by the short and conservative Nand C-terminals, in which most or all cysteine residues are present [14,15]. The repeat region is rich in β-turns flanked by spherical conservative regions formed by α-helices [16]. Most of the cysteine residues form intra-chain disulfide bonds, some of them form inter-chain disulfide bonds and directly affect dough formation and rheological characteristics [17–19]. Generally, the β-turn structure confers protein molecules with significant deformation resistance [20]. In particular, the central repeat domain can be folded in the action of protein disulfide isomerase, which is the basis of dough extension [21]. However, although different methods such as homology modelling and fold identification algorithm were used for deciphering the structure features of HMW-GS [22,23], the detailed 3D structures of wheat glutenin subunits and gluten proteins are still not clear due to the complexity of the protein compositions and the difficulty of crystallization [24–26]. The recently developed protein predicting tool Alphafold has provided the possibility to dissect the 3D structure of glutenin subunits, which is considered as the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known [27–29]. The recent study has shown that AlphaFold2 using the transformer design can obtain a high accuracy and is easy for 3D structure prediction of wheat MATE proteins [30].

In the past decades, a lot of studies have showed that the allelic variations at *Glu-1* in bread wheat are limited. However, more extensive *Glu-1* allelic variations are present in wheat related species such as spelt wheat [31,32], *T. dicoccum* [33], *Ae. tauschii* [34,35], *Ae. longissima* [36,37], *Ae. speltoides* and *Ae. kotschyi* [38]. Spelt wheat (*T. spelta* L., 2n=6x=42, AABBDD) is closely related to common wheat and belongs to the same species with few differences among the subspecies [39–41]. As the first-class gene resource of wheat, spelt wheat is rich in nutritional value, various disease resistance factors, tolerance for various abiotic stresses and can be used for wheat genetic improvement through introducing the beneficial gene from spelt wheat [42–45]. In particular, some novel HMW-GS and subunit combinations in European spelt wheat varieties were identified such as 1Ax2.1\*, 1Bx13\*+1By19\* and 1Bx6.1+1By22.1 [31]. These new allelic variations can be used as potential gene resources for improving wheat gluten quality. However, the molecular characterization of these novel glutenin genes and their application values for wheat quality improvement are still unknown.

To create novel germplasm for the improvement of wheat quality, we amplified and cloned the encoding genes of 1Ax2.1\* and 1By19\* subunits from European spelt wheat varieties by AS-PCR, and their molecular characterization and phylogentic relationships among HMW-GS genes were investigated. Meanwhile, AlphaFold modeling was used to reveal the 3D structural features of 1Ax2.1\* and 1By19\* subunits and previously characterized superior HMW subunits. On the basis of the sequence variations of promoter region, the specific SNP-based markers for *1Ax2.1\** and *1By19\** genes were developed and validated by using a wide range of wheat cultivars, F2 populations and RILs from different crossings between spelt wheat and bread wheat cultivars. The results are discussed in relation to provide new gene resources and molecular markers for the improvement of wheat's breadmaking quality.

#### **2. Results and Discussion**

#### *2.1. Identification of Novel HMW-GS in Spelt Wheat*

HMW-GS compositions at *Glu-1* loci in Spelt 137 and Spelt 6 were identified by SDS-PAGE (Figure 1). The results showed that Spelt 137 contained 1Ax2.1\*, 1Bx13+1By22\* and 1Dx2+1Dy12 subunits at the *Glu-A1*, *Glu-B1* and *Glu-D1* locus, respectively. The new x-type subunit 1Ax2.1\* encoded by *Glu-A1* located between 1Ax1 and 1Ax2\* and had a lower electrophoretic mobility and higher molecular weight than 1Ax2\* (Figure 1A). In addition, one new y-type subunit 1By22\* was present at *Glu-B1* locus, which had a similar electrophoretic mobility and molecular weight with 1By18 subunit. The HMW-GS compositions in Spelt 6 were null at *Glu-A1*, 1Bx13\*+1By19\* at *Glu-B1* and 1Dx2+1Dy12 at *Glu-D1*. The new subunits 1Bx13\* and 1By19\* encoded by *Glu-B1* showed a similar electrophoretic mobility and molecular weight with 1Bx13 and 1By16 subunit, respectively (Figure 1B). The above two new subunits in spelt wheat were firstly found in our previous study [31]. These new allelic variations at *Glu-1* loci provide potential gene resources for wheat quality improvement.

#### *2.2. Molecular Cloning and Characterization of 1Ax2.1\* and 1By19\* Genes from Spelt Wheat*

Two specific amplification fragments with about 2500 bp from Spelt 137 and 2200 bp from Spelt 6 were obtained by AS-PCR (Figure S1), which were corresponding to the sizes of x-type and y-type HMW-GS genes, respectively. Both PCR fragments were collected, cloned and sequenced, and their complete coding sequences with typical structural characteristics of previously characterized HMW-GS genes were obtained and designated as *1Ax2.1\** and *1By19\**. Both *1Ax2.1\** and *1By19\** gene sequences were deposited in GenBank with the accession numbers of MK395158 and MK395159, respectively. The nucleotide

sequences of *1Ax2.1\** (2478 bp) and *1By19\** (2163 bp) genes encoded 824 and 720 amino acid residues, respectively.

The deduced amino acid sequence of *1Ax2.1\** gene showed four distinct domains as other x-type HMW-GSs in Figure 2: a signal peptide of 21 amino acid residues, an N-terminal sequence of 86 amino acid residues followed by a central repetitive domain of 672 residues, a C-terminal domain of 42 residues. The repeat region of 1Ax2.1\* subunit consisted of tandem and interspersed repeats of 33 hexapeptide motifs (consensus PGQGQQ), 10 nonapeptide motifs (consensus GYYPTSPQQ and GYYPTSLQQ) and 23 tripeptide motifs (consensus GQQ). The deduced amino acid sequences of both genes were aligned with other HMW-GSs from wheat and related species to compare their sequence characteristics. Compared with the superior subunit 1Ax2\* [46], 1Ax2.1\* had 6 and 9 amino acid insertions, 6 amino acid deletions and 22 single amino acid substitutions, which ultimately increased the length of the repeat region. Differences in subunit size were mainly caused by the sequences of repetitive polypeptide motifs caused by base deletion or insertion [15,47]. It is known that the longer and more regular repeating units are usually more conducive to superior gluten quality [15,36]. 1Ax2.1\* contained a longer repetitive domain with additional two hexapeptides (consensus PGQGQQ) than the superior quality subunit 1Ax2\*. Besides, 1Ax2.1\* had 6 amino acids deletion in repeat domain compared with the superior quality 1Ax1, but the tripeptides (GQQ), hexapeptides (PGQGQQ) and nonapeptides (GYYPTSPQQ) were highly consistent. And just like 1Ax2.1\*, the number and distribution of cysteine residues of these subunits were highly conserved between them, including sites 31, 46, 61 at the N-terminal and site 812 or 818 or 803 at the C-terminal.

Similarly, *1By19\** gene encoded 720 amino acid residues, and showed a similar primary structures with y-type HMW-GSs in Figure 3, including a signal peptide of 21 amino acid residues, an N-terminal of 104 amino acid residues, a repeat region of 553 amino acid residues and a C-terminal of 42 amino acid residues. The repeat region contained 8 nonaand 34 hexapeptides. In addition, 1Ax2.1\* subunit contained four conserved cysteine residues (three at the N-terminus and one at the C-terminus) while 1By19\* subunits had seven conserved cysteine residues (five at the N-terminus, one at the repetitive domain, and one at the C-terminus). Likewise, the repeat unit in 1By19\* accounted for about 50% of the total amino acids in the repeat region, in which the number of peptide segment PGQGQQ was equivalent to the y-type superior quality subunit 1Dy10. Overall, 1By19\* had the same length of amino acid sequence as the superior quality subunits 1By8 and 1By18. But the amino acid length of 1By19\* located between the superior quality subunits 1By9 and 1By16 due to the insertion and deletion of amino acids in the middle repeat region. Besides, these subunits all contained the same number of nonapeptides (consensus GYYPTSLQQ), but 1By19\* contained 1~2 additional hexapeptides (consensus PGQGQQ). And each of these subunits contained seven cysteine residues, including site 606 or 591 or 624 near the C-terminus flanked by sites 31, 43, 65, 66 and 76 at N-terminus and site 708 or 693 or 726 at the C-terminus.

#### *2.3. SNP and InDel Variations in 1Ax2.1\* and 1By19\* Genes*

Sequence variations resulting from point mutation and insertion/deletions (InDels) are the most common cause of wheat storage protein variation [48,49]. As listed in Table 1, 11 SNPs in *1Ax2.1\** gene were present at different positions in N-terminal domain and in repeat region. Among them, 8 SNPs were nonsynonymous and resulted in the changes of corresponding amino acid residues of H-Y, L-S/P/G, E-G/T/Q, R-G, R-G/L, A-P/G, L-S/P. The remaining three SNPs at positions 198 bp (T-C), 288 bp (G-A) and 1302 bp (T-A/C) were synonymous and did not cause amino acid residue changes. One nonsynonymous SNP variation in *1By19\** gene was detected: A/G-T transversion at the positions of 1137 bp, which led to the amino acid residue change (Q/E/G-H). No InDel variations were found in *1By19\** gene.

**Figure 2.** Comparison of the deduced amino acid sequences of *1Ax2.1\** gene from spelt wheat with other eight x-subunit genes. The blue boxes indicate the cysteine residues, red boxes indicate differences of amino acid.

**Figure 3.** Comparison of the deduced amino acid sequences of *1By19\** gene from spelt wheat with other eight y-type subunit genes. The blue boxes indicate the cysteine residues, red boxes indicate differences of amino acid.

**Table 1.** Positions of SNPs identified between *1Ax2.1\** and other x-type HMW-GS genes.


#### *2.4. Verification of the Cloned 1Ax2.1\* and 1By19\* Genes from Spelt Wheat by Tandem Mass Spectrometry Analysis*

The corresponding HMW-GS on the SDS-PAGE gel were collected and digested, and then identified by MALDI-TOF/TOF-MS (Table 2). The results showed that 1Ax2.1\* and the x-type subunit MK395158 could well match in five peptide segments with 35 (45–79), 24 (195– 218), 49 (357–405), 6 (725–730) and 24 (792–815) amino acid residues, respectively (Figure 2). 1By19\* was well marched to the y-type subunit MK395159 in one peptide segment of 43 residues (QLQCERELQESSLEACRQVVDQQLAGRLPWSTGLQMRCCQQLR) at the position of 28-70 amino acid (Figure 3). All peptide segments identified by tandem mass spectrometry acquired a high protein score (C.I.%) of 100, indicating a high realiability of MS/MS analysis. These results further verified the validity of the cloned *1Ax2.1\** and *1By19\** genes.

**Table 2.** Identification of 1Ax2.1\* and 1By19\* subunits by MALDI-TOF/TOF-MS.


#### *2.5. Phylogenetic Analysis of Ax2.1\* and 1By19\* Genes*

The polymorphism of glutenin subunits is beneficial for understanding the origin and evolution of spelt wheat. The early study on the allelic variations at *Glu-1* and *Glu-3* in spelt wheat varieties supported the hypothesis of secondary origin of spelt wheat from hybridizing between cultivated emmer (*T. dicoccum*, AABB) and club wheat (*T. aestivum* subsp. *compactum*, AABBDD) [31]. Here, the complete encoding sequences of the cloned *1Ax2.1\** and *1By19\** genes from spelt wheat and other 27 HMW-GS genes from common wheat and related species were used to construct a neighbor-joining phylogenetic tree (Figure 4). The results showed that the x-type and y-type HMW-GS genes in the phylogenetic tree were clearly separated into two clades. In the x-type gene clade, 1Bx genes together with two Sx genes from S genome were classified into a subgrouop while the Dx and Ax genes were classified into another subgroup. In particular, *1Ax2.1\** gene from spelt wheat showed a close phylogenetic relationship with *1Ax1*, *1Ax2\** and *1Ax2.1* genes. In the y-type gene clade, the Dy and By genes were classified into different subgroups while *1By19\** gene from spelt wheat was more closed to *1By16*, *1By9*, *1By8* and *1By18* genes.

To further explore the evolutionary relationships between *1Ax2.1\** and *1By19\** and other HMW-GS genes, their divergence times (million years ago, MYA) were estimated and the results are shown in Table S3. In general, the divergent times between x-type subunit genes between y-type subunit genes at each locus were about 0-13 MYA and 0.2-6.0 MYA while the divergence between x- and y-type subunit genes in each locus occurred about 14-17 MYA. The divergence among *1Ax2.1\** and *1Ax1* and *1Ax2\** genes occurred more recently, at about 0.92-1.00 MYA while earlier divergence between *1Ax2.1\** and *1Ax2.1* occurred at 2.31 MYA. *1By19\** displayed a high similarity with *1By16* and they were diverged more recently at 0.31 MYA. It is known that the hexaploid wheat is originated from the hybridation between tetraploid wheat and *T. tauschii* that occurred at about ten-thousand years ago. Thus, *1Ax2.1\** and *1By19\** genes should emerge before spelt wheat formation.

**Figure 4.** Phylogenetics of *1Ax2.1\** and *1By19\** genes and other 27 HMW-GS genes from A, B and D genomes of common wheat and related species. Bootstrap value ≥28% is indicated above or below the branches. The red frame indicates *1Ax2.1\** and *1By19\** genes, respectively. The 27 HMW-GS genes include X61009, M22208, KX454509, X13927, JN982368, KF733216, KJ579439, AB263219, AY553933, KF466259, AY159367, KJ144185, X61026, KF733215, KJ579440, EF540765, KF430649, X12929, BK006459 from *T*. *aestivum*; HQ380225 and HQ380224 from *Ae. speltoides*; KF995273 from *Triticum turgidum*; EU495302 from *T. aestivum* subsp. *Yunnanense*; AJ437000, LN828972 and AY245797 from *T. turgidum* subsp. *Durum*; HQ834308 from *T. monococcum* subsp. *Monococcum*.

#### *2.6. Secondary Structure and 3D Structure Analysis of 1Ax2.1\* and 1By19\* Protein Subunits*

The secondary structure characteristics of glutenin proteins can be used to uncover the molecular mechanisms of gluten quality formation [50]. In general, higher content of β-sheets and α-helix in glutenin subunits is helpful for forming good gluten structure and superior breadmaking quality [51,52]. In this study, the secondary structures of 1Ax2.1\* and 1By19\* subunits and other eight wheat HMW-GS (1Ax2\*, 1Ax1, 1By8, 1By9, 1By15, 1By16, 1By18 and 1By20) were predicted by the PSIPRED server and a comparative analysis for their secondary structure features was performed. Among them, 1Ax2\*, 1Ax1, 1By8, 1By15, 1By16 and 1By18 are considered as superior quality subunits whereas 1By20 is a poor quality subunit [11–13]. As shown in Table 3, 1Ax2.1\* subunit had seven α-helixes (9.58%) and four β-strands (0.82%), better than the superior quality subunits 1Ax2\* with the same seven α-helixes (9.33%) and two β-strands (0.98%), and 1Ax1 with six α-helixes (9.40%) and without β-strands. 1By19\* subunit and other six y-type subunits generally contained eight α-helixes with different percentages, but no β-strands were present. 1By19\* had 12.36% α-helixes, higher than 1By15 (12.03%), 1By16 (11.65%), 1By18 (12.22%), and 1By20 (11.85%), and lower than 1By8 (16.81%) and 1By9 (13.19%).


**Table 3.** Secondary structure prediction of 1Ax2.1\* and 1By19\* subunits and other eight wheat HMW-GS.

The 3D structures of 1Ax2.1\* and 1By19\* subunits and other six superior quality subunits 1Ax2\*, 1Ax1, 1By8, 1By9, 1By16 and 1By18 were further predicted by AlphaFold2 (Figure 5). As a deep learning algorithm, AlphaFold2 can visualize, analyze and interpret 3D protein structures basing on amino acid sequences [53]. The results showed that Alphafold2 predictions with five cycles for eight HMW-GS were almost identical, including three indicators of alignment errors, per-residue confidence scores, matching templates (Figure S2). Eight subunits had different confidence scores in 3D structure models after five cycles of prediction, which were divided into five different ranks according to their scores (Figure S3). The highest level for eight subunits is shown in Figure 5, and the confidence score for 1Ax2.1\* was 33.9, slightly close to 1Ax1 (34.9) and 1Ax2\* (34.4). The confidence score for 1By19\* was 36.9, slightly close to 1By16 (37.4), 1By9 (37.7), 1By8 (37.2) and 1By18 (37.0). All subunits predicted had a typical 3D structure, including two non-repetitive structural domains (N- terminal and C-terminal) with a repetitive fragment in the middle and a large number of α-helices at the N- and C-termini. In general, three x-type subunits (1Ax2.1\*, 1Ax1 and 1Ax2\*), and five y-type subunits (1By19\*, 1By16, 1By9, 1By18 and 1By8) showed a similar 3D structural characteristics, respectively. It is worth noting that the number and distribution of Cys residues play an important role in determining the formation of glutenin polymer and the subsequent rheological parameters of dough [54–56]. Particularly, three x-type subunits (1Ax2.1\*, 1Ax1 and 1Ax2\*) contained four cysteine residues (three in the N-terminal domain and one in the C-terminal domain), which did not form intramolecular disulfide bonds within the subunits. Five y-type subunits (1By19\*, 1By16, 1By18, 1By8 and 1By9) contained seven cysteine residues (five in the N-terminal domain, one in the C-terminal domain and one in the intermediate repeating structural domain), and formed three disulfide bonds: Cys 31~76, 43~65 and 66~706 in 1By19\*, Cys 31~76, 43~65 and 66~726 in 1By16, Cys 31~76, 43-65 and 66~693 in 1By9, Cys 31~76, 43~65 and 66~708 in 1By8, and Cys 31~76, 43~65 and 66~708 in 1By8. However, the cysteines at the intermediate repeat region were unable to form a disulfide bond (Figure 5). Meanwhile, 1Ax2.1\* and 1By19\* subunits contained a large number of α-helices at the Nand C-termini, consistent with the above mentioned secondary structure prediction results as well as previous report [57,58]. Although the cysteines in 1Ax2.1\*, 1Ax1 and 1Ax2\*

subunits and three cysteines in 1By19\*, 1By16, 1By9, 1By8 and 1By18 subunits did not form intramolecular disulfide bonds, they might form intermolecular disulfide bonds with other glutenins or gliadins to further form gluten macropolymers [59].

**Figure 5.** Localized view of the 3D structures of eight wheat HMW-GS by AlphaFold2.

#### *2.7. Development and Validation of SNP-Based Molecular Markers for 1Ax2.1 \* and 1By19\* Genes*

A new marker type named SNP, is only a bi-allelic type of marker in nature with low expectation of heterozygosity [60,61]. In the past years, a number of SNPs-based molecular markers for glutenin genes have been developed such as *1Dx2* and *1Dx5* alleles [62], *1By18* [49], *1Slx2.3\** and *1Sly16\** [36]. These markers can be for rapid improvement of wheat gluten quality. Herein, we amplified and cloned the upstream promoter sequences of *1Ax2.1\** and *1By19\** genes. Then BioEdit7.0 was used to compare their promoter sequence differences with other x- and y-type HMW-GS genes and the specific SNP variations were identified. Ultimately, we used these SNP variations to develop SNP-based molecular markers for *1Ax2.1\** and *1By19\** genes.

The specific primers were designed (Table S4), and then used to amplify the upstream promoter sequences of *1Ax2.1\** and *1By19\** genes, and two specific fragments 905 bp and 728 bp from *1Ax2.1\** and *1By19\** genes were obtained, respectively (Figure S4). After cloning and sequencing, the typical promoter sequences of *1Ax2.1\** and *1By19\** genes were obtained, which were used for sequence alignment with the promoter sequences of other 12 x- and y-type HMW-GS genes deposited in GenBank (Figure S5 and S6). Two specific SNP sites in the upstream promoter region of each gene were identified: −377 bp and −203 bp in *1Ax2.1\** and −591 bp and −56 bp in *1By19\**. According to these specific SNP sites, two pairs of specific primers (2.1\*F/R and 19\*F/R) were designed (Table S5) and used to develop SNP-based molecular markers of *1Ax2.1\** and *1By19\** genes as showed in Figures S5 and S6 via combining with SDS-PAGE identification. The results from 47 bread wheat and spelt wheat varieties with different *Glu-1* allelic variations (Table S1) showed that one 209 bp specific fragment was amplified by using 2.1\*F/R primer pair from the varieties containing 1Ax2.1\* subunit whereas no amplified products were obtained in the varieties without 1Ax2.1\* subunit (Figure 6A,B). Similarly, one 570 bp specific fragment was amplified by the 19\*F/R primer pair from varieties with 1By19\* subunit, which was absent in the varieties without 1By19\* subunit (Figure 6C,D). These results were well consistent with the SDS-PAGE identification. Further collection and sequencing of both amplified fragments also showed a consistence with their amplified regions as shown in Figures S5 and S6.

To further validate the developed SNP markers, two pairs of specific primers were used to amplify the promoter sequences of the target genes in different hybrid populations and RILs. Meanwhile, SDS-PAGE identification was conducted to verify the PCR results. For the marker validation, 200–250 grains from F2 generation populations and 50–80 grains from each RIL were detected. In the F2 populations of Spelt 137×Zhongmai 175, Spelt 137×Ningchun 4 and Spelt 137×Zhongmai 8601 crosses, 2.1\*F/R primer could specifically amplify the 209 bp fragment in the F2 grains containing 1Ax2.1\* subunit, and no amplified products were produced in the F2 grains without 1Ax2.1\* subunit. At the same time, three F2 populations of Spelt 6×Zhongmai 175, Spelt 6×Ningchun 4 and Spelt 6×Zhongmai 8601 crosses were used to verify the molecular marker of *1By19\** gene and the similar results were obtained. The F2 grains having 1By19\* subunit showed the 570 bp specific fragment when amplified by 19\*F/R primer pair (Figure 7). Furthermore, seven RILs from Spelt 137×Ningchun 4 crossing and six RILs from Spelt 6×Ningchun 4 crossing were further used to verify the developed molecular markers for *1Ax2.1\** and *1By19\** genes. As expected, all RILs containing 1Ax2.1\* and 1By19\* subunits could exhibit 209 bp and 570 bp specific fragments, respectively (Figure 8). These results were well consistent with SDS-PAGE identification, confirming that the developed SNP-based molecular markers have high specificity and accuracy for identifying *1Ax2.1\** and *1By19\** genes. Therefore, these SNP-based markers are expected to be used for breadmaking quality improvement via marker-assisted selection during wheat breeding program.

**Figure 6.** Development of the SNP-based molecular markers for *1Ax2.1\** and *1By19\** genes. (**A**) Identification of 1Ax2.1\* subunit in different wheat varieties by SDS-PAGE. (**B**) PCR amplification of *1Ax2.1\** gene from different wheat varieties by the primer 2.1\*F/R. The 209 bp specific amplified fragment is indicated. (**C**) Identification of 1By19\* subunit in different wheat varieties by SDS-PAGE. (**D**) PCR amplification of *1By19\** gene from different wheat varieties by the primer 19\*F/R. The 570 bp specific amplified fragment is indicated. The varied numbers in the figure are the same as those in Table S1. P<sup>1</sup> : Spelt 137; P<sup>2</sup> : Spelt 6; P<sup>3</sup> : Spelt 20; P<sup>4</sup> : Spelt 24.

**Figure 7.** Validation of the SNP-based molecular marker for *1Ax2.1\** (**A**) and *1By19\** (**B**) genes in the F2 populations from crossing between spelt wheat and bread wheat varieties. 1Ax2.1\* and 1By19\* subunits and their special amplification fragments 209 bp and 570 bp are indicated. P<sup>1</sup> : Spelt 137; P<sup>2</sup> : Spelt 6; ZM175: Zhongmai 175; NC4: Ningchun 4; ZM8601: Zhongmai 8601.

**Figure 8.** Validation of the SNP-based molecular markers for *1Ax2.1\** and *1By19\** genes in the RILs derived from Spelt 137×Ningchun 4 and Spelt 6×Ningchun 4 crosses. (**A**) HMW-GS from seven RILs (Lanes 1–7) from Spelt 137×Ningchun 4 identified by SDS-PAGE; (**B**) PCR amplification of *1Ax2.1\** gene from seven RILs (Lanes 1–7) of Spelt 137×Ningchun 4; (**C**) HMW-GS from six RILs (lanes 1–6) from Spelt 6×Ningchun 4 identified by SDS-PAGE; (**D**) PCR amplification of *1By19\** gene from six RILs (lanes 1–6) from Spelt 6×Ningchun 4. 1Ax2.1\* and 1By19\* subunits and the special amplification fragments 209 bp and 570 bp are indicated. P<sup>1</sup> : Spelt 137; P<sup>2</sup> : Spelt 6; NC4: Ningchun 4.

#### **3. Materials and Methods**

#### *3.1. Plant Materials*

The materials used in this work included two European spelt wheat varieties Spelt 6 and Spelt 137, 45 bread wheat and spelt wheat varieties with different allelic variations at *Glu-1* loci (Table S1), six F2 generation populations from Spelt 137×Zhongmai 175, Spelt 137×Ningchun 4, Spelt 137×Zhongmai 8601, Spelt 6×Zhongmai 175, Spelt 6×Ningchun 4, and Spelt 6×Zhongmai 8601, seven and six F6 RILs, respectively, derived from crossing between Spelt 137×Ningchun 4 and Spelt 6×Ningchun 4 via consecutive self-crossing combining with screening and identification. All spelt wheat varieties were collected from Plant Breeding Institute, Technical University of Munich, Germany.

#### *3.2. HMW-GS Extraction and SDS-PAGE*

The extraction of seed HMW-GS was carried out with referred to the previous study with minor modifications [63]. Firstly, 70% ethanol (*v/v*) and 55% isopropanol (*v/v*) were sequentially added to homogenized seeds to remove albumins, globulins and gliadins. Then, glutenins were extracted by a commonly used glutenin extraction buffer (50% isopropanol, 80 µL Tris-HCl, pH 8.0) with 1% dithiothreitol (*w/v*) and 1.4% 4-vinylpyridine (*w/v*). The final extracted glutenin proteins were used for SDS-PAGE based on the reported method [34].

#### *3.3. DNA Isolation, AS-PCR Amplication and Sequencing*

The wheat seedling leaves were used to extract genomic DNA according to the improved CTAB method [37]. Two pair of AS-PCR primers were designed according to the published HMW-GS gene sequences and listed in Table S2. The high-fidelity polymerases (Vazyme Biotech, Nanjing, China) were used for amplifying the complete coding sequence of HMW-GS. PCR reaction was performed by CFX96 Real Time system (Bio-Rad Laboratories) programmed at an initial denaturation at 95 ◦C for 3 min followed by 35 cycles of 95 ◦C for 15 s, 61 ◦C for 15 s, 72 ◦C for 150 s and finally extended at 72 ◦C for 5 min. The

PCR products were separated by 1% agrose gel in Tris-acetic acid-EDTA buffer and the fragments of expected size were collected and purified by using Gel Extraction Kit (Omega, Bienne, Switzerland), and then the purified products were ligated into pMD18-T vector (TaKaRa Biotechnology, Dalian, China) and transformed into Trans-2-blue competent cells. Three positive clones were randomly selected to reduce the sequencing error, and then sequenced by TaKaRa Biotechnology, Dalian, China.

#### *3.4. Sequence Alignment and SNP/InDel Identification*

Multiple sequences alignments of the cloned HMW-GS genes and previously characterized x-type and y-type subunit genes were conducted by using Bioedit (version 7.0, Tom Hall, Scotts Valley, USA) SNPs and InDels variations in HMW-GS genes were identified. 15 x-type and 12 y-type HMW-GS genes deposited in GenBank are as follows: *1Ax1* (X61009), *1Ax2\** (M22208), *1Ax2.1* (HQ834308), *1Bx6* (KX454509), *1Bx7* (X13927), *1Bx13* (JN982368), *1Bx14* (KF733216), *1Bx17* (AB263219), *1Bx20* (AJ437000), *1Bx23* (AY553933), *1Dx2* (KF466259), *1Dx2.2* (AY159367), *1Dx5* (KJ144185), *Sx1\** (HQ380225), and *Sx3\** (HQ380224), *1Ay* (FJ404595), *1By8* (AY245797), *1By9* (X61026), *1By15* (KF733215), *1By15\** (KJ579440), *1By16* (EF540765), *1By18* (KF430649), *1By20* (LN828972), *1Dy10* (X12929), *1Dy12* (BK006459), *Sy9\** (HQ380223), and *Sy18\** (HQ380222).

#### *3.5. MALDI-TOF/TOF-MS*

According to the previously described method [49,51], the corresponding HMW subunits bands on SDS-PAGE gel were excised, and then digested with trypsin. HMW-GS identification of tandem mass spectrometry was performed by using EASY-nLC 1000 (Thermo/Finnigan, San jose, CA, USA) equipped with orbitrap Q Exactive mass spectrometry (Thermo/Finnigan).

#### *3.6. Construction of Phylogenetic Tree and Estimation of Divergence Time*

The full-length homologous nucleotide sequences were aligned using the Clustal W program, and the alignment file was used to construct phylogenetic tree according to the complete coding regions of HMW-GS by using software MEGA (version 6.0, Koichiro Tamura at al., Auckland, New Zealand). An evolution rate of 6.5 <sup>×</sup> <sup>10</sup>−<sup>9</sup> substitution/site year was used to estimate the divergence times of HMW-GS genes based on the reported method [64].

#### *3.7. Secondary Structure and 3D Structure Prediction of HMW-GS*

Secondary structure prediction of the deduced amino acid sequences from the cloned HMW-GS genes were carried out by PSIPRED (http://bioinf.cs.ucl.ac.uk/psipred/psiform. html, accessed on 10 May 2022) [65,66]. The 3D structures of HMW-GS were predicted by AlphaFold2 [67,68], and then editing was carried out by Pymol software (version 1.7.4, Schrödinger, Warren Lyford DeLano, New York City, USA).

#### *3.8. Development and Validation of SNP-Based Molecular Markers*

According to the SNP variations in the upstream promoter sequences of the coding region in the cloned HMW-GS genes (Table S4), two pairs of AS-PCR primers were designed (Table S5). Genomic DNA was extracted using the Trans Fast Taq DNA Polymerase system (TransGen Biotech, Beijing, China). PCR cycles consisted of 94 ◦C for 3 min for activation, followed by 35 cycles of 94 ◦C for 5 s, 58 ◦C/55 ◦C (Table S6) for 15 s, 72 ◦C for 2/4 s, and finally ended at 72 ◦C for a 5 min extension step. The materials used for marker development and verification were described in the section of plant materials.

#### **4. Conclusions**

Two novel HMW-GS 1Ax2.1\* and 1By19\* were identified in European spelt wheat, and their complete encoding gene sequences of corresponding genes were cloned and sequenced. Molecular characterization showed that both subunits had longer and more

regular repeating units, and a high percentage of α-helices and β-sheets. AlphaFold2 prediction showed that 1Ax2.1\* and 1By19\* showed similar 3D structural characteristics with three superior quality x-type subunits (1Ax2.1\*, 1Ax1 and 1Ax2\*) and four superior quality y-type subunits (1By16, 1By9, 1By8 and 1By18), respectively. Both subunits contained a large number of α-helices at the N- and C-termini. In particular, four cysteine residues in three x-type subunits (1Ax2.1\*, 1Ax1 and 1Ax2\*) and the cysteines in the intermediate repeat region were unable to form intramolecular disulfide bonds, but these cysteines might form intermolecular disulfide bonds with other glutenins and gliadins to enhance gluten macropolymer formation. The SNP-based molecular markers for *1Ax2.1\** and *1By19\** genes were developed according to the SNP variations in the promoter regions and verified in different F2 generation populations as well as different recombination inbred lines derived from crossing between spelt wheat and bread wheat cultivars. These molecular markers have shown a high reliability and have a good application prospect for improving bread making through marker-assisted selection.

**Supplementary Materials:** The supporting information can be downloaded at: https://www.mdpi. com/article/10.3390/ijms231911104/s1.

**Author Contributions:** Y.C., J.Z. and R.W. performed all of the experiments, data analysis and wrote the paper. H.S. performed protein identification and data analysis. Y.Y. designed and supervised experiments. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was financially supported by a grant from the National Natural Science Foundation of China (31971931).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** This manuscript has no financial or non-financial competing interests.

#### **Abbreviations**


#### **References**


## *Review* **Physiology and Molecular Breeding in Sustaining Wheat Grain Setting and Quality under Spring Cold Stress**

**Hui Su <sup>1</sup> , Cheng Tan <sup>1</sup> , Yonghua Liu <sup>2</sup> , Xiang Chen <sup>1</sup> , Xinrui Li <sup>1</sup> , Ashley Jones <sup>3</sup> , Yulei Zhu 1,\* and Youhong Song 1,4,\***


**Abstract:** Spring cold stress (SCS) compromises the reproductive growth of wheat, being a major constraint in achieving high grain yield and quality in winter wheat. To sustain wheat productivity in SCS conditions, breeding cultivars conferring cold tolerance is key. In this review, we examine how grain setting and quality traits are affected by SCS, which may occur at the pre-anthesis stage. We have investigated the physiological and molecular mechanisms involved in floret and spikelet SCS tolerance. It includes the protective enzymes scavenging reactive oxygen species (ROS), hormonal adjustment, and carbohydrate metabolism. Lastly, we explored quantitative trait loci (QTLs) that regulate SCS for identifying candidate genes for breeding. The existing cultivars for SCS tolerance were primarily bred on agronomic and morphophysiological traits and lacked in molecular investigations. Therefore, breeding novel wheat cultivars based on QTLs and associated genes underlying the fundamental resistance mechanism is urgently needed to sustain grain setting and quality under SCS.

**Keywords:** *Triticum aestivum* L.; spring frost; spikelet development; grain set and quality; QTLs

### **1. Introduction**

Wheat provides approximately 20% of the food energy and protein produced for human consumption [1], and grain quality is an important indicator due to market value and consumer acceptance [2,3]. Wheat grain quality is a complex combination of various traits, mainly controlled by genotypic and environmental factors [4]. Climate change is causing a temperature shift and ecological landscapes that negatively impact wheat yield and quality [5]. During the last several decades, it has been reported that spring cold stress (SCS) has caused severe losses in wheat production and grain quality. For example, in Australia, the SCS events that frequently occurred at wheat reproductive stage typically resulted in yield losses of 10%, and it's more than 85% in various farmlands [6,7]. Nearly 85% of China's total area planted with winter wheat experiences widespread SCS [8,9]. Reports from North America and Europe indicated that late frost spells are one of the most economically damaging agricultural climate hazards, causing substantial economic losses in 2017 [10,11]. Consequently, the abiotic stress of SCS threatens the safety of crop production systems worldwide. Wheat growth and development have been subjected to more frequent cold stress as climate change continues [12].

The SCS events often occur during the reproductive development in winter wheat [13]. The reproductive development is composed of floral initiation, pollen grain and embryo development, pollination, fertilization and grain setting, etc. [14]. When wheat suffers from frost during the reproductive growth period, it causes the wheat spike cells to lose water and wither, affecting the young spike's normal development and increasing the young

**Citation:** Su, H.; Tan, C.; Liu, Y.; Chen, X.; Li, X.; Jones, A.; Zhu, Y.; Song, Y. Physiology and Molecular Breeding in Sustaining Wheat Grain Setting and Quality under Spring Cold Stress. *Int. J. Mol. Sci.* **2022**, *23*, 14099. https://doi.org/10.3390/ ijms232214099

Academic Editors: Jinsong Bao and Jianhong Xu

Received: 15 September 2022 Accepted: 12 November 2022 Published: 15 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

spike's mortality [15,16]. Malfunctions and irreversible abortion of male and female reproductive organs and gametophytes are the main reasons for cold-induced male and female infertility [17]. During SCS, the anthers display irregular hypertrophy and vacuolation of the tapetum, an unusual accumulation of starch and protein in the plastids, and poor pollen tube development [18,19]. Zhang et al. (2021) stated that low-temperature stress significantly reduced the expression and activity of the sucrose invertase (CWINV) coding gene in young ears at the booting stage, inhibited the transport of sucrose to pollen sac, and then hindered the normal development of pollens [20]. Occurrence of SCS at late reproductive growth resulted in smaller dark-colored seeds with a wrinkled epidermis, poor seed setting and quality [21].

Wheat responds to cold stress by regulating key physiological, biochemical, and molecular mechanisms [22]. Under cold stress, a wide range of chemicals or protective proteins are produced, including soluble carbohydrates, proline, and cold-resistance proteins [23], which are involved in regulating osmotic potential, preventing ice crystal formation, the stability of cell membranes and reactive oxygen species (ROS) scavenging [24]. At the molecular level, estimates of phenotypic plasticity were used to identify loci associated with stress tolerance. Candidate genes involved in phytohormone-mediated processes for stress tolerance were proved to be involved in cold stress responses [25]. Cold acquisition of freezing tolerance requires the orchestration of disparate physiological and biochemical changes, and these changes are mainly mediated through the differential expression of genes [26,27]. Some of these genes encode effector molecules directly involved in stress mitigation, and others encode proteins for signal transduction or transcription factors that control gene pool expression [26]. Genes involved in plant metabolism were differentially expressed to avoid injury and damage associated with SCS; it includes the encodings of Ca2+ binding proteins, protein kinases, and inorganic pyrophosphatase [28].

Understanding the potential regulatory mechanisms behind SCS tolerance is necessary to create wheat breeding varieties with improved grain setting and quality under cold stress. In this review study, we further summarized the consequences of SCS and explored the potential mechanisms to sustain wheat grain setting and quality under SCS. The objectives of this study are to (i) make clear the physiological and molecular mechanism in controlling grain setting and quality under SCS, and (ii) propose breeding strategies in combatting SCS during reproductive stage.

#### **2. Effects of SCS on Grain Number and Quality in Wheat**

Under varying climatic conditions, the SCS events have become more frequent, intense, and prolonged. The SCS events often occur during the reproductive stage of winter wheat, which is critical for the establishment of the panicle [29]. The SCS compromises the development of young spike and floret; nutrient distribution is altered, and floret stunting (or sterility) occurs, resulting in poor grain set and quality (Figure 1).

**Figure 1.** A schematic diagram visually demonstrating the impacts of SCS initiated at tetrad stage (**A**) on subsequent wheat growth and development at booting stage (**B**), anthesis stage (**C**) and maturation stage (**D**). (**F**) Indicates young spikelet development at tetrad stage. (**G**) Indicates tapetum degeneration and pollen sterility in the developing anthers at booting stage. (**H**) Indicates reduced pollen viability and thus spikelet fertility. (**I**) Indicates reduced grain-filling rate and period and enhanced grain abortion, and thus less grain number and quality (**E**). **Figure 1.** A schematic diagram visually demonstrating the impacts of SCS initiated at tetrad stage (**A**) on subsequent wheat growth and development at booting stage (**B**), anthesis stage (**C**) and maturation stage (**D**). (**F**) Indicates young spikelet development at tetrad stage. (**G**) Indicates tapetum degeneration and pollen sterility in the developing anthers at booting stage. (**H**) Indicates reduced pollen viability and thus spikelet fertility. (**I**) Indicates reduced grain-filling rate and period and enhanced grain abortion, and thus less grain number and quality (**E**).

#### *2.1. Grain Number 2.1. Grain Number*

Grain number is a significant factor in determining wheat grain yield [30]. The stages from jointing to flowering are critical to prevent florets from degenerating and increase the grain setting rate [31–33]. Under SCS conditions, the lower spike number per plant and grain number per spike were primarily responsible for reduced grain production (Table 1; Figure 2) [34]. Compared with spring wheat cultivars, semi-winter wheat has stronger cold resistance. For example, under low temperature of −2–6 °C for 3 days at the jointing stage, the grain number per spike was lowered by 1.3–4.4% in Yangmai16 (spring wheat), while decreased by 0.6–1.0% in Xumai30 (semi-winter wheat) [35]. Meanwhile, cold stress led to different yields of different genotypes at the reproductive stage [36]. Compared with the control, low temperature led to zero harvest of diploid genotypes, and the yield of tetraploid genotypes decreased significantly, while hexaploid genotypes acquired relatively high maintenance rate of grain yield among three species [36]. Additionally, the yield loss caused by SCS also depends on the intensity of the low temperature and its duration [37]. Ji et al. (2017) exposed two wheat cultivars at the booting stage to freezing temperature at 2, −2, −4 and −6 °C for 2–6 d in a convective freezing chamber, causing 13.9–85.2% grain yield reduction in spring wheat, while resulting 3.2–85.9% grain yield loss in semi-winter wheat [35]. With the temperature declined to −5 °C and −7 °C at the vegetative growth stage, the grain yield decreased by 10–100% [38]. In each case, the SCS events during the reproductive development significantly affected the growth and development of younger spikes and florets, causing pollen infertility and poor grain setting [39], thereby resulting in a decrease in the number of grains. Grain number is a significant factor in determining wheat grain yield [30]. The stages from jointing to flowering are critical to prevent florets from degenerating and increase the grain setting rate [31–33]. Under SCS conditions, the lower spike number per plant and grain number per spike were primarily responsible for reduced grain production (Table 1; Figure 2) [34]. Compared with spring wheat cultivars, semi-winter wheat has stronger cold resistance. For example, under low temperature of −2–6 ◦C for 3 days at the jointing stage, the grain number per spike was lowered by 1.3–4.4% in Yangmai16 (spring wheat), while decreased by 0.6–1.0% in Xumai30 (semi-winter wheat) [35]. Meanwhile, cold stress led to different yields of different genotypes at the reproductive stage [36]. Compared with the control, low temperature led to zero harvest of diploid genotypes, and the yield of tetraploid genotypes decreased significantly, while hexaploid genotypes acquired relatively high maintenance rate of grain yield among three species [36]. Additionally, the yield loss caused by SCS also depends on the intensity of the low temperature and its duration [37]. Ji et al. (2017) exposed two wheat cultivars at the booting stage to freezing temperature at 2, −2, −4 and −6 ◦C for 2–6 d in a convective freezing chamber, causing 13.9–85.2% grain yield reduction in spring wheat, while resulting 3.2–85.9% grain yield loss in semi-winter wheat [35]. With the temperature declined to −5 ◦C and −7 ◦C at the vegetative growth stage, the grain yield decreased by 10–100% [38]. In each case, the SCS events during the reproductive development significantly affected the growth and development of younger spikes and florets, causing pollen infertility and poor grain setting [39], thereby resulting in a decrease in the number of grains.


**Table 1.** Effects of SCS treatment at different stages on grain number per spike in wheat.

**Figure 2.** The photos of spikes (**A**,**C**) and grains (**B**,**D**) in wheat under the control (**A**,**B**) and spring cold stress (**C**,**D**) −2 °C for 6 h. The photo visually shows the effect of cold stress on the size and color of spike and grain number. **Figure 2.** The photos of spikes (**A**,**C**) and grains (**B**,**D**) in wheat under the control (**A**,**B**) and spring cold stress (**C**,**D**) −2 ◦C for 6 h. The photo visually shows the effect of cold stress on the size and color of spike and grain number.

#### *2.2. Grain Quality*

*2.2. Grain Quality*  Grain quality is primarily based on appearance and nutritional quality [43]. It is well known that mostly spring cold stress events are often encountered during the reproductive period in wheat, which seriously affects the absorption and distribution of nutrients [39]. Grain quality relative to its appearance refers to external morphological characteristics, including grain length, width, and aspect ratio [44]. For example, wheat responds to SCS (−4 °C for 12 h) at the jointing stage by increasing the ratio between grain length and width (L/W) for 0.4–14.2% while decreasing the equivalent diameter in 0.9–11.0% and Grain quality is primarily based on appearance and nutritional quality [43]. It is well known that mostly spring cold stress events are often encountered during the reproductive period in wheat, which seriously affects the absorption and distribution of nutrients [39]. Grain quality relative to its appearance refers to external morphological characteristics, including grain length, width, and aspect ratio [44]. For example, wheat responds to SCS (−4 ◦C for 12 h) at the jointing stage by increasing the ratio between grain length and width (L/W) for 0.4–14.2% while decreasing the equivalent diameter in 0.9–11.0% and grain area in 1.6–20.2% [45]. Compared to the cold-tolerant genotype, the grain width and L/W of the sensitive wheat genotype were more susceptible to low temperatures [46]. It also reported that the grain width is more sensitive to low temperatures than the grain length [46].

grain area in 1.6–20.2% [45]. Compared to the cold-tolerant genotype, the grain width and L/W of the sensitive wheat genotype were more susceptible to low temperatures [46]. It also reported that the grain width is more sensitive to low temperatures than the grain length [46]. In addition to affecting morphological appearance, the quality of grain nutrition is adversely affected by SCS [47]. For wheat grain nutrition quality, protein content is of key significance [48]. It has been noted that SCS limits the production of nitrogen compounds and nonstructural carbohydrates, which decreases the transit of protein and total soluble sugar from stems into grains, resulting in a decline in wheat quality [36]. Under low temperatures at the booting stage, the mean accumulation of total protein decreased by 4.8– 6.9%, albumin by 5.8–9.6%, globulin by 8.4–15.4%, gliadin by 13.2–18.4%, and glutenin by In addition to affecting morphological appearance, the quality of grain nutrition is adversely affected by SCS [47]. For wheat grain nutrition quality, protein content is of key significance [48]. It has been noted that SCS limits the production of nitrogen compounds and nonstructural carbohydrates, which decreases the transit of protein and total soluble sugar from stems into grains, resulting in a decline in wheat quality [36]. Under low temperatures at the booting stage, the mean accumulation of total protein decreased by 4.8–6.9%, albumin by 5.8–9.6%, globulin by 8.4–15.4%, gliadin by 13.2–18.4%, and glutenin by 17.8–29.1% [49]. In addition to this, reductions in the concentrations of amylose, amylopectin and total starch were also observed under different low-temperature levels [46]. According to a recent report, the total starch in wheat grains, as well as the rate of accumulation of straight-chain and branched-chain starch, were closely related to the activities of starch branching enzyme (SBE), soluble starch synthase (SSS), granule-bound starch syn-

17.8–29.1% [49]. In addition to this, reductions in the concentrations of amylose, amylo-

According to a recent report, the total starch in wheat grains, as well as the rate of accumulation of straight-chain and branched-chain starch, were closely related to the activities of starch branching enzyme (SBE), soluble starch synthase (SSS), granule-bound starch synthase (GBSS) and adenosine diphosphate glucose pyrophosphorylase (AGPase) [50], while the activity of essential starch synthesis enzymes is particularly sensitive to SCS during grain development [51]. The low temperature during the reproductive stage decreased the activities of crucial starch synthesis enzymes (AGPase, SSS, GBSS, and SBE) thase (GBSS) and adenosine diphosphate glucose pyrophosphorylase (AGPase) [50], while the activity of essential starch synthesis enzymes is particularly sensitive to SCS during grain development [51]. The low temperature during the reproductive stage decreased the activities of crucial starch synthesis enzymes (AGPase, SSS, GBSS, and SBE) in the grain, thereby reducing the accumulation of starch, resulting in a decreasing grain quality [52].

#### **3. Physiological Mechanism of Controlling Wheat Resistance to Cold Stress**

#### *3.1. Protective Enzymes for Oxidation*

Cold stress often leads to excess accumulation of reactive oxygen species (ROS) such as superoxide radical (O2−) and hydrogen peroxide (H2O2), which causes oxidative damage to DNA, proteins, and lipids, leading to the inhibition of wheat seed development [53,54]. Hence, the balanced ROS production level was achieved at the intracellular level which promotes the normal growth, development, and cellular metabolism (Figure 3) [55].

The activation of subcellular antioxidant mechanisms can provide some resistance to SCS in wheat while also decreasing oxidative burst in the photosynthesis machinery [56]. Activities of antioxidant enzymes, such as peroxidase (POD), superoxide dismutase (SOD), and catalase (CAT), play an essential role in protecting plants from oxidative damage by ROS scavenging [57,58]. Several studies have reported that alterations in the activity of numerous antioxidant defense system enzymes help plants to handle oxidative stress in wheat [59,60]. For example, cold stress (4 ◦C and −4 ◦C) increased the activity of SOD by 6.8–68.3%, POD by 16.6–69.4%, CAT by 6.0–53.8% in a wheat spikelet, compared to optimum temperature (16 ◦C) [61]. Furthermore, antioxidant chemicals, including proline, glutathione (GSH) and ascorbic acid (AsA), also play critical roles in protecting plants from ROS damage caused by cold stress [62]. Under SCS, the accumulation of proline eliminates oxygen free radicals, which balances the osmotic pattern in the cell, and maintains the normal state of the membrane [63]. For example, the application of exogenous proline improved wheat's cold tolerance, due to the increased accumulation of free proline and sucrose, by coordinating carbon and nitrogen metabolism [64]. It is noted that the AsA–GSH cycle, including ascorbate peroxidase (APX), monodehydroascorbate reductase (MDHAR), dehydroascorbate reductase (DHAR), and glutathione reductase (GR), are very effective in improving wheat cold tolerance, particularly to ROS stress [65]. For example, AsA could induce the up-regulation of diverse antioxidants (super oxide dismutase (SOD), peroxidase (POD), and catalase (CAT)), thus offsetting the adverse effects of cold stress at early and reproductive stages of wheat [66].

Another key mechanism in plant cold stress responses is the regulation of transcription by endogenous hormones and ROS [67]. Once induced by cold stress, hormones change the ROS levels due to increasing transcription or talking about post-translational modification/activation of proteins and transforming ROS signaling [68]. For instance, it has been demonstrated that the ROS generated by RBOHs mediates an interaction between ABA and BRs, enhancing cold tolerance in *Arabidopsis* [69]. According to a recent study, the application of exogenous BRs increased antioxidant capability, directing the reduction of oxidative damage caused by ROS bursts [70].

quality [52].

*3.1. Protective Enzymes for Oxidation* 

in the grain, thereby reducing the accumulation of starch, resulting in a decreasing grain

Cold stress often leads to excess accumulation of reactive oxygen species (ROS) such as superoxide radical (O2−) and hydrogen peroxide (H2O2), which causes oxidative damage to DNA, proteins, and lipids, leading to the inhibition of wheat seed development [53,54]. Hence, the balanced ROS production level was achieved at the intracellular level which promotes the normal growth, development, and cellular metabolism (Figure 3) [55].

**3. Physiological Mechanism of Controlling Wheat Resistance to Cold Stress** 

**Figure 3.** Overview of wheat responses to spring cold stress, which induces several protective measures to regulate grain setting and quality. Firstly, cold stress triggers multiple channel activation leading to the increased ABA, Ca2+ and ROS concentrations in the cytosol. The main components in the core ABA signaling transduction pathway include ABA receptor *TaPYL5*, *TaPP2C*, *TaSnRKs,* and the Ca2+ signaling transduction pathway include *TaCDPKs*, *TaCML*, *TaCaM*, which have a positive regulation of cold stress. Secondly, component changes in the MAPK cascade pathway were influenced by the activation of the ABA, ROS and Ca2+ pathway. Thirdly, cold stress response-induced signal transduction leads to the activation of multiple transcription factors, thereby regulating the metabolic hormone, protein, sucrose and antioxidant pathway. These alterations mitigate cell membrane damage and regulate intracellular osmotic balance, preventing the loss of grain yield and quality. **Figure 3.** Overview of wheat responses to spring cold stress, which induces several protective measures to regulate grain setting and quality. Firstly, cold stress triggers multiple channel activation leading to the increased ABA, Ca2+ and ROS concentrations in the cytosol. The main components in the core ABA signaling transduction pathway include ABA receptor *TaPYL5*, *TaPP2C*, *TaSnRKs*, and the Ca2+ signaling transduction pathway include *TaCDPKs*, *TaCML*, *TaCaM*, which have a positive regulation of cold stress. Secondly, component changes in the MAPK cascade pathway were influenced by the activation of the ABA, ROS and Ca2+ pathway. Thirdly, cold stress response-induced signal transduction leads to the activation of multiple transcription factors, thereby regulating the metabolic hormone, protein, sucrose and antioxidant pathway. These alterations mitigate cell membrane damage and regulate intracellular osmotic balance, preventing the loss of grain yield and quality.

#### *3.2. Carbohydrate Metabolism*

Carbohydrate metabolism plays an essential role in energy availability for plant development and also has a role in temperature acclimation [71]. In plants, several soluble sugars, such as sucrose, glucose, sucrose, fructose, raffinose and trehalose, act as biofilm protectors by interacting with the lipid bilayer. This interaction has a role in reducing membrane damage, as the sugars function as osmoprotectants and provide adaption to the cold environment [72,73].

The soluble sugars sucrose, glucose, trehalose, and fructose start accumulating in response to cold stress, enhancing cold tolerance during the reproductive stage of crops [74]. For instance, the buildup of soluble sugars under SCS can raise the amount of proline, which controls osmotic pressure, scavenges reactive oxygen species, and stabilizes biomolecule structure, reducing low-temperature damage [75,76]. Fructans, which are highly water soluble, act as osmoregulatory substances to prevent the formation of ice crystals in the

cytoplasm and improve biofilm stability, enhancing crop cold tolerance [77]. Recent research has confirmed a high correlation between fructan accumulation and cold tolerance due to increasing transcript levels of the Cor (cold-responsive)/Lea (late-embryogenesisabundant), C-repeat-binding factor (CBF), and fructan biosynthesis-related genes in the wheat family [77]. Trehalose has been found to act as an osmoprotectant, and stabilizes protein integrity in plants [78]. Importantly, exogenous trehalose prevented floret degeneration under low-temperature conditions and increased floret fertility in young spikelets, minimized any loss in grain number per spike [43].

Recently, the Sugars Will Eventually be Exported Transporters (SWEETs) have been reported to regulate abiotic stress tolerance, sugar transport, plant growth and development [79]. The SWEETs also play vital roles in oxidative and osmotic stress tolerance [80]. In wheat, the genome-wide analysis revealed 105 SWEETs, and 59% exhibited significant expression changes under abiotic stresses [81]. Importantly, *AtSWEET16* and *AtSWEET17* are two bidirectional vesicular fructose transporters that maintain glycan homeostasis and promote the accumulation of fructose in vacuoles, which may be beneficial in stress tolerance responses [82,83]. A further understanding of sugar metabolism and transport will be key in reducing any sugar starvation in crop reproductive development and enhancing seed setting rate.

## *3.3. Hormones and Ca2+ Signals*

Plants adapt to environmental changes in low-temperature settings by a sequence of cellular reactions triggered by signaling molecules (e.g., hormone signals, Ca2+ signal), which result in plant defense and adaptability to adverse conditions [84,85]. Plant hormones, such as abscisic acid (ABA) [86], jasmonic acid (JA) [87], and salicylic acid (SA) [88], have been reported to play a significant role in regulating grain quality. Past findings revealed that many plants experience higher endogenous ABA levels in response to cold stress [89,90]. In wheat, the application of exogenous ABA is reported to enhance cold tolerance by increasing the activities of antioxidant enzymes and reducing H2O<sup>2</sup> contents under cold stress [91]. In particular, ABA-dependent gene expression, which includes the ABA receptors, protein phosphatases type-2C (PP2Cs), Snf1-related kinase 2s (SnRK2s), and AREB/ABF regulon, controlled by the raised ABA levels, helped plants adapt to abiotic stress cold stress [92]. According to Zhang et al. (2018), the significant up-regulation of the *SnRK2.11*, serine/threonine-protein kinase and serine/threonine-protein phosphatase PP1-like was considered to be a significant reason for improving cold tolerance in wheat during the reproductive stage [28]. These genes were believed to function in ABA signaling in guard cells.

Additionally, JA also plays a mediating role in synthesis and signaling to mediate low-temperature tolerance [93]. For instance, endogenous JA levels were found to be increased in wheat [94], rice [95], and Arabidopsis [96], enhancing the frost resistance of crops. JA functions as an upstream signal of the ICE-CBF pathway, positively modulating freezing responses [97]. *JAZ1* and *JAZ4* are JA signaling negative regulators interacting with ICE1 and ICE2 to repress their transcriptional activity [98]. Subsequently, they regulate the expression of CBF and other low-temperature responsive genes, thus affecting wheat cold resistance [97].

It is well known that SA plays a vital role in responding to abiotic stresses, apart from regulating crop growth, ripening and development [98,99]. SA activates the active oxygen species before low-temperature exposure; it promotes an increase in antioxidant enzyme activity and higher mRNA content of *TaFeSOD*, *TaMnSOD*, *TaCAT* gene transcripts, and free Proline after SCS [100]. Freezing stress during the reproductive stage shows salicylic acid-primed wheat up-regulated the expression level of the WRKY gene (*WRKY19*), heat shock transcription factor (*HSF3*), mitochondrial alternative oxidase (AOX1a), and heat shock protein (HSP70), which contributes to increasing of antioxidant capacity and protection of photosystem in parallel with lower malonaldehyde content, superoxide radical production as compared with non-primed wheat [101]. Further research has demonstrated

that SA treatment reduces ice nucleate and induces anti-freezing protein, which inhibits the formation of ice crystals in plant cells [88].

Ca2+ is an essential secondary messenger in plants in response to cold stress [102]. Ca2+ sensors such as calmodulins (CaMs), CaM-like proteins (CMLs), Ca2+-dependent protein kinases (CPKs/CDPKs), and calcineurin B-like proteins (CBLs) are the primary transmitters of the Ca2+ signal that is induced by cold stress [103–105]. For example, *OsCPK27*, *OsCPK25*, and *OsCPK17* activated MAPK, ROS, and nitric oxide pathways in response to cold stress [85,106]. Recently, genome-wide identification and expression analysis also show that 18 *TaCaM* and 230 *TaCML* gene members were identified in the wheat genome, and *TaCML17*, *21*, *30*, *50*, *59* and *75* were identified related with responses to cold stress in wheat [107].

### *3.4. Transcription Factors*

Wheat genomes contain a large number of transcription factors that play important roles in cold-stress biological processes, including CBF [108], basic leucine zipper (bZIP) [109], MYB [110], and NAC [111].

The ICE-CBF-COR signaling pathway is widely recognized as essential for cold adaptation [112]. The receptor protein detects cold stress and initiates signal transduction, activating and regulating the ICE gene, which up-regulates the transcription and expression of the CBF gene [113]. In wheat, five ICE genes, 37 CBF genes and 11 COR genes were discovered in the wheat genome database [114]. Wheat CBF genes have been demonstrated to improve cold tolerance in other plants, as shown with transgenic barley containing *TaCBF14* and *TaCBF15* genes [115]. A vast variety of transcription factors are also important, such as CBF1, CBF2, and CBF3 [116] and C-type repeats (CTR) [117], which play crucial roles in the biological processes of abiotic stressors in wheat. Previous studies reported that cold-regulated transcriptional activator CBF3 positively regulates cold stress responses in wheat [118]. The RNA-seq data and qRT-PCR revealed that the ICE, CBF, and COR genes have varying expression patterns in different wheat organs, with ICE genes mainly up-regulated in the grain, CBF in the root and stem, and COR in the leaf and grain [114]. All these results show that the ICE–CBF–COR cascade plays a crucial role in the response of wheat to cold stress (Figure 3).

The bZIP genes are involved in important regulatory processes of plant growth and physiological metabolisms, such as promoting anthocyanin accumulation [119] and other signals [120]. Similarly, the bZIP gene also has a variety of biological functions under abiotic stress, and 187 bZIP genes have been predicted in wheat [121]. And the majority of bZIPs linked to frost tolerance in plants are positive regulators [122]. For example, phenotypic analysis and related physiological indicators of cold resistance showed that overexpression of *TaABI5* could enhance cold resistance [109]. In recent years, 15 bZIP genes with variable expression were found in early wheat spikes, and most showed an increase in in expression in response to SCS [123]. Furthermore, the bZIP genes are involved in ABA signaling and play a role in responding to freezing stress in the later stage of wheat [109]. Similarly, MYB and NAC are crucial in controlling plant growth and cold stress responses [110,124].

#### **4. Breeding Strategies to Develop SCS-Resistant Wheat**

Superior wheat genotypes are needed for SCS resistance, which will be made possible by breeding cold-resistant cultivars that maintain yield stability and high quality [125]. Appropriate measures need to be taken to cope with the consequences of SCS in wheat during the reproductive stage, to improve crop yield and quality. Strategies to strengthen SCS resistance include selecting cold-tolerance cultivars, identifying QTL/genes, and exploiting closely linked markers in wheat.

#### *4.1. QTLs Associated with Cold Resistance*

Genetic components such as QTLs have great potential to accelerate traditional breeding processes [126]. QTLs related to cold tolerance and the underlying molecular mechanisms have been thoroughly studied in wheat [127,128]. There are loci for cold resistance on 1B, 1D, 2B, 2D, 4D, 5A, 5D, and 7A, with 5A and 5D suspected to carry significant genes of interest [129,130]. Wheat chromosome 5A plays a key role in cold acclimation and frost tolerance [119]. Three key genes responsible for SCS tolerance, *Fr-1* (e.g., *Fr-A1*, *Fr-B1*) and *Fr-2* (e.g., *Fr-D1*), were located on chromosomes 5A, 5B, and 5D [131,132], with two loci being mapped within a distance of approximately 30 cM [118]. The *Fr-1* maps close to the vernalization locus Vernalization-1 (*Vrn-1*), so they showed highly homologous [133]. The *Vrn1* acts as a positive regulator of vernalization and regulates the transition from vegetative to reproductive growth in wheat [134]. The *Fr-Am2* locus is made up of a group of eleven CBF genes that are activated during vernalization, which in turn activate the COR genes necessary for enhanced cold tolerance of wheat [135,136].

Genome-wide association studies (GWAS) of traits related to wheat resistance and tolerance are essential to understanding their genetic structure for improving breeding selection efficiency [137]. 23 QTL regions located on 11 chromosomes (1A, 1B, 2A, 2B, 2D, 3A, 3D, 4A, 5A, 5B and 7D) were detected for frost tolerance in 276 winter wheat genotypes by GWAS, eight novel QTLs were discovered on chromosomes 1B, 2D, 3A, 3D, 4A and 7D [129]. Eighty SNP loci distributed in all the 21 chromosomes were associated with the resistance of SCS with 16.6–36.2% phenotypic variation by GWAS, six loci of these were stable loci with more than two traits, and multiple superior alleles were obtained from the associated loci related to SCS traits [138]. Nevertheless, the majority of the QTL intervals for low-temperature tolerance reported by GWAS are still huge, and there are too many candidate genes; the causal genes for cold tolerance are still challenging to find.

Of the different genome editing approaches, CRISPR/Cas9 genome editing module has evolved as a successful tool in modulating genes essential for developing high-stress resistance of crops [139]. Meanwhile, CRISPR/Cas9 allows the manipulation of the wheat genome for improved agronomic performance, resistance to biotic and abiotic stresses, higher yields, and better grain quality [140]. For example, Tian et al. (2013) [141] cloned *TaSnRK2.3*, then further determined its expression patterns under freezing stresses in wheat emerging and characterized its function in Arabidopsis. Overexpression of *TaSnRK2.3* significantly enhanced tolerance to freezing stress, enhancing the expression of cold stressresponsive genes and ameliorating physiological indices [141]. Additionally, it showed that overexpressing *TaFBA-A10* led to the increased activity of FBA, as well as regulating key enzymes in the Calvin cycle and the glycolysis rate to enhance cold tolerance of wheat [142]. Therefore, acceptance and utilization of new plant breeding technologies involving genome editing confer opportunities for sustainable agriculture and ensure global food security.

#### *4.2. Cultivars for SCS Resistance Based on Agronomic Traits*

Wheat yield is associated with several agronomic traits which have been used to make better cultivars, increasing the yield and quality of wheat [143]. Given the high heritability of the traits and the relevance of wheat yield, agronomic traits can be used as selection criteria in breeding and cultivar development (Table 2) [144]. Cold stress affects agronomic traits at every developmental stage, but the reproductive stages are relatively more sensitive [145]. Specifically, cold stress affects the development of young spikes and flowers, grain characteristics and quality [146,147]. Some researchers have classified inversions into five major categories based on the degree of damage to the spikelet: grade 1 for no apparent frost damage, grade 2 for frost damage less than 1/3, grade 3 for frost damage between 1/3 and 1/2, grade 4 for frost damage greater than 1/2, and grade 5 for all young spikes that died from freezing [148]. Similarly, frost damage also impaired stem development, resulting in lower plant height and a decrease number of spikes [149]. For example, using the dead stem rate to classify 120 wheat cultivars into five classes of very strong, moderate, weak and very weak, and determining the criteria for categorizing wheat spring frost resistance evaluation classes [150].



**Table 2.** Tolerant and sensitive wheat genotypes and their performances in response to spring cold stress.

Moreover, biomass accumulation is also a significant source of grain yield and a growth process sensitive to cold stress [150]. The SCS has adverse effects on several wheat metrics, including the mean leaf area index (MLAI), mean net assimilation rate (MNAR), harvest index (HI), biomass per plant (BPPM), and grain yield per plant (GYPP) [35]. These metrics can be utilized in wheat breeding programs to assist in developing cold-tolerant varieties.

#### *4.3. Cultivars for SCS Resistance Based on Molecular Traits*

It is critical for breeding to understand the physiological features linked to genetic improvements in yield and quality [153]. When SCS harms wheat, a variety of complicated physiological and biochemical changes take place inside the plant that has an impact on yield and quality. Reactive oxygen [154], MDA content, antioxidant enzyme activity [56], carbohydrates [155], osmoregulatory substances [87], hormone content [91], starch content [156], and photosynthesis [157] are often used as physiological and biochemical indicators for wheat inversion identification (Table 2). According to Zhang et al. (2019), the quantity of wheat-bearing grain can be considered to determine POD activity, SOD activity, and MDA level as indices of wheat cold resistance [158]. To determine the extent of freezing damage, Wang et al. (2022) used principal component-affiliate function-stepwise regression analysis to screen seven important physiological indicators: chlorophyll a, leaf water content, proline, Fv/Fm, soluble protein, MDA, and SOD. The equation coefficient of determination between the predicted value of the integrated index of freezing damage and yield established from this reached 0.898 [159]. Following an abrupt temperature drop, it was discovered that in cold-tolerant wheat cultivars, the expression of genes encoding antioxidant enzymes increased, antioxidant enzyme activity was improved, and ROS content was decreased, whereas ROS content was higher, and some leaves died in cold-sensitive wheat cultivars [160].

To enhance wheat tolerance to SCS and improve sustainability, many researchers focus on understanding the key molecular targets, regulatory pathways and signaling designed for genotype–environment interactions [161,162]. As an important research tool for functional genes, transcriptome sequencing has been employed in regulatory network investigations of plants under abiotic stress [163]. In wheat, 450 genes were found to have altered transcript abundance following 14 low-temperature treatments, including 130 candidates for transcription factors, protein kinases, ubiquitin ligases, GTP, RNA, and Ca2+ binding proteins genes [164]. Transcriptome sequencing of cold stress during reproductive stages in wheat identified 562 up-regulated, and 314 down-regulated differentially expressed genes, and these genes were mainly involved in photosynthesis, lipid and carbohydrate synthesis, amino acid and protein accumulation [165]. According to transcriptomics and metabolomics analysis, the ABA/JA phytohormone signaling and proline biosynthesis pathways play an important role in regulating cold tolerance in wheat [94]. Transcription is only part of the response; many researchers also employ proteomics for in-depth analysis of protein changes, offering global analysis of protein accumulation [166]. Proteomic analysis has been carried out in wheat under SCS [167], with various proteins being identified as having a role in cold tolerance, providing protection against cold damage [168]. For instance, the proteomic analysis of wheat under low temperatures revealed an upregulation of the expression of proteins involved in signal transduction, carbohydrate metabolism, stress and defense responses, and phenylpropane biosynthesis [169].

#### **5. Conclusions and Future Perspectives**

SCS incidents more often occur under changing climatic conditions, causing a serious threat to wheat reproductive tissues and grain production. The SCS is detrimental to the development of the floret and spikelet in wheat; thus, compromising the grain number and quality. A premium cultivar tolerating SCS is a prerequisite for sustaining wheat farming. The review shows that the protection of young, tender spikelet issues in wheat from cold stress impacts was mainly dependent on the collective contribution of antioxidant enzyme

activity, carbohydrate accumulation, hormone signaling and transcriptional regulation. The effort of breeding cultivars with simple agronomic and morpho-physiological traits has been made in coping with cold stress, which should be improved by identifying novel SCStolerant QTLs or genes with regards to floret and spikelet development in new breeding strategies which embrace fundamental mechanisms. Further studies on multi-omics, from genomics to phenomics, to identify the genes regulating cold tolerance will be necessary for future breeding programs.

**Author Contributions:** Y.S. conceived this review; H.S., Y.Z., C.T., X.C. and X.L. collected information and drafted this review; H.S., A.J., Y.L., Y.Z. and Y.S. finalized the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work was supported by grants from the National Key Research and Development Plan Program of China (No. 2017YFD0301307), Graduate Innovation Found of Anhui Agricultural University (2021yjs-2), and the National Natural Science Foundation of China (No. 31901540).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** All of the data generated or analyzed during this study are included in this published article.

**Acknowledgments:** We thank Muhammad A. Hassan for providing valuable suggestions for the article.

**Conflicts of Interest:** We declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland www.mdpi.com

*International Journal of Molecular Sciences* Editorial Office E-mail: ijms@mdpi.com www.mdpi.com/journal/ijms

Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Academic Open Access Publishing

mdpi.com ISBN 978-3-0365-9054-7