*Article* **Fine Mapping and Candidate Gene Analysis of Rice Grain Length QTL** *qGL9.1*

**Luomiao Yang † , Peng Li † , Jingguo Wang, Hualong Liu, Hongliang Zheng, Wei Xin and Detang Zou \***

> Key Laboratory of Germplasm Enhancement, Physiology and Ecology of Food Crops in Cold Region, Ministry of Education, Northeast Agricultural University, Harbin 150030, China

**\*** Correspondence: wrathion@neau.edu.cn

† These authors contributed equally to this work.

**Abstract:** Grain length (GL) is one of the crucial determinants of rice yield and quality. However, there is still a shortage of knowledge on the major genes controlling the inheritance of GL in *japonica* rice, which severely limits the improvement of japonica rice yields. Here, we systemically measured the GL of 667 F<sup>2</sup> and 1570 BC3F<sup>3</sup> individuals derived from two cultivated rice cultivars, Pin20 and Songjing15, in order to identify the major genomic regions associated with GL. A novel major QTL, *qGL9.1*, was mapped on chromosome 9, which is associated with the GL, using whole-genome re-sequencing with bulked segregant analysis. Local QTL linkage analysis with F<sup>2</sup> and fine mapping with the recombinant plant revealed a 93-kb core region on *qGL9.1* encoding 15 protein-coding genes. Only the expression level of *LOC\_Os09g26970* was significantly different between the two parents at different stages of grain development. Moreover, haplotype analysis revealed that the alleles of Pin20 contribute to the optimal GL (9.36 mm) and GL/W (3.31), suggesting that Pin20 is a cultivated species carrying the optimal GL variation of *LOC\_Os09g26970*. Furthermore, a functional-type mutation (16398989-bp, G>A) located on an exon of *LOC\_Os09g26970* could be used as a molecular marker to distinguish between long and short grains. Our experiments identified *LOC\_Os09g26970* as a novel gene associated with GL in *japonica* rice. This result is expected to further the exploration of the genetic mechanism of rice GL and improve GL in rice *japonica* varieties by marker-assisted selection.

**Keywords:** *Oryza sativa* L.; grain length; re-sequencing; fine mapping; P450 protein

### **1. Introduction**

Grain size, a complex quantitative trait involving grain length (GL), grain width (GW), grain thickness, and the grain length/width ratio (GL/W), is one of the determinants of grain weight, which not only affects the yield, but also the appearance quality of rice [1,2]. As an important factor affecting rice yield and quality, mining grain-shape-related genes is an important means to understanding their molecular mechanism and genetic basis. As of 2023, at least 201 rice grain shape genes have been identified. They are located on all chromosomes of the rice genome, and most are distributed on chromosomes 1, 2, 5, 6, and 7 (https://pubmed.ncbi.nlm.nih.gov/) (accessed on 7 May 2023). Most of these 201 genes directly regulate the rice grain shape, and the remaining genes indirectly regulate the grain size through an interaction network between genes.

Previous studies have found that the major factors affecting grain size include the ubiquitination-protease pathway, G-protein signaling, mitogen-activated protein kinase signaling, phytohormone regulation, and various transcriptional regulators [3]. GRAIN WIDTH 2 (*GW2*), the first QTL cloned in rice, encodes a RING-type E3 ubiquitin ligase with ubiquitination and autoubiquitination activity located in the cytoplasm and nucleus [4]; Gproteins that regulate grain size in rice include the a-subunit encoded by *RGA1/D1* [5]; and the β-subunit encoded by *DEP1* [6]. In addition, synthetic-hormone-related genes, such as *OsTAR1* [7] and *TGW6* [8], are also involved in the regulation of seed size. Moreover, several

**Citation:** Yang, L.; Li, P.; Wang, J.; Liu, H.; Zheng, H.; Xin, W.; Zou, D. Fine Mapping and Candidate Gene Analysis of Rice Grain Length QTL *qGL9.1*. *Int. J. Mol. Sci.* **2023**, *24*, 11447. https://doi.org/10.3390/ ijms241411447

Academic Editor: Zsófia Bánfalvi

Received: 20 June 2023 Revised: 7 July 2023 Accepted: 13 July 2023 Published: 14 July 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

other transcriptional regulatory modules, such as *OsmiR396-OsGRFs* [9] and *AP2/ERF* modules [10], also play a key role in rice grain shape determination. In sum, rice grain shape is regulated by multiple factors. Even so, the current molecular network of rice grain types is still insufficient to explain all of the genetic variations. It is of great importance to explore new major genes and allelic variations for the high-yield breeding of rice.

It is well known that the grain length of *indica* rice is longer than that of *japonica* rice. Therefore, the excellent allelic variations of some important GL genes cloned at present are from *indica* rice. For example, the loss-of-function variations of *GS3* [11] and *TGW6* [8] are mostly from *indica* rice. *GS5* is a large grain allelic variation that was retained during the domestication of *indica* rice [12]. The overexpression of *LG3* [13] and *GLW7* [14] in *indica* rice can increase the grain length. Using these allelic variations to improve the grain shape of *japonica* rice, in addition to the method of breeding offspring through *indicia–japonica* hybridization, mutants can be directly obtained through gene editing. However, from the perspective of the geographical adaptability of subspecies, when using the offspring of *indica–japonica* hybridization, it is difficult to obtain materials with an excellent background of *japonica* rice, therefore, it is difficult to produce and apply unless it is included in a large number of molecular breeding works. The materials obtained by gene editing also cannot be used as cultivated varieties due to a certain degree of growth defects [15]. Considering this comprehensively, it is wise and efficient to clone new grain shape genes from *japonica* rice and apply them to *japonica* rice breeding.

Heilongjiang Province, as the main production area of early maturing *japonica* rice in China, had a planting area of about 6.43 million hectares in 2022, accounting for more than 15.5% of the national rice area. However, the grain length value of *japonica* rice in this region is low compared to *indica* rice in southern China, which hinders yield improvement. Therefore, the identification of new alleles controlling grain length from local sources is significant to increasing rice yield. Recent efforts combining QTL-seq and linkage analysis have led to the localization of several candidate genes in rice [16–19]. In this study, we used two *japonica* rice varieties, Pin20 and Songjing 15 (SJ15), with significant differences in grain shape, as parents in order to develop the F2:3 population and the BC3F<sup>3</sup> population. Through QTL-seq, linkage analysis, and fine mapping strategies, we identified the longgrain genes in the large-grain-variety of Pin20. Furthermore, the KASP molecular markers were developed to identify plants with different grain types. This will facilitate the in-depth study of grain type improvement and regulatory mechanisms in *japonica* rice varieties.

#### **2. Results**

#### *2.1. Screening and Evaluation of Plant Height*

The phenotypic detection showed that there were significant differences in the grain length (GL) and length–width ratio (GL/W) between SJ 15 and Pin 20 (Table 1, Figure 1A,B). For the F2:3 population, the variation ranges of GL, grain width (GW), and GL/W were 0.68–1.09 cm, 0.33–0.43 cm, and 1.88–3.09 cm (Table 1, Figure 1C–E), respectively. Except for grain width, the absolute values of the skewness and kurtosis of GL and GL/W were less than 1 (Table 1), showing continuous variation and normal distribution, indicating that these two traits conform to the genetic model of quantitative traits.

**Table 1.** Phenotypic analysis of grain-shape-related traits of parents and populations.


\*\* indicates the significant difference detected at *p* < 0.01 level.

*Int. J. Mol. Sci.* **2023**, *24*, x FOR PEER REVIEW 3 of 16

**Figure 1.** Phenotypic characteristics of grain shape of the parents and F2:3 population. (**A**) Performance of the mature spikes of the two parents; (**B**) Comparison of the mature grain types of the two parents; and (**C**–**E**) Frequency distribution of grain length (GL), grain width (GW), and length–width ratio (GL/W) of rice grains in the F2:3 population. The scale bars for (**A**,**B**) were 1 cm. **Table 1.** Phenotypic analysis of grain-shape-related traits of parents and populations. **Figure 1.** Phenotypic characteristics of grain shape of the parents and F2:3 population. (**A**) Performance of the mature spikes of the two parents; (**B**) Comparison of the mature grain types of the two parents; and (**C**–**E**) Frequency distribution of grain length (GL), grain width (GW), and length–width ratio (GL/W) of rice grains in the F2:3 population. The scale bars for (**A**,**B**) were 1 cm.

#### **Traits SJ15 Pin20 <sup>F</sup>2:3 Population Mean ± SD Range Skewness Kurtosis** *2.2. Phenotypic Analysis of Extreme DNA Pools of Grain Type*

GL (cm) 0.69 1.06 \*\* 0.85 ± 0.00 0.68~1.09 0.61 0.95 GW (cm) 0.36 0.40 0.37 ± 0.00 0.33~0.43 0.27 1.46 GL/W 1.92 2.65 \*\* 2.55 ± 0.02 1.88~3.09 0.33 −0.05 \*\* indicates the significant difference detected at *p* < 0.01 level. *2.2. Phenotypic Analysis of Extreme DNA Pools of Grain Type* The differences in individual traits had an impact on the genotype frequency analysis of the DNA hybrid pool on the whole genome. To clarify the differences in the genetic background between the two DNA pools, 30 long-grain and 30 short-grain lines were analyzed for GL, GW, GL/W, spike number (PN), number of grains per spike (NGS), and spike weight per plant (SW). The results showed that there were highly significant differences in GL and GL/W between the long-grain and short-grain pools, while there were no significant differences in GW, PN, NGS, and SW (Figure 2). Therefore, the phenotypic differences between these two DNA mixing pools are distributed only in the GL and GL/W; thus, we selected 30 representative long-grain individuals and 30 short-grain individuals The differences in individual traits had an impact on the genotype frequency analysis of the DNA hybrid pool on the whole genome. To clarify the differences in the genetic background between the two DNA pools, 30 long-grain and 30 short-grain lines were analyzed for GL, GW, GL/W, spike number (PN), number of grains per spike (NGS), and spike weight per plant (SW). The results showed that there were highly significant differences in GL and GL/W between the long-grain and short-grain pools, while there were no significant differences in GW, PN, NGS, and SW (Figure 2). Therefore, the phenotypic differences between these two DNA mixing pools are distributed only in the GL and GL/W; thus, we selected 30 representative long-grain individuals and 30 short-grain individuals to prepare the GL-pool and GS-pool in order to map the candidate genomic loci using bulked segregant analysis (BSA) and re-sequencing analyses, respectively. *Int. J. Mol. Sci.* **2023**, *24*, x FOR PEER REVIEW 4 of 16

**Figure 2.** Box plot of panicle traits of individuals with significant differences in grain length. GLpool, long-grain DNA pool; GS-pool, short-grain DNA pool; GL, grain length; GW, grain width; GL/W, length–width ratio; PN, spike number; NGS, number of grains per spike; and SW, spike **Figure 2.** Box plot of panicle traits of individuals with significant differences in grain length. GL-pool, long-grain DNA pool; GS-pool, short-grain DNA pool; GL, grain length; GW, grain width; GL/W, length–width ratio; PN, spike number; NGS, number of grains per spike; and SW, spike weight per plant.

weight per plant. \*\* indicates the significant difference detected at *p* < 0.01 level. Each black dot on the box plot represents the phenotypic value corresponding to a single independent plant.

A total of 315,277,920 clean reads and 47,291,688,000 bases were obtained by re-sequencing and data quality control of the two DNA mix pools and both parents (Supplementary Table S1). In addition, the ED (Euclidean distance) and two-tailed Fisher's exact test for each bulk were calculated by aligning the sequence with the Nipponbare reference genome. After calculating a statistical confidence interval of *p* < 0.01 between the two extreme phenotypic blocks, a 4.21 Mb (14,240,001 bp–18,445,701 bp) genomic region on chromosome 9 was identified by overlapping the results of the three algorithms (Table 2, Fig-

ure 3). We designated this QTL as *qGL9.1*.

\*\* indicates the significant difference detected at *p* < 0.01 level. Each black dot on the box plot represents the phenotypic value corresponding to a single independent plant.

#### *2.3. Identification of a Major QTL Controlling GL in Rice Using QTL-Seq*

A total of 315,277,920 clean reads and 47,291,688,000 bases were obtained by resequencing and data quality control of the two DNA mix pools and both parents (Supplementary Table S1). In addition, the ED (Euclidean distance) and two-tailed Fisher's exact test for each bulk were calculated by aligning the sequence with the Nipponbare reference genome. After calculating a statistical confidence interval of *p* < 0.01 between the two extreme phenotypic blocks, a 4.21 Mb (14,240,001 bp–18,445,701 bp) genomic region on chromosome 9 was identified by overlapping the results of the three algorithms (Table 2, Figure 3). We designated this QTL as *qGL9.1*.

**Table 2.** *qGL9.1* association results based on the G-statistic value, ED algorithm, and Fisher algorithm.


**Figure 3.** The results of the QTL-Seq analysis. (**A**) The G-statistic value to map *qGL9.1* based on SNP. (**B**) The Euclidean distance algorithm to map *qGL9.1*. (**C**) The two-tailed Fisher's exact test to map *qGL9.1* based on Indel. The blue lines and the red line represent the threshold line, with confidence levels of 0.95 and 0.99, respectively. The number on the horizontal coordinate represents the chromosome number. The red wireframe represents the intervals covered by *qGL9.1* in different computational modes. **Table 2.** *qGL9.1* association results based on the G-statistic value, ED algorithm, and Fisher algorithm. **Figure 3.** The results of the QTL-Seq analysis. (**A**) The G-statistic value to map *qGL9.1* based on SNP. (**B**) The Euclidean distance algorithm to map *qGL9.1*. (**C**) The two-tailed Fisher's exact test to map *qGL9.1* based on Indel. The blue lines and the red line represent the threshold line, with confidence levels of 0.95 and 0.99, respectively. The number on the horizontal coordinate represents the chromosome number. The red wireframe represents the intervals covered by *qGL9.1* in different computational modes.

G-statistic 14,185,459 18,445,701 4.26 623 0.01 ED 14,240,001 18,760,000 4.52 669 0.01

**Number Threshold**

**QTL Algorithm Start (bp) End (bp) Size (Mb) Gene** 

*qGL9.1*

#### *2.4. Narrowing of qGL9.1 to a Fine Region* For pyramid *qGL9.1*, eight KASP (kompetitive allele-specific PCR) markers were de-

*2.4. Narrowing of qGL9.1 to a Fine Region*

For pyramid *qGL9.1*, eight KASP (kompetitive allele-specific PCR) markers were developed for linkage analysis based on the base information provided by re-sequencing data from Pin20, SJ15, and the two pools, and a significant peak interval was detected in a 448.7-kb region between SNP5 and SNP6 on chromosome 9 when the threshold was 3.0 (Figure 4). The *qGL9.1* contributed to 20.09% of the phenotypic variation for GL (Table 3). The positive-effect allele of *qGL9.1* was derived from Pin20. veloped for linkage analysis based on the base information provided by re-sequencing data from Pin20, SJ15, and the two pools, and a significant peak interval was detected in a 448.7-kb region between SNP5 and SNP6 on chromosome 9 when the threshold was 3.0 (Figure 4). The *qGL9.1* contributed to 20.09% of the phenotypic variation for GL (Table 3). The positive-effect allele of *qGL9.1* was derived from Pin20.

*Int. J. Mol. Sci.* **2023**, *24*, x FOR PEER REVIEW 6 of 16

**Figure 4.** Linkage analysis of grain length. **Figure 4.** Linkage analysis of grain length.

**Table 3.** QTL association results by linkage analysis.


GL 9 32.00 7.23 20.09 0.03 27.5 33.5 Note: GL, grain length; Chr., chromosome; cM, centimorgan; LOD, logarithm of the maximum like-Note: GL, grain length; Chr., chromosome; cM, centimorgan; LOD, logarithm of the maximum likelihood; PVE, phenotypic variation explained; Add, additive effect; Left CI, confidence interval on the left side of the linkage map; and Rigth CI, confidence interval on the right side of the linkage map.

lihood; PVE, phenotypic variation explained; Add, additive effect; Left CI, confidence interval on

ers (16.29–16.39 Mb). According to the MSU Rice Genome Annotation Project Release 7 [20], there are 15 protein-coding genes on the *qGL9.1* locus (Figure 6C), and information on SNP/InDel in the *qGL9.1* region (upstream, UTR3, downstream, and exonic) is listed in

the left side of the linkage map; and Rigth CI, confidence interval on the right side of the linkage map. In order to finely localize *qGL9.1*, we constructed the BC3F<sup>2</sup> population and genotyped the BC3F<sup>2</sup> population using the linkage markers of *qGL9.1* and six KASP markers consistent with the genetic background of SJ15 (Figure 5) and finally screened to two recombinants and obtained the BC3F<sup>3</sup> population containing 1570 lines after self-crossing. For the fine mapping of *qGL9.1*, three KASP markers between SNP5 and SNP6 were developed from the re-sequencing data (Supplementary Table S2, Figure 6A). A total of 21 recombinants were identified by scanning the genotypes of 1570 BC3F<sup>3</sup> individuals, and these 21 recombinants were classified into seven groups (Figure 6B). After progeny tests, the grain lengths of recombinant groups one and two were biased toward the short-grain parent SJ15, and the remaining recombinant groups were close to the long-grain parent Pin20. *qGL9.1* was delimited to the 93.0 Kb interval between the SNP10 and SNP11 mark-In order to finely localize *qGL9.1*, we constructed the BC3F<sup>2</sup> population and genotyped the BC3F<sup>2</sup> population using the linkage markers of *qGL9.1* and six KASP markers consistent with the genetic background of SJ15 (Figure 5) and finally screened to two recombinants and obtained the BC3F<sup>3</sup> population containing 1570 lines after self-crossing. For the fine mapping of *qGL9.1*, three KASP markers between SNP5 and SNP6 were developed from the re-sequencing data (Supplementary Table S2, Figure 6A). A total of 21 recombinants were identified by scanning the genotypes of 1570 BC3F<sup>3</sup> individuals, and these 21 recombinants were classified into seven groups (Figure 6B). After progeny tests, the grain lengths of recombinant groups one and two were biased toward the short-grain parent SJ15, and the remaining recombinant groups were close to the long-grain parent Pin20. *qGL9.1* was delimited to the 93.0 Kb interval between the SNP10 and SNP11 markers (16.29–16.39 Mb). According to the MSU Rice Genome Annotation Project Release 7 [20], there are 15 proteincoding genes on the *qGL9.1* locus (Figure 6C), and information on SNP/InDel in the *qGL9.1* region (upstream, UTR3, downstream, and exonic) is listed in Supplementary Table S3. We found that 10 of the 15 genes had sequence differences in the promoter, exon, or downstream regions.

promoter, exon, or downstream regions.

*Int. J. Mol. Sci.* **2023**, *24*, x FOR PEER REVIEW 7 of 16

**Figure 5.** Screening of heterozygous lines in the BC3F<sup>2</sup> population. Mb, million base pair; B, the genotype is consistent with SJ15; and H, the genotype is heterozygous. **Figure 5.** Screening of heterozygous lines in the BC3F<sup>2</sup> population. Mb, million base pair; B, the genotype is consistent with SJ15; and H, the genotype is heterozygous. **Figure 5.** Screening of heterozygous lines in the BC3F<sup>2</sup> population. Mb, million base pair; B, the genotype is consistent with SJ15; and H, the genotype is heterozygous.

**Figure 6.** Fine mapping of *qGL9.1*. (**A**) The genotype of the recombinant plant. (**B**) Grain length statistics of recombinant plants. (**C**) A total of 15 genes in the *qGL9.1* region were obtained through the annotation information on the *Nipponbare* genome. Letters a, b and c indicate significant differences between groups, significant difference detected at *p* < 0.05 level. **Figure 6.** Fine mapping of *qGL9.1*. (**A**) The genotype of the recombinant plant. (**B**) Grain length statistics of recombinant plants. (**C**) A total of 15 genes in the *qGL9.1* region were obtained through the annotation information on the *Nipponbare* genome. Letters a, b and c indicate significant differences between groups, significant difference detected at *p* < 0.05 level. **Figure 6.** Fine mapping of *qGL9.1*. (**A**) The genotype of the recombinant plant. (**B**) Grain length statistics of recombinant plants. (**C**) A total of 15 genes in the *qGL9.1* region were obtained through the annotation information on the *Nipponbare* genome. Letters a, b and c indicate significant differences between groups, significant difference detected at *p* < 0.05 level.

#### *2.5. Candidate Gene Analysis 2.5. Candidate Gene Analysis 2.5. Candidate Gene Analysis*

Through the qRT-PCR analysis of 10 genes with sequence variation (Figure 7), it was found that significant differences in the relative expression of *LOC\_O09g26970* between Through the qRT-PCR analysis of 10 genes with sequence variation (Figure 7), it was found that significant differences in the relative expression of *LOC\_O09g26970* between Through the qRT-PCR analysis of 10 genes with sequence variation (Figure 7), it was found that significant differences in the relative expression of *LOC\_O09g26970* between Pin20 and SJ15 occurred in samples from the 2 cm, 5 cm, and 7 cm panicles, while no

significant differences were found in the relative expression of the 13 cm panicles. The results showed that the relative expression of *LOC\_Os09g26970* in Pin 20 was higher than that in SJ15 at the early stage of panicle development. The expression levels of the other nine genes were not significantly different at the grain development stage. We further analyzed the structural domains of 15 genes through the Pfam database, annotated them using the Ensembl database, and found that *LOC\_Os09g26970* encodes a cytochrome P450 family protein CYP92A8 (Supplementary Table S4). In addition, using the results of gene annotation based on the re-sequencing data, through pathway significant enrichment analysis, it was found that 616 genes in the 4.2 Mb interval were significantly enriched in arginine and proline metabolism (ko00330), nitrogen metabolism (ko00910), cysteine and methionine metabolism (ko00270), pentose phosphate pathway (ko00030), and glycolysis/gluconeogenesis (ko00010) (Figure 8). Among them, five genes (*LOC\_Os09g26940*, *LOC\_Os09g26950*, *LOC\_Os09g26960*, *LOC\_Os09g26970,* and *LOC\_Os09g26980*) in the candidate interval were significantly enriched in brassinolide biosynthesis (ko00905) (Supplementary Table S5). The genes encoding the cytochrome P450 family proteins have been shown to play an important role in regulating rice grain shape, especially *D11* [21], *GW10* [22], and other proteins encoding the cytochrome P450 family, which plays an active role in controlling grain size through the BR pathway. Therefore, as a P450 family protein significantly enriched in the BR pathway, we believe that *LOC\_Os09g26970* is a candidate gene for *qGL9.1*. alyzed the structural domains of 15 genes through the Pfam database, annotated them using the Ensembl database, and found that *LOC\_Os09g26970* encodes a cytochrome P450 family protein CYP92A8 (Supplementary Table S4). In addition, using the results of gene annotation based on the re-sequencing data, through pathway significant enrichment analysis, it was found that 616 genes in the 4.2 Mb interval were significantly enriched in arginine and proline metabolism (ko00330), nitrogen metabolism (ko00910), cysteine and methionine metabolism (ko00270), pentose phosphate pathway (ko00030), and glycolysis/gluconeogenesis (ko00010) (Figure 8). Among them, five genes (*LOC\_Os09g26940*, *LOC\_Os09g26950*, *LOC\_Os09g26960*, *LOC\_Os09g26970,* and *LOC\_Os09g26980*) in the candidate interval were significantly enriched in brassinolide biosynthesis (ko00905) (Supplementary Table S5). The genes encoding the cytochrome P450 family proteins have been shown to play an important role in regulating rice grain shape, especially *D11* [21], *GW10* [22], and other proteins encoding the cytochrome P450 family, which plays an active role in controlling grain size through the BR pathway. Therefore, as a P450 family protein significantly enriched in the BR pathway, we believe that *LOC\_Os09g26970* is a candidate gene for *qGL9.1*.

Pin20 and SJ15 occurred in samples from the 2 cm, 5 cm, and 7 cm panicles, while no significant differences were found in the relative expression of the 13 cm panicles. The results showed that the relative expression of *LOC\_Os09g26970* in Pin 20 was higher than

nine genes were not significantly different at the grain development stage. We further an-

*Int. J. Mol. Sci.* **2023**, *24*, x FOR PEER REVIEW 8 of 16

**Figure 7.** Expression of candidate genes during the development of the young panicles of both parents. The 2 cm, 5cm, 7cm, and 13 cm on the horizontal coordinates indicate the length of the developing young spike. The results were statistically analyzed using Student's *t*-test (\*, *p* < 0.05; \*\*, *p* < **Figure 7.** Expression of candidate genes during the development of the young panicles of both parents. The 2 cm, 5cm, 7cm, and 13 cm on the horizontal coordinates indicate the length of the developing young spike. The results were statistically analyzed using Student's *t*-test (\*, *p* < 0.05; \*\*, *p* < 0.01).

0.01).

**Figure 8.** KO enrichment bubble plots for genes (Supplementary Table S5) in the *qGL9.1* interval. Horizontal coordinate: enrichment factor (number of differences in this pathway divided by all numbers); vertical coordinate: pathway name; bubble area size: number of genes belonging to this pathway in the target gene set; bubble color: enrichment significance. The redder the color, the smaller the P/Q value. **Figure 8.** KO enrichment bubble plots for genes (Supplementary Table S5) in the *qGL9.1* interval. Horizontal coordinate: enrichment factor (number of differences in this pathway divided by all numbers); vertical coordinate: pathway name; bubble area size: number of genes belonging to this pathway in the target gene set; bubble color: enrichment significance. The redder the color, the smaller the P/Q value.

#### *2.6. The Significant Association of LOC\_Os09g26970 with the SNP*

*2.6. The Significant Association of LOC\_Os09g26970 with the SNP* Sanger sequencing analysis identified 13 nSNPs on *LOC\_Os09g26970* (Supplementary Table S6). The 13 SNPs were previously identified in the 3010 Rice Genome Project and the Rice Functional and Genomic Breeding (RFGB) v2.0 database [23,24]. Among them, 10 SNPs (Chr9-16397163, Chr9-16397736, Chr9-16397760, Chr9-16397792, Chr9- 16398197, Chr9-16398200, Chr9-16398479, Chr9-16398989, Chr9-16399274, and Chr9- 16399673) constituted nine haplotypes, and Hap1, Hap5, Hap6, Hap8, and Hap9 were mainly distributed in *indica* rice. Hap2, Hap3, and Hap4 were mainly distributed in *japonica* rice (Supplementary Table S7). The germplasm of Hap9, consistent with the Pin 20 genotype, and the Hap2, consistent with the SJ15 genotype, differed significantly between GL and GW, and the other haplotypes caused significant phenotypic differences in GL, GW, and GL/W. (Supplementary Table S8). It is worth noting that the haplotype Hap9 Sanger sequencing analysis identified 13 nSNPs on *LOC\_Os09g26970* (Supplementary Table S6). The 13 SNPs were previously identified in the 3010 Rice Genome Project and the Rice Functional and Genomic Breeding (RFGB) v2.0 database [23,24]. Among them, 10 SNPs (Chr9-16397163, Chr9-16397736, Chr9-16397760, Chr9-16397792, Chr9-16398197, Chr9-16398200, Chr9-16398479, Chr9-16398989, Chr9-16399274, and Chr9-16399673) constituted nine haplotypes, and Hap1, Hap5, Hap6, Hap8, and Hap9 were mainly distributed in *indica* rice. Hap2, Hap3, and Hap4 were mainly distributed in *japonica* rice (Supplementary Table S7). The germplasm of Hap9, consistent with the Pin 20 genotype, and the Hap2, consistent with the SJ15 genotype, differed significantly between GL and GW, and the other haplotypes caused significant phenotypic differences in GL, GW, and GL/W. (Supplementary Table S8). It is worth noting that the haplotype Hap9 contributes to the optimal GL (9.36 mm) and GL/W (3.31), suggesting that Pin20 is a cultivated species carrying the optimal grain length variation of *LOC\_Os09g26970*.

contributes to the optimal GL (9.36 mm) and GL/W (3.31), suggesting that Pin20 is a cultivated species carrying the optimal grain length variation of *LOC\_Os09g26970*. In order to obtain a molecular marker that could distinguish the grain length pheno-In order to obtain a molecular marker that could distinguish the grain length phenotype, we designed a KASP marker for an nSNP of the *LOC\_Os09g26970*. SNP10 accurately divided the genotypes of 92 individuals in the 94 BC3F<sup>3</sup> lines into Pin20 and SJ15 genotypes (Figure 9). These clustering results clearly distinguished the two alleles, therefore, the KASP8 marker

type, we designed a KASP marker for an nSNP of the *LOC\_Os09g26970*. SNP10 accurately divided the genotypes of 92 individuals in the 94 BC3F<sup>3</sup> lines into Pin20 and SJ15 genotypes

SNP10 identified 89.8% of the plants showing long-grain phenotypes. In contrast, SNP10

was used to genotype the rice plants. Of the plants with the Pin20 allele, SNP10 identified 89.8% of the plants showing long-grain phenotypes. In contrast, SNP10 was able to identify 86.0% of the short-grain phenotype plants carrying the SJ15 genotype (Supplementary Table S9). This result implies that SNP10 can effectively distinguish the grain length of rice and can be used as an important molecular marker for breeding improvement. was able to identify 86.0% of the short-grain phenotype plants carrying the SJ15 genotype (Supplementary Table S9). This result implies that SNP10 can effectively distinguish the grain length of rice and can be used as an important molecular marker for breeding improvement.

**Figure 9.** A total of 94 BC2F<sup>4</sup> lines were genotyped with SNP10. (**A**) A demonstration of the genotyping effect of SNP10. Yellow fluorescence and red fluorescence indicate individuals with genotypes of Pin20 and SJ15, respectively. Black squares represent spotting holes without samples. The calls for the *LOC\_Os09g26970-Pin20* allele were clustered near the *Y*-axis, while the calls for the *LOC\_Os09g26970-SJ15* allele were clustered on the *X*-axis. (**B**) HEX, Hexachlorofluorescein; and FAM, 6-Carboxyfluorescein. **Figure 9.** A total of 94 BC2F<sup>4</sup> lines were genotyped with SNP10. (**A**) A demonstration of the genotyping effect of SNP10. Yellow fluorescence and red fluorescence indicate individuals with genotypes of Pin20 and SJ15, respectively. Black squares represent spotting holes without samples. The calls for the *LOC\_Os09g26970-Pin20* allele were clustered near the *Y*-axis, while the calls for the *LOC\_Os09g26970-SJ15* allele were clustered on the *X*-axis. (**B**) HEX, Hexachlorofluorescein; and FAM, 6-Carboxyfluorescein.

#### **3. Discussion**

#### **3. Discussion** *3.1. QTL-Seq Analysis Combined with a Screening of Recombinant Plants Can Efficiently Fine-3.1. QTL-Seq Analysis Combined with a Screening of Recombinant Plants Can Efficiently Fine-Map Candidate Genes*

*Map Candidate Genes* Grain length is a significant factor that limits rice yield. Improving and utilizing the large-effect genomic loci associated with GL is essential to increase rice yield. The authors of previous studies have carried out extensive QTL analyses and localized a group of genes that are associated with GL in rice. For example, *PGL1* [25] and *BG1* [26] positively regulated GL by increasing the cell size, whereas *SG1* [27], *SDF5* [28], *OsGDI1* [29], and *TGW6* [8] negatively regulated rice GL by reducing the cell size. However, the strategy of isolating genes by map-based cloning is time-consuming and labor-intensive. In recent years, with the development and application of biological high-throughput sequencing technology and bioinformatics analysis technology, the efficiency of mining QTL has significantly improved. The combination of traditional QTL mapping and QTL-seq can effectively and quickly identify the GL major QTL interval. For example, the GL locus *qTGW5.3* was mapped to a 5 Mb physical interval by QTL-seq. Furthermore, the recombinants and the progeny tests delimited the candidate region of *qTGW5.3* to 1.13 Mb [30]. Due to the lack of further mapping populations and recombinant plants, the candidate genes of *qTGW5.3* have not been identified. In this study, *qGL9.1* was isolated from Pin20 by using the QTL-seq strategy based on the ED, Fisher algorithm, and G value method, and *qGL9.1* was associated with a single strong peak in the three calculation models (Figure 3). This shows a significant difference in the allele ratio between the two mixed pools. To fine map the *qGL9.1* candidate gene, several approaches have been used to narrow Grain length is a significant factor that limits rice yield. Improving and utilizing the large-effect genomic loci associated with GL is essential to increase rice yield. The authors of previous studies have carried out extensive QTL analyses and localized a group of genes that are associated with GL in rice. For example, *PGL1* [25] and *BG1* [26] positively regulated GL by increasing the cell size, whereas *SG1* [27], *SDF5* [28], *OsGDI1* [29], and *TGW6* [8] negatively regulated rice GL by reducing the cell size. However, the strategy of isolating genes by map-based cloning is time-consuming and labor-intensive. In recent years, with the development and application of biological high-throughput sequencing technology and bioinformatics analysis technology, the efficiency of mining QTL has significantly improved. The combination of traditional QTL mapping and QTL-seq can effectively and quickly identify the GL major QTL interval. For example, the GL locus *qTGW5.3* was mapped to a 5 Mb physical interval by QTL-seq. Furthermore, the recombinants and the progeny tests delimited the candidate region of *qTGW5.3* to 1.13 Mb [30]. Due to the lack of further mapping populations and recombinant plants, the candidate genes of *qTGW5.3* have not been identified. In this study, *qGL9.1* was isolated from Pin20 by using the QTL-seq strategy based on the ED, Fisher algorithm, and G value method, and *qGL9.1* was associated with a single strong peak in the three calculation models (Figure 3). This shows a significant difference in the allele ratio between the two mixed pools. To fine map the *qGL9.1* candidate gene, several approaches have been used to narrow down the genomic region associated with *qGL9.1*. Firstly, *qGL9.1* was fine-mapped to a 93 Kb interval containing 15 annotated genes by using the recombinant plants to optimize the target interval (Figure 3). *LOC\_Os09g26970* was further anchored as the most reliable

down the genomic region associated with *qGL9.1*. Firstly, *qGL9.1* was fine-mapped to a 93 Kb interval containing 15 annotated genes by using the recombinant plants to optimize

candidate for *qGL9.1* by expression analysis and functional annotation of the candidate genes. Therefore, *qGL9.1* can be considered the most significant target for GL in exploring candidate for *qGL9.1* by expression analysis and functional annotation of the candidate genes. Therefore, *qGL9.1* can be considered the most significant target for GL in exploring candidate genes. Our study is a good example of using QTL-seq combined with fine mapping to mine candidate genes to obtain major QTL intervals.
