1. Introduction
Rice is the most consumable food commodity in the world and is used as a staple by more than 50% of the world’s population [
1]. Rice is also the third-highest produced agricultural commodity after sugarcane and maize [
2]. It is a highly valuable grain crop with regard to human nutrition, as well as caloric intake, which provides more than one-fifth of the calories being consumed by humans worldwide [
3]. The rapid increase in the human population is further boosting its demand. In some countries, rice is the only staple food, whereas in some other countries, rice is consumed as a traditional dish, as well as an important ingredient in different dishes. In the international market, quality of rice grain is highly indicative of its price, which also reflects its sale-ability. Therefore, both yield and grain quality are equally important parameters for varietal improvement in rice breeding programs.
Commercial success of a modern rice cultivar is highly dependent on grain size related traits (e.g., grain length, grain width, and thickness) for quality and grain weight associated traits (most importantly thousand grain weight) for grain yield [
4]. Grain size related traits determine the rice’s final market value as defined by consumer preferences, which are a combination of grain size, length, and thickness (
Figure 1). Some consumers prefer long grains, while some prefer short and bold grains. Likewise, grain yield is determined by three major components, including grain weight, panicles per plant, and grains per panicle. Among these, the most associated trait is grain weight which is determined as the 1000 grain weight. Therefore, grain length, width, and thickness along with the 1000 grain weight, are central benchmarks for breeding grain appearance, as well as yield improvement, in rice. However, due to the quantitatively inherited nature of these traits, breeders hardly rely on phenotypes for their improvement [
5]. Therefore, the use of genetic markers is considered superior to phenotyping [
6], because such markers are not affected by the environment and are more efficient and reliable compared to phenotypic data.
A number of Quantitative Trait Loci (QTLs) for grain appearance and weight have already been investigated and reported by different scientists [
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20]. More often, grain length, thickness, and width are regarded as determinants of grain appearance whereas the 1000 grain weight determines grain weight and eventually grain yield. As reported, these traits are under the control of several or many genes and are highly influenced by environmental factors. So far, several major QTLs influencing grain appearance and grain weight have already been characterized and investigated by many researchers. These major genes/QTLs include
qPE9~1 [
17],
GW2 [
12],
SLG7 [
21,
22],
GW5 [
23,
24,
25,
26],
GS3 [
27,
28,
29],
GS7 [
30],
GW8 [
31],
GS5 [
32], and
GS2 [
33,
34,
35,
36].
GS2 (
GRAIN SIZE 2), also reported as
GL2 (
GRAIN LENGTH 2) or
PT2 (
PANICLE TRAIT 2), is a rare allele directly controlling two important grain size related traits, including grain width and grain length in rice.
GS2 is found to encode a transcriptional regulator protein named Growth-Regulating Factor 4 (OsGRF4), which is then targeted by OsmiR396, which is a microRNA causing termination of the OsGRF4 function. Several studies have shown that a 2 bp substitution mutation in
GS2 disturbs the binding of OsmiR396 on OsGRF4, resulting in its overexpression, which in turns increases cell enlargement and enhances cell division in grains, causing longer and wider rice grains [
33,
34,
35,
36].
GS3 (GRAIN SIZE 3) was among the first reported genes to have minor effects on grain thickness and width. In published studies on
GS3, the
GS3 was demonstrated to be a negative regulator of grain size, and its encoded putative transmembrane protein contains a plant-specific organ size regulation (OSR) domain as a negative regulatory motif, whose function is inhibited by its tumor necrosis factor receptor/nerve growth factor receptor (TNFR/NGFR) family, cysteine-rich domain and von Willebrand factor type C (VWFC) domain. All four domains have been reported to regulate cell divisions in the upper epidermis of the glume inside the rice seed, causing minor effects on the cell size [
29].
GS5 (
GRAIN SIZE 5) has been reported by many researchers who described this gene as a regulator of grain filling and weight.
GS5 promotes cell division in rice seed and, to some extent, elongation of the cells located in the lemma and palea [
32]. The encoded protein of
GS5 (i.e., putative serine carboxypeptidase) executes its function as a positive regulator of a subset of the transition genes (G1-to-S) of cell cycle, causing increased cell divisions and resulting in enhanced grain filling and grain weight. Likewise, the
GW2 (
GRAIN WIDTH 2) gene encodes a protein (RING-type) that has E3 ubiquitin ligase activity, which degrades the ubiquitin–proteasome pathway.
GW2 negatively regulates cell division by suppressing its substrate(s) to proteasomes for regulated proteolysis. The absence or loss of the
GW2 function via its mutation causes enhanced milk filling in grains and enlarged endosperm cells, resulting in a wider spikelet hull [
12].
GW5 (
GRAIN WIDTH 5), also reported as
SW5 (
SEED WIDTH 5) and
GSE5, was investigated by many researchers [
23,
24,
25,
26], who discovered that
GW5 is negatively associated with rice grain width and weight. Later, it was revealed that
GW5 actually encodes a calmodulin-binding protein, and
GW5 physically interacts with calmodulin AsCaM1-1, which is responsible for grain width in rice. The deletion of
GW5 or its mutations result in wider grains, indicating its negative effects on grain width. Likewise,
SLG7 (
GRAIN LENGTH 7), also known as
GW7 (
GRAIN WIDTH 7), has been identified to encode a TONNEAU1-recruiting motif protein responsible for increased cellular division in the longitudinal direction and reduced cell division in the transverse direction [
37]. This gene was found to be responsible for grain appearance by altering cell divisions, thereby having significant effects on regulating grain weight, as well.
GW8 (
GRAIN WIDTH 8) has been reported as a positive regulator of cell proliferation and has a positive association with seed width and seed weight [
31]. It encodes SQUAMOSA promoter-binding protein-like 16 (AsSPL16), that was discovered to regulate the expression of several genes involved in G1-to-S transition, similar to the regulatory role of the
GS5 gene [
31,
32]. It was revealed that a higher expression of the
GW8 gene promoted cell division and grain filling, resulting in increased grain width and a higher grain yield.
GS7 (
GRAIN SHAPE 7), a robust QTL known to regulate grain shape, has been reported [
30] to control the grain length, roundness (thickness), and area (size) in rice. Likewise, another gene,
qPE9~1, also known as
DEP1 (
DENSE AND ERECT PANICLE 1), encodes a G protein γ subunit found to be involved in the regulation of erect panicles, grains per panicle, nitrogen uptake, and stress tolerance through a G protein signal pathway [
17,
38]. In another study, the protein was also found to regulate plant architecture, grain size, and grain yield in rice. The qPE9–1 protein contains an N-terminal G gamma-like (GGL) domain, a putative transmembrane domain, and a C-terminal cysteine-rich domain [
39]. Overexpression of protein qPE9–1 has been found to be responsible for increased grain size and yield in rice.
In the past few years, PCR based InDel markers have gained popularity in diversity studies because of their reproducibility, ease of use, and co-dominant inheritance [
40]. InDel markers have been extensively utilized as powerful phylogenetic markers for mapping and other genetic studies in different crops [
41,
42,
43,
44,
45,
46,
47,
48]. Here, based on deletion insertion polymorphisms (DIPs), InDel markers were deployed successfully to study marker trait association and genetic variations. InDels are becoming more famous, as their genotyping requires a low start-up cost, and because they are efficient, relatively simple, and applicable to a wide range of species for which expressed sequence tag (EST) collections are available. Therefore, the leading goals of this study were to assess the efficacy of InDels to (1) estimate the population structure, allelic frequencies, and genetic variation in diverse germplasms comprising 204 rice genotypes; (2) sort the germplasms based on the distribution of the InDel marker loci; (3) assess the allele based contribution of the target genes and their association with individual traits; (4) and engagement of InDel markers to understand the genetics of traits for efficient breeding [
49,
50].
3. Discussion
Before this study, the size and weight related traits of rice grains, including grain length, grain width, grain thickness, and thousand grain weight, had not been explored at the same time using a large and diverse germplasm group with the help of InDel markers. New Generation Sequencing (NGS) tools have yielded advanced, cheaper, and more efficient methods for developing such markers. This investigatory research efficaciously shows the capable utilization of deletion / insertion variations (DIV) that naturally accrue in the rice genome. InDel polymorphisms are the second most abundant (after SNPs) forms of genetic variations in animals and plants, with great diversity. Moreover, previous studies only showed the contributions of the reported genes in regulating these grain traits separately [
11,
15,
16,
19,
20,
59,
60]. However, recent studies have shown the modes of InDel based allelic contributions in the expression of grain size, as well as weight related traits, the allelic combination of all the InDel marker loci involved in the final texture of the rice grains, and the potential of each loci in regulating these traits in rice. This research can assist in selecting or deselecting genes for rapid breeding strategies. The correlation coefficient values suggest that the thousand grain weight (TGW) was positively influenced by the other studied grain size traits but with a different aptitude. The TGW can be improved by using all three studied grain traits—most importantly GT, which contributes 49%, followed by GW and GL, which contribute 37.4% and 24.9% of the final grain weight, respectively. These findings are in agreement with those of previous studies [
19,
54,
61,
62], which showed that grain weight is significantly correlated with grain size. The present study further showed that the thickness of rice grains was most strongly correlated with grain weight, followed by width, whereas the length of the grains was the least associated, suggesting that selecting for grain thickness is more fruitful for heavier grains (
Table 2,
Figure 4).
In the present study, two hundred and four rice genotypes were used to investigate alleles in nine different genes that regulate size and weight in rice grains, using InDel markers. In the past few years, PCR based InDel markers have gained popularity in variation studies because of their reproducibility, easy to use nature, and co-dominant inheritance [
40]. Dendrograms were used to separate the germplasms according to their grain size and weight and divide the InDel marker data into distinguishable clusters that could be used for breeding preferential grain appearances and weights. Results of the genetic diversity (D) analysis and InDel based polymorphism information content (PIC) values (
Table 6) indicate that the InDel markers (
GW8-InDel,
GS5-InDel1A,
GW8-InDel2B,
GS2-InDel1A) are highly informative (D ≥ 0.5; PIC ≥ 0.4) for the studied traits, whereas the rest of the markers were found to be moderately (D 0.3–0.5; PIC 0.2–0.4) to slightly/less (D ≤ 0.3; PIC ≤ 0.2) informative. These InDel markers show potential to be efficiently used to study the genetic variations (DIVs) in rice germplasm. Only two InDel markers showed very low values (gene diversity ≤0.3; PIC ≤ 0.2) for D and PIC, indicating that the deployed markers were fairly informative [
20]. Furthermore, this investigatory research also showed the capability of InDel markers to distinguish a diverse rice germplasm into distinctive groups (
Figure 6) with different combinations of grain lengths, widths, thicknesses, and weights that could be used to breed desirable genotypes with better potential for higher market value rice grains and heavier grains for a better yield.
This study successfully identified 25 InDel marker derived loci highly associated (
p ≤ 0.05) with grain size and weight in rice (
Table 4). A total of 38 alleles were identified, out of which 27 alleles were major and were found in more than 20 genotypes. In the case of GL, five markers (
GW8-InDel,
GW8-InDel1A,
GW8-InDel2B,
GS2-InDel1A, and
GS2-InDel2B) corresponding to two genes (
GW8 and
GS2) were found to have a highly significant association with GL at
p ≤ 0.01. Similarly, two markers (
GS3-InDel and
GS5-InDel1A) corresponding to two genes (
GS3 and
GS5) were found to have a significant association with GL at
p ≤ 0.05. In the case of GT, five markers (
GW8-InDel,
GW8-InDel1A,
GW8-InDel2B,
GS2-InDel1A, and
GS2-InDel2B), corresponding to two genes (
GW8 and
GS2), were found to have a significant association with GT (at
p ≤ 0.01). Similarly, two markers (
GS3-InDel and
GS7-InDel), corresponding to genes
GS3 and
GS7, were found to have a significant association with GT (at
p ≤ 0.05).
For GW, five markers (GW8-InDel, GW8-InDel1A, GW8-InDel2B, GS2-InDel1A, and GS2-InDel2B), corresponding to genes GW8 and GS2, respectively, were found to have a significant association with GW at p ≤ 0.01. Similarly, one marker (GW5-InDel), corresponding to gene GW5, was found to have a significant association with GW at p ≤ 0.1. For TGW, one marker (GW8-InDel2B), corresponding to gene GW8 was found to have a significant association with GW at p ≤ 0.01. Similarly, three markers (GW5-InDel, GS5-InDel1A, and GS5-InDel2B), corresponding to two genes (GW8 and GS5) were found to have a significant association with GW at p ≤ 0.05.
The
SLG7 gene is known to regulate the grain size in rice via increased cell division, longitudinally resulting in longer grains [
21]. In the present study, the
SLG7-InDel marker showed a significant (
p ≤ 0.05) association with the thousand grain weight (TGW), which is also in agreement with the results of other researchers. This gene encodes the TONNEAU1-recruiting motif protein, which was found by many researchers [
37] to be responsible for grain appearance by altering cell divisions, thus having significant effects in regulating grain weight, as well. Notably, GW5 gene was found to have a significant contribution in controlling GW and TGW (
p ≤ 0.05), also revealed by previous studies [
23,
24,
25,
26], which showed that this gene encodes a calmodulin-binding protein and
GW5 physically interacts with calmodulin AsCaM1-1, which is responsible for grain width in rice. Recent studies have also demonstrated that this gene is responsible for regulating TGW (a significant correlation with
p ≤ 0.05), as TGW is directly and highly associated with GW (
Table 4), thereby confirming its utilization in grain yield improving objectives in rice breeding programs. Previous studies also identified this gene for controlling seed width and weight in rice [
23,
24,
63].
Previous studies showed that
GS3 was among the first reported genes to have minor effects on grain thickness and width. The domains on its encoded protein have been reported to regulate cell divisions in the upper epidermis of the glume inside the rice seed, causing minor effects on cell size [
29]. In the present study, the
GS3-InDel marker was found to be significantly associated (
p ≤ 0.05) with grain length and grain thickness, which is consistent with previous reports [
20,
28,
31]. Another InDel marker for the
GS7 gene (
GS7-InDel) was also found to be significantly (
p ≤ 0.05) related with grain length and thickness. Our studies showed that the investigated alleles for both genes
GS3 and
GS7 affectively regulated grain length and thickness in the rice (
Table 3). Previous studies [
20,
30] also reported that the germplasm carrying different alleles of the
GS3 gene with different allele combinations of
GS7 produced different grain lengths and thicknesses. Shao et al. [
30] also reported that
GS7 is a strong QTL known to regulate grain size and controls grain length, roundness (thickness), and area (size) in rice. Ngangkham et al. [
20] also found this gene to be associated with GL and GT, thereby playing a significant role in regulating grain size. For
GW2 gene, the results for gene associations with any trait were non-significant, but in previous studies, the gene was found to control the grain width in rice grains [
12]. Ngangkham et al. [
20] also found no association of this gene with any of these traits using STS (Sequence-Tagged Sites) markers, thus emphasizing the ineffectuality of the markers used. This might be due to inter- and/or intra-allelic interactions that may be subjected to further studies.
Among all the studied genes for grain size and weight,
GW8 was detected to represent a highly significant (
p ≤ 0.01) association with all the grain size related traits, thereby suggesting its great importance in regulating grain size in rice. The
GW8 bearing genotypes were reported to have a higher grain length and grain length-width ratio [
20]. The scanning results of the electron microscopy analysis of the lemma in
GW8.1 carrying NILs showed that the inner epidermal cell length was higher than the lines without this gene, indicating that
GW8.1 might be responsible for regulating cell elongation [
64].
GW8 (OsSPL16) encodes a protein that is positively associated with cell proliferation [
31]. Its higher expression promoted cell division and grain filling, consequentially increasing grain width and yield in rice. Another study also suggested that
GW8 suppresses the expression of the GW7 gene and plays a significant role in controlling grain size [
21]. In the present study,
GW8 was determined to regulate grain length, width, and thickness. All three InDel markers successfully distinguished the germplasm into two alleles: the A-allele, which is responsible for shorter, thicker, and wider grains, and the B-allele, which carries genotypes possessing longer but narrower grains. This is due to the fact that grain length (GL) is negatively correlated with grain width (GW) and grain thickness (GT), as depicted by correlation analysis in
Table 2. Based on these results, the B-allele carrying germplasm may be selected to breed longer grains, and the A-alleles may be screened for broader, shorter, and thicker grains. However, all three traits (i.e., GL, GT, and GW) contributed to the thousand grain weight (as suggested by the positive correlation between GL, GT, and GW with thousand grain weight), assuming that both alleles contribute to an increased yield. Two markers (i.e.,
GW8-InDel and
GW8-InDel1A) were shown to have highly significant (
p ≤ 0.0001) associations with GL, GT, and GW, thus indicating the ample potential of InDel markers in variation studies and genome-wide association studies. The third marker (i.e.,
GW8-InDel2B) for the
GW8 gene was also identified to have a highly significant (
p ≤ 0.01) association with GL and GW (
Table 5). However, unlike the other two InDel markers for the
GW8 gene,
GW8-InDel2B also showed a highly significant (
p ≤ 0.0001) relationship with GT and the thousand grain weight (
Table 4), indicating its potential to be used for all four studied traits to improve the grain size and grain weight in rice.
Two InDel based markers were used for the
GS5 gene, and both of these markers showed a significant (
p ≤ 0.05) association with only the thousand grain weight. These findings are partially inconsistent with other studies [
32,
65], which suggested that the
GS5 gene is associated with grain width and grain weight in rice. Previously, Lee et al. [
63] attained three types of alleles by applying the markers generated from the promoter-region of the
GS5 gene, thus demonstrating the relatedness of this gene with grain weight. However, in another study, this gene was reported to participate in the regulation of grain length and grain width [
63]. This might be due to higher genetic and/or allelic interactions with other genes/alleles that must be studied more comprehensively. This gene has been reported to have significant importance in regulating grain yield, as concluded by Li et al. [
32], who showed that this gene encoded proteins—i.e., the putative serine carboxypeptidase executes its function as a positive regulator of a subset of the transition genes (G1-to-S) of the cell cycle, thereby causing increased cell divisions and resulting in enhanced grain filling and grain weight.
The Present study further explored the previously reported gene
GS2 to be highly associated (
p ≤ 0.001) with all three-grain size related traits. Out of the three markers, two markers for this gene (including
GS2-InDel1A and
GS2-InDel2B) showed the potential for GL, GT, and GW, whereas the marker
GS2-InDel showed no association with any trait (
Table 4). For both markers 1A and 1B for this gene, the germplasm was separated into two groups carrying A- and B-alleles with different grain size traits. In the case of InDel1A, the A-allele was associated (
p ≤ 0.001) with a shorter grain length (7.66 ± 0.820 mm) with thicker and wider grains, whereas its B-allele had a germplasm with a longer (8.44 ± 0.908 mm) grain length and narrower and slander grains (
Table 5). Conversely, for the other marker,
GS2-InDel2B, the A-allele was detected to relate to the germplasm with the longest grain lengths (9.32 ± 0.951 mm) and narrowest grains (the GW average is 2.65 ± 0.303 mm) in the whole germplasm, whereas the genotypes carrying the B-allele possessed shorter grains (7.81 ± 0.742 mm) and wider grains (3.01 ± 201 mm), as depicted in
Table 4. This finding suggests that these InDel markers can further be investigated to breed for >9 mm grain lengths. Previous studies also showed that this gene directly controls two important grain size related traits, including grain width and grain length in rice. Researchers showed that its overexpression increased cell enlargement and enhanced cell division in the grain, thus producing longer and wider rice grains [
33,
34,
35,
36].
This study further investigated the favorable alleles in the studied germplasm to improve the grain length (>8 mm) with heavier grains. Identifying the beneficial alleles of the target traits is one of the most important prerequisites to improve modern cultivars via introgressions of favorable alleles from a vast gene pool using marker assisted selection approaches. This investigatory research discovered seven favorable alleles for grain length that can be utilized to improve grain size, while keeping in mind the recent criterion for longer grains with improved grain sizes and weights.
This research explored 7 genes and 11 InDel marker associations with grain size and weight related traits in rice. The present study further showed that InDel markers may be used efficiently in research investigations related to genetic variations, genome-wide association studies, germplasm genetic characterization, gene mapping, and other studies to further develop the ease and efficiency of breeding procedures and create more desirable varieties to cope with climate change and food security risks.