**InDel Marker Based Estimation of Multi-Gene Allele Contribution and Genetic Variations for Grain Size and Weight in Rice (***Oryza sativa* **L.)**

**Sadia Gull 1, Zulqarnain Haider 2, Houwen Gu 1, Rana Ahsan Raza Khan 2, Jun Miao 1, Tan Wenchen 1, Saleem Uddin 3, Irshad Ahmad <sup>4</sup> and Guohua Liang 1,\***


Received: 27 July 2019; Accepted: 24 September 2019; Published: 28 September 2019

**Abstract:** The market success of any rice cultivar is exceedingly dependent on its grain appearance, as well as its grain yield, which define its demand by consumers as well as growers. The present study was undertaken to explore the contribution of nine major genes, *qPE9~1*, *GW2*, *SLG7*, *GW5*, *GS3*, *GS7*, *GW8*, *GS5*, and *GS2*, in regulating four size and weight related traits, i.e., grain length (GL), grain width (GW), grain thickness (GT), and thousand grain weight (TGW) in 204 diverse rice germplasms using Insertion/Deletion (InDel) markers. The studied germplasm displayed wide-ranging variability in the four studied traits. Except for three genes, all six genes showed considerable association with these traits with varying strengths. Whole germplasm of 204 genotypes could be categorized into three major clusters with different grain sizes and weights that could be utilized in rice breeding programs where grain appearance and weight are under consideration. The study revealed that TGW was 24.9% influenced by GL, 37.4% influenced by GW, and 49.1% influenced by GT. Hence, assuming the trend of trait selection, i.e., GT > GW > GL, for improving TGW in the rice yield enhancement programs. The InDel markers successfully identified a total of 38 alleles, out of which 27 alleles were major and were found in more than 20 genotypes. GL was associated with four genes (*GS3*, *GS7*, *GW8*, and *GS2*). GT was also found to be regulated by four different genes (*GS3*, *GS7*, *GW8*, and *GS2*) out of the nine studied genes. GW was found to be under the control of three studied genes (*GW5*, *GW8*, and *GS2*), whereas TGW was found to be under the influence of four genes (*SLG7*, *GW5*, *GW8*, and *GS5*) in the germplasm under study. The Unweighted Pair Group Method with Arithmetic means (UPGMA) tree based on the studied InDel marker loci segregated the whole germplasm into three distinct clusters with dissimilar grain sizes and weights. A two-dimensional scatter plot constructed using Principal Coordinate Analysis (PCoA) based on InDel markers further separated the 204 rice germplasms into four sub-populations with prominent demarcations of extra-long, long, medium, and short grain type germplasms that can be utilized in breeding programs accordingly. The present study could help rice breeders to select a suitable InDel marker and in formulation of breeding strategies for improving grain appearance, as well as weight, to develop rice varieties to compete international market demands with higher yield returns. This study also confirms the efficient application of InDel markers in studying diverse types of rice germplasm, allelic frequencies, multiple-gene allele contributions, marker-trait associations, and genetic variations that can be explored further.

**Keywords:** rice (*Oryza sativa* L.), grain size and weight; Insertion/Deletion (InDel) markers; multi-gene allele contributions; genetic variation; rice germplasm

#### **1. Introduction**

Rice is the most consumable food commodity in the world and is used as a staple by more than 50% of the world's population [1]. Rice is also the third-highest produced agricultural commodity after sugarcane and maize [2]. It is a highly valuable grain crop with regard to human nutrition, as well as caloric intake, which provides more than one-fifth of the calories being consumed by humans worldwide [3]. The rapid increase in the human population is further boosting its demand. In some countries, rice is the only staple food, whereas in some other countries, rice is consumed as a traditional dish, as well as an important ingredient in different dishes. In the international market, quality of rice grain is highly indicative of its price, which also reflects its sale-ability. Therefore, both yield and grain quality are equally important parameters for varietal improvement in rice breeding programs.

Commercial success of a modern rice cultivar is highly dependent on grain size related traits (e.g., grain length, grain width, and thickness) for quality and grain weight associated traits (most importantly thousand grain weight) for grain yield [4]. Grain size related traits determine the rice's final market value as defined by consumer preferences, which are a combination of grain size, length, and thickness (Figure 1). Some consumers prefer long grains, while some prefer short and bold grains. Likewise, grain yield is determined by three major components, including grain weight, panicles per plant, and grains per panicle. Among these, the most associated trait is grain weight which is determined as the 1000 grain weight. Therefore, grain length, width, and thickness along with the 1000 grain weight, are central benchmarks for breeding grain appearance, as well as yield improvement, in rice. However, due to the quantitatively inherited nature of these traits, breeders hardly rely on phenotypes for their improvement [5]. Therefore, the use of genetic markers is considered superior to phenotyping [6], because such markers are not affected by the environment and are more efficient and reliable compared to phenotypic data.

**Figure 1.** Longitudinal and radial-pattern cross section diagrams of rice grain showing the grain length (GL), grain width (GW), and grain thickness (GT).

A number of Quantitative Trait Loci (QTLs) for grain appearance and weight have already been investigated and reported by different scientists [7–20]. More often, grain length, thickness, and width are regarded as determinants of grain appearance whereas the 1000 grain weight determines grain weight and eventually grain yield. As reported, these traits are under the control of several or many genes and are highly influenced by environmental factors. So far, several major QTLs influencing grain appearance and grain weight have already been characterized and investigated by many researchers. These major genes/QTLs include *qPE9~1* [17], *GW2* [12], *SLG7* [21,22], *GW5* [23–26], *GS3* [27–29], *GS7* [30], *GW8* [31], *GS5* [32], and *GS2* [33–36].

*GS2* (*GRAIN SIZE 2*), also reported as *GL2* (*GRAIN LENGTH 2*) or *PT2* (*PANICLE TRAIT 2*), is a rare allele directly controlling two important grain size related traits, including grain width and grain length in rice. *GS2* is found to encode a transcriptional regulator protein named Growth-Regulating Factor 4 (OsGRF4), which is then targeted by OsmiR396, which is a microRNA causing termination of the OsGRF4 function. Several studies have shown that a 2 bp substitution mutation in *GS2* disturbs the binding of OsmiR396 on OsGRF4, resulting in its overexpression, which in turns increases cell enlargement and enhances cell division in grains, causing longer and wider rice grains [33–36]. *GS3* (GRAIN SIZE 3) was among the first reported genes to have minor effects on grain thickness and width. In published studies on *GS3*, the *GS3* was demonstrated to be a negative regulator of grain size, and its encoded putative transmembrane protein contains a plant-specific organ size regulation (OSR) domain as a negative regulatory motif, whose function is inhibited by its tumor necrosis factor receptor/nerve growth factor receptor (TNFR/NGFR) family, cysteine-rich domain and von Willebrand factor type C (VWFC) domain. All four domains have been reported to regulate cell divisions in the upper epidermis of the glume inside the rice seed, causing minor effects on the cell size [29].

*GS5* (*GRAIN SIZE 5*) has been reported by many researchers who described this gene as a regulator of grain filling and weight. *GS5* promotes cell division in rice seed and, to some extent, elongation of the cells located in the lemma and palea [32]. The encoded protein of *GS5* (i.e., putative serine carboxypeptidase) executes its function as a positive regulator of a subset of the transition genes (G1-to-S) of cell cycle, causing increased cell divisions and resulting in enhanced grain filling and grain weight. Likewise, the *GW2* (*GRAIN WIDTH 2*) gene encodes a protein (RING-type) that has E3 ubiquitin ligase activity, which degrades the ubiquitin–proteasome pathway. *GW2* negatively regulates cell division by suppressing its substrate(s) to proteasomes for regulated proteolysis. The absence or loss of the *GW2* function via its mutation causes enhanced milk filling in grains and enlarged endosperm cells, resulting in a wider spikelet hull [12].

*GW5* (*GRAIN WIDTH 5*), also reported as *SW5* (*SEED WIDTH 5*) and *GSE5*, was investigated by many researchers [23–26], who discovered that *GW5* is negatively associated with rice grain width and weight. Later, it was revealed that *GW5* actually encodes a calmodulin-binding protein, and *GW5* physically interacts with calmodulin AsCaM1-1, which is responsible for grain width in rice. The deletion of *GW5* or its mutations result in wider grains, indicating its negative effects on grain width. Likewise, *SLG7* (*GRAIN LENGTH 7*), also known as *GW7* (*GRAIN WIDTH 7*), has been identified to encode a TONNEAU1-recruiting motif protein responsible for increased cellular division in the longitudinal direction and reduced cell division in the transverse direction [37]. This gene was found to be responsible for grain appearance by altering cell divisions, thereby having significant effects on regulating grain weight, as well.

*GW8* (*GRAIN WIDTH 8*) has been reported as a positive regulator of cell proliferation and has a positive association with seed width and seed weight [31]. It encodes SQUAMOSA promoter-binding protein-like 16 (AsSPL16), that was discovered to regulate the expression of several genes involved in G1-to-S transition, similar to the regulatory role of the *GS5* gene [31,32]. It was revealed that a higher expression of the *GW8* gene promoted cell division and grain filling, resulting in increased grain width and a higher grain yield.

*GS7* (*GRAIN SHAPE 7*), a robust QTL known to regulate grain shape, has been reported [30] to control the grain length, roundness (thickness), and area (size) in rice. Likewise, another gene, *qPE9~1*, also known as *DEP1* (*DENSE AND ERECT PANICLE 1*), encodes a G protein γ subunit found to be involved in the regulation of erect panicles, grains per panicle, nitrogen uptake, and stress tolerance through a G protein signal pathway [17,38]. In another study, the protein was also found to regulate plant architecture, grain size, and grain yield in rice. The qPE9–1 protein contains an N-terminal G gamma-like (GGL) domain, a putative transmembrane domain, and a C-terminal cysteine-rich domain [39]. Overexpression of protein qPE9–1 has been found to be responsible for increased grain size and yield in rice.

In the past few years, PCR based InDel markers have gained popularity in diversity studies because of their reproducibility, ease of use, and co-dominant inheritance [40]. InDel markers have been extensively utilized as powerful phylogenetic markers for mapping and other genetic studies in different crops [41–48]. Here, based on deletion insertion polymorphisms (DIPs), InDel markers were deployed successfully to study marker trait association and genetic variations. InDels are becoming more famous, as their genotyping requires a low start-up cost, and because they are efficient, relatively simple, and applicable to a wide range of species for which expressed sequence tag (EST) collections are available. Therefore, the leading goals of this study were to assess the efficacy of InDels to (1) estimate the population structure, allelic frequencies, and genetic variation in diverse germplasms comprising 204 rice genotypes; (2) sort the germplasms based on the distribution of the InDel marker loci; (3) assess the allele based contribution of the target genes and their association with individual traits; (4) and engagement of InDel markers to understand the genetics of traits for efficient breeding [49,50].

### **2. Results**

#### *2.1. Descriptive Statistics and Phenotypic Variability for Rice Grain Size and Weight*

Descriptive statistics (Table 1) were determined for all four studied traits, i.e., grain length (GL), grain thickness (GT), grain width (GW), and thousand grain weight (TGW), to elaborate the phenotypic variations of the respective traits in 204 rice germplasms. Collected data of whole studied germplasm for each trait is given in Table S1 (as Supplementary Material) The average values (Mean ± Standard Error) of 204 rice genotypes for GL, GW, GT, and TGW were observed to be 8.162 ± 0.065 mm, 2.932 ± 0.019 mm, 2.156 ± 0.012 mm, and 25.858 ± 0.199 g, respectively. The germplasm consisting of 204 rice genotypes showed an appreciable range for the estimated GL: 4.640 (ranging from 6.01 to 10.65 mm). Conversely, GW and GT showed lower range values (i.e., 1.800 and 1.000, respectively), ranging from 2.05 to 3.85 mm and 1.86 to 2.86 mm, respectively. Likewise, TGW was found to have an substantial range of 20 (ranging from 17 g to 37 g), depicting a wide range of variation also suggested by the higher value of variance (i.e., 8.070 in the studied germplasm). On other hand, the variance for GL (0.864) was recorded to be higher than the variance of GW and GT (having a variance of 0.073 and 0.027, respectively) (Table 1). The coefficient of variation (CV%) for all the studied traits (i.e., GL, GW, GT, and TGW) was 11.4%, 9.2%, 7.7%, and 11%, respectively. Kurtosis and skewness both symbolize the modes of gene action [51], and estimate the gene numbers controlling the trait [52], respectively. Estimated values of the skewness and kurtosis for all the studied traits are given in Table 1. Skewness was observed for GL, GT, and TGW, with values of 0.514, −0.148, 1.075, and 1.501, while the estimated kurtosis values were −0.511, 1.092, 1.965, and 1.590, respectively (Table 1).

Figure 2 shows the score plot showing phenotypic variability within the germplasm on a biplot using principal component analysis (PCA), with the first two components representing the maximum proportion (PC1 = 56.8%, PC2 = 31.1%) of the total variation. This shows that sufficient phenotypic variation is present in the germplasm to study genetic variation in the germplasm [53].


**Table 1.** Descriptive statistics for grain length (GL), grain width (GW), grain thickness (GT), and thousand grain weight (TGW) of germplasms containing two hundred and four (204) rice germplasms.

**Figure 2.** Score plot showing variability within the germplasms of 204 rice germplasms on a biplot using principal component analysis, with the first two components representing the maximum proportion (87.9%) of the total phenotypic variation for the studied traits.

Based on clustering, 204 germplasms were classified into three distinct clusters (I, II, III), as depicted in Figure 3. The major cluster, i.e., Cluster I, consisted of 182 genotypes. Cluster II consisted of 8 genotypes, whereas Cluster III had only 14 genotypes. Cluster I was further subdivided into Cluster IA and Cluster IB for simplification. Cluster IA consisted of 78 entries, and Cluster IB contained 104 entries of germplasm. The average grain length of Cluster I was calculated to be 8.01 mm, whereas the average GW and GT were 2.96 mm and 2.16 mm, respectively. The thousand grain weight (TGW) of this group was 25.63 g. Cluster II showed the highest values for the average GL (8.83 mm), indicating that the entries of this group had the maximum grain length. This group also had the maximum GT (2.27 mm), GW (3.02 mm), and the heaviest grain, as indicated by its TGW value (i.e., 30.43 g). Cluster III contained genotypes with a medium grain length (8.25 mm), but their GW (2.71 mm), GT (2.03 mm), and TGW (21.70 g) were the lowest among the groups [54].

**Figure 3.** UPGMA Dendrogram showing variability in 204 rice germplasms in three distinct clusters for the studied traits estimated on the similarity index using the Euclidean distances between the groups.
