*2.5. InDel Polymorphism and Assessment of the Genetic Relationship among the 204 Rice Germplasms*

To study the genetic relatedness among the studied germplasm consisting of 204 rice genotypes, fourteen (14) developed Insertion/Deletion (InDel) markers from the nine grain shape, size, and weight related genes were used. Most of the InDel markers amplified 2 bands / alleles, as depicted in Figure 5. Four markers (i.e., *qPE9~1*-InDel, *SLG7*-InDel, *GW5*-InDel, and *GS3*-InDel) showed two alleles per locus, whereas the rest of the markers showed three allele types for their respective genes (Table 5). The major allele frequency ranged from 0.5196 to 0.9069, with an average value of 0.6903. The studied markers also revealed higher gene diversity (D) ranging from 0.1730–0.5246, with an average of 0.4073, indicating that the genotypes in the studied germplasm possess a considerable range of gene variations that can be further exploited. This statement is further strengthened by the higher values of the polymorphism information content (PIC) values, which have an average value of 0.3310 (0.1653–0.4359). The maximum PIC value was observed for the marker *GW8*-InDel (0.4359), followed by *GS5*-InDel1A (0.4145), *GW8*-InDel2B (0.4023), and *GS2*-InDel1A (0.4018), as mentioned in Table 6. The Average value of the observed heterozygosity was very low (0.0193), indicating that the rice germplasm under study was mostly homozygous and had uniform lines. Among all the studied markers, *GS7*-InDel (for *GS7* gene) showed considerable heterozygosity in its results compared to the other markers. The study showed that InDel markers (i.e., *GW8*-InDel, *GW8*-InDel2B, *GS5*-InDel1A, and *GS2*-InDel1A) of the *GW8*, *GS5*, and *GS2* genes are highly informative regarding these traits and can be used to understand the genetic variations in germplasm, whereas the rest of the markers are moderate to slightly informative for these traits in this study (Table 6).

**Table 6.** Estimation of the number of alleles per locus, the major allele frequency, gene diversity (D), expected heterozygosity, and the polymorphism information content (PIC) values in the 204 rice germplasms.


*A* = alleles per locus; *p*ma = major allele frequency; *D* = gene diversity; *H*Exp = expected heterozygosity; PIC = polymorphism information content.

In order to understand genetic variations and distinctiveness in the selected germplasm, an un-weighted neighbor joining tree was constructed (bootstraps value of 10,000) based on a dissimilarity index calculated from allelic data in the bit data format (0 and 1 indicating the absence and presence of alleles, respectively) using the Otsuka–Ochiai coefficient [47,48]. The Formula for calculating the coefficients was

$$dij = 1 - \frac{a}{\sqrt{(a+b)(a+c)}}$$

.

Based on cluster analysis, 204 germplasms were classified into six distinct clusters, encircled separately, as illustrated by the UPGMA tree in Figure 6. The three major and three minor clusters showed distinct grain characteristics. Major clusters were designated as Cluster I, Cluster II, and Cluster III. Cluster I consisted of 80 genotypes that can be further divided into two sub-clusters, IA and IB. Sub-cluster IA consists of 35 genotypes, having the highest average grain length (GL) of 9.58 mm. The average grain width (GW) and average grain thickness (GT) were 2.70 mm and 2.09 mm, respectively, whereas the thousand grain weight (TGW) of this sub-cluster was 27.07 g. Likewise, sub-cluster IB consists of 43 genotypes, with an average GL of 8.78 mm, an average GW of 2.76, an average GT of 2.07, and an average TGW of 25.52 g. The germplasm in cluster I contains extra-long grain rice genotypes that can be utilized for breeding long grain rice varieties. The major Cluster II, consisting of 35 genotypes, comprises the germplasms with an average GL, GW, GT, and TGW of 8.05 mm, 2.95 mm, 2.14 mm, and 25.36 g, respectively. This cluster contains germplasms with long and bolder grains that can be utilized in breeding programs, where grain length along with grain width and thickness are under consideration. Major Cluster III included 74 germplasms, with a mean GL of 7.38, a mean GW of 3.09 mm, a mean GT of 2.29 mm, and a mean TGW of 25.81 g, illustrating that this group consisted of rice germplasms with shorter and bolder grains. This germplasm can be utilized for short and thicker grains with a higher grain weight. The other three minor clusters contained 15 germplasms, as depicted in Figure 6.

**Figure 6.** InDel marker based genetic relationship among the 204 rice germplasm entries, estimated using an un-weighted neighbor joining tree (bootstraps value of 10,000).

#### *2.6. Estimation of Population Genetics for Grain Size and Weight Based on Fourteen InDel Markers*

To estimate the genetic relationship among the populations of the 204 rice genotypes in the germplasm, the total rice germplasm was divided into four sub-populations according to grain type. Extra-long grains with an average grain length (GL) of more than 9.5 mm comprising 23 rice germplasms, long grains with average AGL in the range of 8.5–9.5 mm comprising 48 rice germplasms, medium grains comprising 78 rice germplasms with an average GL between 7.5–8.5 mm, and a short grain type containing 55 germplasms with an average GL below 7.5 mm. Principal Coordinate Analysis (PCoA) was used to establish the genetic relationship of all the germplasms based on fourteen

InDel markers, as depicted in Figure 7. A scatter plot was constructed with two first coordinates that collectively explain 57.3% of the total genetic variation and separate the germplasm into four types of distinct clusters according to their grain types. These results are also in agreement with the results depicted in UPGMA and the dendrogram.

**Figure 7.** Principal Coordinate Analysis (PCoA) of 204 rice germplasms based on 14 InDel markers. A 2D scatter plot distributed the 204 germplasms based on InDel markers into four distinct groups according to the average grain length. Extra-long grains (with AGL > 9.5 mm) are shifted towards the right side, whereas short grain (AGL < 7.5 mm) germplasms shifted towards the left side. Medium grain germplasm is dispersed over the central region of the biplot.

These four sub-populations were further investigated based on the InDel marker binary data in order to separate the total molecular variance into variance within and between sub-populations. The study demonstrated that the total molecular variance was partitioned into two, of which the maximum variance (89%) was observed within the population, and the minimum was found between the populations (11%), as demonstrated in Figure 8. The results extracted from AMOVA analysis (Table 7) clearly showed that there was a maximum (ΦPT = 0.449) and significant (*p* ≤ 0.001) genetic between the short grain and extra-long grain germplasms / sub-populations, while the lowest (ΦPT = 0.035) genetic differentiation was observed between the long grain and medium grain sub-populations.

**Figure 8.** Percentages of molecular variation within and among populations.


**Table 7.** Pair-wise population ΦPT value estimates among the four sub-populations of 204 rice germplasms.

ΦPT values given below the diagonal; calculated for 10,000 permutations.
