*2.4. Analysis of Population Structures*

Based on the ∆K value, the analysis of population structures divided 97 out of 132 accessions into six subpopulations (Figures 1–3). Group 1 contained five diploid (D3) accessions collected from Santa Cruz Island. Group 2 had 32 accessions of *G. tomentosum* (AD3) obtained from a Hawaiian Island. Group 3 was composed of 14 accessions and was demarcated with the accession of *G. darwinii* (AD5) collected from Isabella Island but also including one from the China Wild Cotton Germplasm Nursery. Group 4 had 10 accessions of *G. ekmanianum* (AD6) collected from the Dominican Republic, NPGS, USA. Group 5 contained 19 accessions of *G. darwinii* that were collected from San Cristobal. Group 6 had 17 accessions of *G. barbadense* (AD2); of them, two were collected from the China Wild Cotton Germplasm Nursery (Supplementary Table S5). Based on a phylogenetic analysis using the Unweighted Pair-Group Method using Arithmetic average (UPGMA), the same accessions were placed under discriminating subgroups having significant genetic distance in accordance with the geographical locations of the collection. The results were further validated by using Shannon's information index to determine the genetic diversity among six populations. It was found that population 3 (Isabella) had the highest degree of heterozygosity with 55.5% polymorphic loci (Figure 4).

*Int. J. Mol. Sci.* **2018**, *19*, x FOR PEER REVIEW 5 of 21

**Figure 2.** Delta K for 132 accessions. **Figure 2.** Delta K for 132 accessions. **Figure 2.** Delta K for 132 accessions. **Figure 2.** Delta K for 132 accessions.

genotypic data using STRUCTURE software ver. 2.2. Each accession is indicated by vertical bars. The color subsections within each vertical bar represent the membership coefficient (Q) of the accession to different colors. Six groups were identified. The identified groups are I (red), II (lime), III (Blue), IV (yellow), V (Fuchsia), and VI (Aqua) colors in regular patterns. **Figure 3.** Q plot showing clustering of 132 accessions in 6 subpopulations based on an analysis of genotypic data using STRUCTURE software ver. 2.2. Each accession is indicated by vertical bars. The color subsections within each vertical bar represent the membership coefficient (Q) of the accession to different colors. Six groups were identified. The identified groups are I (red), II (lime), III (Blue), IV (yellow), V (Fuchsia), and VI (Aqua) colors in regular patterns. **Figure 3.** Q plot showing clustering of 132 accessions in 6 subpopulations based on an analysis of genotypic data using STRUCTURE software ver. 2.2. Each accession is indicated by vertical bars. The color subsections within each vertical bar represent the membership coefficient (Q) of the accession to different colors. Six groups were identified. The identified groups are I (red), II (lime), III (Blue), IV (yellow), V (Fuchsia), and VI (Aqua) colors in regular patterns. **Figure 3.** Q plot showing clustering of 132 accessions in 6 subpopulations based on an analysis of genotypic data using STRUCTURE software ver. 2.2. Each accession is indicated by vertical bars. The color subsections within each vertical bar represent the membership coefficient (Q) of the accession to different colors. Six groups were identified. The identified groups are I (red), II (lime), III (Blue), IV (yellow), V (Fuchsia), and VI (Aqua) colors in regular patterns.

**Figure 3.** Q plot showing clustering of 132 accessions in 6 subpopulations based on an analysis of

*Int. J. Mol. Sci.* **2018**, *19*, x FOR PEER REVIEW 6 of 21

**Figure 4.** Alleleic patterns across populations. **Figure 4.** Alleleic patterns across populations.

#### *2.5. Genetic Diversity and Cluster Analysis of Phylogenetic Tree 2.5. Genetic Diversity and Cluster Analysis of Phylogenetic Tree*

A total of 382 alleles, generated by 111 EST-SSRs, were used to run UPGMA for generating the dendrogram. Based on Nei's criteria [52], the genetic distance among wild cotton accessions ranged from 0.003 to 0.529 with an average of 0.325. The highest genetic distance (0.529) was between D3k-21-3 and AD5-lz. The phylogenetic tree was in agreement with the structure results with the exception that *G. hirsutum* and *G. stephensii* sit in different clusters in the phylogenetic tree but in the structure analysis these were grouped together. In order to see how the results correspond to each other between the STRUCTURE and phylogenetic analyses, the dendrogram was manually edited to show the STRUCTURE grouping (Figure 5 and Supplementary Figure S1). Six groups identified in the structure analysis were also clustered together in the phylogenetic tree analysis. Overall, there was good agreement between the two estimates. The clustering pattern also showed agreement with relationships based on pedigree studies [53]. The first two axes of the principal coordinate analysis (PCoA) accounted for 42.2% of the variation (Figure 6). This indicates a high level of genetic diversity in the *Gossypium* germplasm with continuous variation between and within the subgroups. Analysis of molecular variance (AMOVA) revealed highly significant variation between the six groups identified by the structure analysis, with 49% of the total variation contributing to between-group differences. However, a larger amount of variation (51%) was due to diversity within the groups having different populations (Table 2). Pairwise FST analysis revealed that accessions from Pop 3 (Isabella region) are closer to accessions from the San Cristobal (Pop 5) and Santa Cruz regions (Pop 6) as compared with the Hawaiian accessions. The highest genetic differentiation was observed among tetraploid populations between accessions from the Hawaiian (Pop 2) and Santa Cruz (Pop 6) regions with a pairwise FST of 0.752 (*p* < 0.001) (Table 3). A cluster analysis clearly discriminated diploid wild-type cotton from other tetraploid wild-types. These accessions were collected from different locations, namely the Galapagos Islands, Hawaii, the Dominican Republic, Wake Atoll, and the Wild Cotton Germplasm Nursery of China. A total of 382 alleles, generated by 111 EST-SSRs, were used to run UPGMA for generating the dendrogram. Based on Nei's criteria [52], the genetic distance among wild cotton accessions ranged from 0.003 to 0.529 with an average of 0.325. The highest genetic distance (0.529) was between D3k-21-3 and AD5-lz. The phylogenetic tree was in agreement with the structure results with the exception that *G. hirsutum* and *G. stephensii* sit in different clusters in the phylogenetic tree but in the structure analysis these were grouped together. In order to see how the results correspond to each other between the STRUCTURE and phylogenetic analyses, the dendrogram was manually edited to show the STRUCTURE grouping (Figure 5 and Supplementary Figure S1). Six groups identified in the structure analysis were also clustered together in the phylogenetic tree analysis. Overall, there was good agreement between the two estimates. The clustering pattern also showed agreement with relationships based on pedigree studies [53]. The first two axes of the principal coordinate analysis (PCoA) accounted for 42.2% of the variation (Figure 6). This indicates a high level of genetic diversity in the *Gossypium* germplasm with continuous variation between and within the subgroups. Analysis of molecular variance (AMOVA) revealed highly significant variation between the six groups identified by the structure analysis, with 49% of the total variation contributing to between-group differences. However, a larger amount of variation (51%) was due to diversity within the groups having different populations (Table 2). Pairwise FST analysis revealed that accessions from Pop 3 (Isabella region) are closer to accessions from the San Cristobal (Pop 5) and Santa Cruz regions (Pop 6) as compared with the Hawaiian accessions. The highest genetic differentiation was observed among tetraploid populations between accessions from the Hawaiian (Pop 2) and Santa Cruz (Pop 6) regions with a pairwise FST of 0.752 (*p* < 0.001) (Table 3).

The dendrogram was truncated at a genetic distance level of (0.05) and divided 132 cotton genotypes into seven clusters (Supplementary Figure S1). A cluster analysis clearly discriminated diploid wild-type cotton from other tetraploid wild-types. These accessions were collected from different locations, namely the Galapagos Islands, Hawaii, the Dominican Republic, Wake Atoll, and the Wild Cotton Germplasm Nursery of China. The dendrogram was truncated at a genetic distance level of (0.05) and divided 132 cotton genotypes into seven clusters (Supplementary Figure S1).

*Int. J. Mol. Sci.* **2018**, *19*, x FOR PEER REVIEW 7 of 21

*Int. J. Mol. Sci.* **2018**, *19*, x FOR PEER REVIEW 7 of 21

**Figure 5.** Dendrogram of 132 wild cotton accessions by Unweighted Pair-Group Method using Arithmetic average (UPGMA) analysis. Colors in the dendrogram lines correspond to *Gossypium* accession populations as identified by structure analysis while the colors in the circle represent the seven species. A membership threshold of 70% was used to assign accessions to different clusters in this dendrogram based on structure analysis. **Figure 5.** Dendrogram of 132 wild cotton accessions by Unweighted Pair-Group Method using Arithmetic average (UPGMA) analysis. Colors in the dendrogram lines correspond to *Gossypium* accession populations as identified by structure analysis while the colors in the circle represent the seven species. A membership threshold of 70% was used to assign accessions to different clusters in this dendrogram based on structure analysis. **Figure 5.** Dendrogram of 132 wild cotton accessions by Unweighted Pair-Group Method using Arithmetic average (UPGMA) analysis. Colors in the dendrogram lines correspond to *Gossypium* accession populations as identified by structure analysis while the colors in the circle represent the seven species. A membership threshold of 70% was used to assign accessions to different clusters in this dendrogram based on structure analysis.

**Figure 6.** Three-dimensional principal coordinate analysis (PCOA) of a *Gossypium* accessions diversity panel genotyped with expressed sequence tags (EST) and Genomic simple sequence **Figure 6.** Three-dimensional principal coordinate analysis (PCOA) of a *Gossypium* accessions diversity panel genotyped with expressed sequence tags (EST) and Genomic simple sequence **Figure 6.** Three-dimensional principal coordinate analysis (PCOA) of a *Gossypium* accessions diversity panel genotyped with expressed sequence tags (EST) and Genomic simple sequence repeats (SSRs). The different colors in the figure correspond to six clusters: Red (Cluster I), orange (Cluster II), yellow (Cluster III), Bright green (Cluster IV), Sky blue (Cluster V), Blue (Cluster VI).

**Table 2.** Analysis of molecular variance for wild cotton accessions among and within six populations as identified by STRUCTURE.


<sup>(</sup>PhiPT < 0.493; \*\* significance at *p* < 0.001).

**Table 3.** Pairwise Fst estimates for the five groups corresponding to six regions of accession collections as identified by STRUCTURE.

