**Assessment of Genetic Diversity, Population Structure, and Evolutionary Relationship of Uncharacterized Genes in a Novel Germplasm Collection of Diploid and Allotetraploid** *Gossypium* **Accessions Using EST and Genomic SSR Markers**

**Allah Ditta 1,2,† ID , Zhongli Zhou 1,†, Xiaoyan Cai <sup>1</sup> , Xingxing Wang <sup>1</sup> , Kiflom Weldu Okubazghi 1,3 , Muhammad Shehzad <sup>1</sup> , Yanchao Xu <sup>1</sup> ID , Yuqing Hou <sup>1</sup> , Muhammad Sajid Iqbal <sup>1</sup> , Muhammad Kashif Riaz Khan <sup>2</sup> , Kunbo Wang 1,\* and Fang Liu 1,\* ID**


Received: 23 June 2018; Accepted: 13 August 2018; Published: 14 August 2018

**Abstract:** This study evaluated the genetic diversity and population structures in a novel cotton germplasm collection comprising 132 diploids, including *Glossypium klotzschianum* and allotetraploid cotton accessions, including *Glossypium barbadense*, *Glossypium darwinii*, *Glossypium tomentosum*, *Glossypium ekmanianum*, and *Glossypium stephensii*, from Santa Cruz, Isabella, San Cristobal, Hawaiian, Dominican Republic, and Wake Atoll islands. A total of 111 expressed sequence tag (EST) and genomic simple sequence repeat (gSSR) markers produced 382 polymorphic loci with an average of 3.44 polymorphic alleles per SSR marker. Polymorphism information content values counted 0.08 to 0.82 with an average of 0.56. Analysis of a genetic distance matrix revealed values of 0.003 to 0.53 with an average of 0.33 in the wild cotton collection. Phylogenetic analysis supported the subgroups identified by STRUCTURE and corresponds well with the results of principal coordinate analysis with a cumulative variation of 45.65%. A total of 123 unique alleles were observed among all accessions and 31 identified only in *G. ekmanianum*. Analysis of molecular variance revealed highly significant variation between the six groups identified by structure analysis with 49% of the total variation and 51% of the variation was due to diversity within the groups. The highest genetic differentiation among tetraploid populations was observed between accessions from the Hawaiian and Santa Cruz regions with a pairwise FST of 0.752 (*p* < 0.001). DUF819 containing an uncharacterized gene named yjcL linked to genomic markers has been found to be highly related to tryptophan-aspartic acid (W-D) repeats in a superfamily of genes. The RNA sequence expression data of the yjcL-linked gene Gh\_A09G2500 was found to be upregulated under drought and salt stress conditions. The existence of genetic diversity, characterization of genes and variation in novel germplasm collection will be a landmark addition to the genetic study of cotton germplasm.

**Keywords:** novel accessions; PIC; PCR; EST-gSSRs; genes; genetic distance
