*4.5. Gene Mining and Phylogenetic Analysis of DUF819 Proteins*

The complete sequence of Markers SWU15000–SWU15194 mapped for chromosome 6 was downloaded from the Genome database of *G. raimondii* [84] and blastx was used to find the homologue similarity of genes in the genome sequence of *G. raimondii*, *G. arboreum*, *G. hirsutum,* and *G. barbadense*. The mining of genes from the marker regions has been done extensively; see for instance Kirungu et al. [43]. Similarly, the same has been applied by Magwanga et al. [85]. The uncharacterized gene named yjcL of DUF819 (PF008654) was selected for the evolutionary study of genes in sequenced *Gossypium* species. The full-length sequences of DUF819 (PF005684) were downloaded from the pfam database (http://pfam.xfam.org/). The dendrogram was constructed by using Molecular Evolutionary Genetics version 7.0 [86]. The functional description related to domains of uncharacterized proteins has been predicted using the protein sequence of 116 genes downloaded from the Cotton Functional Genomics database (www.cottonfgd.org) [87]. The evolutionary relationship among all selected genes was summed up to provide a clear picture of functions with reference to upregulated genes of the superfamily WDR.

Thus, 115 genes out of 116 were grouped into 6 major clusters and their evolutionary history was inferred using the Neighbor-Joining method [88]. The optimal tree with the sum of branch length equal to 41.56 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches [89]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method [90] and the units for the number of amino acid substitutions per site. The analysis involved 115 amino acid sequences. All ambiguous positions were removed for each sequence pair. There were a total of 1466 positions in the final dataset. Evolutionary analyses were conducted in MEGA7 (Figure 7).

The explored genes were analyzed for their gene features, protein characteristics, and RNA expression using the cotton functional genome database (https://cottonfgd.org/search/), While GO functional classification was done using Agrigo ver. 2.0 software acquiring *Gossypium hirsutum* as the reference genome. The analysis of RNA expression data inferred was then carried out to construct a heatmap using the R statistical software package.
