*2.1. Identification and Characterization of TCP Proteins in G. hirsutum*

To identify the *TCP* genes in the *G. hirsutum* genome, protein sequences of Arabidopsis and rice TCPs serve as BLAST search queries, and multiple-alignment was performed. A total of 73 *TCP* genes were identified. All candidate TCP genes were confirmed to encode the conserved TCP domain using the InterProScan database and NCBI's CDD, the Conserved Domain Database [24]. Seven *GhTCP* genes were found to possess the R domain. These characteristic features suggested that they were members of the TCP gene family. Detailed characteristics of the TCP transcription factors in *G. hirsutum* are offered in Table S1. The GhTCP proteins are different in their length, molecular weight (Mw), and theoretical isoelectric point (pI). The mean length and Mw of these proteins was 347 amino acids and 37.58 kDa, respectively. The pI varied from pH 5.80 (GhTCP9) to 10.07 (GhTCP38) with an average of pH 8.09. All the GhTCP proteins were predicted to localize in the nucleus. Proteins were localized at their appropriate subcellular compartment to perform their desired function [3,25].

Unrooted phylogenetic trees were constructed based on the multiple sequence alignment of 73 GhTCP protein sequences and their Arabidopsis and rice homologs. The TCP transcription factors from the three species were distributed in almost all clades, indicating that the TCP family diversified before divergence of these plants. The phylogenetic tree placed the GhTCPs into two classes (Figure 1), as was also found for all species so far. Class I was named the TCP-P or PCF class, and class II was named the TCP-C class. The class II genes were further divided into two groups: CYC/TB1 and CIN. In *G. hirsutum*, CYC/TB1 and CIN were a larger family: For CYC/TB1, approximately twice the size of those of Arabidopsis and rice; and for CIN, approximately five times the size. Seven GhTCP genes belonged to the CYC/TB1 group–in Arabidopsis and rice, three of 24 AtTCPs and three of 21 OsTCPs were grouped into this subfamily. Fifty GhTCPs belonged to the PCF group, and13 AtTCPs and10 OsTCPs were also grouped into this subfamily. CYC/TB-type proteins were divided into two subgroups. One group contained four *G. hirsutum* TCPs, but only one Arabidopsis TCP and none from rice, which indicated that this group was either acquired after the divergence of monocots and dicots or was lost in rice. In *G. hirsutum*, the number of TCP genes was significantly higher than those in tomato, *Citrullus lanatus*, *Arabidopsis*, rice, and *Prunus mume* (Figure 2).

*Int. J. Mol. Sci.* **2018**, *19*, x FOR PEER REVIEW 4 of 14

**Figure 1.** Phylogenetic analysis of TCP proteins from *G. hirsutum*, *Arabidopsis*, and rice. The deduced full-length amino acid sequences were aligned using ClustalX 2.0 and the phylogenetic tree was constructed using MEGA 6.0 by the Neighbor-Joining (NJ) method with 1000 bootstrap replicates. The three subclasses are indicated with different colors. **Figure 1.** Phylogenetic analysis of TCP proteins from *G. hirsutum*, *Arabidopsis*, and rice. The deduced full-length amino acid sequences were aligned using ClustalX 2.0 and the phylogenetic tree was constructed using MEGA 6.0 by the Neighbor-Joining (NJ) method with 1000 bootstrap replicates. The three subclasses are indicated with different colors.

**Figure 2.** TCP family members of *G. hirsutum*, *G. raimondii*, *G. arboretum*, tomato, *Citrullus lanatus*, Arabidopsis, rice, and *Prunus mume.* Different colors represent the different subclasses,and the number of genes in each subclass is shown. Green: *CYC/TB1* genes; Red: *CIN* genes; Blue: *PCF* genes.

*2.2. Genomic Distribution, Gene Structural Organization,and Domain Analysis of GhTCP Genes* 

**Figure 1.** Phylogenetic analysis of TCP proteins from *G. hirsutum*, *Arabidopsis*, and rice. The deduced full-length amino acid sequences were aligned using ClustalX 2.0 and the phylogenetic tree was

The three subclasses are indicated with different colors.

**Figure 2.** TCP family members of *G. hirsutum*, *G. raimondii*, *G. arboretum*, tomato, *Citrullus lanatus*, Arabidopsis, rice, and *Prunus mume.* Different colors represent the different subclasses,and the number of genes in each subclass is shown. Green: *CYC/TB1* genes; Red: *CIN* genes; Blue: *PCF* genes. **Figure 2.** TCP family members of *G. hirsutum*, *G. raimondii*, *G. arboretum*, tomato, *Citrullus lanatus*, Arabidopsis, rice, and *Prunus mume*. Different colors represent the different subclasses, and the number of genes in each subclass is shown. Green: *CYC/TB1* genes; Red: *CIN* genes; Blue: *PCF* genes.

#### *2.2. Genomic Distribution, Gene Structural Organization,and Domain Analysis of GhTCP Genes 2.2. Genomic Distribution, Gene Structural Organization, and Domain Analysis of GhTCP Genes*

The complete genome sequences provided an overview of the chromosomal distribution of these TCP genes. Among the 73 *G. hirsutum* TCPs, 67 members were located on the 22 chromosomes, and the other six were located at six unmapped scaffolds. *GhTCP* genes were unevenly distributed on 22 of the 26 *G. hirsutum* chromosomes, with the number of *TCP* genes per chromosome in the range of 0–8 (Figure S1). Chromosomes, A12 and D11, contained eight and seven genes, respectively, while chromosomes A02, A06, D03, D06, and D13 had no TCP genes.

To better understand the gene structures of GhTCP family genes, we analyzed their exon–intron organization. Overall, 88% of the GhTCPs contained only one exon (Figure S2). Seven GhTCP genes contained one intron and two exons: *GhTCP3*, *GhTCP23*, *GhTCP26*, *GhTCP56*, *GhTCP64*, and *GhTCP66*. Only *GhTCP33* in the CYC/TB1 group possessed four introns and five exons. Losses or gains of exons were identified during the evolution of the PCF group genes. *GhTCP19* comprised seven introns and eight exons, whereas *GhTCP13* consisted of four introns and five exons. Comparing their structural patterns showed the loss of an exon in the middle of the *GhTCP13* sequence. Two PCF class genes contained one intron and two exons, and the remaining PCF class genes contained only one exon. Analysis of the pattern of exon–intron junctions can provide important understanding into the evolution of gene families. Our results suggested that TCP genes maintained a relatively constant exon–intron composition during evolution of the *G. hirsutum* genome.

The conserved motif of TCP proteins in *G. hirsutum* was investigated using Clustal X. The sequences were found to encode a putative TCP-domain protein that contained a bHLH-type motif at the N-terminus (Figure S3). The components of the loop, and helixes I and II, were quite different between class I and II proteins. Within the TCP domain, several putative residues involved in DNA binding were located in the basic region and several putative hydrophobic residues located in helixes I and II. In the basic region, the CIN and CYC/TB1 type proteins contained an insertion of four amino acids. The R domain, an arginine-rich motif of 18–20 residues, was absent from all class I proteins and was mainly present in CYC/TB1 group proteins.
