Next Article in Journal
A Genotype/Phenotype Study of KDM5B-Associated Disorders Suggests a Pathogenic Effect of Dominantly Inherited Missense Variants
Previous Article in Journal
A Novel Pathogenic TUBA1A Variant in a Croatian Infant Is Linked to a Severe Tubulinopathy with Walker–Warburg-like Features
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identification of QTNs and Their Candidate Genes for Boll Number and Boll Weight in Upland Cotton

by
Xiaoshi Shi
1,†,
Changhui Feng
2,†,
Hongde Qin
2,
Jingtian Wang
1,
Qiong Zhao
1,
Chunhai Jiao
3,* and
Yuanming Zhang
1,*
1
College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
2
Institute of Industrual Crops, Hubei Academy of Agricultural Sciences, Wuhan 430064, China
3
Hubei Academy of Agricultural Sciences, Wuhan 430064, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Genes 2024, 15(8), 1032; https://doi.org/10.3390/genes15081032
Submission received: 2 July 2024 / Revised: 2 August 2024 / Accepted: 5 August 2024 / Published: 5 August 2024
(This article belongs to the Section Plant Genetics and Genomics)

Abstract

:
Genome-wide association study (GWAS) has identified numerous significant loci for boll number (BN) and boll weight (BW), which play an essential role in cotton (Gossypium spp.) yield. The North Carolina design II (NC II) genetic mating population exhibits a greater number of genetic variations than other populations, which may facilitate the identification of additional genes. Accordingly, the 3VmrMLM method was employed for the analysis of upland cotton (Gossypium hirsutum L.) in an incomplete NC II genetic mating population across three environments. A total of 204 quantitative trait nucleotides (QTNs) were identified, of which 25 (24.75%) BN and 30 (29.13%) BW QTNs were of small effect (<1%) and 24 (23.76%) BN and 20 (19.42%) BW QTNs were rare (<10%). In the vicinity of these QTNs, two BN-related genes and two BW-related genes reported in previous studies were identified, in addition to five BN candidate genes and six BW candidate genes, which were obtained using differential expression analysis, gene function annotation, and haplotype analysis. Among these, six candidate genes were identified as homologs of Arabidopsis genes. The present study addresses the limitation of heritability missing and uncovers several new candidate genes. The findings of this study can provide a basis for further research and marker-assisted selection in upland cotton.

1. Introduction

Cotton is a major cash crop, and the yield of cotton is crucial in maintaining the stability of the global textile industry’s supply chain and fostering the growth of ancillary industries [1,2]. As living standards increase, the demand for cotton fiber across a range of sectors is rising in parallel. The cultivation of high-yielding cotton varieties remains a central objective in both cotton breeding programmes and agricultural practices [3].
The boll number (BN) and boll weight (BW) are significant components of cotton lint yield and have been demonstrated to be quantitative traits that are polygenically controlled [4]. A substantial number of genes influencing cotton yield-related traits have been identified. GhLYI-A02 and GhLYI-D08 have been demonstrated to influence lint percentage and boll number per plant [5], GhCEN has been shown to influence boll number [6], while GhADF1 has been found to influence boll number and boll weight [7]. Based on the above findings, numerous studies have employed marker-assisted breeding to refine cotton breeding programmes [8,9]. Consequently, the identification and utilization of related genes provides an effective approach to enhancing cotton yield.
Genome-wide association study (GWAS) is an effective genetic method for the identification of associations between genetic markers and quantitative traits [10,11]. It can rapidly and precisely identify quantitative trait loci (QTLs) and provide detailed information that contributes to the identification and understanding of quantitative trait nucleotides (QTNs) and candidate genes, thereby enhancing the precision of genetic studies [12,13]. GWAS has been extensively employed to identify genes associated with a range of cotton traits, including GhZF14, which affects fiber length [14], and GhRD2 and GhNAC4, which affect cold resistance [15], demonstrating the potential for improving agronomic traits. These findings have laid a crucial theoretical foundation for the development of superior cotton varieties, especially those with the potential for increased yield [16,17,18,19,20].
At present, numerous GWAS methods and corresponding software packages have been developed [21,22,23,24,25,26,27,28]. Li et al. [28] established a 3VmrMLM model for detecting and estimating all the possible effects of QTNs while controlling for all the possible polygenic backgrounds, and they developed the IIIVmrMLM software package to dissect the genetic basis of complex traits [29]. This method reduces the number of variance components in the mixed linear model and increases the power of identifying the genes with allelic substitution effect close to zero, small effects, and rare allele frequency [28,29,30], thereby enabling comprehensive analysis of the genetic basis of quantitative traits.
The North Carolina II (NC II) genetic mating design comprises two parental groups: one comprising the father and the other comprising the mother. The two groups cross, resulting in the production of hybrid offspring [31]. The NC II population’s genetic diversity is enhanced through the identification of genetic markers associated with target traits, which may facilitate the identification of additional genes [32,33,34]. However, most of the research conducted on the NC II population employed conventional GWAS methods that are not optimal for the NC II population, which may have resulted in the loss of heritability. Thus, the 3VmrMLM method was used to analyze the data in the NC II design.
In this study, we employed 3VmrMLM to associate BN and BW phenotypes in three environments with 3,480,274 single nucleotide polymorphisms (SNPs) in an incomplete NC II population of upland cotton. The objective of this study was to identify candidate genes for BN and BW to provide a foundation for further research and marker-assisted selection for cotton yield.

2. Results

2.1. Phenotypic Variation and Analysis of Variance

A statistical analysis, comprising descriptive statistics and analysis of variance, was performed on the phenotypic data of BN and BW of upland cotton using R v4.3.3 software (Table 1; Figure 1A,B). The mean BN ranged from 259.55 to 264.19 (counts), while the mean BW ranged from 4.88 g to 5.74 g, respectively. The standard deviation for BN was between 31.74 and 38.28, while that for BW was between 0.34 and 0.41, respectively. The BN and BW phenotypic distributions were continuous and slightly skewed (Figure 1A), being quantitative traits. The coefficients of variation ranged from 12.05% to 14.49% for BN and from 6.85% to 7.29% for BW. The coefficients of variation of best linear unbiased prediction (BLUP) values for both traits were smaller due to the removal of environmental variation. The broad-sense heritabilities for BN and BW were 69.79% and 74.55%, respectively, with mean BLUP values of 262.40 counts and 5.20 g, respectively. These results indicated that the two traits, BN and BW, exhibited high inter-individual phenotypic variability, rendering this population suitable for GWAS.
BW exhibited highly significant differences among the three environments (F = 59.83~377.80; p < 0.001), and BW in 2019 Ezhou (5.74 ± 0.41 g) was significantly higher than 2018 Wuhan (4.98 ± 0.34 g) and 2019 Wuhan (4.88 ± 0.36 g). However, the differences in three environments for BN were not significant (F = 0.83; p = 0.437), as illustrated in Figure 1B.

2.2. Identification of QTNs for Boll Number and Boll Weight Using the Single Environment Analysis Module of 3VmrMLM

GWAS was conducted using single environment analysis module of 3VmrMLM on the genotypic data of 240 upland cotton cultivars/lines, along with their BN and BW phenotypic data in the three environments and their BLUP values. A total of 101 QTNs were identified for BN (Figure 2) and 103 for BW (Figure 3; Table 2). The additive effects of QTNs for BN ranged from −34.75 to 33.14, and the dominant effects ranged from −30.09 to 26.11, explaining 0.18% to 7.87% of the total phenotypic variance. The additive effects of QTNs for BW ranged from −0.60 to 0.35, and the dominant effects ranged from −0.46 to 0.54, explaining 0.21% to 8.48% of the total phenotypic variance.
The total variance explained by the QTLs in three environments were 638.13, 503.94, and 137.73 for BN, and 0.038, 0.054, and 0.097 for BW, respectively, which were 69.79% (BN) and 74.55% (BW) to the sum of QTN size (Table 3). These results indicate that a significant proportion of trait heritabilities (48.12% for BN and 59.64% for BW on average) is explained by the GWAS results.
Three QTNs were identified repeatedly in multiple datasets. Marker639503 was identified in both 2019 Wuhan and BLUP values for BN, while marker1655415 and marker2032522 were detected in both 2019 Ezhou and BLUP values for BW.

2.3. Candidate Gene Prediction for Boll Number and Boll Weight

Candidate genes were predicted by differential expression analysis, Gene Ontology (GO) annotation, and haplotype analysis. The cotton ovary contains numerous ovules, with the elongated epidermal cells of each ovule differentiating to form cotton fibers. The development of the ovule affects the development of the cotton boll, which in turn has an important effect on BN and BW [35]. For the remaining QTNs without the related genes reported in previous studies, their nearby differentially expressed genes (DEGs) in ovules were analyzed. A total of 79, 110, 50, and 84 DEGs were identified to be around significant and suggested QTNs for BN in the 2018 Wuhan, 2019 Wuhan, 2019 Ezhou, and BLUP datasets, respectively, while 59, 36, 148, and 142 DEGs were identified as being approximately significant and suggested QTNs for BW in the 2018 Wuhan, 2019 Wuhan, 2019 Ezhou, and BLUP datasets, respectively. Subsequent GO annotation of these DEGs yielded 37 and 48 potential candidate genes for BN and BW, respectively.
The 2 kb sequence upstream of the transcription start site is recognised as the promoter region, and SNPs within this region and potential candidate genes have been extracted [36]. Haplotype analysis revealed the significant associations between five candidate genes and BN (Figure 2) and between six candidate genes and BW (Figure 3; Table 3).
Table 3. Candidate genes in proximity to QTNs for boll number (BN) and boll weight (BW) in upland cotton. I: 2018 Wuhan; II: 2019 Wuhan; III: 2019 Ezhou.
Table 3. Candidate genes in proximity to QTNs for boll number (BN) and boll weight (BW) in upland cotton. I: 2018 Wuhan; II: 2019 Wuhan; III: 2019 Ezhou.
TraitChrPosi (bp)LOD Scoresr2 (%)Gene Differential Expression Analysisp-Value in Haplotype AnalysisGO Annotation Analysis
IIIIIIBLUPGene_IDlog2 (Fold Change)p-ValueGO_IDGO_NameE-ValueReference
BNA10108368842 15.92 4.22GH_A10G22583.777.75 × 10−135.53 × 10−3GO:0009739response to gibberellin<1.00 × 10−300[37]
A11121669424 9.27 1.18GH_A11G37514.483.07 × 10−41.27 × 10−4GO:0009738abscisic acid-activated signaling pathway<1.00 × 10−300[38]
D0115378670 13.94 1.52GH_D01G1053−2.081.73 × 10−121.38 × 10−2GO:0009409response to cold<1.00 × 10−300[39]
D0451658044 7.01 1.75GH_D04G1642−2.831.26 × 10−104.73 × 10−2GO:0048481plant ovule development6.22 × 10−150
D085543570 14.971.28GH_D08G05234.1902.50 × 10−3GO:0071215cellular response to abscisic acid stimulus<1.00 × 10−300[38]
BWA03110026226 18.97 3.20GH_A03G2175−2.402.78 × 10−31.10 × 10−2GO:0009737response to abscisic acid<1.00 × 10−300[38]
A048107026813.91 2.37GH_A04G1254−3.703.33 × 10−101.44 × 10−3GO:0009409response to cold3.98 × 10−169[39]
A0712833357 14.36 1.98GH_A07G0935−2.478.52 × 10−75.03 × 10−4GO:0009651response to salt stress5.38 × 10−219[40]
A082669815 27.56 8.48GH_A08G02922.121.55 × 10−22.23 × 10−4GO:0009738abscisic acid-activated signaling pathway1.31 × 10−143[38]
D0851949701 149.89 1.81GH_D08G16042.565.12 × 10−32.79 × 10−2GO:0009409response to cold4.27 × 10−273[39]
D112810467 15.53 2.01GH_D11G0356−3.672.15 × 10−62.72 × 10−3GO:0009409response to cold8.41 × 10−278[39]

2.4. Alignment of Candidate Genes and Arabidopsis Homologous Sequences

The blast tool of TAIR (https://www.arabidopsis.org, accessed on 19 May 2024) was employed to analyze the homologous genes of candidate genes in Arabidopsis thaliana. A comparison of Arabidopsis homologous genes revealed three and three key candidate genes for BN and BW, respectively (Table 4).

3. Discussion

3.1. QTNs Detection in This Study Recovered Some Heritability

In this study, 3VmrMLM was applied to analyze the BN and BW of cotton incomplete NC II data. The total genetic variances explained by all the QTLs for BN and BW were 33.58% and 44.46%, respectively, on average in the three environments, accounting for 48.12% BN and 59.64% BW heritabilities, respectively (Table 1 and Table 2). As we know, quantitative traits are controlled by genes and gene-by-gene interactions and modified by environments and gene-by-environment interactions, where these genes include some major genes and many polygenes each with a relatively small effect. Based on our study in Wang et al. [30], it is difficult for conventional GWAS methods to identify small allele substitution effects, dominant effects, and rare loci. In this study, only the QTNs with additive and dominant effects were identified. Although the loss of heritability of complex traits is common in GWAS [47], the above proportions (48.12% and 59.64%) are relatively high, indicating that this study recovered some heritability. More importantly, a total of four trait-related genes and six candidate genes around 204 QTNs were identified by the new method (Figure 2 and Figure 3), offering novel insights for cotton breeding and GWAS in incomplete NC II populations.

3.2. Related Genes around QTNs for Boll Number and Boll Weight

Previous studies have reported some genes associated with BW and BN. Trait-related genes were identified within a 500 kb region upstream and downstream of the QTNs [36]. Flower bud differentiation represents the developmental basis of cotton bud appearance, flowering, and boll setting and is an important factor affecting cotton yield [48]. Two BN-related and two BW-related genes reported in previous studies were identified for BN and BW, respectively (Table 5; Figure 2 and Figure 3). GhMADS37 is highly expressed in apical buds and flowers. GhMADS27 is highly expressed in flowers [49]. Overexpression of GhMADS22 delays senescence and abscission of floral organs and is responsive to abscisic acid [50]. In addition, the inhibition of GhGlu19 expression significantly increases BW, BN, as well as lint percentage, and also enhances seed vigor, resulting in a significant increase in cotton yield [51].

3.3. The Interaction of Boll Number and Boll Weight with Environment in Upland Cotton

The phenotypic data for BN exhibited no significant divergence across the three environments, whereas those for BW did (Figure 1B), suggesting that BW may be more susceptible to environmental influences than BN. A multi-environmental joint analysis of the phenotypic data for BW in the three environments was therefore conducted using 3VmrMLM (Table S2). However, only one potential candidate gene-environment interaction was identified, namely GH_D09G2342. The growth of cotton bolls is significantly influenced by ambient temperature [39], and GH_D09G2342 plays a role in regulating the response of upland cotton to heat, which may be involved in interactions between upland cotton and the environment.
In this study, several candidate genes for BW have also been identified as being associated with environmental responses. GH_A03G2175 regulates abscisic acid response, GH_A07G0935 regulates response to salt stress, and GH_A04G1254, GH_D08G1604, and GH_D11G0356 regulate response to cold. The aforementioned results indicate that these genes are likely to play a role in the interaction between BW and the environment.

3.4. New Candidate Genes for Boll Number and Boll Weight in Upland Cotton

Some candidate genes are considered to be key, such as GH_D04G1642, which is involved in ovule development. Its homologue in Arabidopsis thaliana, TT16, regulates ovule development, and tt16 mutants exhibit reduced endosperm development following fertilisation [43]. The potential application of such gene in marker-assisted selection could facilitate the development of cotton varieties with enhanced yield stability under diverse environmental conditions.
Furthermore, some candidate genes may regulate BN and BW through phytohormones. GH_A11G3751 is involved in the abscisic acid signalling pathway, and its Arabidopsis homologue, CPK11, positively regulates calcium-mediated abscisic acid signalling. Seedlings overexpressing CPK11 exhibit a significant increase in sensitivity to abscisic acid [41], a trait that could be harnessed to develop cotton varieties with enhanced resilience to stress. GH_A08G0292 is involved in the abscisic acid signalling pathway, and its homologue, LEC1, is crucial for development of Arabidopsis embryo. Together with ABI3, LEC2, and FUS3, it constitutes the LAFL regulatory network, which controls key processes in seed development and maturation, with ABI3 playing a supporting role in the abscisic acid response [44]. Regulation of these genes could lead to enhanced cotton yield and quality.
In addition to the aforementioned genes, other candidate genes are involved in the response to cold stimuli, indicating that multiple genes may influence cotton boll development through the response to temperature. GH_D01G1053 may be a valuable marker for the selection of cotton with enhanced cold tolerance, given that its homologue, CAT3, is involved in the regulation of the biological clock in Arabidopsis and is sensitive to changes in temperature and light conditions, with expression peaking especially in the early evening hours. In contrast, flowering time is significantly prolonged in the cat3 mutant [42]. The homologue of GH_D08G1604, LCBK2, is involved in PHS-P formation in Arabidopsis. The lcbk2 mutant exhibits a blocked PHS-P formation under cold stimulation, indicating an important role for LCBK2 in cold signalling. GH_D08G1604 could be a key gene for enhancing cold tolerance in upland cotton through marker-assisted selection [45]. Furthermore, GH_D11G0356, which has an Arabidopsis homologue, MDH, that displays increased expression under cold conditions, may also serve as a potential target for enhancing cold resistance in upland cotton [46].
The above genes may account for the sensitivity of upland cotton boll development to environmental changes. Future studies should aim to validate the role of these candidate genes in the environmental response of BW and explore their specific mechanisms of action under various environmental conditions. Furthermore, studies should be expanded to include yield-related traits, as well as other traits, and their interactions with environmental factors.

3.5. Comparison of GWAS Results with and without PCA

Incorporating the principal component analysis (PCA) result as the population structure identified 19, 24, 19, and 20 QTNs in BN and 19, 20, 20, and 20 QTNs in BW (Table S4). Only in 2019 Ezhou, 2 BN- and 2 BW-related genes were identified (Table S5), indicating that the inclusion of PCA as a population structure has a negative impact on the accuracy of GWAS results. Therefore, PCA was not considered in this study.

4. Materials and Methods

4.1. Plant Material and Phenotypic Data

In this study, a collection of 60 upland cotton cultivars, sourced from diverse geographical regions, were selected to serve as parental lines (Table S6). Specifically, the first 30 cultivars were designated as male parents, while cultivars numbered 31 to 60 were utilized as female parents. In the Figure 4, each X corresponds to an F1 hybrids progeny, and its corresponding row and column are the numbers of its paternal and maternal, respectively. These hybrid combinations were meticulously designed in Hainan during 2017. The above 60 parents, along with their 180 F1 hybrid progeny, which collectively represent 240 upland cotton cultivars/lines, were subsequently planted.
To evaluate the performance of upland cotton under varied conditions, a total of 240 upland cotton cultivars/lines were cultivated in three distinct experimental settings. The seedling raising of these cultivars/lines commenced on 17 April 2018 and 2019 at the Wuhan Experimental Base of the Institute of Economic Crops, Hubei Academy of Agricultural Sciences, followed by transplanting on 11 May. Furthermore, in 2019, a parallel planting was conducted at the Ezhou Experimental Base of the same institute, utilising direct seeding on 25 April. In all environments, a consistent randomised block design was implemented, featuring single-row plots, each with an area of 3.34 m2 and row spacing of 0.76 m, and triplicate repetitions for each cultivar/line.
In both 2018 and 2019, a comprehensive survey was conducted during the period around 20 September to enumerate the total boll count in each plot. From the central portion of each row, a sample of 50 bolls that had fluffed normally was collected for further analysis. Subsequently, the average weight of this boll sample was calculated.

4.2. Statistical Analysis and Analysis of Variance of Phenotypic Data

Descriptive statistical analysis and variance analysis were conducted to evaluate the performance of upland cotton phenotypic data across different environments. The statistical measures, including maximum, minimum, mean, standard deviation (SD), coefficient of variation (CV), kurtosis, and skewness, for BN and BW in the 2018 Wuhan, 2019 Wuhan, and 2019 Ezhou datasets were calculated using the R package psych v2.4.3.
The R package lme4 v1.1-35.2 was employed to calculate the BLUP values of each individual and the heritability of each trait. The BLUP values were calculated using the model: y i j k = μ + G k + E i + R j i + G E i k + ϵ i j k , where y i j k is the observed value of the jth repetition of the kth cultivar/line in the ith environment, µ is the overall mean, G k is the fixed effect of the kth cultivar/line, E i is the random effect of the ith environment, R j i is the random effect of the jth repetition nested in the ith environment, G E i k is the random effect of the interaction between the ith environment and kth cultivar/line, and ϵ i j k is the residual error. The heritability was calculated using the following formula: h B 2 = σ g 2 σ g 2 + σ e 2 / l × 100 % , where σ g 2 represents the genetic variance, σ e 2 represents the error variance, l represents the number of environments.
The R function aov was employed to identify significant differences in BN and BW over the different environments. Subsequently, the R package ggplot2 v3.5.0 was employed to generate box plots visualizing these differences.

4.3. Genotype Data

Genomic DNA was extracted from young fresh leaves of 60 upland cotton parents using the CTAB method, as previously described [52]. The raw sequencing data were base-called to generate raw reads, which were then subjected to quality control using the fastp v0.23.4 software [53] to produce clean reads suitable for further analysis. The high-quality reads obtained after filtering were aligned to the reference genome using the BWA v0.7.17 software [54], and the poor-quality reads were discarded. The resulting SAM files were sorted, and PCR duplicates were identified and removed using the Picard tool. The sorted BAM files were then indexed to facilitate subsequent analysis. Read counts for each sample were determined using the SAMtools v1.11 software [55]. Population-wide SNP and Indel detection and genotyping were performed using the GATK v4.1.3.0 software [56].
Subsequently, a paired-end (PE) library was constructed for each sample according to the Illumina Library Construction Protocol. This involved randomly shearing the genomic DNA into fragments of 300–500 bp, repairing the ends of these fragments, and ligating adapters to create the sequencing junctions. Subsequently, DNA clusters were prepared on sequencing chips, and the libraries were sequenced online using the Illumina HiSeq 4000 platform with a PE150 sequencing strategy, yielding comprehensive genomic data for each cultivar.
This process yielded 3,480,274 SNP markers for each upland cotton cultivar. The genotypes of the 180 F1 hybrids were inferred based on the genotypes of the parental cultivars. The sequencing of the upland cotton cultivars was completed by Wuhan Gooalgene Technology Co. (Wuhan, China).

4.4. Multi-Locus GWAS

The R package IIIVmrMLM [29] was employed to conduct GWAS on 3,480,274 SNP markers and phenotypic values of both BN and BW, along with their BLUP values across 3 environments, in 240 upland cotton cultivars/lines. All parameters were set to their default values. The K-matrix was calculated using the IIIVmrMLM v1.0 software; the population structure was not included in the mixed model. Significant and suggested QTNs were determined by p ≤ 0.05/m and LOD ≥ 3.0, respectively, where m is the number of markers [28].

4.5. Identification of Related Genes for Boll Number and Boll Weight

The genes related to the BN and BW have been summarised through gene annotations available on CottonFGD (https://cottonfgd.org/, accessed on 22 April 2024) and CottonGVD (https://db.cngb.org/cottonGVD, accessed on 22 April 2024), as well as by reviewing relevant literature. Genes around QTNs were mined within a 500 kb region upstream and downstream [36], using the reference genome of upland cotton from the GCGI (http://cotton.hzau.edu.cn, accessed on 22 April 2024).

4.6. Candidate Genes Prediction for Boll Number and Boll Weight

The ovule transcriptome data between upland cotton TM-1 and sea island cotton Hai7124 (accession number: GSE119184) [57] were used to perform differential expression analysis using R package DEGseq v1.48 [58]. The threshold for significance was set at p < 0.05 and log 2 F o l d   C h a n g e > 2 [59]. The DEGs from ovules at 10 DPA and 20 DPA ovules were pooled for subsequent analyses. The upland cotton genomic data in CottonFGD (https://cottonfgd.net/about/download.html, accessed on 14 April 2024) was used as a reference genome.
The protein sequence information of the upland cotton TM-1 genome was downloaded from IBI (http://ibi.zju.edu.cn/, accessed on 14 May 2024). The protein sequences of the differentially expressed genes in the ovules from the previous step were extracted using TBtools [60], and were then annotated using AgBase (https://agbase.arizona.edu, accessed on 14 May 2024) [61] for GO annotation, with significant e-values set to 10 50 , to identify genes whose biological processes are associated with bud differentiation or flowering, and thus potentially eligible as potential candidate genes.
Sequences 2 kb upstream of the transcription start site were considered as promoter regions and SNPs were extracted within potential candidate genes and their promoter regions [36]. Significant SNP information within the above potential candidate genes as well as 2 kb upstream was extracted using R v4.3.3. ANOVA was performed using the R function aov, and the significant level was set at 0.05.

4.7. Homologous Genes in Arabidopsis thaliana

The candidate gene sequences of upland cotton were extracted from CottonFGD (https://cottonfgd.net/about/download.html, accessed on 14 May 2024) and then entered into TAIR (https://www.arabidopsis.org) for comparison with their homologous genes in Arabidopsis thaliana.

5. Conclusions

3VmrMLM was employed to associate the BN and BW phenotypes in three environments and their BLUP values with 3,480,274 SNPs in 240 upland cotton cultivars/lines. A total of 204 QTNs explaining 0.18% to 8.48% of the phenotypic variance were identified to be associated with BN and BW in upland cotton. Of the identified QTNs, 25 (24.75%) BN and 30 (29.13%) BW QTNs were small (<1%), while 24 (23.76%) BN and 20 (19.42%) BW QTNs were rare (<10%). A total of four trait-related genes, GhMADS37, GhMADS27, GhMADS22, and GhGlu19, were identified in the vicinity of the QTNs. Six key candidate genes were identified: GH_A11G3751, GH_D01G1053, GH_D04G1642, GH_A08G0292, GH_D08G1604, and GH_D11G0356. This study addresses the challenge of missing heritability and expands the genetic repertoire available for marker-assisted selection. The newly identified candidate genes provide a solid foundation for advancing research and for the improvement of cotton yield through marker-assisted breeding strategies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes15081032/s1. Tables S1 and S2: Detailed information of QTNs and QEIs for BN and BW in upland cotton GWAS; Table S3: PCA of 240 upland cotton cultivars/lines; Tables S4 and S5: QTNs and trait related genes for BN and BW in upland cotton GWAS with PCA; Table S6. Names and origins of 60 parents.

Author Contributions

C.J., H.Q. and Y.Z. conceived and managed the research and revised the manuscript. X.S., C.F., J.W. and Q.Z. analyzed datasets and wrote the draft. X.S. drew all the figures and tables. C.F., H.Q. and C.J. measured the phenotypes of these traits and the genotypes of molecular markers. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 32070557 and No. 32270673).

Institutional Review Board Statement

The study did not involve humans or animals.

Informed Consent Statement

This study did not involve human studies.

Data Availability Statement

The data used in this study and the original analysis results are available from the Online Tables or from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Khan, M.A.; Wahid, A.; Ahmad, M.; Tahir, M.T.; Ahmed, M.; Ahmad, S.; Hasanuzzaman, M. World Cotton Production and Consumption: An Overview. In Cotton Production and Uses: Agronomy, Crop Protection, and Postharvest Technologies; Ahmad, S., Hasanuzzaman, M., Eds.; Springer: Singapore, 2020; pp. 1–7. ISBN 9789811514722. [Google Scholar]
  2. Han, T.; Liu, G.; Zhang, L. The Global Cotton Trade Network Reveals a Shift in the Cotton Import Center to the Global South from 1986 to 2020. J. Rural Stud. 2024, 108, 103262. [Google Scholar] [CrossRef]
  3. Zhang, T.; Xuan, L.; Mao, Y.; Hu, Y. Cotton Heterosis and Hybrid Cultivar Development. Theor. Appl. Genet. 2023, 136, 89. [Google Scholar] [CrossRef] [PubMed]
  4. Yang, X.; Wang, Y.; Zhang, G.; Wang, X.; Wu, L.; Ke, H.; Liu, H.; Ma, Z. Detection and Validation of One Stable Fiber Strength QTL on C9 in Tetraploid Cotton. Mol. Genet. Genomics 2016, 291, 1625–1638. [Google Scholar] [CrossRef] [PubMed]
  5. Fang, L.; Wang, Q.; Hu, Y.; Jia, Y.; Chen, J.; Liu, B.; Zhang, Z.; Guan, X.; Chen, S.; Zhou, B.; et al. Genomic Analyses in Cotton Identify Signatures of Selection and Loci Associated with Fiber Quality and Yield Traits. Nat. Genet. 2017, 49, 1089–1098. [Google Scholar] [CrossRef] [PubMed]
  6. Liu, D.; Teng, Z.; Kong, J.; Liu, X.; Wang, W.; Zhang, X.; Zhai, T.; Deng, X.; Wang, J.; Zeng, J.; et al. Natural Variation in a CENTRORADIALIS Homolog Contributed to Cluster Fruiting and Early Maturity in Cotton. BMC Plant Biol. 2018, 18, 286. [Google Scholar] [CrossRef] [PubMed]
  7. Qin, L.; Zhang, H.; Li, J.; Zhu, Y.; Jiao, G.; Wang, C.; Wu, S. Down-Regulation of GhADF1 in Cotton (Gossypium hirsutum) Improves Plant Drought Tolerance and Increases Fiber Yield. Crop J. 2022, 10, 1037–1048. [Google Scholar] [CrossRef]
  8. Hulse-Kemp, A.M.; Lemm, J.; Plieske, J.; Ashrafi, H.; Buyyarapu, R.; Fang, D.D.; Frelichowski, J.; Giband, M.; Hague, S.; Hinze, L.L.; et al. Development of a 63K SNP Array for Cotton and High-Density Mapping of Intraspecific and Interspecific Populations of Gossypium spp. G3 Genes|Genomes|Genet. 2015, 5, 1187–1209. [Google Scholar] [CrossRef]
  9. Gapare, W.; Liu, S.; Conaty, W.; Zhu, Q.-H.; Gillespie, V.; Llewellyn, D.; Stiller, W.; Wilson, I. Historical Datasets Support Genomic Selection Models for the Prediction of Cotton Fiber Quality Phenotypes Across Multiple Environments. G3 Genes|Genomes|Genet. 2018, 8, 1721–1732. [Google Scholar] [CrossRef] [PubMed]
  10. Risch, N.; Merikangas, K. The Future of Genetic Studies of Complex Human Diseases. Science 1996, 273, 1516–1517. [Google Scholar] [CrossRef]
  11. Klein, R.J.; Zeiss, C.; Chew, E.Y.; Tsai, J.-Y.; Sackler, R.S.; Haynes, C.; Henning, A.K.; SanGiovanni, J.P.; Mane, S.M.; Mayne, S.T.; et al. Complement Factor H Polymorphism in Age-Related Macular Degeneration. Science 2005, 308, 385–389. [Google Scholar] [CrossRef]
  12. Gupta, P.K.; Rustgi, S.; Kulwal, P.L. Linkage Disequilibrium and Association Studies in Higher Plants: Present Status and Future Prospects. Plant Mol. Biol. 2005, 57, 461–485. [Google Scholar] [CrossRef] [PubMed]
  13. Khan, S.U.; Saeed, S.; Khan, M.H.U.; Fan, C.; Ahmar, S.; Arriagada, O.; Shahzad, R.; Branca, F.; Mora-Poblete, F. Advances and Challenges for QTL Analysis and GWAS in the Plant-Breeding of High-Yielding: A Focus on Rapeseed. Biomolecules 2021, 11, 1516. [Google Scholar] [CrossRef]
  14. Wang, M.; Qi, Z.; Thyssen, G.N.; Naoumkina, M.; Jenkins, J.N.; McCarty, J.C.; Xiao, Y.; Li, J.; Zhang, X.; Fang, D.D. Genomic Interrogation of a MAGIC Population Highlights Genetic Factors Controlling Fiber Quality Traits in Cotton. Commun. Biol. 2022, 5, 60. [Google Scholar] [CrossRef] [PubMed]
  15. Li, B.; Chen, L.; Sun, W.; Wu, D.; Wang, M.; Yu, Y.; Chen, G.; Yang, W.; Lin, Z.; Zhang, X.; et al. Phenomics-Based GWAS Analysis Reveals the Genetic Architecture for Drought Resistance in Cotton. Plant Biotechnol. J. 2020, 18, 2533–2544. [Google Scholar] [CrossRef] [PubMed]
  16. Zhang, T.; Qian, N.; Zhu, X.; Chen, H.; Wang, S.; Mei, H.; Zhang, Y. Variations and Transmission of QTL Alleles for Yield and Fiber Qualities in Upland Cotton Cultivars Developed in China. PLoS ONE 2013, 8, e57220. [Google Scholar] [CrossRef] [PubMed]
  17. Wang, P.; He, S.; Sun, G.; Pan, Z.; Sun, J.; Geng, X.; Peng, Z.; Gong, W.; Wang, L.; Pang, B.; et al. Favorable Pleiotropic Loci for Fiber Yield and Quality in Upland Cotton (Gossypium hirsutum). Sci. Rep. 2021, 11, 15935. [Google Scholar] [CrossRef] [PubMed]
  18. Zhu, G.; Hou, S.; Song, X.; Wang, X.; Wang, W.; Chen, Q.; Guo, W. Genome-Wide Association Analysis Reveals Quantitative Trait Loci and Candidate Genes Involved in Yield Components under Multiple Field Environments in Cotton (Gossypium hirsutum). BMC Plant Biol. 2021, 21, 250. [Google Scholar] [CrossRef] [PubMed]
  19. Sun, F.; Yang, Y.; Wang, P.; Ma, J.; Du, X. Quantitative Trait Loci and Candidate Genes for Yield-Related Traits of Upland Cotton Revealed by Genome-Wide Association Analysis under Drought Conditions. BMC Genom. 2023, 24, 531. [Google Scholar] [CrossRef] [PubMed]
  20. Shui, G.; Lin, H.; Ma, X.; Zhu, B.; Han, P.; Aini, N.; Gou, C.; Wu, Y.; Pan, Z.; You, C.; et al. Identification of SSR Markers Linked to the Abscission of Cotton Boll Traits and Mining Germplasm in Cotton. J. Cotton Res. 2024, 7, 20. [Google Scholar] [CrossRef]
  21. Zhang, Z.; Ersoz, E.; Lai, C.-Q.; Todhunter, R.J.; Tiwari, H.K.; Gore, M.A.; Bradbury, P.J.; Yu, J.; Arnett, D.K.; Ordovas, J.M.; et al. Mixed Linear Model Approach Adapted for Genome-Wide Association Studies. Nat. Genet. 2010, 42, 355–360. [Google Scholar] [CrossRef]
  22. Kang, H.M.; Sul, J.H.; Service, S.K.; Zaitlen, N.A.; Kong, S.-Y.; Freimer, N.B.; Sabatti, C.; Eskin, E. Variance Component Model to Account for Sample Structure in Genome-Wide Association Studies. Nat. Genet. 2010, 42, 348–354. [Google Scholar] [CrossRef] [PubMed]
  23. Zhou, X.; Stephens, M. Genome-Wide Efficient Mixed-Model Analysis for Association Studies. Nat. Genet. 2012, 44, 821–824. [Google Scholar] [CrossRef] [PubMed]
  24. Liu, X.; Huang, M.; Fan, B.; Buckler, E.S.; Zhang, Z. Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies. PLoS Genet. 2016, 12, e1005767. [Google Scholar] [CrossRef] [PubMed]
  25. Wang, S.-B.; Feng, J.-Y.; Ren, W.-L.; Huang, B.; Zhou, L.; Wen, Y.-J.; Zhang, J.; Dunwell, J.M.; Xu, S.; Zhang, Y.-M. Improving Power and Accuracy of Genome-Wide Association Studies via a Multi-Locus Mixed Linear Model Methodology. Sci. Rep. 2016, 6, 19444. [Google Scholar] [CrossRef] [PubMed]
  26. Wen, Y.-J.; Zhang, H.; Ni, Y.-L.; Huang, B.; Zhang, J.; Feng, J.-Y.; Wang, S.-B.; Dunwell, J.M.; Zhang, Y.-M.; Wu, R. Methodological Implementation of Mixed Linear Models in Multi-Locus Genome-Wide Association Studies. Brief. Bioinform. 2018, 19, 700–712. [Google Scholar] [CrossRef] [PubMed]
  27. Jiang, L.; Zheng, Z.; Qi, T.; Kemper, K.E.; Wray, N.R.; Visscher, P.M.; Yang, J. A Resource-Efficient Tool for Mixed Model Association Analysis of Large-Scale Data. Nat. Genet. 2019, 51, 1749–1755. [Google Scholar] [CrossRef]
  28. Li, M.; Zhang, Y.; Zhang, Z.; Xiang, Y.; Liu, M.; Zhou, Y.; Zuo, J.; Zhang, H.; Chen, Y.; Zhang, Y. A Compressed Variance Component Mixed Model for Detecting QTNs and QTN-by-Environment and QTN-by-QTN Interactions in Genome-Wide Association Studies. Mol. Plant 2022, 15, 630–650. [Google Scholar] [CrossRef] [PubMed]
  29. Li, M.; Zhang, Y.; Xiang, Y.; Liu, M.; Zhang, Y. IIIVmrMLM: The R and C++ Tools Associated with 3VmrMLM, a Comprehensive GWAS Method for Dissecting Quantitative Traits. Mol. Plant 2022, 15, 1251–1253. [Google Scholar] [CrossRef] [PubMed]
  30. Wang, J.T.; Chang, X.Y.; Zhao, Q.; Zhang, Y.M. FastBiCmrMLM: A fast and powerful compressed variance component mixed logistic model for big genomic case-control genome-wide association study. Brief Bioinform. 2024, 25, bbae290. [Google Scholar] [CrossRef]
  31. Comstock, R.E.; Robinson, H.F. The Components of Genetic Variance in Populations of Biparental Progenies and Their Use in Estimating the Average Degree of Dominance. Biometrics 1948, 4, 254–266. [Google Scholar] [CrossRef]
  32. Li, L.; Sun, C.; Chen, Y.; Dai, Z.; Qu, Z.; Zheng, X.; Yu, S.; Mou, T.; Xu, C.; Hu, Z. QTL Mapping for Combining Ability in Different Population-Based NCII Designs: A Simulation Study. J. Genet. 2013, 92, 529–543. [Google Scholar] [CrossRef] [PubMed]
  33. Wang, H.; Xu, C.; Liu, X.; Guo, Z.; Xu, X.; Wang, S.; Xie, C.; Li, W.X.; Zou, C.; Xu, Y. Development of a Multiple-Hybrid Population for Genome-Wide Association Studies: Theoretical Consideration and Genetic Mapping of Flowering Traits in Maize. Sci. Rep. 2017, 7, 40239. [Google Scholar] [CrossRef] [PubMed]
  34. Xu, Y.; Wang, X.; Ding, X.; Zheng, X.; Yang, Z.; Xu, C.; Hu, Z. Genomic Selection of Agronomic Traits in Hybrid Rice Using an NCII Population. Rice 2018, 11, 32. [Google Scholar] [CrossRef] [PubMed]
  35. Zhang, M.; Zheng, X.; Song, S.; Zeng, Q.; Hou, L.; Li, D.; Zhao, J.; Wei, Y.; Li, X.; Luo, M.; et al. Spatiotemporal Manipulation of Auxin Biosynthesis in Cotton Ovule Epidermal Cells Enhances Fiber Yield and Quality. Nat. Biotechnol. 2011, 29, 453–458. [Google Scholar] [CrossRef] [PubMed]
  36. Wang, Y.; Guo, X.; Xu, Y.; Sun, R.; Cai, X.; Zhou, Z.; Qin, T.; Tao, Y.; Li, B.; Hou, Y.; et al. Genome-Wide Association Study for Boll Weight in Gossypium hirsutum Races. Funct. Integr. Genomics 2023, 23, 331. [Google Scholar] [CrossRef]
  37. Ren, G.; Dong, H.; Chen, Y.; Zhuang, Y.; Shao, F.; Liu, Z. Studies on Endogenous Hormone Changes in the Stem Terminal of Gossypium hirsutum during Flower Bud Differentiation. Acta Bot. Boreali-Occident. Sin. 2002, 22, 321–326. [Google Scholar]
  38. Ren, G.; Chen, Y.; Dong, H.; Chen, S. Studies on Flower Bud Differentiation and Changes of Endogenous Hormones of Gossypium hirsutum. Acta Bot. Boreali-Occident. Sin. 2000, 20, 847–851. [Google Scholar]
  39. Reddy, K.R.; Davidonis, G.H.; Johnson, A.S.; Vinyard, B.T. Temperature Regime and Carbon Dioxide Enrichment Alter Cotton Boll Development and Fiber Properties. Agron. J. 1999, 91, 851–858. [Google Scholar] [CrossRef]
  40. Ju, F.; Pang, J.; Huo, Y.; Zhu, J.; Yu, K.; Sun, L.; Loka, D.A.; Hu, W.; Zhou, Z.; Wang, S.; et al. Potassium Application Alleviates the Negative Effects of Salt Stress on Cotton (Gossypium hirsutum L.) Yield by Improving the Ionic Homeostasis, Photosynthetic Capacity and Carbohydrate Metabolism of the Leaf Subtending the Cotton Boll. Field Crops Res. 2021, 272, 108288. [Google Scholar] [CrossRef]
  41. Zhu, S.; Yu, X.; Wang, X.; Zhao, R.; Li, Y.; Fan, R.; Shang, Y.; Du, S.; Wang, X.; Wu, F.; et al. Two Calcium-Dependent Protein Kinases, CPK4 and CPK11, Regulate Abscisic Acid Signal Transduction in Arabidopsis. Plant Cell 2007, 19, 3019–3036. [Google Scholar] [CrossRef]
  42. Michael, T.P.; Salome, P.A.; McClung, C.R. Two Arabidopsis Circadian Oscillators Can Be Distinguished by Differential Temperature Sensitivity. Proc. Natl. Acad. Sci. USA 2003, 100, 6878–6883. [Google Scholar] [CrossRef] [PubMed]
  43. Prasad, K.; Zhang, X.; Tobón, E.; Ambrose, B.A. The Arabidopsis B-Sister MADS-Box Protein, GORDITA, Represses Fruit Growth and Contributes to Integument Development. Plant J. Cell Mol. Biol. 2010, 62, 203–214. [Google Scholar] [CrossRef] [PubMed]
  44. Tian, R.; Wang, F.; Zheng, Q.; Niza, V.M.A.G.E.; Downie, A.B.; Perry, S.E. Direct and Indirect Targets of the Arabidopsis Seed Transcription Factor ABSCISIC ACID INSENSITIVE3. Plant J. 2020, 103, 1679–1694. [Google Scholar] [CrossRef] [PubMed]
  45. Dutilleul, C.; Benhassaine-Kesri, G.; Demandre, C.; Rézé, N.; Launay, A.; Pelletier, S.; Renou, J.-P.; Zachowski, A.; Baudouin, E.; Guillas, I. Phytosphingosine-Phosphate Is a Signal for AtMPK6 Activation and Arabidopsis Response to Chilling. New Phytol. 2012, 194, 181–191. [Google Scholar] [CrossRef] [PubMed]
  46. Goulas, E.; Schubert, M.; Kieselbach, T.; Kleczkowski, L.A.; Gardeström, P.; Schröder, W.; Hurry, V. The Chloroplast Lumen and Stromal Proteomes of Arabidopsis Thaliana Show Differential Sensitivity to Short- and Long-Term Exposure to Low Temperature. Plant J. Cell Mol. Biol. 2006, 47, 720–734. [Google Scholar] [CrossRef] [PubMed]
  47. Yang, J.; Benyamin, B.; McEvoy, B.P.; Gordon, S.; Henders, A.K.; Nyholt, D.R.; Madden, P.A.; Heath, A.C.; Martin, N.G.; Montgomery, G.W.; et al. Common SNPs Explain a Large Proportion of the Heritability for Human Height. Nat. Genet. 2010, 42, 565–569. [Google Scholar] [CrossRef] [PubMed]
  48. Fang, S.; Gao, K.; Hu, W.; Snider, J.L.; Wang, S.; Chen, B.; Zhou, Z. Chemical Priming of Seed Alters Cotton Floral Bud Differentiation by Inducing Changes in Hormones, Metabolites and Gene Expression. Plant Physiol. Biochem. 2018, 130, 633–640. [Google Scholar] [CrossRef] [PubMed]
  49. Jiang, S.; Pang, C.; Song, M.; Wei, H.; Fan, S.; Yu, S. Analysis of MIKCC−Type MADS-Box Gene Family in Gossypium hirsutum. J. Integr. Agric. 2014, 13, 1239–1249. [Google Scholar] [CrossRef]
  50. Zhang, W.; Fan, S.; Pang, C.; Wei, H.; Ma, J.; Song, M.; Yu, S. Molecular Cloning and Function Analysis of Two SQUAMOSA-Like MADS-Box Genes from Gossypium hirsutum L. J. Integr. Plant Biol. 2013, 55, 597–607. [Google Scholar] [CrossRef]
  51. Guo, W.; Wang, H.; Zhou, X. A cotton GhGlu19 gene and its application in improving cotton yield. Patent CN201911281742.5, 17 March 2023. [Google Scholar]
  52. Paterson, A.H.; Brubaker, C.L.; Wendel, J.F. A Rapid Method for Extraction of Cotton (Gossypium spp.) Genomic DNA Suitable for RFLP or PCR Analysis. Plant Mol. Biol. Report. 1993, 11, 122–127. [Google Scholar] [CrossRef]
  53. Chen, S.; Zhou, Y.; Chen, Y.; Gu, J. Fastp: An Ultra-Fast All-in-One FASTQ Preprocessor. Bioinforma. Oxf. Engl. 2018, 34, i884–i890. [Google Scholar] [CrossRef] [PubMed]
  54. Li, H.; Durbin, R. Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef] [PubMed]
  55. Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map Format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef] [PubMed]
  56. McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; et al. The Genome Analysis Toolkit: A MapReduce Framework for Analyzing next-Generation DNA Sequencing Data. Genome Res. 2010, 20, 1297–1303. [Google Scholar] [CrossRef] [PubMed]
  57. Hu, Y.; Chen, J.; Fang, L.; Zhang, Z.; Ma, W.; Niu, Y.; Ju, L.; Deng, J.; Zhao, T.; Lian, J.; et al. Gossypium Barbadense and Gossypium hirsutum Genomes Provide Insights into the Origin and Evolution of Allotetraploid Cotton. Nat. Genet. 2019, 51, 739–748. [Google Scholar] [CrossRef] [PubMed]
  58. Wang, L.; Feng, Z.; Wang, X.; Wang, X.; Zhang, X. DEGseq: An R Package for Identifying Differentially Expressed Genes from RNA-Seq Data. Bioinforma. Oxf. Engl. 2010, 26, 136–138. [Google Scholar] [CrossRef] [PubMed]
  59. Tian, Z.; Zhang, Y.; Zhu, L.; Jiang, B.; Wang, H.; Gao, R.; Friml, J.; Xiao, G. Strigolactones Act Downstream of Gibberellins to Regulate Fiber Cell Elongation and Cell Wall Thickness in Cotton (Gossypium hirsutum). Plant Cell 2022, 34, 4816–4839. [Google Scholar] [CrossRef] [PubMed]
  60. Chen, C.; Chen, H.; Zhang, Y.; Thomas, H.R.; Frank, M.H.; He, Y.; Xia, R. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol. Plant 2020, 13, 1194–1202. [Google Scholar] [CrossRef]
  61. McCarthy, F.M.; Wang, N.; Magee, G.B.; Nanduri, B.; Lawrence, M.L.; Camon, E.B.; Barrell, D.G.; Hill, D.P.; Dolan, M.E.; Williams, W.P.; et al. AgBase: A Functional Genomics Resource for Agriculture. BMC Genom. 2006, 7, 229. [Google Scholar] [CrossRef]
Figure 1. Analysis of boll number (BN) and boll weight (BW) phenotypes in 240 upland cotton cultivars/lines. (A) Histograms of phenotypic distribution; (B) Box plots. *, and ****: the 0.05, and 0.0001 probability levels of significance, respectively; ns: no significant difference at the 0.05 probability level. Yellow dots: average value.
Figure 1. Analysis of boll number (BN) and boll weight (BW) phenotypes in 240 upland cotton cultivars/lines. (A) Histograms of phenotypic distribution; (B) Box plots. *, and ****: the 0.05, and 0.0001 probability levels of significance, respectively; ns: no significant difference at the 0.05 probability level. Yellow dots: average value.
Genes 15 01032 g001
Figure 2. Manhattan plots for boll number in upland cotton. Trait-related genes around QTNs were marked with blue, while candidate genes around QTNs were marked with dark green. The key candidate genes were marked by asterisk (*). (A) 2019 Wuhan; (B) 2019 Ezhou; (C) BLUP.
Figure 2. Manhattan plots for boll number in upland cotton. Trait-related genes around QTNs were marked with blue, while candidate genes around QTNs were marked with dark green. The key candidate genes were marked by asterisk (*). (A) 2019 Wuhan; (B) 2019 Ezhou; (C) BLUP.
Genes 15 01032 g002
Figure 3. Manhattan plots for boll weight in upland cotton. Trait-related genes around QTNs were marked with blue, while candidate genes around QTNs were marked with dark green. The key candidate genes were marked by asterisk (*). (A) 2018 Wuhan; (B) 2019 Wuhan; (C) 2019 Ezhou.
Figure 3. Manhattan plots for boll weight in upland cotton. Trait-related genes around QTNs were marked with blue, while candidate genes around QTNs were marked with dark green. The key candidate genes were marked by asterisk (*). (A) 2018 Wuhan; (B) 2019 Wuhan; (C) 2019 Ezhou.
Genes 15 01032 g003
Figure 4. Parental combinations of 60 upland cotton cultivars in partial NC II genetic mating design. X: the cross was made.
Figure 4. Parental combinations of 60 upland cotton cultivars in partial NC II genetic mating design. X: the cross was made.
Genes 15 01032 g004
Table 1. The descriptive statistics of phenotypic data for boll number (BN) and boll weight (BW) in 240 upland cotton cultivars/lines.
Table 1. The descriptive statistics of phenotypic data for boll number (BN) and boll weight (BW) in 240 upland cotton cultivars/lines.
TraitLocationMaxMinMeanSDMedianKurtosisSkewnessCV (%) h B 2 (%)
BN2018 Wuhan369.1772.40264.1938.28266.002.49−0.6314.4969.79
2019 Wuhan353.50143.33259.5534.29262.200.71−0.4613.21
2019 Ezhou375.67168.67263.4631.74263.171.220.3512.05
BLUP307.70169.98262.4019.29263.272.63−0.837.35
BW2018 Wuhan5.643.274.980.345.002.66−0.896.8574.55
2019 Wuhan5.803.624.880.364.900.87−0.467.29
2019 Ezhou7.154.575.740.415.700.910.257.18
BLUP5.774.205.200.235.211.73−0.704.35
Table 2. Numbers of all the QTNs and their genetic variances and heritabilities for boll number (BN) and boll weight (BW) of upland cotton in different environments.
Table 2. Numbers of all the QTNs and their genetic variances and heritabilities for boll number (BN) and boll weight (BW) of upland cotton in different environments.
TraitEnvironmentNumbers of QTNsTotal Genetic Variance
Explained by the QTNs
Heritabilities of
All the QTNs (%)
BN2018 Wuhan23638.1343.74
2019 Wuhan28503.9443.37
2019 Ezhou18137.7313.64
BLUP32
BW2018 Wuhan210.03832.87
2019 Wuhan200.05442.05
2019 Ezhou350.09758.46
BLUP27
Table 4. Key candidate genes of boll number (BN) and boll weight (BW) in upland cotton and their homologous genes in Arabidopsis thaliana.
Table 4. Key candidate genes of boll number (BN) and boll weight (BW) in upland cotton and their homologous genes in Arabidopsis thaliana.
TraitCandidate Genes in CottonHomologous Genes in Arabidopsis thaliana
HomologyGene NameE-ValueReference
BNGH_A11G3751AT1G35670CPK11<1 × 10−100[41]
GH_D01G1053AT1G20620CAT3<1 × 10−100[42]
GH_D04G1642AT5G23260TT163 × 10−90[43]
BWGH_A08G0292AT1G21970LEC17 × 10−58[44]
GH_D08G1604AT2G46090LCBK2<1 × 10−100[45]
GH_D11G0356AT3G47520MDH<1 × 10−100[46]
Table 5. Related genes, reported in previous studies, around QTNs for boll number (BN) and boll weight (BW) in upland cotton. I: 2018 Wuhan; II: 2019 Wuhan; III: 2019 Ezhou. Trait-related genes with blue were repeatedly identified.
Table 5. Related genes, reported in previous studies, around QTNs for boll number (BN) and boll weight (BW) in upland cotton. I: 2018 Wuhan; II: 2019 Wuhan; III: 2019 Ezhou. Trait-related genes with blue were repeatedly identified.
TraitChrPosition (bp)LOD Scoresr2 (%)Comparative Genomics AnalysisReference
IIIIIIBLUPRelated GenesDistance (kb)
BNA064,326,968 36.29 6.280.66~2.72GhMADS37324.57[49]
D0266,249,932 4.58 1.60GhMADS2727.16[49]
BWA079,022,998~9,339,027 8.26; 15.49 1.18~1.86GhMADS2276.59~392.61[50]
D0561,069,275 13.26 1.66GhGlu19173.58[51]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, X.; Feng, C.; Qin, H.; Wang, J.; Zhao, Q.; Jiao, C.; Zhang, Y. Identification of QTNs and Their Candidate Genes for Boll Number and Boll Weight in Upland Cotton. Genes 2024, 15, 1032. https://doi.org/10.3390/genes15081032

AMA Style

Shi X, Feng C, Qin H, Wang J, Zhao Q, Jiao C, Zhang Y. Identification of QTNs and Their Candidate Genes for Boll Number and Boll Weight in Upland Cotton. Genes. 2024; 15(8):1032. https://doi.org/10.3390/genes15081032

Chicago/Turabian Style

Shi, Xiaoshi, Changhui Feng, Hongde Qin, Jingtian Wang, Qiong Zhao, Chunhai Jiao, and Yuanming Zhang. 2024. "Identification of QTNs and Their Candidate Genes for Boll Number and Boll Weight in Upland Cotton" Genes 15, no. 8: 1032. https://doi.org/10.3390/genes15081032

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop