1. Introduction
Eggplant (
Solanum melongena L.), a member of the
Solanaceae family, is a popular vegetable in Africa, Asia, and Southern Europe [
1]. In India and China, eggplant is the third most important solanaceous crop after potatoes and tomatoes [
2]. Eggplant is a reliable source of vitamins, minerals, and antioxidants in the human diet. Many of the breeding objectives of vegetable and fruit crops (mainly yield, resistance, or tolerance to biotic and abiotic stresses) are shared by the eggplant. However, there are some specific eggplant breeding traits that include aiming to develop prickleless (stem, leaf, and calyx) eggplant varieties and reduce fruit bitterness [
3].
The availability of diverse genetic materials is critical for the development of new crop varieties [
4]. Crops with a narrow genetic basis are vulnerable to new diseases and other constraints that reduce production, which can result in significant declines in areas of adaptation [
5]. It is becoming increasingly important to develop new eggplant varieties with higher yields and improved agronomic characteristics such as optimal plant architecture and fruit shape, low risk of deterioration during transport, and longer storability. Despite the economic importance of eggplant improvement, its genome has received less attention than that of closely related
Solanaceae species: tomato, potato, and pepper [
3]. However, eggplant breeders have recently begun using marker-assisted selection.
Linkage mapping has revealed the genetic basis of certain fruit and plant morphological traits in both intra-specific [
3] and inter-specific [
6,
7,
8] populations. In a pioneering attempt to apply a genome-wide association (GWA) approach, Ge et al. [
9] were able to identify some phenotype-genotype associations for eight fruit-related traits. The identification of quantitative trait loci (QTL) associated with several agronomic traits has been developed in eggplant, as has the improvement of genetic linkage map construction. For anthocyanin pigmentation, fruit morphology (weight, length, diameter, metabolic content, and shape), and prickleless, for example, several QTLs have been identified using an intraspecific F2 population and a 238-loci linkage map [
2,
3,
6,
10,
11]. However, when compared to other vegetable crops such as tomato and cucumber, the identification and characterization of QTLs and functional genes underlying important agronomic traits in eggplant has lagged significantly, owing in part to the lack of a genetic linkage map with high-density markers. So far, with the help of next-generation sequencing (NGS) technologies, four eggplant reference genomes have been published [
12,
13,
14,
15], which would greatly facilitate developing a large number of SNP markers for genetic map construction, resulting in improved efficiency of fine gene mapping.
GWAS is a powerful technique for deciphering the genetic basis of complex phenotypes by exploiting naturally occurring genetic variability [
16]. GWAS enables the detection of relationships between molecular markers and desirable traits with better mapping resolution than standard bi-parental populations and has been used to identify markers associated with desired traits in a variety of crops [
17,
18,
19]. GWAS involves an assessment of the population structure of the diversity panel to determine the genetic relatedness of individuals and rule out erroneous associations [
16,
20] and relies on the use of a sufficiently large number of markers. Recent advancements in next-generation sequencing technology and SNP genotyping have given breeders more tools for characterizing genetic variation at high resolution and selecting desired traits when developing new varieties.
Therefore, the purpose of this study was to characterize the phenotypic features of eggplant germplasm and identify SNP markers associated with the agro-morphological traits. In this study, the GWAS panel included a total of 288 eggplant germplasms from different species and significantly associated SNP markers for some agro-morphological features were identified.
2. Results
2.1. Phenotypic Variation and Correlations of Eggplant Core Collection
The eggplant resources (587) collected from 50 countries, including 80 resources in the Philippines, 44 resources in China, and 16 resources in Korea, were used for establishing the core collection. A total of 288 eggplant resources were selected from 587 Solanum accessions based on 52 SNP markers together with agro-morphological traits. The available phenotype data for 17 traits was included in the selection of a representative core collection because the core sets selected using only genotype data could not represent the diversity of the entire collection, presumably due to limitations in the number of SNP markers used. This core collection was further used for a genome-wide association study.
Phenotypic characterization of 17 qualitative and quantitative agro-morphological traits was performed (
Table 1 and
Table 2) for 288 germplasms. Of the eggplant collections evaluated, 260 accessions (90.28%) had an intermediate growth habit, 15 (5.21%) had an upright growth habit, and 13 (4.51%) had a prostrate growth habit. Most eggplants lacked anthocyanin pigmentation on the hypocotyl and fully developed stems. Also, the majority of the eggplant accessions had no prickles on the stem, leaf, or calyx. Regarding flower size, 33 (11.46%) accessions had small (2 cm) flowers, 250 (86.81%) had medium (2–3 cm) flowers, and 5 (1.74%) had large flowers. Flower colors were purple (61.11%), light purple (21.18%), white (16.32%), and white and purple (mixed) (1.39%). The predominant immature fruit colors of eggplant germplasm were green and purple with 38.19% and 36.11%, respectively. As for fruit color at maturity, purple (40.28%) and green (22.57%) were the two most common colors among eggplant germplasm. The majority of eggplant germplasms had light brown (tan) and yellow fruit at harvest (47.22% and 37.50%, respectively).
Table 2 presents the minimum, maximum, averages, and standard deviations of quantitative agro-morphological data for 288 eggplant core collections. The plant height of eggplants ranged from 13.20 cm to 210 cm. The average plant height, fruit width, fruit length, days to flowering, and days to maturity were 87.76 cm. 5.77 cm, 16.80 cm, 110, and 156 days, respectively (
Table 2).
The correlation between agro-morphological characteristics is shown in
Figure 1. Anthocyanin pigmentation of the hypocotyl and stem showed a positive correlation (r = 0.23 ***). Of 288 eggplant germplasm samples, 58 had pigmented hypocotyls and 230 did not. Similarly, a large number of accessions (202) lacked anthocyanin pigmentation on the stems, whereas the remaining 86 accessions had pigmented stems. There was a significant, strong positive correlation between stem prickles and leaf prickles (r = 0.83***). The majority of eggplant genetic resources did not have prickles on the stems (266 germplasms) or leaves (257 germplasms). A strong positive correlation (r = 0.61***) was found between days to flowering and days to maturity. As shown in
Figure 1, the agro-morphological traits were grouped into four main clusters according to the correlation coefficient values. The first cluster (I) comprised five agro-morphological traits; fruit color at harvest, stem prickles, leaf prickles, days to flowering and days to maturity. There was a strong positive correlation among traits within the first cluster. The second cluster included flower color, immature fruit color, flower size, and fruit shape. The correlation within the second (II) cluster was positive and moderate. The third cluster (III) contained hypocotyl anthocyanin, calyx prickles, fruit length and fruit width, whereas the fourth cluster (IV) comprised stem anthocyanin, mature fruit color, growth habit and plant height. There was a moderate to high negative correlation between the traits of clusters I and II. The agro-morphological traits of clusters I and III had a weak positive to weak negative correlation, whereas clusters I and IV had a weak positive to moderate negative correlation. The correlation between clusters II and III traits was moderate positive to weak negative.
Principal component analysis (PCA) plot was generated using the phenotypic data of 288 eggplant accessions (
Figure 2). The first five PCs explained 57.6% of the total variance. PC1 accounted for 22.2% of total phenotypic variation. Stem prickles, immature fruit color, flower size, fruit shape, and flower color were the top five contributors of agro-morphological-related traits to PC1. Meanwhile, PC2, which was primarily associated with calyx prickles, hypocotyl anthocyanin, stem prickles, and flower color, explained 11.2% of the total variance. The positively and negatively correlated agro-morphological traits and the corresponding individual eggplant genetic resources are visualized in
Figure 2A,B. The fruit color at harvest (L) was positively correlated and showed a wide distance from other variables (
Figure 2A) and most of the germplasm (
Figure 3) corresponded with fruit harvest color (code: 363, 155, 467, 349, 341, 504, etc.) had red-colored fruits at the ripening stage.
2.2. Genotyping-by-Sequencing and SNP Calling
The GBS library was constructed from 288 eggplant accessions and sequenced using the Illumina Hiseq 2000 platform (Illumina, Madison, WI, USA) and generated approximately 2.2 billion reads with an average mapping depth of 25.41× for a single accession.
Table 3 and
Table 4 present a summary of these sequencing results. The summary of the reference genome, including chromosome length (bp), number of transcripts, transcript length (bp), and CDS length (bp) for each chromosome is presented in
Supplementary Table S1. The genotyping of the eggplant core collection detected 1,859,683 SNPs covering 12 chromosomes. A total of 114,981 SNPs were obtained after filtering the frequency of minor alleles (>5%) and missing data (<30%) (
Table 5). The number of SNPs retained on each chromosome is presented in
Figure 3.
2.3. Population Structure and Phylogenetic Tree Analysis
The population structure of the 288 eggplant genetic resources was inferred using STRUCTURE (v. 2.3.4) software (Pritchard et al., 2000). Admixture model-based simulations were carried out by varying K from 1 to 10 with 10 iterations. The estimated likelihood (lnP (D)) was greatest for K = 3 (
Supplementary Figure S1), suggesting the presence of three main populations in the eggplant genetic resources panel (
Figure 4). The PCA and DAPC of the eggplant population were analyzed and presented in
Figure 5A,B. The PCA showed that the first three components comprised approximately 71.6% of the total variation and allowed the population to be categorized into three groups. The first PC comprised 45%, whereas the second and third comprised 24% and 2.6%, respectively. The eggplant genetic resources population was divided into three groups (blue, red, and green) as presented in the PCA and DAPC.
Supplementary Table S2 contains information on the Admixture groups. The neighbor-joining (NJ) analysis of the entire population (288 eggplant accessions) is presented in
Figure 6. As shown in the phylogenetic tree, many clusters were formed based on 114,981 SNPs.
2.4. Genome-Wide Association Analysis
A genetic association study was conducted to identify SNPs associated with qualitative and quantitative agro-morphological traits. The GWAS results of 17 agro-morphological traits were visualized in Manhattan (
Figure 7) and QQ plots (
Supplementary Figure S2). Among the 17 agro-morphological traits, significantly associated SNPs were found for six traits (
Supplementary Table S3 and
Figure 7). The Bonferroni-corrected threshold (-log
p > 6.34) was used as a cut-off to identify marker-trait associations. A total of 377 significant SNPs associated with six agro-morphological traits were identified. These six traits (number of SNPs) were: days to maturity (51), flower size (121), fruit width (20), harvest fruit color (42), leaf prickles (38), and stem prickles (105). All SNPs significantly linked to six agro-morphological traits are presented in
Supplementary Table S3. Among the significantly associated SNPs, the top 10 SNPs based on the log10
p-value for six agro-morphological traits are presented in
Table 6. The largest fraction of significant SNPs (11.94%) was found on Ch01, followed by Ch07 and Ch06 with 11.67% and 10.08%, respectively. The smallest fraction of significant SNP markers (4.24% with 16 SNPs) was found on Ch12 for days to maturity (two), flower size (seven), fruit color at harvest (two), leaf prickle (one), and stem prickle (four). Except for Ch07 and Ch11, SNPs that were significantly associated with leaf prickles were found on all chromosomes.
The number of significant SNPs associated with leaf prickles were seven on Ch02, six on Ch01 and Ch05, four on Ch04 and Ch06, three on Ch03, Ch08, and Ch10, and one on Ch09 and Ch11. Following flower size, the second highest number of significantly associated SNPs were found for stem prickles and located across all 12 chromosomes. The numbers of significantly associated SNPs with stem prickle found on Ch01, Ch08, Ch07, Ch10, and Ch05 were 14, 12, 11, 11, and 10, respectively. A relatively high number of significantly associated SNPs (121) were found for flower size across all 12 chromosomes. Of these SNPs, 15 were on Ch01, 14 on Ch04, 13 on Ch06 and Ch07, 12 on Ch03, and 11 on Ch10 and Ch11. Regarding fruit width, significantly associated SNPs were found only on a few chromosomes: Ch01 (three), Ch02 (one), Ch04 (one), Ch05 (two), Ch07 (nine), Ch09 (three), and Ch11 (one). Among the nine SNPs associated with fruit width located on Ch07, two of them were located in the intergenic region, and the other two were on genes that encode proteins with unknown functions. Among the SNPs associated with harvest color, seven were on Ch06, five on Ch01 and Ch07, and four on Ch03, Ch05, Ch08, and Ch10. Also, two SNPs were located on chromosomes Ch02, Ch09, Ch11, and Ch12. One SNP associated with harvest color was found in a gene that encodes sbt3, a subtilisin-like protease SBT3. Significantly associated SNPs with days to maturity were found on all chromosomes. Eight SNPs were located on Ch08, seven on Ch10, and six on Ch03 and Ch07 each. Relatively few SNPs associated with days to maturity were found on Ch05, Ch11, Ch01, Ch09, Ch12, and Ch02.
3. Discussion
The genetic diversity of plant genetic resources (PGRs), which provide useful alleles linked to plant development and improvement, is critical for the conservation and utilization of germplasm conserved in a gene bank [
21,
22]. DNA molecular markers provide valuable information for analyzing genetic diversity, genetic relationships, population structure, and core collections in a variety of crop species [
23,
24,
25,
26,
27,
28]. Representative core collections have been selected in various crops using different sampling strategies and clustering methods [
29,
30,
31,
32,
33,
34]. The M strategy was reported to be a useful approach for selecting a core collection with high genetic diversity and a reasonable size [
32]. In this study, a representative core collection was established by selecting 288 eggplant resources from 587
Solanum accessions for efficient germplasm management and further studies. The greater the genetic diversity of germplasm, the greater the likelihood of success in breeding desirable traits. Studying and understanding the association of agro-morphological trait variations with genetic variable sites may assist in the selection and transformation of desirable traits to develop new cultivars through breeding programs. Diverse agro-morphological variations (fruit and leaf) of eggplant germplasm were found in previous studies [
35,
36,
37]. Similarly, in this study, eggplant genetic resources collected from different countries possessed diverse agro-morphological characteristics. The correlation between agro-morphological traits was estimated and a strong positive correlation was observed between some agro-morphological traits such as stem prickles and leaf prickles, days to flowering and days to maturity, and immature fruit color and mature fruit color.
SNP markers are regarded as potentially promising breeding tools for use in genetic mapping and marker-assisted selection since they can be scored in parallel experiments at a low cost [
38]. SNP markers were utilized in this study to assess population structure metrics, phylogenetic trees, and marker-trait associations. The phylogenetic tree analysis was conducted, and the evolutionary relationships among germplasm were based on the SNPs presented in this study. Population structure and kinship analysis allowed the clustering of eggplant germplasm into three broad groups. The majority of the germplasms used in this study (240 germplasms) belonged to
S. melongena. Population 1 (Pop1) and 2 (Pop2) were mainly germplasm belonging to
S. melongena, and a few unknown (
S. spp.) species were also clustered. As presented in the PCA and DAPC, the first two clusters did not separate from each other entirely. A few germplasms from one to five genotypes belonging to other species were clustered in Group 3 (42 germplasms). The possibility of genetic material hybridization (naturally or via breeders) and migration of genetic resources from place to place could be the reason for creating subpopulations within the same species.
Genome-wide association studies have proved its efficiency in finding genomic regions linked with economically important agronomical features in several crops, including wheat [
39,
40,
41,
42], eggplant [
36], potato [
43], and soybean [
44,
45]. There are important agro-morphological traits to be improved in eggplant, including the development of prickleless varieties. Although prickly varieties are preferred in some areas due to their perceived improved organoleptic quality, prickles are generally regarded as undesirable since they can puncture the skin of the fruits and are problematic during harvesting and storage [
46]. Previous research on raspberry and blackberry prickles has revealed that they are epidermal tissue outgrowths of modified glandular trichomes (GTs); once the outermost cells become lignified, lignification continues inward and downward until the prickles become completely lignified and thus mature [
47,
48]. A phenotypic assessment of prickles in
Solanum viarum Dunal indicated that they may be initiated by GTs or triggered by GT-derived signals [
49]. Transcriptome studies in raspberry and
S. viarum revealed several transcription factors (TFs) that may be involved in prickle development [
49,
50]. In this study, three SNPs in three transcription factor genes (Trihelix transcription factor ASIL2, Probable WRKY transcription factor 35, and Probable transcription factor At5g28040) were found to have a significant association with stem prickles. One of the three SNPs was linked to both leaf and stem prickles. This SNP was located on Ch01 (14404622 bp) in a transcription factor gene (Trihelix transcription factor ASIL2). The SNP that was located on Ch05 (2527410 bp) was majorly found in eggplant genetic resources that have prickles on the stem. Several QTLs for prickle have been found in eggplant on chromosomes 2, 6, 7, and 8 [
3,
8,
51,
52]. A recent work genetically located a
Pl locus on chromosome 6, and produced a 0.5 kb presence/absence variant marker for prickleless eggplant selection [
53].
Interestingly, one SNP on Ch01 was found to be strongly linked with fruit color at harvest and was situated in a gene that produces the acetylserotonin O-methyl transferase (ASMT) enzyme. ASMT was also involved in a variety of plant growth and development dynamics. ASMT is the final enzyme in melatonin biosynthesis and may have a rate-limiting role in plant melatonin production. Several studies in recent years have confirmed that tryptophan decarboxylase (TrpDC), tryptamine 5-hydroxylase (T5H), serotonin N-acetyltransferase (SNAT), and acetylserotonin-O-methyltransferase (ASMT) are involved in melatonin synthesis in plants [
54,
55]. Sun et al. found that an exogenous melatonin treatment promoted ripening and improved tomato fruit quality after harvest [
56]. Similarly, exogenous melatonin induced strawberry ASMT expression and accelerated strawberry fruit ripening via the ABA pathway [
57]. Melatonin-deficient ASMT rice, on the other hand, showed accelerated senescence in detached flag leaves as well as a significantly lower yield [
58].
In a previous study, it was indicated that the width and length of each flower organ affect the entire flower size [
59]. Also, another study showed flower disc diameter was positively correlated with disc area in sunflower [
60]. Among the total of 121 SNPs associated with flower size, 22 SNPs were found in the intergenic regions and others were in protein-coding genes with known (82 SNPs) and unknown (17 SNPs) functions. In this study, 20 SNPs significantly associated with fruit width were found. In a previous study, seven SNPs were identified on Ch01 (1), Ch02 (2), Ch03 (1), Ch09 (1), and Ch12 (1) that were linked with tomato fruit width (two) [
61]. Some of the most significantly associated SNPs with flower size were found in genes encoding pentatricopeptide repeat-containing protein At5g14770, probable histone chaperone ASF1A, Ultraviolet-B receptor UVR8, MACPF domain-containing protein At1g14780, G2/mitotic-specific cyclin-1, two-component response regulator ORR21, and adenosine triphosphatase (ARSA1 ATPase) (
Table 6).
The number of days needed until maturity is an important agronomic trait to determine and select early and late mature crops. The early flowering plant had a shortened maturity period as supported by a strong positive correlation of days to flowering and days to maturity (r = 0.64***). In previous studies, several SNP markers associated with days to maturity have been found in different crops, such as Kersting’s groundnut [
62]. In this study, a total of 51 SNPs were associated with days to maturity, and one SNP was located in a gene that codes for pentatricopeptide repeat-containing protein (PPR). Mutations in these PPR protein-coding genes lead to the dysfunction of mitochondria and/or chloroplasts, thereby resulting in growth retardation, pollen abortion, and seed development defects in plants [
63], indicating the important roles of PPR proteins in plant growth and development [
64]. As presented in
Table 6, some of the highly significantly associated SNPs with days to maturity were found in genes that encode DNA ligase 4 (LIG4) (Ch03 at 2.5 Mbp), PPL1 PsbP-like protein 1 chloroplastic (Ch03 at 8.8 Mbp), 4-coumarate--CoA ligase-like 5 (4CLL5) (Ch05 at 3.8 Mbp), Actin-7 (Ch05 at 4.0 Mbp), PHYC Phytochrome C (Ch07 at 126.0 Mbp), and PAL5 phenylalanine ammonia-lyase. DNA ligase enzymes perform crucial roles in DNA replication and repair processes by catalyzing the joining of adjacent polynucleotides [
65]. Eukaryotes have multiple DNA ligases with unique roles in DNA metabolism, with clear differences in the functions of DNA ligase orthologues in mammals, yeast, and plants. DNA ligase 4 (LIG4) is found in all eukaryotes and facilitates the final step in the DSB repair pathway known as non-homologous end joining (NHEJ) [
65]. Waterworth et al. [
66] studied the role of DNA ligases in seed germination in terms of vigor and viability after storage under suboptimal conditions, as seen in much of the developing world. The identification of DNA repair mechanisms critical for rapid germination and seed lifespan can help forecast seed lot storage and germination performance, and these DNA repair pathways represent prospects for crop development with improved seed storability and germination performance features [
66]. The other three SNPs were also found to be significantly associated with days to maturity and are located on Ch06 (9.7 Mbp) and Ch12 (2.6 Mbp and 9.3 Mbp) in genes that encode proteins with unknown functions (
Table 6).