Next Article in Journal
Genetic Diversity and Population Structure of Cacao (Theobroma cacao L.) Germplasm from Sierra Leone and Togo Based on KASP–SNP Genotyping
Previous Article in Journal
Improving Yield and Quality of ‘Balady’ Mandarin Trees by Using Shading Techniques and Reflective Materials in Response to Climate Change Under Flood Irrigation Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identification of Candidate Genes for Soybean Storability via GWAS and WGCNA Approaches

Key Laboratory of Soybean Biology in Chinese Education Ministry, Northeast Agricultural University, Harbin 150030, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Agronomy 2024, 14(11), 2457; https://doi.org/10.3390/agronomy14112457
Submission received: 28 September 2024 / Revised: 20 October 2024 / Accepted: 21 October 2024 / Published: 22 October 2024
(This article belongs to the Section Plant-Crop Biology and Biochemistry)

Abstract

:
Soybean (Glycine max (L.) Merr.) is an important crop for both food and feed, playing a significant role in agricultural production and the human diet. During long-term storage, soybean seeds often exhibit reduced quality, decreased germination, and lower seedling vigor, ultimately leading to significant yield reductions in soybean crops. Seed storage tolerance is a complex quantitative trait controlled by multiple genes and is also influenced by environmental factors during seed formation, harvest, and storage. This study aimed to evaluate soybean germplasms for their storage tolerance, identify quantitative trait nucleotides (QTNs) associated with seed storage tolerance traits, and screen for candidate genes. The storage tolerance of 168 soybean germplasms was evaluated, and 23,156 high-quality single nucleotide polymorphism (SNP) markers were screened and analyzed through a genome-wide association study (GWAS). Ultimately, 14 QTNs were identified as being associated with seed storage tolerance and were distributed across the eight chromosomes of soybean, with five QTNs (rs25887810, rs27941858, rs33981296, rs44713950, and rs18610980) being newly reported loci in this study. In the linkage disequilibrium regions of these SNPs, 256 genes were identified. By combining GWAS and weighted gene co-expression network analysis (WGCNA), eight hub genes (Glyma.03G058300, Glyma.04G1921100, Glyma.04G192600, Glyma.04G192900, Glyma.07G002000, Glyma.08G329400, Glyma.16G074600, Glyma.16G091400) were jointly identified. Through the analysis of expression patterns, two candidate genes (Glyma.03G058300, Glyma.16G074600) potentially involved in seed storage tolerance were ultimately identified. Additionally, haplotype analysis revealed that natural variations in Glyma.03G058300 could affect seed storage tolerance. The findings of this research provide a theoretical foundation for understanding the regulatory mechanism underlying soybean storage.

1. Introduction

Soybean (Glycine max (L.) Merr.) seeds contain approximately 40% protein, 20% oil, and 30% carbohydrate, making them highly valuable for food, feed, and processing applications. However, during prolonged storage, soybean seeds often experience a decline in quality and seedling viability and a reduced germination capacity, ultimately leading to significant yield losses in soybean crops [1]. The phenomenon of seed deterioration involves numerous biochemical and physiological changes, including the loss of enzymatic activities and membrane integrity and genetic alterations. However, the mechanisms by which the aging process affects soybean seed viability have not yet been fully elucidated [2]. Seed deterioration is greatly affected by the storage conditions, with high temperatures and high relative humidity being the two most important abiotic factors [3]. Different soybean varieties (or genotypes) exhibit varying levels of resistance to adverse storage conditions [4]. Furthermore, the external environment and genetic background significantly influence seed storability [5]. Thus, screening and breeding cultivars with resistance to adverse storage conditions is the most promising way to combat soybean seed deterioration in storage [6]. Seed storability is a complex quantitative trait controlled by multiple genes/quantitative trait loci (QTLs) [7], which is also affected by genotype and environmental factors during seed formation, harvest, and storage [8]. Studies have shown that the fatty acid composition in soybean seeds is one of the key factors determining seed viability. High fatty acid content and fragile seed coats can accelerate the reduction in seed viability [9]. The loss of soybean seed viability is also associated with phospholipase D activity. During the natural aging process of soybeans, phospholipase D causes the separation of the plasma membrane from the cell wall complex and the disintegration of oil bodies, ultimately reducing seed viability [10]. Jong et al. [11] conducted a comprehensive analysis using transcriptomics, proteomics, and metabolomics and identified 13 differentially expressed genes associated with seed storability in pathways such as photosynthesis and energy metabolism, sugar metabolism, lipid metabolism, protein post-translational modification, and storage protein accumulation. Significant progress has been made in the study of seed storage tolerance in plants such as Arabidopsis thaliana [12,13], rice [14,15], and maize [16,17], yet research on the mechanisms of soybean seed storage tolerance is very limited.
Advances in molecular marker technologies have provided an efficient means of identifying QTLs for seed storability, which accelerates breeding programs of cultivars with higher resistance to seed storability by using marker-assisted selection (MAS). Natural aging and artificial aging are two commonly used methods in the study of seed storability, with seed viability directly influencing the storability of seeds [18]. Singh et al. [19] identified four independent SSR markers that were significantly associated with seed storability under natural aging conditions through the F2:3 mapping population across two tested locations. Li et al. [20] detected six QTLs linked to seed storability on six chromosomes in rice utilizing a population of 182 backcross RILs derived from the Koshihikari/Kasalath cross under natural aging conditions. Reports indicated that 34 novel QTLs had been identified in three seed germination-related traits for assessing seed storability [21,22]. Wang et al. [23] found that under −20 °C storage and accelerated aging conditions, they respectively identified two and eight QTLs associated with soybean seed viability, yet no common QTLs were detected in RIL populations stored under both conditions. These identified QTLs were difficult to repeat in another mapping population for limited QTL resolution and lower density of molecular markers used in these studies.
A genome-wide association analysis study (GWAS) is a statistical method used to identify associations between specific traits and genetic variations. It is employed to detect correlations between millions of single nucleotide polymorphisms (SNPs) across the entire genome and specific phenotypes [24]. GWAS serves as an alternative to linkage analysis and is often utilized to identify markers associated with a particular trait by using linkage disequilibrium (LD) between alleles within natural populations, with no need for developing bi-parental populations [25]. Meanwhile, weighted gene co-expression network analysis (WGCNA) is a crucial approach for delving into the functional significance of key genes amidst the surge of data [26]. WGCNA is a bioinformatics method that leverages the interconnected nature of life activities, combined with high-throughput sequencing technology, to divide genes into multiple modules. It studies the biological relevance between co-expression modules and target traits, explores the association between gene networks and traits, and identifies hub genes [27]. WGCNA can cluster genes based on their expression profiles across different sample datasets, and through the identification of modules of co-expressed genes among various biological samples, it can further elucidate key genes that are highly correlated with traits and metabolomic data [28]. Recently, the combination of GWAS and WGCNA has been utilized in soybean research to identify genes associated with isoflavone accumulation in soybean seeds and oil content [29,30]. Nonetheless, no studies to date have applied GWAS and WGCNA to unravel the gene networks and molecular regulatory mechanisms that control soybean seed storability.
Soybean seed storability varies among different varieties and is involved in multiple pathways, controlled by multiple genes. To clarify the intrinsic mechanisms of soybean seed storability, refine the regulatory network of seed storability, and identify candidate genes for seed storage tolerance, GWAS was conducted for seed storability-related traits in soybean, using 23,156 SNPs and 168 accessions collected globally. WGCNA analysis was performed using transcriptomic data from 15 soybean seeds with high storability and 15 seeds with low storability. This study employed a combined strategy of GWAS and WGCNA analysis to identify the putative regulatory genes controlling soybean seed storability and to search for superior haplotypes associated with the storage tolerance of soybean seeds.

2. Materials and Methods

2.1. Plant Materials and Evaluation of Germination Test

We collected 168 soybean germplasm accessions globally to construct a phenotypic diversity association panel. The associated population was composed of soybean germplasm resources with significant storage differences from around the world. The origins and quantities of the soybean germplasm from each country were as follows: China (152), America (11), Japan (2), Belgium (1), Germany (1), and Myanmar (1), including 36 local varieties and 132 improved varieties (Table S1). Subsequently, the germination-related traits of the 168 varieties were evaluated. These varieties were sown and harvested in 2014, 2015 and 2016, stored in cold storage, and their germination-related traits were measured in 2020. According to the methods described by Zhang et al. [7,31], 100 seeds from each cultivar were germinated in filter paper rolls wrapped with plastic film in a 25 °C germinator. This process was repeated three times. Data from the same inter-annual period were averaged, and descriptive statistical analysis of the phenotypic data was performed using Microsoft Excel v2021 (Table S1). Germinability (GA) was recorded on the fifth day, germination rate (GR) on the eighth day, and the dry weight of individual seedlings (S) on the eighth day. The germinability index (GI) and vitality index (VI) were calculated using GI = ∑(Gt/Dt) and VI = GI × ∑S [21,32], where Gt was the number of germinated seeds on Day t, and Dt was the time corresponding to Gt in days.

2.2. SNP Genotyping Data Collection

Genomic DNA from each tested accession was isolated using the SDS protocol [33], and partial sequencing was performed based on the specific locus amplified fragment sequencing (SLAF-seq) methodology. Two restriction enzymes, Mse I (EC 3.1.21.4) and Hae III (EC: 3.1.21.4) (Thermo Fisher Scientific Inc., Waltham, MA, USA), were used to produce more than 50,000 sequencing tags of 300–500 bp sequencing tags from each tested accession based on preliminary analysis of the reference genome. The sequencing tags obtained from each accession were distributed in unique genomic regions of the 20 soybean chromosomes and were then used to construct the sequencing libraries. SOAP2 software version 2.21 was utilized to align the 45-bp sequence reads at both ends of the sequencing tags from each accession library to the soybean reference genome [34,35]. The raw reads from the same genomic position were used to define the SLAF groups, utilizing more than 58,000 high-quality sequencing tags from each tested sample. The minor allele frequency (MAF) threshold value (≥0.04) was used to identify SNPs. The genotype was regarded as heterozygous when the ratio of the minor allele depth to the total depth of the sample was ≥(1/3).

2.3. Population Structure Evaluation and Linkage Disequilibrium (LD) Analysis

The principle component analysis (PCA) program of GAPIT software version 3.0 was performed to calculate the population structure of the association panel [36,37]. TASSEL 3.0 was used to calculate LD with SNP (MAF > 0.05 and missing data < 10%) and r2 (squared allele frequency correlations) [38]. In contrast to the GWAS, missing SNP genotypes were not imputed with the major allele before LD analysis. The parameters in the program included MAF (>0.04) and the integrity of each SNP (>80%).

2.4. Association Analysis and Candidate Gene Prediction and Annotation

Using 168 accessions collected globally and 23,156 SNPs, an association signal for seed storage-related traits was detected. p-values were estimated at 0.05 level (≤1.58 × 10−4) based on the Bonferroni method, and further were set as the threshold to declare the significant association signals [39]. Genomic regions 100 kb upstream and downstream of each significant SNP were selected as candidate genes and annotated using the soybean reference genome (Wm82. a2. v1, http://www.soybase.org) (accessed on 12 June 2024). The GO (https://www.geneontology.org/) (accessed on 13 June 2024) enrichment analysis was conducted based on the SoyBase database. The KEGG (https://www.genome.jp/kegg/) (accessed on 14 June 2024) database was utilized for conducting pathway enrichment analysis of candidate genes. Using the SoyMD online platform (https://yanglab.hzau.edu.cn/SoyMD/#/) (accessed on 2 July 2024), we predicted the expression levels of candidate genes during seed development at various dynamic stages of grain development.

2.5. Metabolomic and Transcriptomics Data Processing and Analysis

In this experiment, 30 soybean varieties were selected, including 15 storage-tolerant (average vitality index: 209.80 g–563.42 g) and 15 storage-intolerant (average vitality index: 1.64 g–43.62 g) soybean varieties (Table S1). Non-targeted metabolomics was conducted on the seeds of these 30 soybean varieties, resulting in the detection of 11,672 metabolites (Table S4). After normalizing the metabolite data, metabolites with an average difference in content and a maximum difference of ≥1.5 between the storage-tolerant and storage-intolerant varieties were selected as differential metabolites for analysis. These differential metabolites were then annotated and analyzed in the KEGG database (https://www.kegg.jp/) (accessed on 3 July 2024) [40].
Total RNA was extracted from soybean seeds using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) [41]. The purity and concentration of the RNA samples were assessed, and cDNA was synthesized from mRNA using the ReverTra Ace® qPCR RT Master Mix (TOYOBO, Life Science Department, Okayama, Japan). The cDNA library was constructed on the Illumina HiSeq sequencing platform [42]. High-quality reads were aligned to the reference genome (Glycine max Wm82.a2.v1) using Hisat2 v2.0.5 software [43]. Transcripts per million (TPM) values were used for gene/transcript-level quantification. The RNA-Seq data was accessible: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1139955 (accessed on 12 August 2024).

2.6. Weighted Gene Co-Expression Network Analysis

WGCNA was performed using transcriptomic data from 30 soybean varieties, including 15 storage-tolerant and 15 storage-intolerant soybean varieties. WGCNA was conducted using R software version 4.2.2 and the WGCNA package to construct a weighted gene co-expression network [26,44]. The soft thresholding was determined based on the scale-free network distribution principle. The threshold parameter β value was selected when the fitting curve first approached 0.9 to create an adjacency matrix [30]. The topological overlap matrix (TOM) similarity function was utilized to transform the adjacency matrix into a TOM, and a gene connection network was constructed [45]. Finally, gene modules were generated based on high gene significance (GS, association between gene expression and traits) and module membership (MM, correlation between gene expression and module eigengene) [29]. These gene modules were clustered using the dynamic cutoff method, and the co-expression network was visualized using the Cytoscape version 3.9.0 software package [46].

3. Results

3.1. Relates Traits of Seed Vigor in Soybean

The germination-related traits (GA, GR, GI, and VI) of 168 soybean cultivars stored over different years were analyzed for variation (Table S1). The phenotypic data for germination-related traits of soybean stored in different years ranged as follows: germinability (GA) ranged from 0 to 85.67%, germination rate (GR) ranged from 7.67% to 92.67%, germinability index (GI) ranged from 2.71 to 95.93, and vitality index (VI) ranged from 1.5 g to 580.26 g. The average values for GA, GR, GI, and VI were calculated based on the mean across different years, resulting in averages of 28.65%, 39.37%, 20.7, and 144.34 g, respectively (Table 1 and Figure 1). The average coefficients of variation traits were, respectively, 63.17, 49.57, 56.13, and 95.4, and no significant differences were found for the association panel. No significant skewness and kurtosis were observed, which showed a continuous variation and approximately normal distribution (Table 1 and Figure 1). The results indicated that the variation in seed-vigor-related traits in soybean aligned with the genetic characteristics of quantitative traits, demonstrating extensive variation within the natural population examined in this study.

3.2. Distribution of Markers and Analysis of Mapping Population

Based on a minor allele frequency > 0.05 and missing data of less than 0.03, a total of 23,156 SNPs were developed on all 20 chromosomes of the soybean genome. The SNPs were unevenly distributed on the 20 chromosomes, with the most on Chr.18 and the least on Chr.11, covering a total of 947.89 Mbp of the soybean genome. The average marker density was 42.13 SNPs per Mbp (Figure 2). Based on these SNPs, principal component analysis (PCA) and kinship analysis were conducted for the association panel, which showed that the first principal components (PCs) could explain 13.57% of the whole genetic variation, and the genetic correlation between different tested varieties was lower (Figure 2).

3.3. Quantitative Trait Nucleotide (QTN) Associated with Soybean Seed-Vigor-Related Traits by GWAS

The compressed mixed linear model (CMLM) through the R package was used to identify association signals for germination-related traits. In this study, a total of 25, 27, 25, and 88 quantitative trait nucleotides (QTNs) were identified on 20 chromosomes, associated with GA, GR, GI, and VI, respectively, over the span of three test years (Figure 3). Among these, 14 QTNs were consistently detected in more than two environments (Table 2). Notably, one QTN, rs46769428, located on Chr.08, coincided with a site previously reported to be associated with storage tolerance in soybeans. Additionally, eight QTNs (rs8468280 in Chr.03, rs12451980 and rs46324094 in Chr.04, rs249786 and rs5985722 in Chr.07, rs14369289 and rs7372359 in Chr.16, and rs42627530 in Chr.19) were found to be in close proximity to loci that have been reported to be associated with storage tolerance. The remaining five QTNs (rs25887810, rs27941858, rs33981296, rs44713950, and rs18610980) are reported for the first time in this study as being associated with storage tolerance in soybean seeds. Evaluating the different alleles and their effects at the identified QTNs revealed that these alleles significantly influenced the germination-related traits of the tested varieties (Table 2). Consequently, these alleles can be used to develop markers that effectively assist in the selection of storable soybean varieties.

3.4. Gene Enrichment Analysis of Candidate Genes

Genes responsible for storage tolerance were selected according to the LD decay distance and the 200-kbp flanking regions in each peak SNP of 14 associated QTNs. A total of 317 genes were included in these regions, among which, 53 genes had no functional annotations and 11 genes had unknown functions (Table S2). The remaining 253 genes were potentially involved in regulating the storage tolerance of soybean seeds (Table S2). These were related to catalytic activity, biological regulation, cellular process, cell part, protein-containing complex, and other functions (Figure 4). Among the identified candidate genes, Glyma.19G163900 and Glyma.19G164100 were types of ethylene response factor located near rs42627530 on Chr.19. These genes were capable of binding to dehydration response elements and integrating hormone signals such as ethylene, abscisic acid, and jasmonates, regulating the plant’s defense responses to abiotic stress [48]. Glyma.16G073100, located near rs7372359 on Chr.16, was a monodehydroascorbate reductase. Research reported that the monodehydroascorbate reductase could negatively regulate the accumulation of ascorbic acid in plant tissues [49], and overexpression of monodehydroascorbate reductase enhances plant tolerance to ozone, salt, and drought, thus improving seed storability [50]. Additionally, Glyma.16G071800, near rs7372359 on Chr.16, belongs to the oleosin family of proteins. Research has shown that oleosins increased seed viability during overwintering by preventing the abnormal fusion of oil bodies [51].

3.5. Identification of Key Modules Possessing Candidate Genes via WGCNA

To further identify novel genes regulating soybean seed storage tolerance, we selected 15 varieties with high storability and 15 varieties with low storability. Subsequently, transcriptomic and metabolomic analyses were performed on these 30 soybean varieties, resulting in the detection of 55,589 genes and 11,672 metabolites. After normalizing the metabolomics data, a total of 15 metabolites were produced, including 10 known metabolites and 5 unknown metabolites. Metabolome analysis identified 10 differential metabolites, including neg01773 (10-hydroperoxy-8E, 12Z-octadecadienoic acid), which is associated with linoleic acid metabolism, pos00019 ((−)-Epicatechin), which is involved in the flavonoid biosynthesis, and pos00350 (Delphinidin 3-glucoside), which participates in anthocyanin biosynthesis (Table S3). Subsequently, WGCNA was conducted using the transcriptomic data and the 10 differential metabolites, resulting in the identification of 14 co-expression modules and 14,965 genes, with the ‘lightcyan’ module containing the highest number of genes (5055) and the ‘darkseagreen3’ module the fewest (40) (Figure 5).
To analyze the significant association between differential metabolites and modules, a correlation analysis was conducted and a heatmap was generated (df = 29) (Figure 5). The results showed that the ‘coral1’ module had a significant positive correlation with the differential metabolite neg01773 (10-hydroperoxy-8E,12Z-octadecadienoic acid) (r = 0.63, p = 2 × 10−4). Additionally, the ‘darkseagreen3’ module exhibited a significant negative correlation with the differential metabolite pos00678 (Quercetagetin 3′-methylether 7-glucoside) (r = 0.008, p = −0.47). Based on the absolute value of the correlation between modules and significant differential metabolites, modules with a significant association (p ≤ 0.01) were selected, including the ‘coral1’ module, ‘darkseagreen3’ module, ‘darkviolet’ module, ‘brown2’ module, and ‘darkgreen’ module. It was speculated that the genes within these modules were related to the storability of soybeans.
To understand the biological significance of the co-expression networks and further determine the relationship between genes within the modules and seed storage tolerance, KEGG enrichment analysis was conducted on the candidate genes within the ‘coral1’, ‘darkseagreen3’, ‘darkviolet’, ‘brown2’, and ‘darkgreen’ modules (Figure 6). The results revealed that these genes were significantly enriched in metabolic pathways such as photosynthesis, carbon metabolism, biosynthesis of amino acids, spliceosome, glycolysis/gluconeogenesis, and ribosome biogenesis in eukaryotes. Subsequently, GO enrichment analysis was performed on the genes within the modules (Figure 6). The most significant annotation terms were identified in the biological process category, with the top five enriched terms related to photosynthesis (GO:0042440, GO:0009765, GO:0019684), generation of precursor metabolites and energy (GO:0006091), and ribosome biogenesis (GO:0042254).

3.6. Prediction of Candidate Genes for Storage Tolerant Traits

To obtain key genes for regulating seed storability, we combined the results of GWAS and WGCNA screening and ultimately identified eight candidate genes for storability, including Glyma.03G058300 (located near rs8468280 of Chr.03) from the ‘coral1’ module, Glyma.07G002000 (located near rs249786 of Chr.07) from the ‘darkviolet’ module, and Glyma.16G091400 (located near rs14369289 of Chr.16), Glyma.16G074600 (located near rs7372359 of Chr.16), Glyma.04G192100 (located near rs46324094 of Chr.04), Glyma.04G192600 (located near rs46324094 of Chr.04), Glyma.04G192900 (located near rs46324094 of Chr.04) and Glyma.08G329400 (located near rs44713950 of Chr.08) from the ‘darkgreen’ module (Figure 7). To analyze the expression patterns of these candidate genes, SoyMD used data from soybean sequences. Based on soybean RNA sequence data, two genes (Glyma.03G058300, Glyma.16G074600) were detected to be highly expressed in seeds, suggesting that these two genes may be associated with soybean storability (Figure 7).
In order to determine the role of candidate genes in the storage tolerance of soybean, haplotype analysis of the candidate genes was conducted using the GLM method based on gene association (Figure 8). The results revealed 14 SNPs in the Glyma.03G058300 gene and 17 SNPs in the Glyma.16G074600 gene (Table 3). Among these, the Glyma.03G058300 gene was found to be associated with seed storability, while the Glyma.16G074600 gene was found to show no significant correlation with seed storability.

4. Discussion

Soybean seeds, rich in vegetable oil and plant protein, are prone to deterioration under high temperatures and high relative humidity, resulting in reduced germination ability and seedling vigor, thereby affecting yield [52]. Different soybean cultivars (or genotypes) exhibit significant resistance levels to adverse storage conditions [53]. Therefore, screening and breeding cultivars with resistance to adverse storage conditions is the most promising way to combat soybean seed deterioration in storage. Currently, the storage tolerance of soybean seeds is mainly based on the germination rate of seeds [54,55]. Therefore, 168 soybean sources stored in different years were identified based on the germination rate and related characteristics. Among them, only three cultivars showed high germination rates in different storage years. Thus, these three sources may be potentially utilized in future breeding.
Seed storability is a complex quantitive trait controlled by multiple genes/QTLs [56]. In particular, soybean seed traits are considered complex due to multi-gene control and are highly influenced by the environment and hence difficult to manipulate. QTLs for storage-related traits have been identified in plants such as Arabidopsis thaliana [8,57], rice [58], maize [59], barley [60], wheat [61], and tomato [62], but there has been limited research on soybean. In the research on QTLs for soybean storage tolerance, a total of 17 QTLs associated with seed storability were identified and distributed across 15 chromosomes [21,23], but no overlapping QTL regions were observed among these QTLs [61,63]. These QTLs were detected through seed storage indicators such as germination rate, seed coat permeability, and electrical conductivity. In this study, based on earlier obtained phenotypic data and 23,156 SNPs, a total of 14 QTLs were detected to be associated with the storage tolerance of soybean seeds. Among them, rs46769428 largely overlapped with Satt538 detected by Singh R K, et al. [47]. Nine loci, including rs8468280, overlap or are in close proximity to the QTLs for seed storage tolerance traits identified by previous researchers [64,65,66]. The remaining four QTLs (rs25887810, rs27941858, rs33981296, and rs18610980) were new loci reported in this study. Genotype analysis commonly used SSR and other low-density molecular markers, limiting QTL localization accuracy [67,68]. Meanwhile, genotyping techniques represented by high throughput sequencing can greatly improve the detection ability of high-density molecular marker SNPs, which can ensure the fine mapping of storage tolerance genes and the identification of candidate genes in soybean seeds [69]. This study could lay a foundation for revealing the genetic mechanism of storage tolerance and breeding utilization potential of soybean seeds [70].
WGCNA is an effective technique for classifying transcriptomic data into co-expression modules to reduce the number of potential candidate genes. The experiment in this study utilized metabolomic data to screen out differential metabolites, and through enrichment analysis, it was found that metabolites involved in the linoleic acid metabolism, anthocyanin biosynthesis, and flavonoid biosynthesis pathways were selected. The storability of soybean seeds was influenced by various factors. Naflath et al. [71] reported that the storability of seeds was related to seed coat color and lignin content. Yu et al. [72] used high-throughput transcriptome sequencing to screen for genes in high- and low-protein and oil lines and found that these were associated with the accumulation of substances during seed storage and seed growth and development. Lee et al. [73] found that the concentration of flavonoid metabolites in seeds significantly decreased with an increase in storage time. Therefore, the differential metabolites selected in this study could accurately excavate the genes related to soybean seed storability. In this study, out of 14 modules, 5 modules showed a significant correlation with the levels of six differential metabolites. The ‘coral1’ module had a highly significant positive correlation with the metabolite neg01773 (10-hydroperoxy-8E,12Z-octadecadienoic acid) in the linoleic acid pathway. The genes present in these modules were involved in carbon metabolism and the production of precursor metabolites and energy. In summary, the identification of genes participating in these modules provided new genetic resources for a better understanding of soybean seed storability.
After seed aging, seed viability and germination levels decrease, at which point the soybean will trigger a series of protective mechanisms to prevent loss. These include oxidation of cellular biomolecules [74], DNA repair [75,76], protein denaturation [77], and so on.
Although GWAS analysis can identify the relationship between SNPs and traits, it may not accurately pinpoint candidate genes. Therefore, a combined analysis strategy using both GWAS and WGCNA can enhance the identification of candidate genes. In this study, a total of eight overlapping candidate genes were identified, with Glyma.03G058300 and Glyma.16G074600 showing high expression levels in seeds, suggesting their potential association with seed storability. Additionally, this study provides SNP markers for breeding seed storability. In this research, the Glyma.03G058300 gene was found to have two haplotypes in the exon and upstream regions, with the superior haplotype exhibiting a higher vitality index level. The results indicate that the Glyma.03G058300 gene may be valuable for molecular-assisted selection (MAS) of soybean seed storability.
Vijayakumar et al. [78] found that there are differences in the storage tolerance of soybean seeds among different varieties. Seed storability is regulated through multiple pathways and is controlled by multiple genes. To date, no studies have applied GWAS and WGCNA to elucidate the gene networks and molecular regulatory mechanisms controlling the storability of soybean seeds. Based on seed germination rate and other traits, this study combined GWAS and WGCNA, and through expression pattern analysis and haplotype analysis, the candidate gene Glyma.03G058300 was ultimately identified. Glyma.03G058300 encodes the cation exchanger 3 protein (CAX3). CAX3 belongs to the cation exchanger (CAX) gene family, which plays a crucial role in plant growth, development, and responses to biotic and abiotic stresses. CAX3 is capable of actively transporting cytosolic Ca2+ into the vacuole, thereby helping to maintain the balance of intracellular calcium ion concentration. Studies have reported that in soybeans, the CAX3 encoded by the Glyma.13G343300 gene can form complexes with CAX1 and play a role in various processes, including phosphate homeostasis and heavy metal tolerance [79,80]. Park et al. [81] reported that the expression of H+/Ca2+ transporters in tomatoes could increase calcium content and extend seed shelf life, serving as an alternative to the application of CaCl2 for prolonging the shelf life of agricultural products. However, the functional mechanism of Glyma.03G058300 in regulating the storability of soybean seeds needs further study.

5. Conclusions

Recent research suggests that using natural populations for GWAS analysis was an effective strategy for identifying candidate genes in soybeans. By integrating GWAS and WGCNA, along with expression pattern analysis, two candidate genes that play a role in seeds were ultimately determined. Among them, the natural variations present in Glyma.03G058300 suggested that it affected the storability of soybeans. Functional analysis of Glyma.03G058300 would provide new insights into the mechanisms underlying soybean seed storability.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agronomy14112457/s1, Table S1: Evaluation of storage tolerance traits; Table S2: Gene models in the flanking regions of peak SNPs; Table S3: Differentially metabolized compounds related to soybean storage tolerance. Table S4: Metabolome data for the 30 varieties studied.

Author Contributions

Conceptualization, X.W., Y.W. and Z.Y.; methodology, J.X.; software, H.L.; validation, H.L., Y.L. and W.T.; formal analysis, J.X. and Z.Y.; investigation, Y.L.; resources, X.Z.; data curation, Y.H.; writing—original draft preparation, X.W. and Y.W.; writing—review and editing, Y.Z.; visualization, W.T.; supervision, X.Z. and Y.Z.; project administration, Y.H.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This study was conducted in the Key Laboratory of Soybean Biology of the Chinese Education Ministry, Soybean Research & Development Center (CARS), and the Key Laboratory of Northeastern Soybean Biology and Breeding/Genetics of the Chinese Agriculture Ministry and was financially supported by National Key Research & Development Project (2021YFD1201103), Key Laboratory of Soybean Mechanized Production, Ministry of Agriculture and Rural Affairs, People’s Republic of China (Grant No. SMP202206), the Chinese National Natural Science Foundation (31971967, U22A20473), the Youth Leading Talent Project of the Ministry of Science and Technology in China (2015RA228), the National Ten-thousand Talents Program, Postdoctoral Fund in Heilongjiang Province (LBH-Q20004), and a national project (CARS-04-PS06).

Data Availability Statement

The RNA_Seq data (ID: PRJNA1139955) is accessible: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1139955 (accessed on 12 August 2024). All figures and data are included in the manuscript and Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Pirredda, M.; Fañanás-Pueyo, I.; Oñate-Sánchez, L.; Mira, S. Seed Longevity and Ageing: A Review on Physiological and Genetic Factors with an Emphasis on Hormonal Regulation. Plants 2024, 13, 41. [Google Scholar] [CrossRef] [PubMed]
  2. Souza, A.D.V.; Santos, D.; Rodrigues, A.A.; Zuchi, J.; Vieira, M.C.; Sales, J.F. Physical and Physiological Soybean Seed Qualities Stored under Different Environmental Conditions and Storage Bag Depths. Braz. J. Biol. 2023, 83, e277916. [Google Scholar] [CrossRef] [PubMed]
  3. Shu, Y.; Zhou, Y.; Mu, K.; Hu, H.; Chen, M.; He, Q.; Huang, S.; Ma, H.; Yu, X. A Transcriptomic Analysis Reveals Soybean Seed Pre-Harvest Deterioration Resistance Pathways under High Temperature and Humidity Stress. Genome 2020, 63, 115–124. [Google Scholar] [CrossRef] [PubMed]
  4. Zhou, W.; Chen, F.; Luo, X.; Dai, Y.; Yang, Y.; Zheng, C.; Yang, W.; Shu, K. A Matter of Life and Death: Molecular, Physiological, and Environmental Regulation of Seed Longevity. Plant Cell Environ. 2020, 43, 293–302. [Google Scholar] [CrossRef]
  5. Nagel, M.; Kranner, I.; Neumann, K.; Rolletschek, H.; Seal, C.E.; Colville, L.; Fernández-Marín, B.; Börner, A. Genome-Wide Association Mapping and Biochemical Markers Reveal That Seed Ageing and Longevity Are Intricately Affected by Genetic Background and Developmental and Environmental Conditions in Barley. Plant Cell Environ. 2015, 38, 1011–1022. [Google Scholar] [CrossRef]
  6. Zheng, Q.; Teng, Z.; Zhang, J.; Ye, N. ABA Inhibits Rice Seed Aging by Reducing H2O2 Accumulation in the Radicle of Seeds. Plants 2024, 13, 809. [Google Scholar] [CrossRef]
  7. Dargahi, H.; Tanya, P.; Srinives, P. Mapping of the Genomic Regions Controlling Seed Storability in Soybean (Glycine max L.). J. Genet. 2014, 93, 365–370. [Google Scholar] [CrossRef]
  8. Bentsink, L.; Alonso-Blanco, C.; Vreugdenhil, D.; Tesnier, K.; Groot, S.P.; Koornneef, M. Genetic Analysis of Seed-Soluble Oligosaccharides in Relation to Seed Storability of Arabidopsis. Plant Physiol. 2000, 124, 1595–1604. [Google Scholar] [CrossRef]
  9. Ramtekey, V.; Cherukuri, S.; Kumar, S.; Kudekallu V., S.; Sheoran, S.; Bhaskar K., U.; Naik K., B.; Kumar, S.; Singh, A.N.; Singh, H.V. Seed Longevity in Legumes: Deeper Insights Into Mechanisms and Molecular Perspectives. Front. Plant Sci. 2022, 13, 918206. [Google Scholar] [CrossRef]
  10. Lee, J.; Welti, R.; Roth, M.; Schapaugh, W.T.; Li, J.; Trick, H.N. Enhanced Seed Viability and Lipid Compositional Changes during Natural Ageing by Suppressing Phospholipase Dα in Soybean Seed. Plant Biotechnol. J. 2012, 10, 164–173. [Google Scholar] [CrossRef]
  11. Jong, C.; Yu, Z.; Zhang, Y.; Choe, K.; Uh, S.; Kim, K.; Jong, C.; Cha, J.; Kim, M.; Kim, Y.; et al. Multi-Omics Analysis of a Chromosome Segment Substitution Line Reveals a New Regulation Network for Soybean Seed Storage Profile. Int. J. Mol. Sci. 2024, 25, 5614. [Google Scholar] [CrossRef]
  12. Nakajima, S.; Ito, H.; Tanaka, R.; Tanaka, A. Chlorophyll b Reductase Plays an Essential Role in Maturation and Storability of Arabidopsis Seeds. Plant Physiol. 2012, 160, 261–273. [Google Scholar] [CrossRef] [PubMed]
  13. Rissel, D.; Losch, J.; Peiter, E. The Nuclear Protein Poly(ADP-Ribose) Polymerase 3 (AtPARP3) Is Required for Seed Storability in Arabidopsis Thaliana. Plant Biol. 2014, 16, 1058–1064. [Google Scholar] [CrossRef]
  14. Yan, S.; Huang, W.; Gao, J.; Fu, H.; Liu, J. Comparative Metabolomic Analysis of Seed Metabolites Associated with Seed Storability in Rice (Oryza sativa L.) during Natural Aging. Plant Physiol. Biochem. 2018, 127, 590–598. [Google Scholar] [CrossRef]
  15. Yuan, Z.; Fan, K.; Xia, L.; Ding, X.; Tian, L.; Sun, W.; He, H.; Yu, S. Genetic Dissection of Seed Storability and Validation of Candidate Gene Associated with Antioxidant Capability in Rice (Oryza sativa L.). Int. J. Mol. Sci. 2019, 20, 4442. [Google Scholar] [CrossRef] [PubMed]
  16. Gong, S.; Ding, Y.; Huang, S.; Zhu, C. Identification of miRNAs and Their Target Genes Associated with Sweet Corn Seed Vigor by Combined Small RNA and Degradome Sequencing. J. Agric. Food Chem. 2015, 63, 5485–5491. [Google Scholar] [CrossRef] [PubMed]
  17. Song, Y.; Lv, Z.; Wang, Y.; Li, C.; Jia, Y.; Zhu, Y.; Cao, M.; Zhou, Y.; Zeng, X.; Wang, Z.; et al. Identification of miRNAs Mediating Seed Storability of Maize during Germination Stage by High-Throughput Sequencing, Transcriptome and Degradome Sequencing. Int. J. Mol. Sci. 2022, 23, 12339. [Google Scholar] [CrossRef] [PubMed]
  18. Groot, S.P.C.; Surki, A.A.; de Vos, R.C.H.; Kodde, J. Seed Storage at Elevated Partial Pressure of Oxygen, a Fast Method for Analysing Seed Ageing under Dry Conditions. Ann. Bot. 2012, 110, 1149–1159. [Google Scholar] [CrossRef]
  19. Singh, R.K.; Raipuria, R.K.; Bhatia, V.S.; Rani, A.; Pushpendra; Husain, S.M.; Chauhan, D.; Chauhan, G.S.; Mohapatra, T. SSR Markers Associated with Seed Longevity in Soybean. Seed Sci. Technol. 2008, 36, 162–167. [Google Scholar] [CrossRef]
  20. Li, L.; Lin, Q.; Liu, S.; Liu, X.; Wang, W.; Hang, N.T.; Liu, F.; Zhao, Z.; Jiang, L.; Wan, J. Identification of Quantitative Trait Loci for Seed Storability in Rice (Oryza sativa L.). Plant Breed. 2012, 131, 739–743. [Google Scholar] [CrossRef]
  21. Zhang, X.; Hina, A.; Song, S.; Kong, J.; Bhat, J.A.; Zhao, T. Whole-Genome Mapping Identified Novel “QTL Hotspots Regions” for Seed Storability in Soybean (Glycine max L.). BMC Genom. 2019, 20, 499. [Google Scholar] [CrossRef] [PubMed]
  22. Zhou, Z.; Jiang, Y.; Wang, Z.; Gou, Z.; Lyu, J.; Li, W.; Yu, Y.; Shu, L.; Zhao, Y.; Ma, Y.; et al. Resequencing 302 Wild and Cultivated Accessions Identifies Genes Related to Domestication and Improvement in Soybean. Nat. Biotechnol. 2015, 33, 408–414. [Google Scholar] [CrossRef] [PubMed]
  23. Wang, R.; Wu, F.; Xie, X.; Yang, C. Quantitative Trait Locus Mapping of Seed Vigor in Soybean under −20 °C Storage and Accelerated Aging Conditions via RAD Sequencing. Curr. Issues Mol. Biol. 2021, 43, 1977–1996. [Google Scholar] [CrossRef]
  24. Hayes, B. Overview of Statistical Methods for Genome-Wide Association Studies (GWAS). Methods Mol. Biol. 2013, 1019, 149–169. [Google Scholar] [CrossRef]
  25. He, J.; Gai, J. Genome-Wide Association Studies (GWAS). Methods Mol. Biol. 2023, 2638, 123–146. [Google Scholar] [CrossRef]
  26. Langfelder, P.; Horvath, S. WGCNA: An R Package for Weighted Correlation Network Analysis. BMC Bioinform. 2008, 9, 559. [Google Scholar] [CrossRef]
  27. Wang, Y.; Wang, Y.; Liu, X.; Zhou, J.; Deng, H.; Zhang, G.; Xiao, Y.; Tang, W. WGCNA Analysis Identifies the Hub Genes Related to Heat Stress in Seedling of Rice (Oryza sativa L.). Genes 2022, 13, 1020. [Google Scholar] [CrossRef]
  28. Zong, J.; Chen, P.; Luo, Q.; Gao, J.; Qin, R.; Wu, C.; Lv, Q.; Zhao, T.; Fu, Y. Transcriptome-Based WGCNA Analysis Reveals the Mechanism of Drought Resistance Differences in Sweetpotato (Ipomoea batatas (L.) Lam.). Int. J. Mol. Sci. 2023, 24, 14398. [Google Scholar] [CrossRef]
  29. Azam, M.; Zhang, S.; Li, J.; Ahsan, M.; Agyenim-Boateng, K.G.; Qi, J.; Feng, Y.; Liu, Y.; Li, B.; Qiu, L.; et al. Identification of Hub Genes Regulating Isoflavone Accumulation in Soybean Seeds via GWAS and WGCNA Approaches. Front. Plant Sci. 2023, 14, 1120498. [Google Scholar] [CrossRef]
  30. Zhao, X.; Zhang, Y.; Wang, J.; Zhao, X.; Li, Y.; Teng, W.; Han, Y.; Zhan, Y. GWAS and WGCNA Analysis Uncover Candidate Genes Associated with Oil Content in Soybean. Plants 2024, 13, 1351. [Google Scholar] [CrossRef]
  31. Zhang, Z.-H.; Yu, S.-B.; Yu, T.; Huang, Z.; Zhu, Y.-G. Mapping Quantitative Trait Loci (QTLs) for Seedling-Vigor Using Recombinant Inbred Lines of Rice (Oryza sativa L.). Field Crops Res. 2005, 91, 161–170. [Google Scholar] [CrossRef]
  32. Zhang, X.; Xu, M.; Hina, A.; Kong, J.; Gai, J.; He, X.; Zhao, T. Seed Storability of Summer-Planting Soybeans under Natural and Artificial Aging Conditions. Legume Res. 2019, 42, 250–259. [Google Scholar] [CrossRef]
  33. Xia, Y.; Chen, F.; Du, Y.; Liu, C.; Bu, G.; Xin, Y.; Liu, B. A Modified SDS-Based DNA Extraction Method from Raw Soybean. Biosci. Rep. 2019, 39, BSR20182271. [Google Scholar] [CrossRef]
  34. Wang, K.; Li, M.; Hakonarson, H. ANNOVAR: Functional Annotation of Genetic Variants from High-Throughput Sequencing Data. Nucleic Acids Res. 2010, 38, e164. [Google Scholar] [CrossRef] [PubMed]
  35. Quinlan, A.R. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr. Protoc. Bioinform. 2014, 47, 11.12.1–11.12.34. [Google Scholar] [CrossRef]
  36. Ma, L.; Qing, C.; Zhang, M.; Zou, C.; Pan, G.; Shen, Y. GWAS with a PCA Uncovers Candidate Genes for Accumulations of Microelements in Maize Seedlings. Physiol. Plant 2021, 172, 2170–2180. [Google Scholar] [CrossRef]
  37. Lipka, A.E.; Tian, F.; Wang, Q.; Peiffer, J.; Li, M.; Bradbury, P.J.; Gore, M.A.; Buckler, E.S.; Zhang, Z. GAPIT: Genome Association and Prediction Integrated Tool. Bioinformatics 2012, 28, 2397–2399. [Google Scholar] [CrossRef]
  38. Wang, J.; Zhao, X.; Wang, W.; Qu, Y.; Teng, W.; Qiu, L.; Zheng, H.; Han, Y.; Li, W. Genome-Wide Association Study of Inflorescence Length of Cultivated Soybean Based on the High-throughout Single-Nucleotide Markers. Mol. Genet. Genom. 2019, 294, 607–620. [Google Scholar] [CrossRef]
  39. Sedgwick, P. Multiple Hypothesis Testing and Bonferroni’s Correction. BMJ 2014, 349, g6284. [Google Scholar] [CrossRef]
  40. Kanehisa, M.; Furumichi, M.; Sato, Y.; Kawashima, M.; Ishiguro-Watanabe, M. KEGG for Taxonomy-Based Analysis of Pathways and Genomes. Nucleic Acids Res. 2023, 51, D587–D592. [Google Scholar] [CrossRef]
  41. Rio, D.C.; Ares, M.; Hannon, G.J.; Nilsen, T.W. Purification of RNA Using TRIzol (TRI Reagent). Cold Spring Harb. Protoc. 2010, 2010, pdb.prot5439. [Google Scholar] [CrossRef] [PubMed]
  42. Jeon, S.A.; Park, J.L.; Kim, J.-H.; Kim, J.H.; Kim, Y.S.; Kim, J.C.; Kim, S.-Y. Comparison of the MGISEQ-2000 and Illumina HiSeq 4000 Sequencing Platforms for RNA Sequencing. Genom. Inf. 2019, 17, e32. [Google Scholar] [CrossRef] [PubMed]
  43. Kim, D.; Paggi, J.M.; Park, C.; Bennett, C.; Salzberg, S.L. Graph-Based Genome Alignment and Genotyping with HISAT2 and HISAT-Genotype. Nat. Biotechnol. 2019, 37, 907–915. [Google Scholar] [CrossRef] [PubMed]
  44. Wang, R.; Wang, Y.; Yao, W.; Ge, W.; Jiang, T.; Zhou, B. Transcriptome Sequencing and WGCNA Reveal Key Genes in Response to Leaf Blight in Poplar. Int. J. Mol. Sci. 2023, 24, 10047. [Google Scholar] [CrossRef]
  45. DiLeo, M.V.; Strahan, G.D.; den Bakker, M.; Hoekenga, O.A. Weighted Correlation Network Analysis (WGCNA) Applied to the Tomato Fruit Metabolome. PLoS ONE 2011, 6, e26683. [Google Scholar] [CrossRef]
  46. Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef]
  47. Singh, R.K.; Raipuria, R.K.; Bhatia, V.S.; Rani, A.; Pushpendra, null; Husain, S.M.; Tara Satyavathi, C.; Chauhan, G.S.; Mohapatra, T. Identification of SSR Markers Associated with Seed Coat Permeability and Electrolyte Leaching in Soybean. Physiol. Mol. Biol. Plants 2008, 14, 173–177. [Google Scholar] [CrossRef]
  48. Müller, M.; Munné-Bosch, S. Ethylene Response Factors: A Key Regulatory Hub in Hormone and Stress Signaling. Plant Physiol. 2015, 169, 32–41. [Google Scholar] [CrossRef]
  49. Jia, D.; Gao, H.; He, Y.; Liao, G.; Lin, L.; Huang, C.; Xu, X. Kiwifruit Monodehydroascorbate Reductase 3 Gene Negatively Regulates the Accumulation of Ascorbic Acid in Fruit of Transgenic Tomato Plants. Int. J. Mol. Sci. 2023, 24, 17182. [Google Scholar] [CrossRef]
  50. Murthy, S.S.; Zilinskas, B.A. Molecular Cloning and Characterization of a cDNA Encoding Pea Monodehydroascorbate Reductase. J. Biol. Chem. 1994, 269, 31129–31133. [Google Scholar] [CrossRef]
  51. Shao, Q.; Liu, X.; Su, T.; Ma, C.; Wang, P. New Insights Into the Role of Seed Oil Body Proteins in Metabolism and Plant Development. Front. Plant Sci. 2019, 10, 1568. [Google Scholar] [CrossRef] [PubMed]
  52. Xu, W.; Wang, Q.; Zhang, W.; Zhang, H.; Liu, X.; Song, Q.; Zhu, Y.; Cui, X.; Chen, X.; Chen, H. Using Transcriptomic and Metabolomic Data to Investigate the Molecular Mechanisms That Determine Protein and Oil Contents during Seed Development in Soybean. Front. Plant Sci. 2022, 13, 1012394. [Google Scholar] [CrossRef] [PubMed]
  53. Ludwig, V.; Berghetti, M.R.P.; Ribeiro, S.R.; Rossato, F.P.; Wendt, L.M.; Thewes, F.R.; Thewes, F.R.; Brackmann, A.; Both, V.; Wagner, R. The Effects of Soybean Storage under Controlled Atmosphere at Different Temperatures on Lipid Oxidation and Volatile Compounds Profile. Food Res. Int. 2021, 147, 110483. [Google Scholar] [CrossRef] [PubMed]
  54. Kim, H.T.; Choi, U.-K.; Ryu, H.S.; Lee, S.J.; Kwon, O.-S. Mobilization of Storage Proteins in Soybean Seed (Glycine max L.) during Germination and Seedling Growth. Biochim. Biophys. Acta 2011, 1814, 1178–1187. [Google Scholar] [CrossRef] [PubMed]
  55. Fleming, M.B.; Hill, L.M.; Walters, C. The Kinetics of Ageing in Dry-Stored Seeds: A Comparison of Viability Loss and RNA Degradation in Unique Legacy Seed Collections. Ann. Bot. 2019, 123, 1133–1146. [Google Scholar] [CrossRef]
  56. Wu, F.; Luo, X.; Wang, L.; Wei, Y.; Li, J.; Xie, H.; Zhang, J.; Xie, G. Genome-Wide Association Study Reveals the QTLs for Seed Storability in World Rice Core Collections. Plants 2021, 10, 812. [Google Scholar] [CrossRef]
  57. Hartanto, M.; Joosen, R.V.L.; Snoek, B.L.; Willems, L.A.J.; Sterken, M.G.; de Ridder, D.; Hilhorst, H.W.M.; Ligterink, W.; Nijveen, H. Network Analysis Prioritizes DEWAX and ICE1 as the Candidate Genes for Major eQTL Hotspots in Seed Germination of Arabidopsis Thaliana. G3 2020, 10, 4215–4226. [Google Scholar] [CrossRef]
  58. Liu, F.; Li, N.; Yu, Y.; Chen, W.; Yu, S.; He, H. Insights into the Regulation of Rice Seed Storability by Seed Tissue-Specific Transcriptomic and Metabolic Profiling. Plants 2022, 11, 1570. [Google Scholar] [CrossRef]
  59. Li, L.; Wang, F.; Li, X.; Peng, Y.; Zhang, H.; Hey, S.; Wang, G.; Wang, J.; Gu, R. Comparative Analysis of the Accelerated Aged Seed Transcriptome Profiles of Two Maize Chromosome Segment Substitution Lines. PLoS ONE 2019, 14, e0216977. [Google Scholar] [CrossRef]
  60. Shvachko, N.A.; Khlestkina, E.K. Molecular Genetic Bases of Seed Resistance to Oxidative Stress during Storage. Vavilovskii Zhurnal Genet. Sel. 2020, 24, 451–458. [Google Scholar] [CrossRef]
  61. Shi, H.; Guan, W.; Shi, Y.; Wang, S.; Fan, H.; Yang, J.; Chen, W.; Zhang, W.; Sun, D.; Jing, R. QTL Mapping and Candidate Gene Analysis of Seed Vigor-Related Traits during Artificial Aging in Wheat (Triticum aestivum). Sci. Rep. 2020, 10, 22060. [Google Scholar] [CrossRef] [PubMed]
  62. Bizouerne, E.; Ly Vu, J.; Ly Vu, B.; Diouf, I.; Bitton, F.; Causse, M.; Verdier, J.; Buitink, J.; Leprince, O. Genetic Variability in Seed Longevity and Germination Traits in a Tomato MAGIC Population in Contrasting Environments. Plants 2023, 12, 3632. [Google Scholar] [CrossRef] [PubMed]
  63. Tian, R.; Kong, Y.; Shao, Z.; Zhang, H.; Li, X.; Zhang, C. Discovery of Genetic Loci and Causal Genes for Seed Germination via Deep Re-Sequencing in Soybean. Mol. Breed. 2022, 42, 45. [Google Scholar] [CrossRef]
  64. Cao, Y.; Zhang, X.; Jia, S.; Karikari, B.; Zhang, M.; Xia, Z.; Zhao, T.; Liang, F. Genome-Wide Association among Soybean Accessions for the Genetic Basis of Salinity-Alkalinity Tolerance during Germination. Crop Pasture Sci. 2021, 72, 255–267. [Google Scholar] [CrossRef]
  65. Zhang, W.; Xu, W.; Zhang, H.; Liu, X.; Cui, X.; Li, S.; Song, L.; Zhu, Y.; Chen, X.; Chen, H. Comparative Selective Signature Analysis and High-Resolution GWAS Reveal a New Candidate Gene Controlling Seed Weight in Soybean. Theor. Appl. Genet. 2021, 134, 1329–1341. [Google Scholar] [CrossRef]
  66. Liu, Z.; Li, H.; Gou, Z.; Zhang, Y.; Wang, X.; Ren, H.; Wen, Z.; Kang, B.-K.; Li, Y.; Yu, L.; et al. Genome-Wide Association Study of Soybean Seed Germination under Drought Stress. Mol. Genet. Genom. 2020, 295, 661–673. [Google Scholar] [CrossRef]
  67. Nissan, N.; Hooker, J.; Pattang, A.; Charette, M.; Morrison, M.; Yu, K.; Hou, A.; Golshani, A.; Molnar, S.J.; Cober, E.R.; et al. Novel QTL for Low Seed Cadmium Accumulation in Soybean. Plants 2022, 11, 1146. [Google Scholar] [CrossRef]
  68. Hu, Z.; Zhang, D.; Zhang, G.; Kan, G.; Hong, D.; Yu, D. Association Mapping of Yield-Related Traits and SSR Markers in Wild Soybean (Glycine soja Sieb. and Zucc.). Breed. Sci. 2014, 63, 441–449. [Google Scholar] [CrossRef]
  69. Alam, M.; Wang, Y.; Chen, J.; Lou, G.; Yang, H.; Zhou, Y.; Luitel, S.; Jiang, G.; He, Y. QTL Detection for Rice Grain Storage Protein Content and Genetic Effect Verifications. Mol. Breed. 2023, 43, 89. [Google Scholar] [CrossRef]
  70. Scheben, A.; Batley, J.; Edwards, D. Genotyping-by-Sequencing Approaches to Characterize Crop Genomes: Choosing the Right Tool for the Right Application. Plant Biotechnol. J. 2017, 15, 149–161. [Google Scholar] [CrossRef]
  71. Naflath, T.V.; Rajendraprasad, S.; Ravikumar, R.L. Evaluation of Diverse Soybean Genotypes for Seed Longevity and Its Association with Seed Coat Colour. Sci. Rep. 2023, 13, 4313. [Google Scholar] [CrossRef]
  72. Yu, J.-Y.; Zhang, Z.-G.; Huang, S.-Y.; Han, X.; Wang, X.-Y.; Pan, W.-J.; Qin, H.-T.; Qi, H.-D.; Yin, Z.-G.; Qu, K.-X.; et al. Analysis of miRNAs Targeted Storage Regulatory Genes during Soybean Seed Development Based on Transcriptome Sequencing. Genes 2019, 10, 408. [Google Scholar] [CrossRef] [PubMed]
  73. Lee, S.J.; Ahn, J.K.; Kim, S.H.; Kim, J.T.; Han, S.J.; Jung, M.Y.; Chung, I.M. Variation in Isoflavone of Soybean Cultivars with Location and Storage Duration. J. Agric. Food Chem. 2003, 51, 3382–3389. [Google Scholar] [CrossRef] [PubMed]
  74. Bailly, C. The Signalling Role of ROS in the Regulation of Seed Germination and Dormancy. Biochem. J. 2019, 476, 3019–3032. [Google Scholar] [CrossRef]
  75. Waterworth, W.; Balobaid, A.; West, C. Seed Longevity and Genome Damage. Biosci. Rep. 2024, 44, BSR20230809. [Google Scholar] [CrossRef]
  76. Pagano, A.; Araújo, S.D.S.; Macovei, A.; Leonetti, P.; Balestrazzi, A. The Seed Repair Response during Germination: Disclosing Correlations between DNA Repair, Antioxidant Response, and Chromatin Remodeling in Medicago Truncatula. Front. Plant Sci. 2017, 8, 1972. [Google Scholar] [CrossRef]
  77. Min, C.W.; Lee, S.H.; Cheon, Y.E.; Han, W.Y.; Ko, J.M.; Kang, H.W.; Kim, Y.C.; Agrawal, G.K.; Rakwal, R.; Gupta, R.; et al. In-Depth Proteomic Analysis of Glycine max Seeds during Controlled Deterioration Treatment Reveals a Shift in Seed Metabolism. J. Proteom. 2017, 169, 125–135. [Google Scholar] [CrossRef]
  78. Vijayakumar, H.P.; Vijayakumar, A.; Srimathi, P.; Somasundaram, G.; Prasad, S.R.; Natarajan, S.; Dhandapani, R.; Boraiah, K.M.; Vishwanath, K. Biochemical Changes in Naturally Aged Seeds of Soybean Genotypes with Good and Poor Storability. Legume Res. 2019, 42, 782–788. [Google Scholar] [CrossRef]
  79. Manohar, M.; Shigaki, T.; Mei, H.; Park, S.; Marshall, J.; Aguilar, J.; Hirschi, K.D. Characterization of Arabidopsis Ca2+/H+ Exchanger CAX3. Biochemistry 2011, 50, 6189–6195. [Google Scholar] [CrossRef]
  80. Cheng, N.-H.; Pittman, J.K.; Shigaki, T.; Lachmansingh, J.; LeClere, S.; Lahner, B.; Salt, D.E.; Hirschi, K.D. Functional Association of Arabidopsis CAX1 and CAX3 Is Required for Normal Growth and Ion Homeostasis. Plant Physiol. 2005, 138, 2048–2060. [Google Scholar] [CrossRef]
  81. Park, S.; Cheng, N.H.; Pittman, J.K.; Yoo, K.S.; Park, J.; Smith, R.H.; Hirschi, K.D. Increased Calcium Levels and Prolonged Shelf Life in Tomatoes Expressing Arabidopsis H+/Ca2+ Transporters. Plant Physiol. 2005, 139, 1194–1206. [Google Scholar] [CrossRef]
Figure 1. The germination-related traits of soybean stored in different years.
Figure 1. The germination-related traits of soybean stored in different years.
Agronomy 14 02457 g001
Figure 2. Mapping of SNP genetic data in associated populations. (A) Distribution of SNP markers among 20 chromosomes. (B) LD decay of the genome-wide association study (GWAS) population. (C) Population structure of soybean germplasm. (D) The first three principal components reflected by SNPs used in the GWAS. (E) A heatmap of the kinship matrix of the 168 soybean accessions.
Figure 2. Mapping of SNP genetic data in associated populations. (A) Distribution of SNP markers among 20 chromosomes. (B) LD decay of the genome-wide association study (GWAS) population. (C) Population structure of soybean germplasm. (D) The first three principal components reflected by SNPs used in the GWAS. (E) A heatmap of the kinship matrix of the 168 soybean accessions.
Agronomy 14 02457 g002aAgronomy 14 02457 g002b
Figure 3. Result of association mapping for storage tolerance of soybean. (A) Germinability rate; (B) Germinability; (C) Germinability index; (D) Vitality index. The black line on each subgraph indicates the log10 (p-value) significance threshold.
Figure 3. Result of association mapping for storage tolerance of soybean. (A) Germinability rate; (B) Germinability; (C) Germinability index; (D) Vitality index. The black line on each subgraph indicates the log10 (p-value) significance threshold.
Agronomy 14 02457 g003
Figure 4. Functional categories of the genes in 100-kb flanking regions around peak SNPs.
Figure 4. Functional categories of the genes in 100-kb flanking regions around peak SNPs.
Agronomy 14 02457 g004
Figure 5. WGCNA reveals modules associated with differentially metabolized genes. (A) Dendrogram of average network adjacency for identifying gene co-expression modules. (B) Pie chart of the number of genes in each module. (C) Analysis of the correlation between gene modules and traits.
Figure 5. WGCNA reveals modules associated with differentially metabolized genes. (A) Dendrogram of average network adjacency for identifying gene co-expression modules. (B) Pie chart of the number of genes in each module. (C) Analysis of the correlation between gene modules and traits.
Agronomy 14 02457 g005
Figure 6. KEGG enrichment and GO annotation of genes within the significantly different modules. (A) KEGG enrichment for genes in the five significantly different modules. (B) GO annotation for genes in the five significantly different modules.
Figure 6. KEGG enrichment and GO annotation of genes within the significantly different modules. (A) KEGG enrichment for genes in the five significantly different modules. (B) GO annotation for genes in the five significantly different modules.
Agronomy 14 02457 g006
Figure 7. Mining candidate genes by integrating GWAS and WGCNA. (A) The number of overlapping genes obtained by combining GWAS and WGCNA. (B) Co-expression network analysis of candidate genes in the ‘coral1’ module. Red triangles represent overlapping genes. (C) Co-expression network analysis of candidate genes in the ‘darkviolet’ module. (D) Co-expression network analysis of candidate genes in the ‘darkgreen’ module. (E) Analysis of the expression pattern of candidate genes in the seeds.
Figure 7. Mining candidate genes by integrating GWAS and WGCNA. (A) The number of overlapping genes obtained by combining GWAS and WGCNA. (B) Co-expression network analysis of candidate genes in the ‘coral1’ module. Red triangles represent overlapping genes. (C) Co-expression network analysis of candidate genes in the ‘darkviolet’ module. (D) Co-expression network analysis of candidate genes in the ‘darkgreen’ module. (E) Analysis of the expression pattern of candidate genes in the seeds.
Agronomy 14 02457 g007
Figure 8. Candidate gene association and haplotype analysis. * indicates significance in variance analysis at p < 0.05.
Figure 8. Candidate gene association and haplotype analysis. * indicates significance in variance analysis at p < 0.05.
Agronomy 14 02457 g008
Table 1. Statistical analysis and variation of seed vigor under different traits.
Table 1. Statistical analysis and variation of seed vigor under different traits.
LocationYearMin aMax aMean aSD bCV (%) cSkewnessKurtosis
Germinability20140.0060.0017.3915.1887.260.52−0.48
20150.0097.0031.3518.6959.640.740.64
20160.00100.0037.2115.8642.620.321.25
Average0.0085.6728.6517.7063.170.530.47
Germinability rate20140.0078.0026.1919.3067.580.59−0.06
201510.00100.0044.5817.9043.300.60−0.05
201613.00100.0047.3316.4037.830.620.23
Average7.6792.6739.3716.8249.570.600.04
Germinability index20140.00100.7525.1816.9165.131.012.09
20154.06107.4328.27112.3759.511.062.28
20164.0679.6138.66108.5143.74−0.05−0.59
Average2.7195.9330.70178.7956.130.671.26
Vitality index20140.00490.27105.3315.18106.691.451.61
20152.01495.74112.5718.6996.401.602.65
20162.49754.77215.1115.8683.110.87−0.11
Average1.50580.26144.3417.7095.401.311.38
Germinability and Germination rate: a %; Germinability index: a unitless; Vitality index: a gram; b Standard Deviation; c coefficient of variation.
Table 2. Peak SNP and beneficial allele of the index of seed vigor resistance identified by GWAS.
Table 2. Peak SNP and beneficial allele of the index of seed vigor resistance identified by GWAS.
SNPChr.PositionTraitYear−Log10(P)MAFR2Allele 1Allele 2Allele EffectReferences
rs25887810125887810germination index20153.300.0660.100CT9.354
vitality index3.420.10960.898
rs27941858127941858germination rate20153.310.0480.131GA−12.509
germination3.370.128−12.301
rs33981296133981296germination index20163.070.0540.104GA−9.849
vitality index3.670.118−116.141
rs846828038468280vitality index20153.040.0930.098GA−50.777
20163.450.112−90.582
rs12451980412451980vitality index20143.220.1110.123CA−51.458
20153.140.101−49.905
rs46324094446324094vitality index20143.280.1440.124TC−45.041
20153.190.102−44.039
rs2497867249786germination20163.570.1470.113AT−6.939
germination index20165.170.164−9.130
rs598572275985722germination20163.080.3800.100GT4.471
germination rate20165.120.1346.886
rs44713950844713950germination rate20143.410.2630.115CT−5.464
germination index20143.290.112−5.141
rs46769428846769428germination rate20143.500.2570.118CA−6.006Singh R.K., et al. [47]
germination index20143.540.1191.877
rs186109801418610980germination rate20143.160.2510.108TC−5.662
germination index20153.210.097−5.806
rs143692891614369289vitality index20143.070.0840.118GT60.125
20153.230.10361.370
rs7372359167372359vitality index20145.850.0630.197AT93.122
20154.460.13878.454
rs426275301942627530germination index20153.090.1620.094GT6.383
vitality index20153.340.10645.477
20143.570.13243.023
germination20154.190.1518.425
Table 3. Gene-based association of candidate genes.
Table 3. Gene-based association of candidate genes.
Gene IDChromosomePhysical Position (bp)RegionTraitAlleles−Log10(P)R2 (%)Functional Annotation
Glyma.03G05830038286280intronicVI15C/A2.5816990.455cation exchanger 3
8287385intronicVI16G/T2.8538720.422
8287566intronicVI14G/A2.6363880.397
8287572intronicVI15G/A5.1374040.627
8287590intronicVI14A/G3.0442880.362
8287592intronicVI15A/C5.1374040.627
8287668intronicVI14G/A3.0442880.362
8287672intronicVI14A/G3.0442880.362
8287673intronicVI14T/C3.0442880.362
8287675intronicVI14A/G3.0442880.362
8287680intronicVI15C/A2.7471470.41
8287786intronicVI15G/A3.315020.471
8280562upstreamGR15C/T2.9281180.349
8288104UTR3VI15G/T2.5767540.39
Glyma.16G074600167490751intronicVI15G/A3.0358580.442breast basic conserved 1
7490754intronicVI15G/A3.0358580.442
7492051intronicGR14T/A2.5301780.385
7492068intronicVI15A/G2.591760.456
7492080intronicVI15G/A2.5376020.449
7492199downstreamVI15T/A2.663540.464
7492973downstreamGR14A/C3.6725180.428
7493438downstreamGR14G/A2.5272440.302
7493443synonymousVI15G/A2.503070.299
7493464synonymousGR14T/C3.3356310.473
7493761synonymousVI15T/A2.8326830.419
7494765UTR5VI15A/T2.9829670.498
7494812UTR5VI15C/T3.4689810.486
7495467UTR5VI15A/T3.191350.458
7495873UTR5VI15A/T3.5256530.492
7495915UTR5VI15T/A2.6536470.463
7496419upstreamVI15C/T2.5466820.45
7496426upstreamVI15A/G2.8210230.418
7496457upstreamVI15T/C2.8601210.423
7496477upstreamGR15T/A2.6055480.393
7496478upstreamGR15G/A2.6055480.393
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, X.; Wang, Y.; Xie, J.; Yang, Z.; Li, H.; Li, Y.; Teng, W.; Zhao, X.; Zhan, Y.; Han, Y. Identification of Candidate Genes for Soybean Storability via GWAS and WGCNA Approaches. Agronomy 2024, 14, 2457. https://doi.org/10.3390/agronomy14112457

AMA Style

Wu X, Wang Y, Xie J, Yang Z, Li H, Li Y, Teng W, Zhao X, Zhan Y, Han Y. Identification of Candidate Genes for Soybean Storability via GWAS and WGCNA Approaches. Agronomy. 2024; 14(11):2457. https://doi.org/10.3390/agronomy14112457

Chicago/Turabian Style

Wu, Xu, Yuhe Wang, Jiapei Xie, Zhenhong Yang, Haiyan Li, Yongguang Li, Weili Teng, Xue Zhao, Yuhang Zhan, and Yingpeng Han. 2024. "Identification of Candidate Genes for Soybean Storability via GWAS and WGCNA Approaches" Agronomy 14, no. 11: 2457. https://doi.org/10.3390/agronomy14112457

APA Style

Wu, X., Wang, Y., Xie, J., Yang, Z., Li, H., Li, Y., Teng, W., Zhao, X., Zhan, Y., & Han, Y. (2024). Identification of Candidate Genes for Soybean Storability via GWAS and WGCNA Approaches. Agronomy, 14(11), 2457. https://doi.org/10.3390/agronomy14112457

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop