Next Article in Journal
Respiratory Infection and Inflammation in Cystic Fibrosis: A Dynamic Interplay among the Host, Microbes, and Environment for the Ages
Next Article in Special Issue
Novel Seed Size: A Novel Seed-Developing Gene in Glycine max
Previous Article in Journal
Identification of a Rice Leaf Width Gene Narrow Leaf 22 (NAL22) through Genome-Wide Association Study and Gene Editing Technology
Previous Article in Special Issue
Genome-Wide Identification and Expression Analysis of the Ammonium Transporter Family Genes in Soybean
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantitative Trait Loci (QTL) Analysis of Seed Protein and Oil Content in Wild Soybean (Glycine soja)

1
Department of Applied Plant Science, Chonnam National University, Gwangju 61186, Republic of Korea
2
BK21 FOUR Center for IT-Bio Convergence System Agriculture, Chonnam National University, Gwangju 61186, Republic of Korea
3
National Institute of Crop Science, Rural Development Administration (RDA), Wanju 55365, Republic of Korea
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2023, 24(4), 4077; https://doi.org/10.3390/ijms24044077
Submission received: 30 January 2023 / Revised: 10 February 2023 / Accepted: 13 February 2023 / Published: 17 February 2023
(This article belongs to the Special Issue Genetics and Novel Techniques for Soybean Yield Enhancement)

Abstract

:
Soybean seeds consist of approximately 40% protein and 20% oil, making them one of the world’s most important cultivated legumes. However, the levels of these compounds are negatively correlated with each other and regulated by quantitative trait loci (QTL) that are controlled by several genes. In this study, a total of 190 F2 and 90 BC1F2 plants derived from a cross of Daepung (Glycine max) with GWS-1887 (G. soja, a source of high protein), were used for the QTL analysis of protein and oil content. In the F2:3 populations, the average protein and oil content was 45.52% and 11.59%, respectively. A QTL associated with protein levels was detected at Gm20_29512680 on chr. 20 with a likelihood of odds (LOD) of 9.57 and an R2 of 17.2%. A QTL associated with oil levels was also detected at Gm15_3621773 on chr. 15 (LOD: 5.80; R2: 12.2%). In the BC1F2:3 populations, the average protein and oil content was 44.25% and 12.14%, respectively. A QTL associated with both protein and oil content was detected at Gm20_27578013 on chr. 20 (LOD: 3.77 and 3.06; R2 15.8% and 10.7%, respectively). The crossover to the protein content of BC1F3:4 population was identified by SNP marker Gm20_32603292. Based on these results, two genes, Glyma.20g088000 (S-adenosyl-l-methionine-dependent methyltransferases) and Glyma.20g088400 (oxidoreductase, 2-oxoglutarate-Fe(II) oxygenase family protein), in which the amino acid sequence had changed and a stop codon was generated due to an InDel in the exon region, were identified.

1. Introduction

Soybean (Glycine max L.) is an important crop worldwide that represents a major source of protein and vegetable oil for the human diet and animal feed. As of 2021, soybean was the largest source of protein meal in the world, at 243.6 t, and the second-largest source of vegetable oil (58.7 t) after palm oil (http://soystats.com/, accessed on 12 February 2023). In Asian countries, soybean seeds are used to produce a number of food products, including soymilk, tofu, soybean paste, natto, and soy sauce. In the West, soybean is typically used for soybean meal and seed oil. Soybean seeds generally consist of 40% protein and 20% oil [1], and these traits are negatively correlated with each other [2,3]. For this reason, it is very difficult to improve both traits simultaneously. In addition, because there is a negative correlation between seed yield and protein content [4], high-protein varieties need to be developed with care. In addition to being easily influenced by environmental factors, the protein and oil content of soybean seeds is regulated by the polygenes and quantitatively inherited [5,6]. These polygenes can be divided into major genes, which are less influenced by the environment and that have a significant influence on these levels; and minor genes, which have a weaker influence.
Wild soybean (G. soja Sieb. and Zucc.), the ancestor of cultivated soybean, has high genetic diversity and is thus valuable as a breeding material for soybean breeding programs [7,8]. Various studies have used wild soybean to improve biological stress resistance, abiotic stress tolerance, nutrition, and yields [9]. The average protein content of wild soybean is reported to be higher than that of cultivated soybean, although this may be due to correlations with the yield or oil content [9]. In the study by Chen and Nelson (2004), the protein content of wild and cultivated soybean lines was about 47% and 40%, respectively, while the oil content was 15% and 11%, respectively [10].
After the publication of the soybean genome for the first time by Schmutz et al. (2010) [11], Ha et al. (2012) [12] advanced genomic research further with the integration of physical maps for G. max and G. soja. QTL mapping uses F2, backcross (BC), and recombinant inbred line (RIL) populations derived from bi-parental crosses. In many soybean populations, the QTLs for proteins and oils have been mapped to genomic regions on chromosomes 15 and 20 [13,14,15,16,17]. Major QTLs for protein and oil content were identified by Diers et al. (1992) using RFLP markers for the F2 population through the crossbreeding of G. max (A81-356022) and G. soja (PI 468916) [18]. The QTL located on chromosome 15 has been fine-mapped at an interval of 535 kb between simple sequence repeat (SSR) markers (Kim et al. 2016) [19], while the candidate gene for the QTL located on chromosome 20 has been cloned as Glyma.20G85100 encoding the CCT domain [20].
To date, 255 and 322 protein and oil content-related QTLs, respectively, have been identified using bi-parental populations (https://www.soybase.org/, accessed on 12 February 2023). However, these QTLs may include multiple duplicate detections, so the Soybean Genetics Committee has emphasized the importance of experimentally confirming QTLs and adding “cq” in front of the original QTL name to indicate that it has been confirmed [20,21]. In total, 16 QTLs each have been confirmed for protein and oil content (https://www.soybase.org/), and these are distributed across 11 chromosomes, including chromosomes 15 and 20 [18,19,20]. Of these, the only QTLs derived from wild soybean are cqpro-003 and cqoil-004 [22]. Therefore, the purpose of this study was to discover new genes for protein and oil content from wild soybean using two progeny types derived from high-protein wild soybean lines.

2. Results

2.1. Phenotypic Variation in the Protein and Oil Content

In the present study, the seed oil and protein content were measured using seeds harvested from F3 and BC1F3 progeny lines in 2019 and 2020, respectively. The protein content of Daepung and GWS-1887, the parents of the F2:3 line, in 2019 was 37.10% and 50.37%, respectively, while the oil content was 19.81% and 5.83%, respectively. In the 190 F2:3 plants, the protein content ranged from 40.08% to 50.96% with a mean of 45.52%, and the oil content ranged from 7.84% to 16.61% with a mean of 11.59% (Table 1).
The protein content of Daepung and GWS-1887, the parents of the BC1F2:3 line, in 2020 was 40.05% and 49.28%, respectively, and the oil content was 16.53% and 5.34%, respectively. In the BC1F2 population, the protein content ranged from 31.50% to 49.54% with a mean of 44.25% and the oil content ranged from 7.73% to 14.84% with a mean of 12.14% (Table 2).
The phenotypic variation of the F2 and BC1F2 populations followed a normal distribution (Figure 1 and Figure 2, respectively).

2.2. Linkage Maps and QTL Analysis

Linkage maps for the F2 and BC1F2 populations were constructed using polymorphic SNP markers acquired from SoySNP6K Illumina BeadChips (Figures S1 and S2, respectively). In the F2 population, 2592 polymorphism markers were used, with an average chromosome length of 95 cM and an average of 130 markers located across each of the 20 chromosomes (Table 3).
In the BC1F2 population, 1063 polymorphism markers were used, with an average chromosome length of 60 cM (except chromosome 12, which did not show polymorphism) and an average of 130 markers located on each of the 19 chromosomes (Table 4).
The average genetic interval for both the F2 and BC1F2 populations was 1.1 cM. QTLs for the protein and oil content in both populations were identified using MQM mapping analysis. In the F2:3 population, QTLs for protein and oil content were identified on chromosomes 20 and 15, respectively (Figure 3), and these QTLs accounted for 17.2% and 12.2% of the phenotypic variation, respectively, with additive effects of the alleles on these traits of −1.10 and 0.59 (Table 5).
On the other hand, in the BC1F2:3 population, QTLs for protein and oil content were both identified on chromosome 20 (Figure 4), accounting for 15.8% and 10.7% of the phenotypic variation, respectively, with additive effects of the alleles on these traits of −1.49 and 0.62 (Table 6).

2.3. Crossover Detection

From the BC1F3 population, two lines with a heterozygote at a position expected to represent a high-protein-promoting gene on chromosome 20 were selected and advanced to produce a BC1F3:4 generation, followed by genotyping and protein content measurement. In Table 7, crossover occurred at 33,049,242 bp, and the individual with the genotype of the parent Daepung had average protein levels of 44.10 g, while individuals with the genotype of the parent GWS-1887 had a protein level of 47.58 g. In addition, crossover occurred at 32,603,292 bp, and individuals with the genotype of the parent Daepung had average protein levels of 45.25 g, and the individual with the genotype of the parent GWS-1887 had a protein level of 48.54 g. Given the genotypes of the two lines, it was predicted that the range of genes related to high protein levels is present at least downstream of position 32,603,292 bp.

2.4. Genome Re-Sequencing

Genome-resequencing analysis was conducted on wild soybean GWS-1887, which has a high protein content. The total number of sequencing reads was about 260 million with a sequencing depth of 38.6× and a total size of about 39 billion bp, while the coverage for the reference genome was 95.3%. Compared with the Williams 82 reference genome, approximately 4.7 million SNPs and 0.9 million InDels were identified in GWS-1887 (Table 8).
A total of 21 protein-related candidate genes with InDels were detected between Gm20_27578013, which is a molecular marker identified as a result of QTL mapping in the BC1F2:3 population, and Gm20_32603292, which was identified as the crossover site in BC1F3:4 (Table 9).
Of these, large InDels were identified in five genes (Glyma.20G085300, Glyma.20G085450, Glyma.20G085800, Glyma.20g087000, and Glyma.20G088000), small InDels were most common in 40 loci in Glyma.20G085300, and three large InDels were present in Glyma.20G088000. In particular, InDels occurred in the exon region of genes Glyma.20g088000 and Glyma.20g088400, and the InDels of the two genes generated stop codons with amino acid frameshifts (Figure 5).
Of these, the stop codon in Glyma.20g088000 is expected to greatly simplify the structure of the protein (Figure 6).

3. Discussion

QTL analysis of the protein and oil content in soybean has been well-studied in previous research. The present study searched for QTLs related to high protein content using two F2 and BC1F2 populations derived from a cross between cultivated soybean variety Daepung and wild soybean variety GWS-1887. The protein content of cultivated soybean is known to be about 40% [1], whereas wild soybean GWS-1887 has protein levels close to 50% [9,10,23]. This suggests that wild soybean GWAS-1887 may be useful for QTL analysis in terms of mapping the genetic regions associated with high protein content in soybean. However, the crossbreeding between G. max and G. soja may lead to linkage drag and consequent negative introgression such as reduced yields [9]. Daepung exhibited a higher annual variation in its protein content (37.10% in 2019 and 40.05% in 2020) than did GWS-1887 (50.37% in 2019 and 49.28% in 2020) (Table 1 and Table 2). Several studies have reported that a lack of soil moisture reduces the protein content of soybean [24,25,26]. The rainfall in the experimental area in August 2019 during the soybean development period was lower than normal, while the rainfall in August 2020 was above average (http://www.kma.go.kr/, accessed on 12 February 2023). Therefore, it appears that the protein content of wild soybean has a higher environmental stability than cultivated soybean.
Previous QTL analysis of the protein and oil content in soybean seeds has been conducted with various populations, with the identified QTLs distributed across 20 chromosomes (https://www.soybase.org/). Of these, major QTLs for protein and oil content are present on chromosomes 15 and 20 [18], and several researchers have attempted to narrow down their precise location [13,19,20,21,22,27]. In the present study, the QTLs related to protein and oil content in the F2:3 population were identified as Gm20_29512680 and Gm15_3621773, respectively, whereas in the BC1F2:3 population, marker Gm20_27578013 was identified for both protein and oil. Kim et al. (2016) reported the fine-mapping of a backcross line of Williams 82 and PI 407788A with 96 BARCSOYSSR markers and found that the QTL related to the protein and oil content on chromosome 15 was present in a 535 kb region from the physical position 3.59 Mbp to 4.12 Mbp [19]. These results are consistent with the SNP marker Gm15_2621773 at the physical position 3.63 Mbp detected for oil content in the F2:3 population in the present study. Recently, cqSeed protein-003 located on chromosome 20 was narrowed down through fine-mapping to a 77.8 kb region between genetic marker BARCSOYSSR_20_0670 and BARCSOYSSR_20_0674 (31.74 to 31.82 Mbp), and the Glyma.20G85100 gene encoding the CCT domain was identified as a candidate gene involved in protein content [20].
In our results, protein-related QTLs were mapped to Gm20_29512680 at 30.61 Mbp and Gm20_27578013 at 28.69 Mbp on chromosome 20 in the F2:3 and BC1F2:3 populations, respectively. These results are consistent with the 24.55–32.91 Mbp range reported by Bolon et al. (2010) [15] and the 28.7–31.1 Mbp range reported by Hwang et al. (2014) [13], but not with the 32.71–33.08 Mbp range identified by Vaughn et al. (2014) [14] and the 31.74–31.82 Mbp range found by Fliege et al. (2022) [20]. The reason for these inconsistencies could the low LD with the surrounding markers [20], and it is known that the wild soybean variety used as a parent in this study has a lower LD than cultivated soybean [28]. For more accurate confirmation of the location, crossovers around the QTL detected in BC1F2:3 were identified but could not be found, and it was concluded that a crossover occurred at markers Gm20_33049242 and Gm20_32603292 in the two BC1F3:4 lines, which was advanced one generation by selecting high-protein lines. Therefore, it was predicted that the protein-related gene is present in the region downstream of Gm20_32603292.
Based on the results of QTL mapping, InDels were then searched for in the candidate genes located at around 30 Mbp between the Williams 82 reference genome and GWS-1887. Interestingly, no mutation was detected in the Glyma.20G85100 gene of the CCT motif family protein, which was recently cloned as a major protein-related gene [20]. These results suggest that other major protein-related genes may be present in a similar genetic region. Wang et al. (2021) selected protein-related candidate gene Glyma.20g088000 using DEG analysis via RNA-seq, and it was found that Glyma.20g088000 had a significant difference in its sequence between the high-protein Nanxiadou 25 and low-protein Tongdou 11 varieties due to InDels [16]. Interestingly, Glyma.20g088000 (S-adenosyl-l-methionine-dependent methyltransferase) was also selected as a candidate gene in the present study because small and large InDels occurred within several regions of the gene. In addition, Glyma.20g086900 (aldehyde dehydrogenase-related) and Glyma.20g088400 (oxidoreductase, 2-oxoglutarate-Fe(II) oxygenase family protein) genes were selected by Lee et al. (2019) as a result of a GWAS for the soybean seed protein content from maturation groups I to IV [17]. These two genes were also identified as candidate genes in this study.
In the present study, it was confirmed that Glyma.20g088000 and Glyma.20g088400 had a large InDel in the 5′ first exon and a small InDel in the 3′ third exon, respectively (Figure 5). Nonsense mutations that create stop codons and frameshifts in which amino acids are rearranged can disrupt the function of a gene [29]. In one example, truncated polypeptides generated as a result of nonsense mutations resulted in the loss of anthocyanin pigments associated with the color of soybean flowers [30]. In particular, the stop codon in Glyma.20g088000 is expected to greatly simplify the structure of the protein (Figure 6), thus it is likely to have a significant effect on the expression of its function. Although these candidate genes have potential functions in metabolism, the mechanisms of how they relate to seed composition require further study. In addition, the results collectively suggest that protein content may be regulated by the complex interaction of multiple genes located at around 30 Mbp on chromosome 20.

4. Materials and Methods

4.1. Plant Materials

In the present study, 180 F3 and 90 BC1F4 populations derived from a cross between Daepung and GWS-1887 were analyzed. Daepung, which was used as the female, recurrent parent, is an elite Korean variety that is strongly resistant to disease and shattering and has high yields [31], while GWS-1887, which was used as the male parent, was selected from the core collection of wild soybean accessions from the Rural Development Administration (RDA) because it has a protein content of 50% or higher. In the summer of 2018, F1 seeds were obtained from artificial crossbreeding in an experimental field at Chonnam University (Gwangju, 36°17′ N, 126°39′ E, Republic of Korea). The F1 seeds were planted in a greenhouse during the 2018–2019 winter season to obtain F2 seeds, with the generation then advanced from F2 to F3 in the summer of 2019. At the same time, F1 seeds were backcrossed in the summer of 2019 to obtain BC1F1 seeds. The produced BC1F1 seeds were planted in a greenhouse during the 2019–2020 winter period to obtain BC1F2 seeds. Finally, in the summer of 2020, the BC1F2 generation was advanced and BC1F3 seeds were obtained.

4.2. Analysis of Protein and Oil Content

All harvested seed samples were dried at 40 °C for 7 d and then pulverized using a coffee grinder to produce 3 g each for subsequent analysis. The crude protein content was measured using the Kjeldahl method. Reagents required for digestion, distillation, and titration were prepared, including 0.1N hydrochloric acid, a decomposition accelerator (containing 10 g of potassium sulfate and 1 g of copper sulfate), 40% sodium hydroxide solution, and 1% boric acid solution with 100 mL and 70 mL of Bromocresol green and methyl red solutions, respectively. The sample solution was prepared by mixing 0.7–1.0 g of the ground seed and 7–8 g of the decomposition accelerator with 10 mL of sulfuric acid in a decomposition bottle. Digestion was carried out by heating the sample solution at a slow ramping rate until no visible bubbles remained and the solution became transparent. The solution was then analyzed using a Kjeltec 1030 Autoanalyzer (FOSS Tecator AB, Hogans, Sweden) following the manufacturer’s instructions.
The crude oil content was measured using ether extraction. For this, an oil metering bottle was pre-dried at around 95–100 °C for 2 h followed by cooling in a desiccator for 30 min. Following this, 2–3 g of the sample wrapped in No. 2 filter paper was dried at the same temperature and for the same duration of time as the oil metering bottle. After drying, the sample was placed in a Soxtec 1043 instrument (FOSS Tecator AB, Hogans, Sweden), and subjected to a flow of ether at 80 °C for 8 h to extract the oil. The processed ether was then collected in an oil metering bottle and subsequently dried (95–100 °C for 3 h) followed by cooling in a desiccator (40 min) and weighing. The oil content was determined by subtracting the weight of the empty oil metering bottle from the weight of the bottle containing the extract.
C r u d e   o i l % = Oil   bottle   weight   after   extraction Oil   bottle   weight   before   extraction Sample   weight × 100

4.3. DNA Extraction and SNP Genotyping

Fresh leaf tissue was collected at the beginning of growth for DNA extraction and ground using liquid nitrogen in a mortar. Genomic DNA was isolated from 20 mg of lyophilized leaf tissue using a DNeasy Plant Mini Kit (QIAGEN, Valencia, CA, USA) according to the manufacturer’s protocol. The quality and quantity of the extracted total DNA were verified using a Nano-MD UV-Vis spectrophotometer (Scinco, Seoul, Republic of Korea). The extracted DNA was stored in a freezer at −80 °C until further use. A total of 270 samples, consisting of 180 F2 and 90 BC1F2 plants and two replications of each parental plant (Daepung and GWS-1887) were genotyped using a SoySNP6K Illumina BeadChip (Illumina, San Diego, CA, USA) at TNT Research Co. (Anyang, Republic of Korea). The SNP alleles were called using Illumina’s GenomeStudio (Illumina, Inc., San Diego, CA, USA).

4.4. Genetic Linkage Analysis

A genetic linkage map was constructed using the Kosambi mapping function in Joinmap v4.1 (Kyazma, Wageningen, The Netherlands). For genetic analysis, MQM mapping was employed with MapQTL 6.0 (Kyazma, Wageningen, The Netherlands). In the F2 population, permutations were conducted to determine the genome-wide significance threshold for the LOD scores, with the number of permutations set at 1000. In the BC1F2 population, an LOD score of ≥3.0 was set as the threshold for determining the presence of a QTL. LOD graphs and the location maps for the QTLs were created with MapChart2.2.

4.5. Re-Sequencing

Re-sequencing analysis was commissioned by Insilicogen (Yongin, Republic of Korea) and performed using an Illumine Novaseq 6000 platform. A library was constructed from DNA fragments with 151 bp paired ends read using a DNA Sample Prep Kit (Illumina) following the manufacturer’s instructions. An analysis pipeline for detecting mutations in the sequencing data for the entire genome was employed with an NF-Core/SAREK workflow [32]. The snpEff tool was used for genetic variation annotation and effect prediction, while the snpEff database was referenced to Glycine max var. Williams 82 [11]. The whole genome sequencing data of GWS-1887 were deposited in NCBI under the BioProject PRJNA915129.

5. Conclusions

In this study, QTL mapping of the protein and oil content in soybean seeds was conducted using two progeny populations derived from high-protein wild soybean lines. QTL was detected in the region of cqPRO-003, which has been previously reported as a major QTL related to protein content, but as a result of resequencing, no difference was observed from the recently cloned candidate gene cqPRO-003. On the other hand, new candidate genes Glyma.20g088000 and Glyma.20g088400, which contained InDels, were discovered. This suggests that the protein content may be regulated by the complex interaction of multiple genes and associations other than those that have previously been reported.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms24044077/s1.

Author Contributions

Writing—original draft preparation, W.J.K.; methodology, B.H.K., C.Y.M., S.K. and S.S.; writing—review and editing, S.C.; resources, M.-S.C., S.-K.P. and J.-K.M.; supervision, B.-K.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2018R1D1A1B07048126 and NRF-2015R1C1A1A02036757).

Data Availability Statement

The original contribution presented in the study are publicly available.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Natarajan, S.; Luthria, D.; Bae, H.; Lakshman, D.; Mitra, A. Transgenic Soybeans and Soybean Protein Analysis: An Overview. J. Agric. Food Chem. 2013, 61, 11736–11743. [Google Scholar] [CrossRef] [PubMed]
  2. Kim, H.-K.; Kang, S.-T.; Choung, M.-G.; Jung, C.-S.; Oh, K.-W.; Baek, I.-Y.; Son, B.-G. Simple sequence repeat markers linked to quantitative trait loci controlling seed weight, protein and oil contents in soybean. J. Life Sci. 2006, 16, 949–954. [Google Scholar]
  3. Kim, H.; Kang, S. Identification of Quantitative Trait Loci (QTLs) Associated with Oil and Protein Contents in Soybean (Glycine max L.). J. Life Sci. 2004, 14, 453–458. [Google Scholar] [CrossRef] [Green Version]
  4. Wilcox, J.R.; Cavins, J.F. Backcrossing High Seed Protein to a Soybean Cultivar. Crop Sci. 1995, 35, 1036–1041. [Google Scholar] [CrossRef]
  5. Hu, G.; Chen, Q.; Liu, C.; Jiang, H.; Wang, J.; Qi, Z. Integration of major QTLs of important agronomic traits in soybean. In Soybean–Molecular Aspects of Breeding; Sudaric, A., Ed.; InTech: London, UK, 2011; pp. 81–118. [Google Scholar]
  6. Wilcox, J. Breeding soybeans for improved oil quantity and quality. In World Soybean Research Conference III: Proceedings; CRC Press: Boca Raton, FL, USA, 2022. [Google Scholar]
  7. Kuroda, Y.; Tomooka, N.; Kaga, A.; Wanigadeva, S.M.S.W.; Vaughan, D.A. Genetic diversity of wild soybean (Glycine soja Sieb. et Zucc.) and Japanese cultivated soybeans [G. max (L.) Merr.] based on microsatellite (SSR) analysis and the selection of a core collection. Genet. Resour. Crop Evol. 2009, 56, 1045–1055. [Google Scholar] [CrossRef]
  8. Lee, J.D.; Yu, J.-K.; Hwang, Y.-H.; Blake, S.; So, Y.-S.; Lee, G.-J.; Nguyen, H.T.; Shannon, J.G. Genetic diversity of wild soybean (Glycine soja Sieb. and Zucc.) accessions from South Korea and other countries. Crop Sci. 2008, 48, 606–616. [Google Scholar] [CrossRef]
  9. Kofsky, J.; Zhang, H.; Song, B.-H. The Untapped Genetic Reservoir: The Past, Current, and Future Applications of the Wild Soybean (Glycine soja). Front. Plant Sci. 2018, 9, 949. [Google Scholar] [CrossRef] [Green Version]
  10. Chen, Y.; Nelson, R. Genetic variation and relationships among cultivated, wild, and semiwild soybean. Crop Sci. 2004, 44, 316–325. [Google Scholar] [CrossRef] [Green Version]
  11. Schmutz, J.; Cannon, S.B.; Schlueter, J.; Ma, J.; Mitros, T.; Nelson, W.; Hyten, D.L.; Song, Q.; Thelen, J.J.; Cheng, J.; et al. Genome sequence of the palaeopolyploid soybean. Nature 2010, 463, 178–183. [Google Scholar] [CrossRef] [Green Version]
  12. Ha, J.; Abernathy, B.; Nelson, W.; Grant, D.; Wu, X.; Nguyen, H.T.; Stacey, G.; Yu, Y.; Wing, R.A.; Shoemaker, R.C.; et al. Integration of the draft sequence and physical map as a framework for genomic research in soybean (Glycine max (L.) Merr.) and wild soybean (Glycine soja Sieb. and Zucc.). G3 2012, 2, 321–329. [Google Scholar] [CrossRef] [Green Version]
  13. Hwang, E.-Y.; Song, Q.; Jia, G.; E Specht, J.; Hyten, D.L.; Costa, J.; Cregan, P.B. A genome-wide association study of seed protein and oil content in soybean. BMC Genom. 2014, 15, 1. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Vaughn, J.; Nelson, R.L.; Song, Q.; Cregan, P.B.; Li, Z. The Genetic Architecture of Seed Composition in Soybean Is Refined by Genome-Wide Association Scans Across Multiple Populations. G3 2014, 4, 2283–2294. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Bolon, Y.-T.; Joseph, B.; Cannon, S.B.; A Graham, M.; Diers, B.W.; Farmer, A.D.; May, G.D.; Muehlbauer, G.J.; Specht, J.E.; Tu, Z.J.; et al. Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean. BMC Plant Biol. 2010, 10, 41. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Wang, J.; Mao, L.; Zeng, Z.; Yu, X.; Lian, J.; Feng, J.; Yang, W.; An, J.; Wu, H.; Zhang, M.; et al. Genetic mapping high protein content QTL from soybean ‘Nanxiadou 25’and candidate gene analysis. BMC Plant Biol. 2021, 21, 1–13. [Google Scholar] [CrossRef]
  17. Lee, S.; Van, K.; Sung, M.; Nelson, R.; LaMantia, J.; McHale, L.K.; Mian, M.A.R. Genome-wide association study of seed protein, oil and amino acid contents in soybean from maturity groups I to IV. Theor. Appl. Genet. 2019, 132, 1639–1659. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Diers, B.W.; Keim, P.; Fehr, W.R.; Shoemaker, R.C. RFLP analysis of soybean seed protein and oil content. Theor. Appl. Genet. 1992, 83, 608–612. [Google Scholar] [CrossRef]
  19. Kim, M.; Schultz, S.; Nelson, R.L.; Diers, B.W. Identification and Fine Mapping of a Soybean Seed Protein QTL from PI 407788A on Chromosome 15. Crop Sci. 2016, 56, 219–225. [Google Scholar] [CrossRef]
  20. Fliege, C.E.; Ward, R.A.; Vogel, P.; Nguyen, H.; Quach, T.; Guo, M.; Viana, J.P.G.; dos Santos, L.B.; Specht, J.E.; Clemente, T.E.; et al. Fine mapping and cloning of the major seed protein quantitative trait loci on soybean chromosome 20. Plant J. 2022, 110, 114–128. [Google Scholar] [CrossRef]
  21. Fasoula, V.A.; Harris, D.K.; Boerma, H.R. Validation and Designation of Quantitative Trait Loci for Seed Protein, Seed Oil, and Seed Weight from Two Soybean Populations. Crop Sci. 2004, 44, 1218–1225. [Google Scholar] [CrossRef]
  22. Nichols, D.M.; Glover, K.D.; Carlson, S.R.; Specht, J.E.; Diers, B.W. Fine Mapping of a Seed Protein QTL on Soybean Linkage Group I and Its Correlated Effects on Agronomic Traits. Crop Sci. 2006, 46, 834–839. [Google Scholar] [CrossRef]
  23. Leamy, L.J.; Zhang, H.; Li, C.; Chen, C.Y.; Song, B.-H. A genome-wide association study of seed composition traits in wild soybean (Glycine soja). BMC Genom. 2017, 18, 18. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Specht, J.; Chase, K.; Macrander, M.; Graef, G.; Chung, J.; Markwell, J.; Germann, M.; Orf, J.; Lark, K. Soybean Response to Water: A QTL Analysis of Drought Tolerance. Crop Sci. 2001, 41, 493–509. [Google Scholar] [CrossRef]
  25. Boydak, E.; Alpaslan, M.; Hayta, M.; Gerçek, S.; Simsek, M. Seed Composition of Soybeans Grown in the Harran Region of Turkey As Affected by Row Spacing and Irrigation. J. Agric. Food Chem. 2002, 50, 4718–4720. [Google Scholar] [CrossRef]
  26. Carrera, C.; Martínez, M.J.; Dardanelli, J.; Balzarini, M. Water Deficit Effect on the Relationship between Temperature during the Seed Fill Period and Soybean Seed Oil and Protein Concentrations. Crop Sci. 2009, 49, 990–998. [Google Scholar] [CrossRef]
  27. Sebolt, A.M.; Shoemaker, R.C.; Diers, B.W. Analysis of a Quantitative Trait Locus Allele from Wild Soybean That Increases Seed Protein Concentration in Soybean. Crop Sci. 2000, 40, 1438–1444. [Google Scholar] [CrossRef]
  28. Zhou, Z.; Jiang, Y.; Wang, Z.; Gou, Z.; Lyu, J.; Li, W.; Yu, Y.; Shu, L.; Zhao, Y.; Ma, Y.; et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 2015, 33, 408–414. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Du, Y.; Luo, S.; Li, X.; Yang, J.; Cui, T.; Li, W.; Yu, L.; Feng, H.; Chen, Y.; Mu, J.; et al. Identification of Substitutions and Small Insertion-Deletions Induced by Carbon-Ion Beam Irradiation in Arabidopsis thaliana. Front. Plant Sci. 2017, 8, 1851. [Google Scholar] [CrossRef] [Green Version]
  30. Takahashi, R.; Benitez, E.R.; Oyoo, M.E.; Khan, N.A.; Komatsu, S. Nonsense Mutation of an MYB Transcription Factor Is Associated with Purple-Blue Flower Color in Soybean. J. Hered. 2011, 102, 458–463. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Park, K.; Kim, Y.H.; Lee, E.S.; Ha, K.S. A new soybean cultivar for fermented soyfood and tofu with high yield, “Daepung”. Korean J. Breed. 2005, 37, 111–112. [Google Scholar]
  32. Garcia, M.; Juhos, S.; Larsson, M.; Olason, P.I.; Martin, M.; Eisfeldt, J.; DiLorenzo, S.; Sandgren, J.; De Ståhl, T.D.; Ewels, P.; et al. Sarek: A portable workflow for whole-genome sequencin g analysis of germline and somatic variants. F1000Research 2020, 9, 63–414. [Google Scholar] [CrossRef]
Figure 1. Distribution of the seed protein and oil content in the F2:3 mapping population derived from a cross between Daepung and GWS-1887 in 2019.
Figure 1. Distribution of the seed protein and oil content in the F2:3 mapping population derived from a cross between Daepung and GWS-1887 in 2019.
Ijms 24 04077 g001
Figure 2. Distribution of the seed protein and oil content in the BC1F2:3 mapping population derived from a cross between Daepung and GWS-1887 in 2020.
Figure 2. Distribution of the seed protein and oil content in the BC1F2:3 mapping population derived from a cross between Daepung and GWS-1887 in 2020.
Ijms 24 04077 g002
Figure 3. Likelihood of odds (LOD) plot for the seed protein and oil content using an LOD threshold of 3.0 (the vertical dotted line). These QTLs were mapped in the F2:3 population derived from a cross between Daepung and GWS-1887.
Figure 3. Likelihood of odds (LOD) plot for the seed protein and oil content using an LOD threshold of 3.0 (the vertical dotted line). These QTLs were mapped in the F2:3 population derived from a cross between Daepung and GWS-1887.
Ijms 24 04077 g003
Figure 4. Likelihood of odds (LOD) plot for the seed protein and oil content using an LOD threshold of 3.0 (the vertical dotted line). These QTLs were mapped in the BC1F2:3 population derived from a cross between Daepung and GWS-1887.
Figure 4. Likelihood of odds (LOD) plot for the seed protein and oil content using an LOD threshold of 3.0 (the vertical dotted line). These QTLs were mapped in the BC1F2:3 population derived from a cross between Daepung and GWS-1887.
Ijms 24 04077 g004
Figure 5. InDels in the Glyma.20g088400 and Glyma.20g088000 genes from GWS-1887.
Figure 5. InDels in the Glyma.20g088400 and Glyma.20g088000 genes from GWS-1887.
Ijms 24 04077 g005
Figure 6. 3D prediction of the structure of the Glyma.20g088000 protein from GWS-1887.
Figure 6. 3D prediction of the structure of the Glyma.20g088000 protein from GWS-1887.
Ijms 24 04077 g006
Table 1. Seed protein and oil content of the F2:3 mapping population and its parental lines Daepung and GWS-1887 in 2019.
Table 1. Seed protein and oil content of the F2:3 mapping population and its parental lines Daepung and GWS-1887 in 2019.
DaepungGWS-1887Daepung × GWS-1887 F2:3
MinMaxAverageSkewKurt
Protein (%)37.1050.3740.0850.9645.520.080.04
Oil (%)19.815.837.8416.6111.590.151.04
Min, minimum; Max, maximum; Skew, skewness; Kur, Kurtosis.
Table 2. Seed protein and oil content of the BC1F2:3 mapping population and its parental lines Daepung and GWS-1887 in 2020.
Table 2. Seed protein and oil content of the BC1F2:3 mapping population and its parental lines Daepung and GWS-1887 in 2020.
DaepungGWS-1887Daepung × GWS-1887 BC1F2:3
MinMaxAverageSkewKurt
Protein (%)40.0549.2831.5049.5444.25−1.324.40
Oil (%)16.535.347.7314.8412.14−0.470.55
Min, minimum; Max, maximum; Skew, skewness; Kur, Kurtosis.
Table 3. Summary of the genetic linkage map for the F2 mapping population derived from a cross between Daepung and GWS-1887.
Table 3. Summary of the genetic linkage map for the F2 mapping population derived from a cross between Daepung and GWS-1887.
Chr.Marker No.Length (cM)Average Spacing (cM)Chr.Marker No.Length (cM)Average Spacing (cM)
1105129.01.211142154.21.1
2110168.41.512106134.31.3
3130146.71.113165161.11.0
492131.11.414138126.60.9
592126.81.415134141.51.1
6140173.71.216110113.71.0
7158141.50.917131138.61.1
8182181.61.018140136.91.0
9113133.61.219138151.01.1
10153158.21.020116148.91.3
Total25952897.41.1
Table 4. Summary of the genetic linkage map for the BC1F2 mapping population derived from a cross between Daepung and GWS-1887.
Table 4. Summary of the genetic linkage map for the BC1F2 mapping population derived from a cross between Daepung and GWS-1887.
Chr.Marker No.Length (cM)Average Spacing (cM)Chr.Marker No.Length (cM)Average Spacing (cM)
13851.91.4114948.71.0
25448.80.912---
37370.11.013112125.11.1
45970.01.2144036.50.9
57897.21.21590104.81.2
61110.81.0164345.61.1
72320.70.9173033.01.1
85757.01.0188170.60.9
94444.21.0193130.01.0
106768.61.02083101.81.2
Total10631135.41.1
Table 5. Effects of the SNP markers associated with seed protein and oil content in the F2:3 population.
Table 5. Effects of the SNP markers associated with seed protein and oil content in the F2:3 population.
SNPChr.LODHomo AAHomo BBR2 (%)
Protein
Gm20_29512680209.5744.5346.6817.2
Oil
Gm15_3621773155.8012.0510.8812.2
SNP, single nucleotide polymorphism; Chr, chromosome; LOD, likelihood of odds; Homo AA, Daepung allele; Homo BB, GWS-1887 allele.
Table 6. Effects of the SNP markers associated with seed protein and oil content in the BC1F2:3 population.
Table 6. Effects of the SNP markers associated with seed protein and oil content in the BC1F2:3 population.
SNPChr.LODHomo AAHomo BBR2 (%)
Protein
Gm20_27578013203.2942.6645.6315.8
Oil
Gm20_27578013203.0612.7011.4710.7
SNP, single nucleotide polymorphism; Chr, chromosome; LOD, likelihood of odds; Homo AA, Daepung allele; Homo BB, GWS-1887 allele.
Table 7. Markers and genotypes of BC1F4, two high-protein line selected from BC1F3 and advanced in generations.
Table 7. Markers and genotypes of BC1F4, two high-protein line selected from BC1F3 and advanced in generations.
Maker NameBC1F4 Population No.
141-1141-3141-5141-6141-7141-8141-10141-12141-14145-1145-3145-4145-5145-7145-9145-10145-11145-14145-15
BARC_1.01_Gm20_26500747A AAABAAAAABAAABABBB
BARC_1.01_Gm20_26785339AAAABAAAAABAAABABBB
BARC_1.01_Gm20_27578013AAAABAAAAABAAABABBB
BARC_1.01_Gm20_28017701AAAABAAAAABAAABABBB
BARC_1.01_Gm20_28070832AAAABAAAAABAAABABBB
BARC_1.01_Gm20_28389137AAAABAAAAABAAABABBB
BARC_1.01_Gm20_29512680AAAABAAAAABAAABABBB
BARC_1.01_Gm20_29594697AAAABAAAAABAAABABBB
BARC_1.01_Gm20_32603292AAAABAAAAABAAABABBB
BARC_1.01_Gm20_32784352AAAABAAAAAAAAAAAAAA
BARC_1.01_Gm20_33001386AAAABAAAAAAAAAAAAAA
BARC_1.01_Gm20_33049242AAAABAAAAAAAAAAAAAA
BARC_1.01_Gm20_33580029AAAAAAAAAAAAAAAAAAA
BARC_1.01_Gm20_33634187AAAAAAAAAAAAAAAAAAA
BARC_1.01_Gm20_33727273AAAAAAAAAAAAAAAAAAA
BARC_1.01_Gm20_33770596AAAAAAAAAAAAAAAAAAA
BARC_1.01_Gm20_33842096AAAAAAAAAAAAAAAAAAA
BARC_1.01_Gm20_34057184AAAAAAAAAAAAAAAAAAA
Protein content (%)44.4044.8145.4743.9047.5844.3243.3843.6042.9445.8750.1745.5545.6643.9247.4345.2448.6048.0448.48
‘A’ genotype designates that the selected BC1F4–derived line was homogeneous for the allele from Daepung, ‘B’ designates that the line was homogeneous for the allele from GWS-1887.
Table 8. Statistics for the high-quality reads from GWS-1887 mapped to the reference soybean genome.
Table 8. Statistics for the high-quality reads from GWS-1887 mapped to the reference soybean genome.
GWS-1887
Total reads260,761,086
Total size (bp)39,374,923,986
Mapped reads (%)98.8
Genome coverage (%)95.3
Sequencing depth38.6×
Number of SNPs4,750,431
Number of InDels897,005
Table 9. Candidate genes for seed protein content and InDels between the reference genome and GWS-1887.
Table 9. Candidate genes for seed protein content and InDels between the reference genome and GWS-1887.
GeneGene Position
(bp)
AnnotationPosition of Genetic Variation (bp)GWS-1887
Glyma.20G08245031,080,736..31,082,822Ammonium transporter in embryo development31082569small indel
Glyma.20G08400031,486,240..31,488,766Small nuclear ribonucleoprotein F31486478missing allele
31486484missing allele
31486498missing allele
31486737ref
31487037ref
31487188ref
31487573ref
31488329ref
31488590ref
Glyma.20G08405131,490,248..31,494,204Protein FAR1-RELATED SEQUENCE 5-like31490350small indel
31493215small indel
Glyma.20G08410031,538,100..31,541,615Tetratricopeptide repeat (TPR)-like superfamily protein31541401small indel
31541587small indel
Glyma.20G08450031,616,022..31,624,831Pre-mRNA-processing factor 17-like isoform X131624215small indel
Glyma.20G08510031,724,592..31,729,626CCT motif family protein31727019ref
Glyma.20G08525031,749,953..31,750,702 31750696small indel
Glyma.20G08530031,772,592..31,776,697Uncharacterized protein LOC100814338 isoform X331772869small indel
31775624small indel
31775992large indel
Glyma.20G08540031,778,024..31,782,50860S ribosomal L23-like protein31778202small indel
31778572small indel
31779221small indel
31779277small indel
31780211small indel
31780626small indel
31781580small indel
31782506small indel
Glyma.20G08545031,783,157..31,786,246Uncharacterized protein LOC10266981531783447large indel
31784986small indel
Glyma.20G08550031,799,904..31,802,040HAD superfamily31801435small indel
Glyma.20G08570031,931,514..31,933,982Unknown protein31931542small indel
31932782small indel
31933822small indel
Glyma.20G08580031,936,253..31,941,630Eukaryotic translation initiation factor 4E31940258small indel
31940854large indel
Glyma.20G08610031,963,336..31,965,943 31965594ref
Glyma.20G08680032,280,129..32,280,768PA domain32280386small indel
Glyma.20g08690032,344,475..32,346,588Aldehyde dehydrogenase32345329small indel
32345819ref
32346510small indel
Glyma.20g08700032,482,132..32,486,538Signal transduction histidine kinase32482188small indel
32482242small indel
32482250small indel
32482352small indel
32482410small indel
32482424small indel
32482464small indel
32482474small indel
32482480small indel
32482487small indel
32482549small indel
32482603small indel
32482697small indel
32482745small indel
32482761small indel
32482765small indel
32482811small indel
32482815small indel
32482850small indel
32482968small indel
32482972small indel
32482985small indel
32483011small indel
32483034small indel
32483036small indel
32483056small indel
32483091small indel
32483140small indel
32483171small indel
32483226small indel
32483233small indel
32483237small indel
32483278small indel
32483319small indel
32483325small indel
32483396small indel
32483411large indel
32483458small indel
32483552small indel
32485633small indel
32486435small indel
Glyma.20G08760032,693,094..32,698,696Target SNARE coiled-coil domain protein32694336small indel
32696653small indel
Glyma.20G08800032,881,065..32,886,678S-adenosyl-l-methionine-dependent methyltransferases32881614small indel
32882108small indel
32882111large indel
32882633small indel
32882909small indel
32883949ref
32883973small indel
32884174small indel
32884324small indel
32884601small indel
32885578ref
32885582small indel
32885642ref
32885647ref
32885839ref
32885841small indel
32885843small indel
32885941small indel
32886302large indel
32886312small indel
32886569large indel
Glyma.20G08810032,887,478..32,888,573(S)-coclaurine N-methyltransferase-like32887547small indel
32887824ref
Glyma.20G08840032,909,361..32,910,905Oxidoreductase32910800small indel
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, W.J.; Kang, B.H.; Moon, C.Y.; Kang, S.; Shin, S.; Chowdhury, S.; Choi, M.-S.; Park, S.-K.; Moon, J.-K.; Ha, B.-K. Quantitative Trait Loci (QTL) Analysis of Seed Protein and Oil Content in Wild Soybean (Glycine soja). Int. J. Mol. Sci. 2023, 24, 4077. https://doi.org/10.3390/ijms24044077

AMA Style

Kim WJ, Kang BH, Moon CY, Kang S, Shin S, Chowdhury S, Choi M-S, Park S-K, Moon J-K, Ha B-K. Quantitative Trait Loci (QTL) Analysis of Seed Protein and Oil Content in Wild Soybean (Glycine soja). International Journal of Molecular Sciences. 2023; 24(4):4077. https://doi.org/10.3390/ijms24044077

Chicago/Turabian Style

Kim, Woon Ji, Byeong Hee Kang, Chang Yeok Moon, Sehee Kang, Seoyoung Shin, Sreeparna Chowdhury, Man-Soo Choi, Soo-Kwon Park, Jung-Kyung Moon, and Bo-Keun Ha. 2023. "Quantitative Trait Loci (QTL) Analysis of Seed Protein and Oil Content in Wild Soybean (Glycine soja)" International Journal of Molecular Sciences 24, no. 4: 4077. https://doi.org/10.3390/ijms24044077

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop