Next Article in Journal
Recent Insights into the Physio-Biochemical and Molecular Mechanisms of Low Temperature Stress in Tomato
Previous Article in Journal
Breeding Rice to Increase Anthocyanin Yield Per Area through Small, Black Grain Size and Three Grains per Spikelet
Previous Article in Special Issue
Genome-Wide Identification of the Soybean AlkB Homologue Gene Family and Functional Characterization of GmALKBH10Bs as RNA m6A Demethylases and Expression Patterns under Abiotic Stress
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Harnessing Multi-Omics Strategies and Bioinformatics Innovations for Advancing Soybean Improvement: A Comprehensive Review

by
Siwar Haidar
1,2,
Julia Hooker
1,2,†,
Simon Lackey
1,2,†,
Mohamad Elian
1,2,†,
Nathalie Puchacz
1,
Krzysztof Szczyglowski
3,
Frédéric Marsolais
3,
Ashkan Golshani
2,
Elroy R. Cober
1 and
Bahram Samanfar
1,2,*
1
Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON K1A 0C6, Canada
2
Department of Biology, Ottawa Institute of Systems Biology, Carleton University, Ottawa, ON K1S 5B6, Canada
3
Agriculture and Agri-Food Canada, London Research and Development Centre, London, ON N5V 4T3, Canada
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Plants 2024, 13(19), 2714; https://doi.org/10.3390/plants13192714
Submission received: 1 September 2024 / Revised: 26 September 2024 / Accepted: 26 September 2024 / Published: 28 September 2024

Abstract

:
Soybean improvement has entered a new era with the advent of multi-omics strategies and bioinformatics innovations, enabling more precise and efficient breeding practices. This comprehensive review examines the application of multi-omics approaches in soybean—encompassing genomics, transcriptomics, proteomics, metabolomics, epigenomics, and phenomics. We first explore pre-breeding and genomic selection as tools that have laid the groundwork for advanced trait improvement. Subsequently, we dig into the specific contributions of each -omics field, highlighting how bioinformatics tools and resources have facilitated the generation and integration of multifaceted data. The review emphasizes the power of integrating multi-omics datasets to elucidate complex traits and drive the development of superior soybean cultivars. Emerging trends, including novel computational techniques and high-throughput technologies, are discussed in the context of their potential to revolutionize soybean breeding. Finally, we address the challenges associated with multi-omics integration and propose future directions to overcome these hurdles, aiming to accelerate the pace of soybean improvement. This review serves as a crucial resource for researchers and breeders seeking to leverage multi-omics strategies for enhanced soybean productivity and resilience.

1. Introduction

Soybean (Glycine max (L.) Merr.) is a highly important agricultural crop worldwide with vast implications in human diet, animal feed, and sustainable agricultural practices. Through a symbiotic relationship with soil bacteria, soybean has the capacity to fix inert atmospheric nitrogen into more biologically available nitrogen compounds. Incorporating soybeans into crop rotations reduces the need for nitrogen fertilizers, which minimizes the release of nitrogen compounds as environmental pollutants. With high seed oil and protein, soybean is used in a wide range of consumer-end products and animal feed. With some of the highest seed protein content among legumes, soybean is one of the most widely consumed crops in the world due to its dense nutritional value. In fact, much evidence suggests that expansion of soybean production is an excellent avenue to address the caloric and protein requirements of a growing population [1].
Soybean research supports efforts to navigate changing and unpredictable climates, as well as emerging and spreading agricultural pests and disease, and the needs of a growing population. One area of study is driving soybean agriculture to more northern locations in Canada, China, and the United States [2]. With widespread global interest in soybean agriculture, a suite of ever-growing resources for soybean improvement and breeding strategies has become invaluable to the soybean research community.
Historical improvement of the soybean has relied largely on the use of classical genetics, phenotypic selection, and the pedigree method. Large segregating populations are grown and observed for phenotypic variation from which individuals that display traits conferring desirable agronomic and seed quality characteristics are selected to continue in subsequent generations, until segregation is limited and most traits are considered fixed, usually in four or five generations. At this point, and once adequate seed is available, cultivars can be bulked for testing at a larger scale, where agronomic performance, especially yield, as well as resistance to abiotic and biotic stressors, can be assessed in multiple environments. Only the highest performers are selected for release to growers, representing a tiny fraction of the original segregating population. This process, from the original hybridization to cultivar release, can take up to 10 years or more, unless it is accelerated by costly greenhouse generational advancement or winter nurseries in the opposite hemisphere.
Multi-omics refers to the integrated analysis of data from multiple “omics” disciplines to obtain a comprehensive view of biological processes. In recent years, multi-omics approaches coupled with bioinformatics tools have emerged as powerful strategies for enhancing soybean improvement efforts [3]. These integrated methodologies offer a comprehensive understanding of the complex molecular networks underlying various traits of interest in soybean, including yield, seed quality, and abiotic and biotic stress tolerance [4]. By simultaneously analyzing multiple layers of biological information such as genomics, transcriptomics, proteomics, metabolomics, epigenomics, and phenomics, researchers can unravel the complex interactions between genes, proteins, and metabolites, thereby providing valuable insights into the mechanisms governing soybean traits [5].
While the application of multi-omics approaches in soybean research has accumulated significant attention, similar efforts have been undertaken in other important crops, including maize [6], rice [7], and wheat [8] as well as fruit crops like tomato [9]. Recent reviews have highlighted the utility of multi-omics techniques in elucidating the genetic and molecular basis of key agronomic traits in these crops, paving the way for targeted breeding strategies [10,11,12]. These reviews underscore the broad applicability of multi-omics approaches across diverse plant species and emphasize their potential to revolutionize agricultural practices worldwide.
In light of these biotechnological advancements, the objective here is to delve into the latest developments in multi-omics techniques and bioinformatics applications specifically for soybean research. By reviewing recent studies and discussing the potential implications for soybean breeding programs, we aim to provide insights into how multi-omics approaches can contribute to the accelerated development of agronomically enhanced soybean varieties. Through a comprehensive exploration of an array of different omics approaches and powerful data processing tools we provide a review on the current state of multi-omics and bioinformatics in soybean biology.

2. Pre-Breeding and Genomic Selection

Efforts must be made early in the pipeline to streamline the process and guarantee a greater chance of success. One way plant breeders are tackling this long timeline is to carry out pre-breeding or germplasm development. Pre-breeding is the process of introgressing desirable alleles from a source with characteristics that often make it unsuitable for commercial release into a background that can then be used as a suitable donor into more commercially acceptable cultivars [13]. The initial cross between the exotic donor and the intermediary is never intended for commercial release; rather, it acts as a stepping stone from which the trait can be transitioned into multiple backgrounds depending on the other traits the breeder is trying to capture [14]. By developing these intermediaries in advance, breeders are ensuring that a larger proportion of the early generation material will be suitable for selection, reducing the time and effort required between the cross and the cultivars release.
Genomic selection employs and builds on the key aspects of phenotypic selection while adding genotypic information to assist the breeder in selecting high-performing individuals [15]. The basics of this selection method are fairly straightforward: populations are divided into a training set and a testing (breeding) set. Both sets are genotyped, after which only the training set is grown and phenotyped. The resulting phenotypic data are then used in conjunction with the genotypic data to develop the genomic selection model, to be used in predicting the phenotypic traits of the testing set, called genomic estimated breeding values. The breeder then selects lines observed and/or predicted to be the highest performers. Importantly, genomic selection has been shown to improve genetic gain in breeding populations while reducing loss of potentially important genetic diversity [16].
Marker-assisted selection (MAS) is an important tool contributing to the success of both pre-breeding and genomic selection strategies [17]. MAS allows breeders to selectively screen parental lines for one or a few desirable alleles, making choices at the outset that guarantee a larger number of desirable offspring regardless of selection method employed. Several techniques exist for MAS, such as CAPS/dCAPS, SSR, and SNP based KASP [18]. Cleaved amplified polymorphic sequencing (CAPS) and derived cleaved amplified sequencing (dCAPS) rely on gene-specific DNA amplification with or without inclusion of the target SNP in the primer sequence, which then can produce diagnostic bands after restriction enzyme digestion of varying size on an agarose gel depending on the alleles present [19,20]. Simple sequence repeat (SSR), sometimes referred to as microsatellite markers, is another PCR-based approach in which varying numbers of repeated sequences are diagnostic for the allele present, with results again read from a gel [21]. The Kompetitive Allele Specific PCR (KASP) assay is diagnostic for the SNP present in a DNA sample, with primers specific to the gene and allele suspected to be present in the sample. KASP is significantly less ambiguous than CAPS/dCAPS and SSR markers and does not require any sort of gel electrophoresis to make genotype calls. KASP is seen as a more modern technique for MAS due to its scalability, repeatability, and specificity [22].

3. Multi-Omics Approaches and Bioinformatics Tools and Resources in Soybean Research

3.1. Genomics

Genomics is the study of genes within an organism and the ways those genes function and interact, as well as how those genes behave, function, and interact under different environmental conditions [23]. Cells within an organism, like soybean, for example, all contain a complete copy of the organism’s genome, but carry out a vast array of functions and interactions that allow the organism to thrive in the environment for which it is adapted. Studying and understanding the genomics of an organism helps to explain basic biological functions, and allows for improved understanding of how genetics is able to fine-tune and control the organism’s phenotype [24].
The study of soybean genomics has the potential to greatly improve our understanding of soybean biology, especially as it relates to agronomic performance and farmgate profitability. The soybean genome is an ancient paleopolyploid, first sequenced by Schmutz et al. [25], revealing 1.1 giga base pairs with a predicted 46,000 protein coding regions. More recently, reference grade genome assemblies have been reported by Song et al. [26] and Valliyodan [27], who chose to improve on the originally published reference assembly of the Williams 82 cultivar using updated techniques and technologies. Furthermore, Valliyodan et al. [27] reported a reference grade assembly of the Southern US variety Lee, Shen et al. [28] published a complete genome of the Chinese cultivar Zhonghuang 13, and Xie et al. [29] reported a reference genome of Glycine soja W05, a representative of the soybean’s wild crop relative. Having quality reference genomes, with the ability to choose the one most appropriate for the study being carried out, is fundamental to the advancement of the field of soybean genomics.
Reference grade genomes are essential, but the assembly of pan-genomes allows for a more critical look at the genomics of the soybean, and the differences between groups of individuals and their conserved and dispensable genomic regions [30]. The concept of pan-genome was intended to better reveal and describe the totality of the genome of a species, rather than the genome of a single individual. Several key pan-genomes have been assembled and reported for the soybean. Torkamaneh [31] assembled a pan-genome of only cultivated soybean, from diverse geographies and genetic backgrounds, revealing that more than 90% of the genome is conserved and demonstrating that the genomic regions available for genetic gains in adapted cultivars is highly restricted. Bayer et al. [32] and Liu et al. [33] assembled pan-genomes that included a greater diversity of individuals. One approach, used by Bayer et al. [32], collected over 1100 genome sequences to assemble the pan-genome, showing that significant loss of allelic diversity occurred during domestication, and that only a small number of founders has contributed to most of the genetic variability employed in today’s commercial cultivars. In contrast, Liu et al. [33] employed phylogenetic analysis to select 26 individuals for a pan-genome, capturing wild, landrace, and commercial cultivars. The three studies outlined above all agree that the soybean has a highly conserved genome between individuals, that genetic bottlenecks of domestication represent a major loss of potentially economically important alleles, and that pan-genomes should be employed to focus efforts on specific genomic regions that are highly variable for crop improvement provided that they determine useful trait variation.
Next-Generation Sequencing (NGS) methods have allowed for a vast increase in the amount of genomic data available for soybean research [34]. This increase in available data heightens the demand for tools that can process and annotate extensive amounts of information more quickly [35]. New bioinformatics tools have been developed to fill this processing need. The Extensive De novo Transposable elements Annotator (EDTA), for example, is a comprehensive pipeline used to simplify the process of creating a transposable element library for annotation [36]. This was accomplished by benchmarking many of the most commonly used programs using various metrics and incorporating the best performing options into the pipeline [36]. Bioinformatics tools are also used to identify proteins and sequences responsible for certain regulatory functions. iTAK is a computational program designed to identify transcription factors (TF), transcription regulators (TR), and protein kinases (PK) from a given sequence. This was achieved by comparing TF/TR classification rules in various databases and deriving a consensus of rules that can be applied based on existing literature [37]. The identification of structural rearrangements (inversion, duplications) and mutations (SNP, Indels) is also vital for understanding gene function [38]. The Synteny and Rearrangement Identifier (SyRI) tool was developed to identify these variations and structural rearrangements within all syntenic and structurally rearranged regions of related genomes [39]. Other tools such as MCScanX [40] and i-ADHoRe [41] also provide a means for synteny detection, visualization, and comparison.
Pinpointing SNPs associated with phenotypic changes in a genome as large as the soybean’s requires the use of large and diverse collections of individuals along with mathematical and statistical techniques. Genome-wide association studies (GWAS) use a large panel of individuals that represent the diversity employed by plant breeders in their crop improvement efforts, as well as wild crop relatives and historical landraces [42,43]. GWAS denotes SNPs that are significantly associated with a phenotype and have been widely used to suggest genomic regions and hotspots. The diversity of phenotypes studied using GWAS points toward its acceptance in the field of genomics and its applicability for crop improvement: resistance to soybean mosaic virus [44], primary root length [45], photosynthesis [46], protein content [47], flowering time [48], mineral element uptake [49], stem pushing and lodging resistance [50], and tolerance to cold imbibition stress [51], to name a few.
Unlike GWAS, genomic selection is an applied genomics technique that feeds directly into the plant breeding pipeline by reducing the effort required for variety development and release. While this technique is being applied in increasing frequency, studies reporting its use are still somewhat limited. Miller et al. [52] used large breeding populations tested in multiple environments with relatively low genotyping density, revealing that things like population relatedness, marker density, and genomic models are all factors in the success or failure of the technique. Kaler et al. [53] compared the ratio of training to testing set populations, finding that it was more important to choose more significant SNPs in genotyping rather than altering the ratio of training to testing sets. Bandillo et al. [54] assessed whether genomic selection was useful in improving yield, using seven years of yield data to show that it was just as effective as standard breeding at phenotypic selection while reducing the overall number of lines needing to be grown each year.
Genomics provide a detailed understanding of the genetic architecture and functional mechanisms of soybeans. The advancements in soybean genomics, from the initial sequencing to the development of reference-grade assemblies and pan-genomes, have significantly enhanced our knowledge of soybean biology and its potential for crop improvement. Techniques such as NGS generate the data used in approaches like GWAS and genomic selection, which are crucial for identifying key genetic variations, understanding complex trait inheritance, and improving breeding efficiency. As research progresses, the integration of advanced bioinformatics tools with multi-omics approaches will continue to help contribute to sustainable agricultural practices and food security.

3.2. Transcriptomics

Development of high-throughput sequencing technologies has brought about a new era of molecular biology. No longer constrained to the confines of real-time quantitative PCR (RT-qPCR) and Sanger sequencing, high-throughput sequencing has released a flood of information that provides a snapshot of a sample under any given condition. This snapshot of transcript information allows researchers a peek into the ongoing biology in a sample at any given time. In combination with whole genome sequence assembly, reference genomes support transcriptomic analysis, offering a universal reference for researchers to compare gene expression data. However, the value of transcriptomics extends beyond reference genomes; RNA sequencing (RNA-seq) reads can be assembled de novo should a reference genome be unavailable or inappropriate for the given study. This circumvents the challenge of using microarray chips, which require known sequence information for chip development.
Transcriptomics opens a world of possibilities for pairwise comparative studies, known as differential expression (DE) analysis. Tools such as DESeq2 [55] and edgeR [56] are useful for calculating DE from RNA-seq data. DE analysis quantifies the relative expression levels of active genes between two groups, such as different tissue types of one sample, identical samples under different treatments, or different samples under the same treatment, among other possible comparisons.
RNA-seq data can be entirely vast, but purposeful in the right hands. One RNA-seq dataset can be used to investigate the data holistically, or be sliced into genes/gene families of interest for a targeted approach. The development of user-friendly RNA-seq atlases puts the power of RNA-seq in the hands of any user, bypassing any requirement for plant growth and wet-lab work (assuming the data are suitable for the research question). This means that a near-infinite number of RNA-seq and DE studies can be investigated at a minimal cost. Users can download RNA-seq data to carry out pairwise DE analyses tailored to their own research interests.
Severin et al. [5] developed a tissue-specific and temporally variable RNA-seq database, publicly available through SoyBase (https://soybase.org/soyseq/ (accessed on 24 September 2024) [5,57]). The RNA-seq atlas allows soybean researchers to search candidate genes and receive corresponding transcriptomic data across all tissues and stages of development. More recently, Almeida-Silva et al. [58] developed the Soybean Expression Atlas v2, a publicly available RNA-seq database spanning transcript- (measures the expression of individual RNA variants) and gene-level expression data (combines the expression of all variants of a gene) for 5481 soybean samples [58].
Liu et al. [59] integrated single-cell RNA-seq (scRNA-seq) and spatial transcriptomics to produce a cell atlas across multiple histological features of a soybean root nodule when inoculated with rhizobia [59]. The cell atlas allows the researchers to identify expression profiles of cell types within different parts of the root and nodule and use this information to identify rare cell subtypes that play key roles in maturation and function of the nodule. This study also uncovered functionally distinct and transitional cellular subgrouping between infected and uninfected cells. This cell atlas is publicly available (https://zhailab.bio.sustech.edu.cn/single_cell_soybean (accessed on 24 September 2024), [59]) to facilitate soybean root and nodule research through cell type-specific transcriptomics.
Further, RNA-seq data and phenotype data can be computationally combined using Weighted Gene Co-expression Network Analysis (WGCNA), a novel analysis technique that groups samples by phenotype and uses the corresponding RNA-seq data to identify co-expressed gene networks that differ between phenotypic groups. Typical network analyses assess co-expressed genes regardless of phenotypic trait status; WGCNA differs from trait network analysis by sub-setting the expression data by phenotypic groups to look for gene networks that differ between groups, which is important for identifying trait-associated gene networks [60]. WGCNA has been used in soybean pathology; in a recent study on plant–pathogen interaction between G. max and soybean mosaic virus, hub genes and key gene regulatory pathways involved in pathogen response were identified [61]. WGCNA has also been used to identify genetic factors that have implications on embryo size and hormone signaling pathways [62]. More recently, an advanced multi-WGCNA package has been released, which allowed for trait-specific WGCNA using more than one phenotypic trait [63], which has valuable uses in complex transcriptomic experiments.
In a bottom-up or targeted approach, gene family analysis can be conducted using RNA-seq data, providing direct information on the transcriptional activity of a family of sequentially or functionally related genes across all given samples. As an example, an investigation into the Glycine max Sugars Will Eventually be Exported Transporter (GmSWEET) family was carried out to expand on the recently characterized seed protein QTL identified as gmsweet39 [64] to be positively correlated with seed oil content [65]. Using transcriptomic analysis of 10 soybean lines over four years, this study assessed expression variability across all known GmSWEET genes between soybeans grown in two geographically distinct regions of Canada [66]. Interestingly, a difference in gmsweet29 and paralog gmsweet34 expression pointed to a putative link to seed protein content differences observed between locations [66]. In a study on the environmental effects on seed protein in soybean, large-scale transcriptomics was used for pathway mapping of DE genes involved in the alanine–aspartate–glutamate metabolism pathway. This study identified an important difference in expression of genes underlying asparagine metabolism in soybeans grown in cooler, drier climates, and this likely influences seed protein content [67,68].
In another study, Lopes-Caitar et al. [69] conducted an in silico investigation of the Hsp20 genes in soybean in response to temperature and nematode infection. A combination of multiple soybean expression databanks were then used to obtain expression data in silico across all the identified Hsp20 genes, including the SoyBase RNA-seq Atlas [5]. In addition to identification of groups of Hsp20 genes induced under temperature stresses (hot and cold), this study identified a conserved promoter sequence unique to nematode stress response-Hsp20 genes [69].
Undoubtedly, transcriptomics has had a profound impact on accelerating soybean research, among many other fields of research, due to its widespread application. Here we provide an overview of novel uses of transcriptomics in soybean research, paving an ever-expanding path for the usage of RNA-seq tools and techniques.

3.3. Proteomics

Proteins are the foundation of biological processes and are fundamental in understanding soybean biology. Proteomics offers a detailed evaluation of all the proteins that are present at a particular time. The evolution of proteomics (large-scale study of proteins, particularly their structures, functions, and interactions within a biological system) has greatly aided soybean research, especially regarding growth, stress responses, and nodule development and functioning [4]. Soybean proteome studies initially depended on 2D gel electrophoresis, prior to the revolutionary impact of liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) on high-throughput proteomics, which improved both efficiency and accuracy [70]. Studies by Hajduch et al. [71] constructed high-resolution proteome maps of soybean seed filling, which set a guide for subsequent proteomics advancements [71].
Subsequent studies, like Afroz et al. [72], have explored tissue-specific analyses in soybean seedlings, revealing key proteins in various organs such as leaves, hypocotyls, and roots [72]. Analyses of root hairs and developing seeds using isobaric tags for relative and absolute quantitation (iTRAQ) have identified specific proteins associated with root hair and seed development [73,74]. Proteomics has also been employed to study enzyme expression and the regulatory mechanisms involved in their accumulation during seed storage [75]. For instance, Xu et al. [76] used proteomics to compare protein expression between high-oil and high-protein soybean cultivars, shedding light on differences in oil synthesis [76]. It was found that when GmDGAT1-2 was overexpressed, there was a significant alteration of total fatty acid content through an upregulation of oleosin and a downregulation of the enzyme lipoxygenase [77].
Proteomics has revealed critical proteins involved in stress responses. Studies like those by Xu et al. [77] and Wang et al. [78,79] identified tissue-specific abiotic stress responses in soybean, highlighting the mechanisms at play. The studies utilized advanced proteomic techniques such as LC-MS/MS to identify and analyze the proteins involved in stress responses in soybean. In 2017, Wang et al. [79] conducted proteomic analyses focused on specific tissues of soybean seedlings exposed to flooding condition [79]. In another study, they documented the stress responses of young plants and seedlings exposed to combined stresses in a tissue-specific manner and identified sensitive tissues based on protein profiles during different developmental stages [80]. In 2021, using quantitative proteomics, Wang et al. [78] conveyed the effects of calcium on soybean radicle protrusion during germination using LC-MS/MS for proteomic analysis to identify and quantify protein expression changes in response to varying calcium levels. Their results highlighted that low levels of calcium promoted radicle protrusion, whereas higher levels of calcium suppressed it [78]. Aside from abiotic stress responses, Yadav and Singh [81] discuss how soybean plants change their protein expression in response to infestation by Spodoptera litura (a common cutworm) [81]. Using LC-MS, they found that 390 differentially abundant proteins are involved in the response to infestation. These proteins are involved in secondary metabolism, reactive oxygen homeostasis, and signaling pathways and could be possible candidates for further analysis [81].
Moreover, recent quantitative proteomic analyses have revealed insights into soybean seed traits under various stressors, paving the way for targeted breeding efforts. Islam et al. [82] carried out a quantitative proteomic analysis comparing low-linolenic-acid transgenic soybean seeds silenced for the GmFAD3 fatty acid desaturase gene with control seeds. They found disruptions in proteins involved in fatty acid metabolic pathways, noting a decreased abundance of proteins related to fatty acid initiation, elongation, desaturation, and the β-oxidation of α-linolenic acids [82]. Wei et al. [83] studied the impact of temperature and humidity stress on the vigor of soybean cotyledons, embryos, leaves, and pods using a combination of quantitative proteomics with physiological data [83].
Given that proteins are responsible for many cellular functions, there is a lot of unique information that can be gleaned from understanding a plant’s proteome [84]. Bioinformatics plays a vital role in developing novel analysis strategies for the ever-increasing amount of proteomics data [85]. Data-independent acquisition (DIA) is an emerging mass-spectrometry-based technique for collecting proteomics data. However, identifying peptides directly from DIA is difficult as the data are often highly multiplexed. PECAN is a free library tool developed to accurately detect peptides directly from DIA data based on ion product scoring [86]. Bioinformaticians have also developed platforms dedicated to bridging the gap between informatics and biological researchers. For example, Perseus is a user-friendly, computational platform dedicated to the analysis of proteomic data. This platform provides data including protein expression, post-translational modifications, interaction proteomics, and time-series analysis [87]. Traditionally, protein quantification is primarily achieved using isotope-based labeling methods. The label preparation requires several preparation steps; it is therefore of interest to develop methods to quantify proteins without prior labeling. MaxLFQ was developed as a label-free protein quantification tool. It was implemented into the MaxQuant platform and uses a novel protein identification approach to maximize the ratio information from peptide signals, thus increasing the accuracy of the quantification [88].
While soybean proteome research lags behind other crops, it serves as a foundational platform for functional genomics studies. Advancing soybean research could be achieved through the development of a reference map for the soybean proteome [4]. Proteomics data can aid in identifying new proteins, analyzing the expression patterns of their corresponding genes, and facilitating their molecular cloning. The integration of proteomics with other -omics data holds promise for identifying elite alleles and developing molecular markers, crucial for soybean molecular breeding. Despite challenges in proteomics, analytical techniques continue to progress, offering greater insights into protein-specific influences on cellular processes. Nevertheless, understanding the molecular mechanisms underlying soybean traits requires efficient, high-throughput techniques for analyzing protein expression and interactions, driving soybean research into new areas of research [89].

3.4. Metabolomics

Metabolomics is an emerging field within systems biology, and it offers a novel approach to analyzing small metabolites in plant tissues [90]. Techniques employed for quantifying plant metabolites include thin-layer chromatography (TLC), liquid chromatography–electrochemistry–mass spectrometry (LC–EC–MS), gas/liquid chromatography–mass spectrometry (GC/LC–MS), nuclear magnetic resonance (NMR) spectroscopy, Fourier transform infrared (FT–IR) spectroscopy, direct infusion mass spectrometry (DIMS), and capillary electrophoresis–LC–MS; with LC–MS, GC–MS, NMR, and capillary electrophoresis MS being the most prevalent in plant studies [12,91]. These techniques allow for direct links between metabolites and observable traits, as metabolites can affect both gene transcription and protein expression [92]. By revealing metabolic fingerprints through metabolite detection and bioinformatics analysis, researchers gain insights into plant metabolism, shedding light on crucial pathways like the tricarboxylic acid cycle and glycolysis [93]. Additionally, single-cell mass spectrometry offers insights into cellular physiology and hidden phenotypes by allowing the analysis of gene expression, protein function, and metabolite levels in individual cells [94]. Its application, although still evolving, holds potential for deciphering cellular mechanisms in both biomedical and environmental contexts [95].
Researchers have examined how seed dry weight, seed coat color, and maturity impact metabolite levels. For example, studies on black soybean seeds have explored how metabolite levels vary at different maturity stages, revealing that many metabolites change as seeds mature and that isoflavone content is closely linked to seed maturity [96]. Furthermore, by manipulating specific metabolic pathways, it was possible to increase the nutritional value of genetically modified soybeans by boosting isoflavone accumulation in developing seeds [96,97]. Metabolomics also aids in understanding how soybeans respond to abiotic stress, highlighting the role of certain metabolites, like glycine and proline, as osmoprotectants [98]. Moreover, it identifies metabolic markers indicative of drought stress, offering potential applications in crop improvement and breeding by providing insights into the metabolites involved, thus providing new opportunities to increase crop yields [98].
In soybean, metabolomics has revealed 169 metabolites crucial for seed development, including those involved in amino acid biosynthesis, antioxidant utilization, and lipid oxidation [99]. Additionally, there were notable differences in metabolite levels and distinct correlations between metabolites across various soybean cultivars under different shade/light conditions. The isoflavone profiles of soybean germplasm underscored the wide range of isoflavones found in soybeans [100], and various aglycones were linked to differing degrees of shade tolerance in seedlings [101]. Moreover, an NMR-based approach to observe the influence of silver nanoparticles on the metabolites of transgenic soybean varieties was performed, and it was conveyed that silver application does not play a role in secondary metabolites [102]. In a targeted metabolomics approach, Nguyen et al. (2024) revealed the metabolomes of 64 seed lines of G. soja [103]. In a more untargeted metabolomics approach, the differences between high- and low-production poly(γ-glutamic acid) was observed, and it was found that among thousands of differentially expressed metabolites, 257 were significantly either upregulated or downregulated [104].
Despite these advancements, challenges persist in metabolomics, including the complexity of plant metabolic pathways and the need for integration with other approaches [105]. While metabolomics has accelerated our understanding of plant metabolism and genetic architecture, gaps remain between research findings and practical applications. Efforts to resolve methodological issues and integrate metabolomics with other disciplines hold promise for future breakthroughs [4].

3.5. Epigenomics

Epigenetics refers to heritable changes in gene expression that do not alter the underlying genetic sequence. Chromatin conformation and DNA accessibility are modulated through histone modifications, and DNA markers such as methylation and acetylation. The collection of epigenetic variations across the genome is referred to as the epigenome, and the soybean epigenome is a unique landscape due to its paleopolyploid history [25,106,107]. The major contribution of epigenomic studies to understand soybean biology has been the comprehensive mappings of the epigenetic repatterning within its genomic landscape. Studies have consistently found that not only do differentially methylated regions exist within the soybean genome, but these regions also correlate with differential expression patterns. Pericentromeric and centromeric genomic regions are found hypermethylated at a higher rate than non-pericentromeric regions, and they have lower gene densities and decreased transcription levels [107,108,109,110]. A significant portion of the soybean genome is composed of hypomethylated regions termed DNA methylation valleys (DMVs) and has been found to be enriched with transcription factor genes [109]. Transposable elements are preferentially methylated in all three contexts (pericentromeric, centromeric, and non-pericentromeric regions) compared to protein coding genes and, consequently, have decreased expression levels [107,108,109]. The evolutionary development of the intrinsically linked genome and epigenome has dictated the genomic architecture and dynamic regulatory responses that modern soybean uses to modulate its phenotypic responses, providing the plants with the ability to grow, adapt, and survive.
Epigenetic regulation is implicated in the facilitation of various physiological processes in soybean, with dynamic changes in marker levels documented during seed maturation, germination, root nodulation, and flowering [108,109,111,112,113]. Using whole-genome bisulfite sequencing (WGBS), it has been observed that DNA methylation levels change significantly in cotyledons as they mature, with transcribed genes approximately two-fold more likely to be differentially methylated than non-transcribed genes [108]. In addition to DNA methylation, fluctuations in histone modifications appear to be an integral part of the gene regulatory network underlying seed maturation. Chromatin immunoprecipitation (ChIP) assays found genes located in DMVs enriched in histone marks on several lysine residues at all seed and germination stages, with no significant enrichment of the histone marks found at any developmental stage [109]. Importantly, the proportion of histone marks on DMV genes varied throughout development and in parallel with gene expression levels.
Modulation of epigenetic marks have also been shown to play a role in soybean’s response to various stressors [114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129]. Near-inbred lines (NILs) derived from resistant and susceptible soybean lines were found to have significant differences in their methylome profiles, both before and following infection with soybean cyst nematode (SCN) [124]. In response to SCN infection, both NILs exhibited an induction of genome-wide DNA hypermethylation of miRNA genes, but with susceptible lines having a much higher occurrence of differentially methylated miRNAs. Consistent trends have also been observed between differential N6-methyladenosine (m6A) modifications and the pattern of transcriptional responses initiated by the soybean when exposed to stressors such as Meloidogyne incognita infection or heavy metals such as lead or cadmium; all showed dynamic redistribution of m6A modifications across transcripts, and generally genes with m6A modification had higher expression levels than those without this modification [118,119,129]. Conjoint analysis of differentially methylated genes (DMGs) with DEGs in each of these studies provided candidates that may play important roles in modulating these responses. In addition to helping plants overcome environmental perturbations, soybeans pre-exposed or “primed” to stressors have shown more prompt responses, long-term increased resistance, and decreased functional consequences, as well as epigenetic memory (the ability to pass favorable phenotypic responses to their progeny) [125,128,130,131,132].
An epigenetic regulatory mechanism for conferring salt tolerance in soybean has been elucidated, whereby salt stress induces the expression and accumulation of a nuclear factor Y subunit, which directly interacts to destabilize and promotes histone acetylation and subsequent activation of salt-responsive genes to enhance salt tolerance [122]. Liu et al. [133] identified and molecularly characterized six soybean LSD-like genes, GmLDLs, which were found to exhibit catalytic demethylase activity, elucidating their potential function in acting as chromatin remodeling factors of stress-inducible genes to regulate expression in response to different abiotic stresses.
Importantly, while studies focusing on epigenomic mapping have been immensely beneficial to furthering our knowledge regarding soybean biology, there remains significant challenges in solidifying causal relationships due to the complex interplay between environmental influences, genetic, and epigenetic factors. Moving forward, more emphasis should be placed on the testing of hypothetical causal relationships between identified epigenetic marks and their suspected functional consequences, which can only be facilitated through the integration of a multi-omics approach and the implementation of epigenome editing tools. More researchers are utilizing an integrative approach to comprehensively analyze and construct the complex regulatory networks controlling phenotypic responses in soybean [113,123,124,131,132,134].

3.6. Phenomics

Since the development of high-throughput phenotyping technologies, research in the area of crop phenotyping is now being acknowledged as ‘phenomics’ [135]. Manual study of plant phenotypes is inefficient and costly on a large scale and prone to errors due to human bias. In order to address these challenges, plant phenomics is playing a key role in helping to improve efficiency and accuracy in studying plant characteristics [136], and it has been observed that the study of phenomics has increased in recent years [4].
Soybean cultivars differ based on many factors, including but not limited to resistance to stress and metabolic capacity, which influences resulting seed attributes like size and composition. Significant changes in structural characteristics can arise from even the most minor developmental differences [137]. Morrison et al. (2021) used 3D depth cameras to measure canopy height in soybean and compared it to single point systems operated by hand, and found that there were significant differences [138]. Plant phenomics uses advanced, high-throughput phenotyping technologies to gather detailed phenotype data throughout plant growth. This approach generates extensive trait data, allowing for the division of a single trait into several smaller, testable components [139]. Using fixed, quantitative, and uniform standards for data collection enables automated, high-throughput analysis [140,141]. This improves the precision of crop phenotype identification and boosts the efficiency of plant breeding and cultivation management [141,142].
Industrial unmanned aerial vehicles (UAVs) equipped with advanced high-definition dual-camera multispectral systems can accurately estimate soybean yields and classify pod maturity by analyzing time-series multispectral images [143]. Integration of high-spatial-resolution red green blue (RGB), multispectral, and thermal data from UAVs has enhanced the accuracy of assessing soybean physiological and biochemical parameters, such as chlorophyll content and nitrogen concentration, as well as biophysical traits like leaf area index and biomass [144]. Additionally, early-season RGB images of the soybean canopy have been used to predict yield, maturity, and seed size based on color and texture features [145].
These UAVs, outfitted with costly multispectral, hyperspectral, and thermal imaging equipment, are increasingly utilized for precise sampling of soybean canopy traits, including height, area, temperature, and leaf wilting [146,147,148]. Despite early success studying several phenotypes, obtaining high-dimensional phenotypic traits from 2D images remains challenging, with some morphological trait estimates requiring calibration. To address this challenge, 3D reconstruction of plant morphology from 2D images using open-source structure from motion (OpenSfM) and multi-view stereo (MVS) methods has been employed [135]. This approach allows for the generation of high-density 3D point clouds to capture plant height and growth phenotype data [149]. A notable study used 3D reconstruction technology to analyze the phenotypic fingerprint and growth patterns of soybean plants throughout their development, demonstrating a cost-effective alternative to expensive laser scanners and potential for automating certain processes [150].
High-throughput phenotyping methods can allow for effective assessment of plant responses to stressors and subsequently the identification of novel genes [151]. For instance, in studying stress-induced effects on plants, a combination of approaches, namely ground vehicles, unmanned aerial systems, and smartphone-captured digital images, can be employed to routinely and directly evaluate the severity of iron deficiency chlorosis (IDC) in real time, thereby enabling large-scale field screening for soybean IDC tolerance [152,153]. Zhou et al. (2018) demonstrated the feasibility of an automated plant phenotyping system in a greenhouse setting, using digital cameras to continuously capture images and extract features such as chlorophyll content and salt tolerance, showcasing its potential for analysis [154]. Additionally, for stressors like flooding, Zhou et al. (2021) applied multispectral and infrared thermal imaging cameras to take canopy images from varying flight altitudes, employing deep learning techniques to estimate soybean flooding damage scores, thus facilitating the identification of genetic features linked to abiotic stress and the development of resilient soybean varieties [147]. Plant phenomics research is experiencing rapid growth due to advancements in remote sensing, robotics, visualization, and artificial intelligence (AI), which present unprecedented challenges in managing large datasets and numerous images. Machine learning (ML), particularly deep learning, is being employed to address these challenges, using techniques such as convolutional neural networks (CNNs), multilayer perceptrons (MLPs), and recurrent neural networks (RNNs) [155,156]. CNNs, in particular, have found widespread application in plant phenotyping tasks, supported by advances in cloud and graphics processing unit (GPU) computing [155].
In soybean yield prediction, ML, multimodal data fusion, and deep learning techniques have been applied to UAV-collected data with notable success [157,158,159]. A fusion architecture that utilizes multi-view images has been developed to monitor soybean pods and estimate yield, demonstrating effectiveness [160]. Similarly, a novel image analysis method has been devised to rapidly assess the morphology and color of soybean seeds [161]. Algorithms such as random forests [162] and deep CNNs are well suited for vision-based tasks, enabling the detection of materials in harvested soybeans on a large scale [163]. Furthermore, for below-ground root phenotype data, a fully automated pipeline employing deep learning architectures has significantly reduced labor and inconsistency in nodule detection and segmentation (i.e., separation of the image into regions with similar features), allowing for earlier assessment of genetic and environmental impacts on nodules [164].
While soybean phenomics research has seen significant progress, its focus has predominantly been on describing external physical traits of plants, neglecting internal and biochemical characteristics. This limitation has hindered its practical application in plant breeding. However, private breeders are increasingly utilizing advanced technologies, such as drones, to gather comprehensive phenotypic data. To fully leverage the potential of plant phenomics in advancing plant breeding, it is crucial to integrate the study of internal physiological traits and biochemical characteristics alongside external traits.

4. Integration of Multi-Omics Data for Soybean Trait Improvement

Combining multi-omics data—genomics, transcriptomics, proteomics, metabolomics, epigenomics, and phenomics—is essential for a comprehensive understanding of biological processes in soybean. Each omics layer provides unique insights into the different aspects of cell function and regulation. Genomics reveals the genetic blueprint, while transcriptomics highlights active gene expression. Proteomics uncovers the proteins and their interactions, metabolomics identifies metabolic pathways and their products, epigenomics sheds light on gene regulation mechanisms beyond the DNA sequence, and phenomics links these molecular data to observable traits [4,10,11,91,136]. Integrating these datasets allows for a holistic view of how genes and their products interact within the complex biological networks of soybean.
A multi-omics approach is especially helpful in understanding stress responses and adaptation mechanisms in soybean; by integrating these data, candidate genes and regulatory networks that contribute to key agronomic traits can be identified and harnessed for strategic breeding. For example, under drought conditions, changes in gene expression and metabolite levels were correlated to identify key pathways that confer drought tolerance [165]. This integrated approach enables the identification of biomarkers for breeding and genetic engineering, which can improve soybean varieties for enhanced stress tolerance and yield [98,165].
One notable example of multi-omics integration in soybean research is the identification of novel genetic markers associated with soybean seed weight and oil content. By combining genomics and transcriptomics, researchers identified QTLs and candidate genes that influence both seed weight and oil content [166]. Kumar et al. (2021) used an integrated approach combining genomics, transcriptomics, and proteomics to identify molecular mechanisms behind seed oil and protein content [167]. Another example is the study of soybean resistance to biotic stress, specifically resistance to the soybean cyst nematode (SCN), a major pest affecting soybean yields [168,169,170]. An integrated approach combining transcriptomics with metabolomics was used to identify resistance genes and understand their expression patterns in resistance to SCN, where significantly different metabolites involved in specific pathways related to sensitive or resistant varieties were revealed [169]. Moreover, a study employed ML to integrate genomic data from GWAS and phenotypic data from hyperspectral reflectance, identifying genetic loci linked to yield, and this allowed the development of predictive models with improved accuracy [159].
Additionally, multi-omics integration was used to study soybean responses to phosphorus deficiency, a common nutrient stress. Transcriptomics analysis showed how the genes identified were associated with phosphorus uptake and utilization, and how they were differentially expressed under phosphorus stress [171]. Proteomics and metabolomics analyses identified proteins and metabolites involved in phosphorus metabolism. This integrated approach led to the identification of key genetic and biochemical pathways involved in phosphorus use efficiency, offering targets for improving soybean nutrient management [171].
Multi-omics can be especially useful when used together to characterize novel whole genome sequences and subsequently investigate gene expression to gain a comprehensive appreciation for genetically important and/or active areas of the genome. Gupta et al. (2023) used DNA sequencing to assemble the complete genomes of three different isolates of soybean rust pathogen (Phakopsora pachyrhizi) collected from different areas of South America [172]. This study investigated the evolutionary history and the influence of the high transposable element content found in the P. pachyrhizi genomes. These researchers used raw RNA-seq data to determine heterozygosity and perform SNP analysis across the three different exomes [172]. SNP analysis was used to point toward unique nonsynonymous mutations for each fungal isolate, and transcriptomics was used to determine expression activity of each of the corresponding genes [172]. This study used a combination of omics techniques to draw meaningful conclusions on the natural history of this large and detrimental fungal family, and the implications on adaptation and genetic plasticity as a result of high transposable elements within their genomes. This is also an excellent example of using RNA-seq data beyond the usual DE analysis.
CRISPR (clustered regularly interspaced short palindromic repeats) technology has significantly enhanced the integration of multi-omics techniques by providing a precise and efficient method for genome editing. For example, CRISPR has been used to knock out or modify genes in soybean, enabling researchers to observe changes in gene expression, protein levels, and metabolic pathways resulting from these edits [173,174,175]. These targeted modifications help in understanding the functional roles of specific genes and their contributions to complex traits such as plant architecture and flowering time.
The use of CRISPR also facilitates the validation of candidate genes identified through multi-omics integration. By editing these genes, researchers can confirm their roles in specific biological processes and pathways. For instance, CRISPR has been used to validate genes involved in soybean’s response to abiotic stresses such as drought and salinity [176]. This approach not only accelerates the identification and validation of key genetic markers but also aids in the development of improved soybean varieties with desired traits. Thus, CRISPR technology plays a crucial role in advancing multi-omics research by enabling precise genetic manipulations and facilitating the comprehensive study of gene function and regulation [177].
This technology allows researchers to introduce specific genetic modifications and study their effects across various -omics layers [178,179]. Wan et al. (2022) used CRISPR/Cas9 to mutate the E1 early maturity locus in soybean to confirm its functional role in plant development. Phenotypic evidence identified the role of E1 on initiation of terminal flowering, determinate growth habit, and branch number. Transcriptomics and DE analysis between the wildtype (WT) and mutant pointed to a compensatory response in the expression of E1 homologs, E1-La and E1-Lb. Expression profiles of WT and mutant plants identified that the mutants had a significant decrease in Dt1 expression and significant upregulation of Dt2 expression, both growth habit regulator genes [180,181], suggesting that E1 regulates growth habit using the Dt2-Dt1 signaling pathway [182]. This research combines the power of CRISPR/Cas9 gene editing with transcriptional profiling to identify a key regulatory mechanism with important agronomic implications.
Large-scale multi-omics methods are an excellent approach to identifying the etiology of a gene or trait. Liang et al. (2022) used GWAS to identify a locus on chromosome 18, Dt2, with a predominant association to shoot branching in soybean, an agronomically valuable trait that influences yield [183]. Using transcriptomics, the authors pointed to a difference in expression of Dt2 corresponding with natural variations in sequence between major haplotypes. Expression data from 20 random natural accessions identified a negative correlation between Dt2 expression and branch number [184]. The authors used CRISPR/Cas9 to knock out Dt2 and observed that the mutant had an increased number of branches, delayed flowering and maturity, with increased main stem node number and plant height [184], consistent with previously reported functions [181,185]. The authors used DE to identify the interaction between Dt2 and the promoter region of GmAp1 gene family representatives. This study exhibits a seamless flow from genomics, to transcriptomics, to functional validation of the underlying genetic basis of an agronomic trait.
Bioinformatics tools have been developed for each of the -omics disciplines and have helped in data analysis and processing. However, in order to gain a holistic understanding of soybeans, these platforms need to be integrated and examined together [186]. Yang et al. developed SoyMD, a multi-omics database that allows for the querying of information regarding a gene of interest including functional annotation, homology, genetic variation, and epigenetic signals. SoyMD integrates several -omics tools to achieve the highest level of detail for its assemblies [187]. This includes annotations using Gene Ontology (GO), RNA-seq data that was aligned using HiSat2, and epigenetic signals that were predicted using Basenji, a deep CNN approach [188,189,190]. SoyMD is implemented as a web application and provides an invaluable tool for multi-omics analysis of soybean [187]. SoyOmics is another integrated multi-omics publicly available database that aims to gain a more holistic understanding of soybean. SoyOmics gathers assembled genomes, RNA-seq data, and DNA methylation datasets. These data were used to create toolkits such as easy GWAS for quick-start GWAS analysis, and ExpPattern for expression pattern analysis. SoyOmics is available as a web application and provides additional resources for multi-omics analyses for soybeans [191]. There are also various bioinformatics tools that can be used to interact with these multi-omics platforms. Pathway tools (PTools) interact with the Pathway Genome database (PGDB) and allow for genome analysis, metabolic modeling, and analysis of high-throughput data from this database. This includes data querying and data visualization, which allows for researchers to easily obtain their data of interest as quickly as possible [192].
A comprehensive understanding using an integrated multi-omics approach is crucial for developing strategies to optimize soybean growth and productivity under diverse environmental challenges, ultimately contributing to food security and sustainable agriculture. Table 1 presents a summary table of the main findings discussed in this section as well as their references.

5. Novel Approaches and Emerging Trends

Emerging techniques in soybean -omics are revolutionizing our understanding of this essential crop. scRNA-seq, as discussed above and in Liu et al. (2023), is emerging as a powerful tool to study gene expression at the individual cell level, offering unique resolution in understanding cellular heterogeneity and developmental processes in soybeans [168]. Often, it is difficult to distinguish between technical artifacts in scRNA-seq data, as generated by cells with a lot of noise, which results in a distinct gene expression pattern from actual biological heterogeneity between cells. The Single-cell RNA-seq Quality Control (SinQC) tool was developed and uses the cells of the main population as controls to establish quality cutoff points, which allows it to identify these technical artifacts [193]. Another promising technique is GWAS, which leverages large datasets to identify genetic variations linked to important agronomic traits [194]. These techniques, combined with advanced bioinformatics tools, are enabling researchers to dissect complex genetic traits and accelerate the development of superior soybean varieties [179].
The integration of novel -omics techniques holds great potential for accelerating soybean improvement efforts and overcoming longstanding agricultural challenges. High-throughput sequencing and GWAS enable the identification of genetic markers associated with desirable traits, which can be utilized in MAS to expedite the breeding process [51]. This accelerates the development of soybean varieties that are better adapted to diverse environmental conditions. By integrating these advanced techniques, researchers can tackle existing challenges in soybean cultivation, leading to increased productivity and sustainability.
Less common than GWAS, genome-wide epistatic studies (GWES) use similar concepts of diverse panels of individuals along with mathematical and statistical techniques, but it points toward pairs of SNPs or markers that in combination are statistically associated with significant phenotypic differences [195]. GWAS excels at detecting single polymorphisms in the genome associated with the phenotype but does not consider that highly repetitive genomes like the soybean may have multiple mutations working in conjunction to control phenotypic differences [195,196]. GWES can act as a stepping stone between soybean genomics and the genomics of model organisms as it can reveal more complex gene networks at play that are not readily observable and can point out upstream or downstream functional information that contributes to the field of functional genomics more readily than the results of a simple GWAS study. Moellers et al. [196] used GWES to explore resistance of soybean to Sclerotinia sclerotiorum, demonstrating that even though single loci with high effect are elusive, multiple loci working in collaboration can provide important levels of resistance while at the same time shining light on the immune system of the soybean. Assefa et al. [197] used GWES to explore -IDC- in soybean, showing that GWES also has applicability in studying abiotic stressors. The limitations of GWES lie in its computational cost, which is significantly higher than GWAS, and also the need to functionally validate two loci working in conjunction rather than a single locus alone.
Analysis of latent phenotypes (i.e., nuance variations pervasive in regulatory circuits), together with GWES represent cutting-edge approaches that fill critical gaps in current soybean research and breeding strategies. Latent phenotype analysis involves the use of advanced imaging or computational techniques to uncover hidden traits that are not easily observable through traditional phenotyping methods [198]. This approach can reveal subtle variations in plant morphology, physiology, and development, providing a more comprehensive understanding of trait penetrance and genetic diversity in soybean populations [199]. By identifying these latent phenotypes, traits that may contribute to improved yield, stress tolerance, and overall plant performance can be selected. By incorporating latent phenotype analysis and GWES into soybean research, scientists can gain a deeper understanding of the genetic basis of complex traits and develop more effective breeding programs to enhance soybean productivity and resilience [199].
Machine learning and AI are revolutionizing soybean research and breeding by providing advanced analytical tools to manage and interpret large datasets. These technologies enable the identification of data and the modeling of complex relationships that are too difficult to accomplish with traditional methods [200]. Machine learning methods, such as the random forest classifier, have been used to identify important phenotypic predictor variables related to seed yield. This information can then be used by breeders to improve breeding selection, thus optimizing the selection and placement in agro-management systems [201]. AI has also seen a surge in application to multi-omics analysis [202]. More specifically, deep learning, a subset of ML, has been used to develop pDeep, a neural-network based model that can predict the intensity distribution of product ion mass spectrometry based proteomics analysis to predict fragmentation spectra for any peptide using only its sequence [203]. Deep learning has also been used in the development of Deepnovo, a neural network model that can perform de novo peptide sequencing using mass spectrometry data without the use of an existing database [204]. AI is also seeing expanded uses in genomics, where deep learning was used to develop DeepBind, a software tool used to predict the sequence specificity of DNA and RNA binding proteins with high accuracy [205]. Deep learning was also used to develop DeepSEA, another tool that can predict chromatin effects of sequence alterations with single nucleotide precision [206]. AI and ML are also used to optimize existing processes. CRISTAI uses an ML-based algorithm to help with sgRNA design for use in CRISPR-Cas9 genome editing. The CRISTA algorithm has been shown to have higher accuracy than traditional methods for predicting the propensity of a genomic site to be cleaved by the designed sgRNA [207].

6. Challenges and Future Directions

While multi-omics investigations have near endless potential for advanced bioinformatic exploration, challenges in integrating information across multiple platforms can negatively influence results by introducing ambiguity, reduced confidence, or miscommunication across platforms, to name a few examples.
Updated reference genomes, such as the movement in the soybean research community from Williams 82.a2.v1 to Williams 82.a4.v1 and/or Williams 82.a6.v1, change the standardized baseline (genome) for which a vast number of -omics studies depend. While updates to the reference genomes are crucial to the advancement of soybean research, discrepancies between older and newer reference genomes introduce limitations in the cross-talk between data generated across different reference genomes. This can lead to inaccurate gene and/or marker mapping, hindering any potential for contributing to the improvement of soybean breeding programs.
Large scale -omics quickly generate a vast amount of data, which poses issues regarding optimal storage and organization (both long and short term). Furthermore, bioinformatic algorithms may not necessarily be equipped or suitable beyond a particular size threshold, and scalability needs to be considered; further, this threshold may vary across -omics platforms. Tools and databases are being developed at a rate that is overwhelming for many researchers. It can therefore be difficult to ensure that all these new tools adhere to findable, accessible, interoperable, reusable (FAIR) research standards [208]. Additionally, multi-omics approaches often result in larger datasets than many researchers are used to working with. In order to maximize the potential of these approaches, a higher emphasis on interpretation of large datasets, coupled with a strong understanding of the biological systems being studied, will need to be applied [11]. There are also persisting challenges with each of the individual -omics techniques that hinder the multi-omics integration. For example, RNA-seq data is often generated using complex plant tissues, which results in noise from transcripts that are outside of the region of interest. Single-cell RNA-seq offers a potential solution to this noise by removing the noise from the unwanted cells, but it is cost-prohibitive in many cases at this time [209].
Large-scale multi-platform data also pose a challenge in interpretation of the results. Interpretation of multi-omics results needs to be conducted within the biological context of the data. Tissue type, developmental stage, sample type (DNA, mRNA, protein, metabolite, etc.), and experimental condition must be carefully coordinated and interpreted such that the biological context of the information is not lost. Data visualization across multi-omics platforms poses an issue regarding graphically presenting information taken from multiple contexts (i.e., transcriptomics and metabolomics—the transcriptional activity and the metabolite distribution can easily be displayed alone, but how can these data effectively be shown within the same graphical context?).
While adopting a multi-omics approach to study the intricacies of biological systems is an attractive option due to the significant amount of data generated from multiple layers of study, it is not without significant challenges. Not only does each -omics method generate substantial volumes of data, but the data generated from different -omics techniques each have their own structures and scales, resulting in highly dimensional data. This presents computational challenges in data handling operations, in which we are restricted by algorithm and tool development, processing power, and storage capacity. Additionally, as an emerging field, the integration of multi-omics studies currently has no standardization regarding experimental protocols, data collection, data processing and statistical modeling, presentation, and result dissemination. Lack of standardization in experimental conditions, data collection, and data processing methods leads to issues in the reproducibility of results and application of findings to other experimental conditions. The large amounts of highly dimensional and heterogenous data create statistical challenges, with large numbers of non-independent variables increasing noise and risk of false positives, tendency of overfitting, and multicollinearity in statistical models, resulting in decreased power and difficulty establishing thresholds and cutoffs. While statistical significance does not always indicate biological relevance, the lack of accurate statistical models subsequently makes interpreting the results from complex interactions between -omics layers more difficult and hinders strategies for candidate selection for further functional validation. Importantly, even with the progress being made within this sphere, we continue to remain limited by the functional annotation of genes, proteins, and metabolites, with a tendency of bias toward well-annotated genes remaining strong. The future of multi-omics investigative studies carries astronomical potential for gaining a holistic understanding of biological activity from multiple perspectives at the same time. More work is needed to advance the seamless flow from one -omics platform to another. As the field of bioinformatics progresses, new and enhanced algorithms will be developed, which will lead to more robust bioinformatic analyses—for example, a platform where the effects of genetic or epigenetic modifications can be pre-tested in silico before experimental verification. This would revolutionize the approach to plant research, allowing for precise predictions and optimizations.
Recent technological advancements in throughput, multiplexing, resolution, and accuracy now allow for the simultaneous profiling of multiple -omes from single cells with unprecedented clarity and sensitivity, providing a comprehensive and holistic view of complex cellular processes. The advent and application of AI-assisted techniques, such as machine learning and deep learning, to biological analyses have improved, and continue to improve, our ability to overcome the limitations posed by traditional analysis methods. Integration of multi-omics studies with advanced AI learning models will provide a framework for efficiently identifying patterns, extracting meaningful relationships from extensive datasets, interpreting complex and intricate data layers, and ultimately accelerating multiple facets of plant research. Further, hardware will also advance as we look toward the future of -omics and agriculture; for example, new UAVs and drones will be equipped with cutting-edge cameras, sensors, and information relay signaling. Multi-omics analysis has opened an entirely new world of analytics in the fields of molecular biology, microbiology, agronomy, and so much more. Moreover, addressing these advancements will require a focus on education, including the need for training at university levels to equip the next generation of scientists and researchers with the necessary skills to meet these challenges.
Emphasis now needs to be placed on creating more intuitive bioinformatics pipelines that are capable of multi-omics analysis. Pipelines such as Genpipes and MOMIC represent a promising beginning as they are both recently developed pipelines that perform multi-omics analyses [210,211]. By complementing these pipelines with additional tools for data visualization, and potentially including graphical user interfaces in new pipelines, which can improve accessibility, multi-omics analysis could reach new heights of applicability.

7. Conclusions

The integration of multi-omics strategies with bioinformatics innovations is a transformative approach in soybean improvement. These techniques collectively offer a multidimensional, holistic view of soybean biology, encompassing the genetic blueprint (genomics), gene expression patterns (transcriptomics), protein functions and interactions (proteomics and interactomes), metabolic pathways (metabolomics), epigenetic modifications (epigenomics), and observable traits (phenomics). By integrating these diverse data layers, researchers can unravel the complex regulatory networks and molecular mechanisms driving soybean growth, development, and responses to environmental stresses.
Such comprehensive insights enable precision breeding strategies that are more targeted and efficient. By pinpointing specific genes and pathways associated with desirable traits, breeders can develop soybean varieties with improved yield, quality, and resilience. This is especially crucial in the context of climate change, where crops must adapt to increasingly variable and extreme conditions. Enhanced stress tolerance in soybeans, for instance, can lead to varieties that maintain productivity under drought, heat, or pest pressures.
Moreover, the integration of multi-omics data with bioinformatics tools accelerates the discovery and functional validation of candidate genes, reducing the time and resources required for developing new soybean varieties. This innovation also supports sustainable agricultural practices by facilitating the creation of crops that require fewer inputs like fertilizers and pesticides, thereby minimizing environmental impact. Advanced bioinformatics tools can sift through vast amounts of data to identify patterns and correlations that might not be apparent through traditional methods, making it possible to predict how different genetic variations will affect the plant’s phenotype. This predictive power is crucial for breeding programs aiming to produce robust soybean varieties capable of thriving under various environmental conditions.
The multi-omics approach also aids in understanding the epigenetic modifications that regulate gene expression in response to environmental factors. These modifications can have profound effects on plant traits and can be harnessed to develop soybeans with enhanced adaptability and performance. By leveraging epigenomics, researchers can identify epigenetic markers associated with desirable traits and use this information to select parent plants for breeding programs or influence the plant performance in situ in a temporary manner [212].
Furthermore, phenomics, which involves the comprehensive measurement of phenotypes on a large scale, enables the detailed assessment of how genetic variations manifest in observable traits. High-throughput phenotyping allows for the rapid and accurate collection of data on plant characteristics, facilitating the identification of phenotypic traits linked to specific genetic and epigenetic variations. This integration of phenomics data with other omics data provides a complete picture of the plant’s biology, guiding breeders in their efforts to enhance soybean varieties.
Ultimately, the continued advancement and application of multi-omics and bioinformatics in soybean research are vital for ensuring global food security and sustainability. Collaboration among scientists, breeders, and bioinformaticians is essential to fully exploit these technologies’ potentials, leading to strong and resilient soybean varieties that can meet the growing demands of the world’s population while preserving environmental health. This collaborative effort will drive innovation in agricultural practices, paving the way for a future where soybeans and other crops are optimized for both productivity and sustainability.

Author Contributions

Conceptualization, B.S.; resources, B.S.; writing—original draft preparation, S.H., J.H., S.L., M.E., and N.P.; writing—review and editing, S.H., J.H., S.L., M.E., N.P., K.S., F.M., A.G., E.R.C., and B.S.; supervision, B.S., E.R.C., and A.G.; funding acquisition, B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

Siwar Haidar would like to express her deep appreciation to Ahmed Kharrouby, for his support and encouragement.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Messina, M. Perspective: Soybeans can help address the caloric and protein needs of a growing global population. Front. Nutr. 2022, 9, 909464. [Google Scholar] [CrossRef]
  2. Hoffman, A.; Kemanian, A.; Forest, C.E. The response of maize, sorghum, and soybean yield to growing-phase climate revealed with machine learning. Environ. Res. Lett. 2020, 15, 094013. [Google Scholar] [CrossRef]
  3. Pazhamala, L.T.; Kudapa, H.; Weckwerth, W.; Millar, A.H.; Varshney, R.K. Systems biology for crop improvement. Plant Genome 2021, 14, e20098. [Google Scholar] [CrossRef]
  4. Cao, P.; Zhao, Y.; Wu, F.; Xin, D.; Liu, C.; Wu, X.; Lv, J.; Chen, Q.; Qi, Z. Multi-Omics Techniques for Soybean Molecular Breeding. Int. J. Mol. Sci. 2022, 23, 4994. [Google Scholar] [CrossRef]
  5. Severin, A.J.; Woody, J.L.; Bolon, Y.-T.; Joseph, B.; Diers, B.W.; Farmer, A.D.; Muehlbauer, G.J.; Nelson, R.T.; Grant, D.; Specht, J.E.; et al. RNA-Seq Atlas of Glycine max: A guide to the soybean transcriptome. BMC Plant Biol. 2010, 10, 160. [Google Scholar] [CrossRef]
  6. Han, L.; Zhong, W.; Qian, J.; Jin, M.; Tian, P.; Zhu, W.; Zhang, H.; Sun, Y.; Feng, J.-W.; Liu, X.; et al. A multi-omics integrative network map of maize. Nat. Genet. 2022, 55, 144–153. [Google Scholar] [CrossRef]
  7. Iqbal, Z.; Iqbal, M.S.; Khan, M.I.R.; Ansari, M.I. Toward Integrated Multi-Omics Intervention: Rice Trait Improvement and Stress Management. Front. Plant Sci. 2021, 12, 741419. [Google Scholar] [CrossRef]
  8. Da Ros, L.; Da Ros, L.; Bollina, V.; Soolanayakanahally, R.; Pahari, S.; Elferjani, R.; Kulkarni, M.; Vaid, N.; Risseuw, E.; Cram, D.; et al. Multi-omics atlas of combinatorial abiotic stress responses in wheat. Plant J. 2023, 116, 1118–1135. [Google Scholar] [CrossRef]
  9. Cigliano, R.A.; Aversano, R.; Di Matteo, A.; Palombieri, S.; Termolino, P.; Angelini, C.; Bostan, H.; Cammareri, M.; Consiglio, F.M.; Della Ragione, F.; et al. Multi-omics data integration provides insights into the post-harvest biology of a long shelf-life tomato landrace. Hortic. Res. 2022, 9, uhab042. [Google Scholar] [CrossRef]
  10. Großkinsky, D.K.; Syaifullah, S.J.; Roitsch, T. Integration of multi-omics techniques and physiological phenotyping within a holistic phenomics approach to study senescence in model and crop plants. J. Exp. Bot. 2017, 69, 825–844. [Google Scholar] [CrossRef]
  11. Mahmood, U.; Li, X.; Fan, Y.; Chang, W.; Niu, Y.; Li, J.; Qu, C.; Lu, K. Multi-omics revolution to promote plant breeding efficiency. Front. Plant Sci. 2022, 13, 1062952. [Google Scholar] [CrossRef] [PubMed]
  12. Yang, Y.; Saand, M.A.; Huang, L.; Abdelaal, W.B.; Zhang, J.; Wu, Y.; Li, J.; Sirohi, M.H.; Wang, F. Applications of Multi-Omics Technologies for Crop Improvement. Front. Plant Sci. 2021, 12, 563953. [Google Scholar] [CrossRef] [PubMed]
  13. Sharma, S.; Upadhyaya, H.D.; Varshney, R.K.; Gowda, C.L.L. Pre-breeding for diversification of primary gene pool and genetic enhancement of grain legumes. Front. Plant Sci. 2013, 4, 309. [Google Scholar] [CrossRef] [PubMed]
  14. Kashyap, A.; Garg, P.; Tanwar, K.; Sharma, J.; Gupta, N.C.; Ha, P.T.T.; Bhattacharya, R.C.; Mason, A.S.; Rao, M. Strategies for utilization of crop wild relatives in plant breeding programs. Theor. Appl. Genet. 2022, 135, 4151–4167. [Google Scholar] [CrossRef] [PubMed]
  15. Crossa, J.; Pérez-Rodríguez, P.; Cuevas, J.; Montesinos-López, O.; Jarquín, D.; de los Campos, G.; Burgueño, J.; González-Camacho, J.M.; Pérez-Elizalde, S.; Beyene, Y.; et al. Genomic Selection in Plant Breeding: Methods, Models, and Perspectives. Trends Plant Sci. 2017, 22, 961–975. [Google Scholar] [CrossRef]
  16. Li, Y.; Kaur, S.; Pembleton, L.W.; Valipour-Kahrood, H.; Rosewarne, G.M.; Daetwyler, H.D. Strategies of preserving genetic diversity while maximizing genetic response from implementing genomic selection in pulse breeding programs. Theor. Appl. Genet. 2022, 135, 1813–1828. [Google Scholar] [CrossRef]
  17. Bassi, F.M.; Sanchez-Garcia, M.; Ortiz, R. What plant breeding may (and may not) look like in 2050? Plant Genome 2024, 17, e20368. [Google Scholar] [CrossRef]
  18. Thomson, M.J. High-Throughput SNP Genotyping to Accelerate Crop Improvement. Plant Breed. Biotechnol. 2014, 2, 195–212. [Google Scholar] [CrossRef]
  19. Konieczny, A.; Ausubel, F.M. A procedure for mapping Arabidopsis mutations using co-dominant ecotype-specific PCR-based markers. Plant J. 1993, 4, 403–410. [Google Scholar] [CrossRef]
  20. Neff, M.M.; Neff, J.D.; Chory, J.; Pepper, A.E. dCAPS, a simple technique for the genetic analysis of single nucleotide polymorphisms: Experimental applications in Arabidopsis thaliana genetics. Plant J. 1998, 14, 387–392. [Google Scholar] [CrossRef]
  21. Zietkiewicz, E.; Rafalski, A.; Labuda, D. Genome Fingerprinting by Simple Sequence Repeat (SSR)-Anchored Polymerase Chain Reaction Amplification. Genomics 1994, 20, 176–183. [Google Scholar] [CrossRef] [PubMed]
  22. He, C.; Holme, J.; Anthony, J. SNP Genotyping: The KASP Assay. In Crop Breeding: Methods and Protocols; Fleury, D., Whitford, R., Eds.; Springer: New York, NY, USA, 2014; pp. 75–86. [Google Scholar]
  23. National Human Genome Research Institute. A Brief Guide to Genomics. 16 August 2022. Available online: https://www.genome.gov/about-genomics/fact-sheets/A-Brief-Guide-to-Genomics (accessed on 6 July 2024).
  24. National Human Genome Research Institute. Genetics vs. Genomics Fact Sheet. 7 September 2018. Available online: https://www.genome.gov/about-genomics/fact-sheets/Genetics-vs-Genomics (accessed on 6 July 2024).
  25. Schmutz, J.; Cannon, S.B.; Schlueter, J.; Ma, J.; Mitros, T.; Nelson, W.; Hyten, D.L.; Song, Q.; Thelen, J.J.; Cheng, J.; et al. Genome sequence of the palaeopolyploid soybean. Nature 2010, 463, 178–183. [Google Scholar] [CrossRef] [PubMed]
  26. Song, Q.; Jenkins, J.; Jia, G.; Hyten, D.L.; Pantalone, V.; Jackson, S.A.; Schmutz, J.; Cregan, P.B. Construction of high resolution genetic linkage maps to improve the soybean genome sequence assembly Glyma1.01. BMC Genom. 2016, 17, 33. [Google Scholar] [CrossRef] [PubMed]
  27. Valliyodan, B.; Cannon, S.B.; Bayer, P.E.; Shu, S.; Brown, A.V.; Ren, L.; Jenkins, J.; Chung, C.Y.; Chan, T.; Daum, C.G.; et al. Construction and comparison of three reference-quality genome assemblies for soybean. Plant J. 2019, 100, 1066–1082. [Google Scholar] [CrossRef]
  28. Shen, Y.; Du, H.; Liu, Y.; Ni, L.; Wang, Z.; Liang, C.; Tian, Z. Update soybean Zhonghuang 13 genome to a golden reference. Sci. China Life Sci. 2019, 62, 1257–1260. [Google Scholar] [CrossRef]
  29. Xie, M.; Chung, C.Y.-L.; Li, M.-W.; Wong, F.-L.; Wang, X.; Liu, A.; Wang, Z.; Leung, A.K.-Y.; Wong, T.-H.; Tong, S.-W.; et al. A reference-grade wild soybean genome. Nat. Commun. 2019, 10, 1216. [Google Scholar] [CrossRef]
  30. Tettelin, H.; Medini, D. The Pangenome; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
  31. Torkamaneh, D.; Lemay, M.; Belzile, F. The pan-genome of the cultivated soybean (PanSoy) reveals an extraordinarily conserved gene content. Plant Biotechnol. J. 2021, 19, 1852–1862. [Google Scholar] [CrossRef]
  32. Bayer, P.E.; Valliyodan, B.; Hu, H.; Marsh, J.I.; Yuan, Y.; Vuong, T.D.; Patil, G.; Song, Q.; Batley, J.; Varshney, R.K.; et al. Sequencing the USDA core soybean collection reveals gene loss during domestication and breeding. Plant Genome 2021, 15, e20109. [Google Scholar] [CrossRef]
  33. Liu, Y.; Du, H.; Li, P.; Shen, Y.; Peng, H.; Liu, S.; Zhou, G.-A.; Zhang, H.; Liu, Z.; Shi, M.; et al. Pan-Genome of Wild and Cultivated Soybeans. Cell 2020, 182, 162–176. [Google Scholar] [CrossRef]
  34. Bhat, J.A.; Yu, D. High-throughput NGS-based genotyping and phenotyping: Role in genomics-assisted breeding for soybean improvement. Legume Sci. 2021, 3, e81. [Google Scholar] [CrossRef]
  35. Basantani, M.K.; Gupta, D.; Mehrotra, R.; Mehrotra, S.; Vaish, S.; Singh, A. An update on bioinformatics resources for plant genomics research. Curr. Plant Biol. 2017, 11–12, 33–40. [Google Scholar] [CrossRef]
  36. Ou, S.; Su, W.; Liao, Y.; Chougule, K.; Agda, J.R.A.; Hellinga, A.J.; Lugo, C.S.B.; Elliott, T.A.; Ware, D.; Peterson, T.; et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019, 20, 275. [Google Scholar] [CrossRef]
  37. Zheng, Y.; Jiao, C.; Sun, H.; Rosli, H.G.; Pombo, M.A.; Zhang, P.; Banf, M.; Dai, X.; Martin, G.B.; Giovannoni, J.J.; et al. iTAK: A Program for Genome-wide Prediction and Classification of Plant Transcription Factors, Transcriptional Regulators, and Protein Kinases. Mol. Plant 2016, 9, 1667–1670. [Google Scholar] [CrossRef]
  38. Henikoff, S.; Comai, L. Single-Nucleotide Mutations for Plant Functional Genomics. Annu. Rev. Plant Biol. 2003, 54, 375–401. [Google Scholar] [CrossRef]
  39. Goel, M.; Sun, H.; Jiao, W.-B.; Schneeberger, K. SyRI: Finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019, 20, 277. [Google Scholar] [CrossRef]
  40. Wang, Y.; Tang, H.; DeBarry, J.D.; Tan, X.; Li, J.; Wang, X.; Lee, T.-H.; Jin, H.; Marler, B.; Guo, H.; et al. MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012, 40, e49. [Google Scholar] [CrossRef]
  41. Proost, S.; Fostier, J.; De Witte, D.; Dhoedt, B.; Demeester, P.; Van de Peer, Y.; Vandepoele, K. i-ADHoRe 3.0—Fast and sensitive detection of genomic homology in extremely large data sets. Nucleic Acids Res. 2012, 40, e11. [Google Scholar] [CrossRef]
  42. Morrell, P.L.; Buckler, E.S.; Ross-Ibarra, J. Crop genomics: Advances and applications. Nat. Rev. Genet. 2011, 13, 85–96. [Google Scholar] [CrossRef]
  43. Zhang, J.; Song, Q.; Cregan, P.B.; Jiang, G.-L. Genome-wide association study, genomic prediction and marker-assisted selection for seed weight in soybean (Glycine max). Theor. Appl. Genet. 2015, 129, 117–130. [Google Scholar] [CrossRef]
  44. Pu, Y.; Yan, R.; Jia, D.; Che, Z.; Yang, R.; Yang, C.; Wang, H.; Cheng, H.; Yu, D. Identification of soybean mosaic virus strain SC7 resistance loci and candidate genes in soybean [Glycine max (L.) Merr.]. Mol. Genet. Genom. 2024, 299, 54. [Google Scholar] [CrossRef]
  45. Li, Y.; Gu, J.; Zhao, B.; Yuan, J.; Li, C.; Lin, Y.; Chen, Y.; Yang, X.; Wang, Z.-Y. Identification and confirmation of novel genetic loci and domestication gene GmGA20ox1 regulating primary root length in soybean seedling stage. Ind. Crop. Prod. 2024, 217, 118814. [Google Scholar] [CrossRef]
  46. Hu, D.; Zhao, Y.; Zhu, L.; Li, X.; Zhang, J.; Cui, X.; Li, W.; Hao, D.; Yang, Z.; Wu, F.; et al. Genetic dissection of ten photosynthesis-related traits based on InDel- and SNP-GWAS in soybean. Theor. Appl. Genet. 2024, 137, 96. [Google Scholar] [CrossRef]
  47. Zhao, X.; Zhu, H.; Liu, F.; Wang, J.; Zhou, C.; Yuan, M.; Zhao, X.; Li, Y.; Teng, W.; Han, Y.; et al. Integrating Genome-Wide Association Study, Transcriptome and Metabolome Reveal Novel QTL and Candidate Genes That Control Protein Content in Soybean. Plants 2024, 13, 1128. [Google Scholar] [CrossRef]
  48. Yao, X.; Zhang, D. Genome-Wide Association Analysis of Active Accumulated Temperature versus Flowering Time in Soybean [Glycine max (L.) Merr.]. Agronomy 2024, 14, 833. [Google Scholar] [CrossRef]
  49. Dhingra, A.; Shinde, S.; D’agostino, L.; Devkar, V.; Shinde, H.; Rajurkar, A.B.; Sonah, H.; Vuong, T.D.; Siebecker, M.G.; Jiao, Y.; et al. Identification of novel germplasm and genetic loci for enhancing mineral element uptake in soybean. Environ. Exp. Bot. 2024, 219, 105643. [Google Scholar] [CrossRef]
  50. Kato, S.; Samanfar, B.; Morrison, M.J.; Bekele, W.A.; Torkamaneh, D.; Rajcan, I.; O’donoughue, L.; Belzile, F.; Cober, E.R. Genome-wide association study to identify soybean stem pushing resistance and lodging resistance loci. Can. J. Plant Sci. 2021, 101, 663–670. [Google Scholar] [CrossRef]
  51. Haidar, S.; Lackey, S.; Charette, M.; Yoosefzadeh-Najafabadi, M.; Gahagan, A.C.; Hotte, T.; Belzile, F.; Rajcan, I.; Golshani, A.; Morrison, M.J.; et al. Genome-wide analysis of cold imbibition stress in soybean, Glycine max. Front. Plant Sci. 2023, 14, 1221644. [Google Scholar] [CrossRef]
  52. Miller, M.J.; Song, Q.; Fallen, B.; Li, Z. Genomic prediction of optimal cross combinations to accelerate genetic improvement of soybean (Glycine max). Front. Plant Sci. 2023, 14, 1171135. [Google Scholar] [CrossRef]
  53. Kaler, A.S.; Purcell, L.C.; Beissinger, T.; Gillman, J.D. Genomic prediction models for traits differing in heritability for soybean, rice, and maize. BMC Plant Biol. 2022, 22, 87. [Google Scholar] [CrossRef]
  54. Bandillo, N.B.; Jarquin, D.; Posadas, L.G.; Lorenz, A.J.; Graef, G.L. Genomic selection performs as effectively as phenotypic selection for increasing seed yield in soybean. Plant Genome 2023, 16, e20285. [Google Scholar] [CrossRef]
  55. Love, M.I.; Huber, W.; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014, 15, 550. [Google Scholar] [CrossRef]
  56. Robinson, M.D.; McCarthy, D.J.; Smyth, G.K. EdgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010, 26, 139–140. [Google Scholar] [CrossRef]
  57. Brown, A.V.; Conners, S.I.; Huang, W.; Wilkey, A.P.; Grant, D.; Weeks, N.T.; Cannon, S.B.; Graham, M.A.; Nelson, R.T. A new decade and new data at SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res. 2020, 49, D1496–D1501. [Google Scholar] [CrossRef]
  58. Almeida-Silva, F.; Pedrosa-Silva, F.; Venancio, T.M. The Soybean Expression Atlas v2: A comprehensive database of over 5000 RNA-seq samples. Plant J. 2023, 116, 1041–1051. [Google Scholar] [CrossRef]
  59. Liu, Z.; Kong, X.; Long, Y.; Liu, S.; Zhang, H.; Jia, J.; Cui, W.; Zhang, Z.; Song, X.; Qiu, L.; et al. Integrated single-nucleus and spatial transcriptomics captures transitional states in soybean nodule maturation. Nat. Plants 2023, 9, 515–524. [Google Scholar] [CrossRef]
  60. Langfelder, P.; Horvath, S. WGCNA: An R package for weighted correlation network analysis. BMC Bioinform. 2008, 9, 559. [Google Scholar] [CrossRef]
  61. Niu, J.; Zhao, J.; Guo, Q.; Zhang, H.; Yue, A.; Zhao, J.; Yin, C.; Wang, M.; Du, W. WGCNA Reveals Hub Genes and Key Gene Regulatory Pathways of the Response of Soybean to Infection by Soybean mosaic virus. Genes 2024, 15, 566. [Google Scholar] [CrossRef]
  62. Du, J.; Wang, S.; He, C.; Zhou, B.; Ruan, Y.L.; Shou, H. Identification of regulatory networks and hub genes controlling soybean seed set and size using RNA sequencing analysis. J. Exp. Bot. 2017, 68, 1955–1972. [Google Scholar] [CrossRef]
  63. Tommasini, D.; Fogel, B.L. multiWGCNA: An R package for deep mining gene co-expression networks in multi-trait expression data. BMC Bioinform. 2023, 24, 115. [Google Scholar] [CrossRef]
  64. Zhang, H.; Goettel, W.; Song, Q.; Jiang, H.; Hu, Z.; Wang, M.L.; An, Y.-Q.C. Selection of GmSWEET39 for oil and protein improvement in soybean. PLoS Genet. 2020, 16, e1009114. [Google Scholar] [CrossRef]
  65. Miao, L.; Yang, S.; Zhang, K.; He, J.; Wu, C.; Ren, Y.; Gai, J.; Li, Y. Natural variation and selection in GmSWEET39 affect soybean seed oil content. New Phytol. 2019, 225, 1651–1666. [Google Scholar] [CrossRef] [PubMed]
  66. Hooker, J.C.; Nissan, N.; Luckert, D.; Zapata, G.; Hou, A.; Mohr, R.M.; Glenn, A.J.; Barlow, B.; Daba, K.A.; Warkentin, T.D.; et al. GmSWEET29 and Paralog GmSWEET34 Are Differentially Expressed between Soybeans Grown in Eastern and Western Canada. Plants 2022, 11, 2337. [Google Scholar] [CrossRef] [PubMed]
  67. Hooker, J.C.; Smith, M.; Zapata, G.; Charette, M.; Luckert, D.; Mohr, R.M.; Daba, K.A.; Warkentin, T.D.; Hadinezhad, M.; Barlow, B.; et al. Differential gene expression provides leads to environmentally regulated soybean seed protein content. Front. Plant Sci. 2023, 14, 1260393. [Google Scholar] [CrossRef]
  68. Pandurangan, S.; Pajak, A.; Molnar, S.J.; Cober, E.R.; Dhaubhadel, S.; Hernández-Sebastià, C.; Kaiser, W.M.; Nelson, R.L.; Huber, S.C.; Marsolais, F. Relationship between asparagine metabolism and protein concentration in soybean seed. J. Exp. Bot. 2012, 63, 3173–3184. [Google Scholar] [CrossRef]
  69. Lopes-Caitar, V.S.; de Carvalho, M.C.; Darben, L.M.; Kuwahara, M.K.; Nepomuceno, A.L.; Dias, W.P.; Abdelnoor, R.V.; Marcelino-Guimarães, F.C. Genome-wide analysis of the Hsp 20 gene family in soybean: Comprehensive sequence, genomic organization and expression profile analysis under abiotic and biotic stresses. BMC Genom. 2013, 14, 577. [Google Scholar] [CrossRef]
  70. Jorrin-Novo, J.V.; Komatsu, S.; Sanchez-Lucas, R.; de Francisco, L.E.R. Gel electrophoresis-based plant proteomics: Past, present, and future. Happy 10th anniversary Journal of Proteomics! J. Proteom. 2019, 198, 1–10. [Google Scholar] [CrossRef]
  71. Hajduch, M.; Ganapathy, A.; Stein, J.W.; Thelen, J.J. A Systematic Proteomic Study of Seed Filling in Soybean. Establishment of High-Resolution Two-Dimensional Reference Maps, Expression Profiles, and an Interactive Proteome Database. Plant Physiol. 2005, 137, 1397–1419. [Google Scholar] [CrossRef] [PubMed]
  72. Afroz, A.; Hashiguchi, A.; Khan, M.R.; Komatsu, S. Analyses of the Proteomes of the Leaf, Hypocotyl, and Root of Young Soybean Seedlings. Protein Pept. Lett. 2010, 17, 319–331. [Google Scholar] [CrossRef] [PubMed]
  73. Nguyen, T.H.N.; Brechenmacher, L.; Aldrich, J.T.; Clauss, T.R.; Gritsenko, M.A.; Hixson, K.K.; Libault, M.; Tanaka, K.; Yang, F.; Yao, Q.; et al. Quantitative Phosphoproteomic Analysis of Soybean Root Hairs Inoculated with Bradyrhizobium japonicum. Mol. Cell. Proteom. 2012, 11, 1140–1155. [Google Scholar] [CrossRef]
  74. Qin, J.; Gu, F.; Liu, D.; Yin, C.; Zhao, S.; Chen, H.; Zhang, J.; Yang, C.; Zhan, X.; Zhang, M. Proteomic analysis of elite soybean Jidou17 and its parents using iTRAQ-based quantitative approaches. Proteome Sci. 2013, 11, 12. [Google Scholar] [CrossRef]
  75. Hajduch, M.; Matusova, R.; Houston, N.L.; Thelen, J.J. Comparative proteomics of seed maturation in oilseeds reveals differences in intermediary metabolism. Proteomics 2011, 11, 1619–1629. [Google Scholar] [CrossRef] [PubMed]
  76. Xu, X.P.; Liu, H.; Tian, L.; Dong, X.B.; Shen, S.H.; Qu, L.Q. Integrated and comparative proteomics of high-oil and high-protein soybean seeds. Food Chem. 2015, 172, 105–116. [Google Scholar] [CrossRef]
  77. Xu, Y.; Yan, F.; Liu, Y.; Wang, Y.; Gao, H.; Zhao, S.; Zhu, Y.; Wang, Q.; Li, J. Quantitative proteomic and lipidomics analyses of high oil content GmDGAT1-2 transgenic soybean illustrate the regulatory mechanism of lipoxygenase and oleosin. Plant Cell Rep. 2021, 40, 2303–2323. [Google Scholar] [CrossRef] [PubMed]
  78. Wang, X.; Hu, H.; Li, F.; Yang, B.; Komatsu, S.; Zhou, S. Quantitative proteomics reveals dual effects of calcium on radicle protrusion in soybean. J. Proteom. 2020, 230, 103999. [Google Scholar] [CrossRef]
  79. Wang, X.; Khodadadi, E.; Fakheri, B.; Komatsu, S. Organ-specific proteomics of soybean seedlings under flooding and drought stresses. J. Proteom. 2017, 162, 62–72. [Google Scholar] [CrossRef]
  80. Wang, X.; Komatsu, S. Proteomic approaches to uncover the flooding and drought stress response mechanisms in soybean. J. Proteom. 2018, 172, 201–215. [Google Scholar] [CrossRef] [PubMed]
  81. Yadav, M.; Singh, A. Reprogramming of Glycine max (Soybean) Proteome in Response to Spodoptera litura (Common Cutworm)-Infestation. J. Plant Growth Regul. 2024, 43, 1934–1953. [Google Scholar] [CrossRef]
  82. Islam, N.; Bates, P.D.; John, K.M.M.; Krishnan, H.B.; Zhang, Z.J.; Luthria, D.L.; Natarajan, S.S. Quantitative Proteomic Analysis of Low Linolenic Acid Transgenic Soybean Reveals Perturbations of Fatty Acid Metabolic Pathways. Proteomics 2019, 19, e1800379. [Google Scholar] [CrossRef]
  83. Wei, J.; Liu, X.; Li, L.; Zhao, H.; Liu, S.; Yu, X.; Shen, Y.; Zhou, Y.; Zhu, Y.; Shu, Y.; et al. Quantitative proteomic, physiological and biochemical analysis of cotyledon, embryo, leaf and pod reveals the effects of high temperature and humidity stress on seed vigor formation in soybean. BMC Plant Biol. 2020, 20, 127. [Google Scholar] [CrossRef]
  84. Al-Amrani, S.; Al-Jabri, Z.; Al-Zaabi, A.; Alshekaili, J.; Al-Khabori, M. Proteomics: Concepts and applications in human medicine. World J. Biol. Chem. 2021, 12, 57–69. [Google Scholar] [CrossRef]
  85. Chen, C.; Hou, J.; Tanner, J.J.; Cheng, J. Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis. Int. J. Mol. Sci. 2020, 21, 2873. [Google Scholar] [CrossRef] [PubMed]
  86. Ting, Y.S.; Egertson, J.D.; Bollinger, J.G.; Searle, B.C.; Payne, S.H.; Noble, W.S.; MacCoss, M.J. PECAN: Library-free peptide detection for data-independent acquisition tandem mass spectrometry data. Nat. Methods 2017, 14, 903–908. [Google Scholar] [CrossRef] [PubMed]
  87. Tyanova, S.; Temu, T.; Sinitcyn, P.; Carlson, A.; Hein, M.Y.; Geiger, T.; Mann, M.; Cox, J. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 2016, 13, 731–740. [Google Scholar] [CrossRef]
  88. Cox, J.; Hein, M.Y.; Luber, C.A.; Paron, I.; Nagaraj, N.; Mann, M. Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ. Mol. Cell. Proteom. 2014, 13, 2513–2526. [Google Scholar] [CrossRef] [PubMed]
  89. Mergner, J.; Kuster, B. Plant Proteome Dynamics. Annu. Rev. Plant Biol. 2022, 73, 67–92. [Google Scholar] [CrossRef]
  90. Weckwerth, W. Metabolomics in Systems Biology. Annu. Rev. Plant Biol. 2003, 54, 669–689. [Google Scholar] [CrossRef] [PubMed]
  91. Dikobe, T.; Masenya, K.; Manganyi, M. Molecular technologies ending with ‘omics’: The driving force toward sustainable plant production and protection [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2023, 12, 480. [Google Scholar] [CrossRef]
  92. Taylor, J.; King, R.D.; Altmann, T.; Fiehn, O. Application of metabolomics to plant genotype discrimination using statistics and machine learning. Bioinformatics 2002, 18 (Suppl. S2), S241–S248. [Google Scholar] [CrossRef]
  93. Xiao, Q.; Mu, X.; Liu, J.; Li, B.; Liu, H.; Zhang, B.; Xiao, P. Plant metabolomics: A new strategy and tool for quality evaluation of Chinese medicinal materials. Chin. Med. 2022, 17, 45. [Google Scholar] [CrossRef] [PubMed]
  94. Dolatmoradi, M.; Samarah, L.Z.; Vertes, A. Single-Cell Metabolomics by Mass Spectrometry: Opportunities and Challenges. Anal. Sens. 2021, 2, e202100032. [Google Scholar] [CrossRef]
  95. Lanekoff, I.; Sharma, V.V.; Marques, C. Single-cell metabolomics: Where are we and where are we going? Curr. Opin. Biotechnol. 2022, 75, 102693. [Google Scholar] [CrossRef] [PubMed]
  96. Lee, J.; Hwang, Y.-S.; Chang, W.-S.; Moon, J.-K.; Choung, M.-G. Seed maturity differentially mediates metabolic responses in black soybean. Food Chem. 2013, 141, 2052–2059. [Google Scholar] [CrossRef] [PubMed]
  97. Wilcox, J.R.; Shibles, R.M. Interrelationships among Seed Quality Attributes in Soybean. Crop. Sci. 2001, 41, 11–14. [Google Scholar] [CrossRef]
  98. Feng, Z.; Ding, C.; Li, W.; Wang, D.; Cui, D. Applications of metabolomics in the research of soybean plant under abiotic stress. Food Chem. 2019, 310, 125914. [Google Scholar] [CrossRef] [PubMed]
  99. Lin, H.; Rao, J.; Shi, J.; Hu, C.; Cheng, F.; Wilson, Z.A.; Zhang, D.; Quan, S. Seed metabolomic study reveals significant metabolite variations and correlations among different soybean cultivars. J. Integr. Plant Biol. 2014, 56, 826–836. [Google Scholar] [CrossRef]
  100. Kim, J.K.; Kim, E.-H.; Park, I.; Yu, B.-R.; Lim, J.D.; Lee, Y.-S.; Lee, J.-H.; Kim, S.-H.; Chung, I.-M. Isoflavones profiling of soybean [Glycine max (L.) Merrill] germplasms and their correlations with metabolic pathways. Food Chem. 2013, 153, 258–264. [Google Scholar] [CrossRef]
  101. Liu, J.; Hu, B.; Liu, W.; Qin, W.; Wu, H.; Zhang, J.; Yang, C.; Deng, J.; Shu, K.; Du, J.; et al. Metabolomic tool to identify soybean [Glycine max (L.) Merrill] germplasms with a high level of shade tolerance at the seedling stage. Sci. Rep. 2017, 7, 42478. [Google Scholar] [CrossRef]
  102. Quintela, A.L.; Santos, M.F.; de Lima, R.F.; Mayer, J.L.; Marcheafave, G.G.; Arruda, M.A.; Tormena, C.F. Influence of Silver Nanoparticles on the Metabolites of Two Transgenic Soybean Varieties: An NMR-Based Metabolomics Approach. J. Agric. Food Chem. 2024, 72, 12281–12294. [Google Scholar] [CrossRef]
  103. Nguyen, K.-O.T.; Do, T.N.; Dang, K.P.; Sato, M.; Hirai, M.Y. Single-grain-based widely targeted metabolomics profiling of sixty-four accessions of Japanese wild soybean (Glycin soja Sieb. Et Zucc.). Int. J. Food Sci. Technol. 2024, 59, 4251–4262. [Google Scholar] [CrossRef]
  104. Yan, D.; Huang, L.; Mei, Z.; Bao, H.; Xie, Y.; Yang, C.; Gao, X. Untargeted metabolomics revealed the effect of soybean metabolites on poly(γ-glutamic acid) production in fermented natto and its metabolic pathway. J. Sci. Food Agric. 2023, 104, 1298–1307. [Google Scholar] [CrossRef]
  105. Hall, R.D.; D’Auria, J.C.; Ferreira, A.C.S.; Gibon, Y.; Kruszka, D.; Mishra, P.; Van de Zedde, R. High-throughput plant phenotyping: A role for metabolomics? Trends Plant Sci. 2022, 27, 549–563. [Google Scholar] [CrossRef] [PubMed]
  106. Lavin, M.; Herendeen, P.S.; Wojciechowski, M.F. Evolutionary Rates Analysis of Leguminosae Implicates a Rapid Diversification of Lineages during the Tertiary. Syst. Biol. 2005, 54, 575–594. [Google Scholar] [CrossRef] [PubMed]
  107. Kim, K.D.; El Baidouri, M.; Abernathy, B.; Iwata-Otsubo, A.; Chavarro, C.; Gonzales, M.; Libault, M.; Grimwood, J.; Jackson, S.A. A Comparative Epigenomic Analysis of Polyploidy-Derived Genes in Soybean and Common Bean. Plant Physiol. 2015, 168, 1433–1447. [Google Scholar] [CrossRef] [PubMed]
  108. An, Y.C.; Goettel, W.; Han, Q.; Bartels, A.; Liu, Z.; Xiao, W. Dynamic Changes of Genome-Wide DNA Methylation during Soybean Seed Development. Sci. Rep. 2017, 7, 12263. [Google Scholar] [CrossRef]
  109. Chen, M.; Lin, J.-Y.; Hur, J.; Pelletier, J.M.; Baden, R.; Pellegrini, M.; Harada, J.J.; Goldberg, R.B. Seed genome hypomethylated regions are enriched in transcription factor genes. Proc. Natl. Acad. Sci. USA 2018, 115, E8315–E8322. [Google Scholar] [CrossRef]
  110. Wang, L.; Jia, G.; Jiang, X.; Cao, S.; Chen, Z.J.; Song, Q. Altered chromatin architecture and gene expression during polyploidization and domestication of soybean. Plant Cell 2021, 33, 1430–1446. [Google Scholar] [CrossRef]
  111. Manoharlal, R.; Saiprasad, G.V.S. Assessment of germination, phytochemicals, and transcriptional responses to ethephon priming in soybean [Glycine max (L.) Merrill]. Genome 2019, 62, 769–783. [Google Scholar] [CrossRef]
  112. Wang, Q.; Yung, W.-S.; Wang, Z.; Lam, H.-M. The histone modification H3K4me3 marks functional genes in soybean nodules. Genomics 2020, 112, 5282–5294. [Google Scholar] [CrossRef]
  113. Zhai, H.; Wan, Z.; Jiao, S.; Zhou, J.; Xu, K.; Nan, H.; Liu, Y.; Xiong, S.; Fan, R.; Zhu, J.; et al. GmMDE genes bridge the maturity gene E1 and florigens in photoperiodic regulation of flowering in soybean. Plant Physiol. 2022, 189, 1021–1036. [Google Scholar] [CrossRef]
  114. Cadavid, I.C.; Balbinott, N.; Margis, R. Beyond transcription factors: More regulatory layers affecting soybean gene expression under abiotic stress. Genet. Mol. Biol. 2023, 46 (Suppl. S1), e20220166. [Google Scholar] [CrossRef] [PubMed]
  115. Chen, R.; Li, M.; Zhang, H.; Duan, L.; Sun, X.; Jiang, Q.; Zhang, H.; Hu, Z. Continuous salt stress-induced long non-coding RNAs and DNA methylation patterns in soybean roots. BMC Genom. 2019, 20, 730. [Google Scholar] [CrossRef] [PubMed]
  116. Chu, S.; Zhang, X.; Yu, K.; Lv, L.; Sun, C.; Liu, X.; Zhang, J.; Jiao, Y.; Zhang, D. Genome-Wide Analysis Reveals Dynamic Epigenomic Differences in Soybean Response to Low-Phosphorus Stress. Int. J. Mol. Sci. 2020, 21, 6817. [Google Scholar] [CrossRef] [PubMed]
  117. Feng, P.; Sun, X.; Liu, X.; Li, Y.; Sun, Q.; Lu, H.; Li, M.; Ding, X.; Dong, Y. Epigenetic Regulation of Plant Tolerance to Salt Stress by Histone Acetyltransferase GsMYST1 From Wild Soybean. Front. Plant Sci. 2022, 13, 860056. [Google Scholar] [CrossRef] [PubMed]
  118. Han, X.; Shi, Q.; He, Z.; Song, W.; Chen, Q.; Qi, Z. Transcriptome-wide N6-methyladenosine (m6A) methylation in soybean under Meloidogyne incognita infection. aBIOTECH 2022, 3, 197–211. [Google Scholar] [CrossRef]
  119. Han, X.; Wang, J.; Zhang, Y.; Kong, Y.; Dong, H.; Feng, X.; Li, T.; Zhou, C.; Yu, J.; Xin, D.; et al. Changes in the m6A RNA methylome accompany the promotion of soybean root growth by rhizobia under cadmium stress. J. Hazard. Mater. 2023, 441, 129843. [Google Scholar] [CrossRef]
  120. Hossain, M.S.; Kawakatsu, T.; Kim, K.D.; Zhang, N.; Nguyen, C.T.; Khan, S.M.; Batek, J.M.; Joshi, T.; Schmutz, J.; Grimwood, J.; et al. Divergent cytosine DNA methylation patterns in single-cell, soybean root hairs. New Phytol. 2017, 214, 808–819. [Google Scholar] [CrossRef]
  121. Jiang, L.; Yang, X.; Gao, X.; Yang, H.; Ma, S.; Huang, S.; Zhu, J.; Zhou, H.; Li, X.; Gu, X.; et al. Multiomics Analyses Reveal the Dual Role of Flavonoids in Pigmentation and Abiotic Stress Tolerance of Soybean Seeds. J. Agric. Food Chem. 2024, 72, 3231–3243. [Google Scholar] [CrossRef]
  122. Lu, L.; Wei, W.; Tao, J.; Lu, X.; Bian, X.; Hu, Y.; Cheng, T.; Yin, C.; Zhang, W.; Chen, S.; et al. Nuclear factor Y subunit GmNFYA competes with GmHDA13 for interaction with GmFVE to positively regulate salt tolerance in soybean. Plant Biotechnol. J. 2021, 19, 2362–2379. [Google Scholar] [CrossRef]
  123. Ma, C.; Ma, S.; Yu, Y.; Feng, H.; Wang, Y.; Liu, C.; He, S.; Yang, M.; Chen, Q.; Xin, D.; et al. Transcriptome-wide m6A methylation profiling identifies GmAMT1;1 as a promoter of lead and cadmium tolerance in soybean nodules. J. Hazard. Mater. 2024, 465, 133263. [Google Scholar] [CrossRef]
  124. Rambani, A.; Hu, Y.; Piya, S.; Long, M.; Rice, J.H.; Pantalone, V.; Hewezi, T. Identification of Differentially Methylated miRNA Genes During Compatible and Incompatible Interactions Between Soybean and Soybean Cyst Nematode. Mol. Plant-Microbe Interact. 2020, 33, 1340–1352. [Google Scholar] [CrossRef]
  125. Rambani, A.; Pantalone, V.; Yang, S.; Rice, J.H.; Song, Q.; Mazarei, M.; Arelli, P.R.; Meksem, K.; Stewart, C.N.; Hewezi, T. Identification of introduced and stably inherited DNA methylation variants in soybean associated with soybean cyst nematode parasitism. New Phytol. 2020, 227, 168–184. [Google Scholar] [CrossRef] [PubMed]
  126. Sun, L.; Song, G.; Guo, W.; Wang, W.; Zhao, H.; Gao, T.; Lv, Q.; Yang, X.; Xu, F.; Dong, Y.; et al. Dynamic Changes in Genome-Wide Histone3 Lysine27 Trimethylation and Gene Expression of Soybean Roots in Response to Salt Stress. Front. Plant Sci. 2019, 10, 1031. [Google Scholar] [CrossRef] [PubMed]
  127. Yung, W.; Huang, C.; Li, M.; Lam, H. Changes in epigenetic features in legumes under abiotic stresses. Plant Genome 2022, 16, e20237. [Google Scholar] [CrossRef]
  128. Yung, W.; Wang, Q.; Huang, M.; Wong, F.; Liu, A.; Ng, M.; Li, K.; Sze, C.; Li, M.; Lam, H. Priming-induced alterations in histone modifications modulate transcriptional responses in soybean under salt stress. Plant J. 2021, 109, 1575–1590. [Google Scholar] [CrossRef]
  129. Zhang, Y.; Han, X.; Su, D.; Liu, C.; Chen, Q.; Qi, Z. An analysis of differentially expressed and differentially m6A-modified transcripts in soybean roots treated with lead. J. Hazard. Mater. 2023, 453, 131370. [Google Scholar] [CrossRef] [PubMed]
  130. Kim, Y.K.; Chae, S.; Oh, N.I.; Nguyen, N.H.; Cheong, J.J. Recurrent Drought Conditions Enhance the Induction of Drought Stress Memory Genes in Glycine max L. Front. Genet. 2020, 11, 576086. [Google Scholar] [CrossRef]
  131. Raju, S.K.K.; Shao, M.; Sanchez, R.; Xu, Y.; Sandhu, A.; Graef, G.; Mackenzie, S. An epigenetic breeding system in soybean for increased yield and stability. Plant Biotechnol. J. 2018, 16, 1836–1847. [Google Scholar] [CrossRef]
  132. Wang, W.; Zhang, T.; Liu, C.; Liu, C.; Jiang, Z.; Zhang, Z.; Ali, S.; Li, Z.; Wang, J.; Sun, S.; et al. A DNA demethylase reduces seed size by decreasing the DNA methylation of AT-rich transposable elements in soybean. Commun. Biol. 2024, 7, 613. [Google Scholar] [CrossRef]
  133. Liu, M.; Jiang, J.; Han, Y.; Shi, M.; Li, X.; Wang, Y.; Dong, Z.; Yang, C. Functional Characterization of the Lysine-Specific Histone Demethylases Family in Soybean. Plants 2022, 11, 1398. [Google Scholar] [CrossRef]
  134. Yang, C.; Shen, W.; Chen, H.; Chu, L.; Xu, Y.; Zhou, X.; Liu, C.; Chen, C.; Zeng, J.; Liu, J.; et al. Characterization and subcellular localization of histone deacetylases and their roles in response to abiotic stresses in soybean. BMC Plant Biol. 2018, 18, 226. [Google Scholar] [CrossRef]
  135. Zhao, C.; Zhang, Y.; Du, J.; Guo, X.; Wen, W.; Gu, S.; Wang, J.; Fan, J. Crop Phenomics: Current Status and Perspectives. Front. Plant Sci. 2019, 10, 714. [Google Scholar] [CrossRef]
  136. Shen, Y.; Zhou, G.; Liang, C.; Tian, Z. Omics-based interdisciplinarity is accelerating plant breeding. Curr. Opin. Plant Biol. 2022, 66, 102167. [Google Scholar] [CrossRef] [PubMed]
  137. Lube, V.; Noyan, M.A.; Przybysz, A.; Salama, K.; Blilou, I. MultipleXLab: A high-throughput portable live-imaging root phenotyping platform using deep learning and computer vision. Plant Methods 2022, 18, 38. [Google Scholar] [CrossRef] [PubMed]
  138. Morrison, M.J.; Gahagan, A.C.; Lefebvre, M.B. Measuring canopy height in soybean and wheat using a low-cost depth camera. Plant Phenome J. 2021, 4, e20019. [Google Scholar] [CrossRef]
  139. Andrade-Sanchez, P.; Gore, M.A.; Heun, J.T.; Thorp, K.R.; Carmo-Silva, A.E.; French, A.N.; Salvucci, M.E.; White, J.W. Development and evaluation of a field-based high-throughput phenotyping platform. Funct. Plant Biol. 2014, 41, 68–79. [Google Scholar] [CrossRef]
  140. Cooper, L.; Meier, A.; Elser, J.L.; Preece, J.; Xu, X.; Kitchen, R.S.; Qu, B.; Zhang, E.; Todorovic, S.; Jaiswal, P. The Planteome Project. In ICBO/BioCreative; Oregon State University: Corvallis, OR, USA, 2016. [Google Scholar]
  141. Von Gillhaussen, P. Interantional Plant Phenotyping Network (IPPN). Germany. Available online: https://www.plant-phenotyping.org/ (accessed on 26 September 2024).
  142. Fiorani, F.; Schurr, U. Future Scenarios for Plant Phenotyping. Annu. Rev. Plant Biol. 2013, 64, 267–291. [Google Scholar] [CrossRef]
  143. Yu, N.; Li, L.; Schmitz, N.; Tian, L.F.; Greenberg, J.A.; Diers, B.W. Development of methods to improve soybean yield estimation and predict plant maturity with an unmanned aerial vehicle based platform. Remote. Sens. Environ. 2016, 187, 91–101. [Google Scholar] [CrossRef]
  144. Maimaitijiang, M.; Ghulam, A.; Sidike, P.; Hartling, S.; Maimaitiyiming, M.; Peterson, K.; Shavers, E.; Fishman, J.; Peterson, J.; Kadam, S.; et al. Unmanned Aerial System (UAS)-based phenotyping of soybean using multi-sensor data fusion and extreme learning machine. ISPRS J. Photogramm. Remote. Sens. 2017, 134, 43–58. [Google Scholar] [CrossRef]
  145. Yuan, W.; Wijewardane, N.K.; Jenkins, S.; Bai, G.; Ge, Y.; Graef, G.L. Early Prediction of Soybean Traits through Color and Texture Features of Canopy RGB Imagery. Sci. Rep. 2019, 9, 14089. [Google Scholar] [CrossRef]
  146. Toda, Y.; Kaga, A.; Kajiya-Kanegae, H.; Hattori, T.; Yamaoka, S.; Okamoto, M.; Tsujimoto, H.; Iwata, H. Genomic prediction modeling of soybean biomass using UAV-based remote sensing and longitudinal model parameters. Plant Genome 2021, 14, e20157. [Google Scholar] [CrossRef]
  147. Zhou, J.; Mou, H.; Zhou, J.; Ali, M.L.; Ye, H.; Chen, P.; Nguyen, H.T. Qualification of Soybean Responses to Flooding Stress Using UAV-Based Imagery and Deep Learning. Plant Phenomics 2021, 2021, 9892570. [Google Scholar] [CrossRef] [PubMed]
  148. Zhou, J.; Zhou, J.; Ye, H.; Ali, M.L.; Nguyen, H.T.; Chen, P. Classification of soybean leaf wilting due to drought stress using UAV-based imagery. Comput. Electron. Agric. 2020, 175, 105576. [Google Scholar] [CrossRef]
  149. Zhou, J.; Fu, X.; Zhou, S.; Zhou, J.; Ye, H.; Nguyen, H.T. Automated segmentation of soybean plants from 3D point cloud using machine learning. Comput. Electron. Agric. 2019, 162, 143–153. [Google Scholar] [CrossRef]
  150. Zhu, R.; Sun, K.; Yan, Z.; Yan, X.; Yu, J.; Shi, J.; Hu, Z.; Jiang, H.; Xin, D.; Zhang, Z.; et al. Analysing the phenotype development of soybean plants using low-cost 3D reconstruction. Sci. Rep. 2020, 10, 7055. [Google Scholar] [CrossRef]
  151. Finkel, E. With ‘Phenomics’, Plant Scientists Hope to Shift Breeding Into Overdrive. Science 2009, 325, 380–381. [Google Scholar] [CrossRef] [PubMed]
  152. Dobbels, A.A.; Lorenz, A.J. Soybean iron deficiency chlorosis high-throughput phenotyping using an unmanned aircraft system. Plant Methods 2019, 15, 97. [Google Scholar] [CrossRef]
  153. Naik, H.S.; Zhang, J.; Lofquist, A.; Assefa, T.; Sarkar, S.; Ackerman, D.; Singh, A.; Singh, A.K.; Ganapathysubramanian, B. A real-time phenotyping framework using machine learning for plant stress severity rating in soybean. Plant Methods 2017, 13, 23. [Google Scholar] [CrossRef] [PubMed]
  154. Zhou, J.; Chen, H.; Zhou, J.; Fu, X.; Ye, H.; Nguyen, H.T. Development of an automated phenotyping platform for quantifying soybean dynamic responses to salinity stress in greenhouse environment. Comput. Electron. Agric. 2018, 151, 319–330. [Google Scholar] [CrossRef]
  155. Arya, S.; Sandhu, K.S.; Singh, J.; Kumar, S. Deep learning: As the new frontier in high-throughput plant phenotyping. Euphytica 2022, 218, 47. [Google Scholar] [CrossRef]
  156. Urbina, F.; Ekins, S. The commoditization of AI for molecule design. Artif. Intell. Life Sci. 2022, 2, 100031. [Google Scholar] [CrossRef]
  157. Herrero-Huerta, M.; Rodriguez-Gonzalvez, P.; Rainey, K.M. Yield prediction by machine learning from UAS-based multi-sensor data fusion in soybean. Plant Methods 2020, 16, 78. [Google Scholar] [CrossRef] [PubMed]
  158. Maimaitijiang, M.; Sagan, V.; Sidike, P.; Hartling, S.; Esposito, F.; Fritschi, F.B. Soybean yield prediction from UAV using multimodal data fusion and deep learning. Remote. Sens. Environ. 2020, 237, 111599. [Google Scholar] [CrossRef]
  159. Yoosefzadeh-Najafabadi, M.; Torabi, S.; Tulpan, D.; Rajcan, I.; Eskandari, M. Genome-Wide Association Studies of Soybean Yield-Related Hyperspectral Reflectance Bands Using Machine Learning-Mediated Data Integration Methods. Front. Plant Sci. 2021, 12, 777028. [Google Scholar] [CrossRef] [PubMed]
  160. Riera, L.G.; Carroll, M.E.; Zhang, Z.; Shook, J.M.; Ghosal, S.; Gao, T.; Singh, A.; Bhattacharya, S.; Ganapathysubramanian, B.; Singh, A.K.; et al. Deep Multiview Image Fusion for Soybean Yield Estimation in Breeding Applications. Plant Phenomics 2021, 2021, 9846470. [Google Scholar] [CrossRef] [PubMed]
  161. BAEK, J.; Lee, E.; Kim, N.; Kim, S.L.; Choi, I.; Ji, H.; Chung, Y.S.; Choi, M.-S.; Moon, J.-K.; Kim, K.-H. High Throughput Phenotyping for Various Traits on Soybean Seeds Using Image Analysis. Sensors 2019, 20, 248. [Google Scholar] [CrossRef]
  162. Strobl, C.; Zeileis, A. Party on!–A new, conditional variable importance measure for random forests available in party. R J. 2021, 1, 14–17. [Google Scholar] [CrossRef]
  163. Momin, M.A.; Yamamoto, K.; Miyamoto, M.; Kondo, N.; Grift, T. Machine vision based soybean quality evaluation. Comput. Electron. Agric. 2017, 140, 452–460. [Google Scholar] [CrossRef]
  164. Jubery, T.Z.; Carley, C.N.; Singh, A.; Sarkar, S.; Ganapathysubramanian, B.; Singh, A.K. Using Machine Learning to Develop a Fully Automated Soybean Nodule Acquisition Pipeline (SNAP). Plant Phenomics 2021, 2021, 9834746. [Google Scholar] [CrossRef]
  165. Zhao, B.; Zhang, S.; Yang, W.; Li, B.; Lan, C.; Zhang, J.; Yuan, L.; Wang, Y.; Xie, Q.; Han, J.; et al. Multi-omic dissection of the drought resistance traits of soybean landrace LX. Plant Cell Environ. 2021, 44, 1379–1398. [Google Scholar] [CrossRef]
  166. Yuan, X.; Jiang, X.; Zhang, M.; Wang, L.; Jiao, W.; Chen, H.; Mao, J.; Ye, W.; Song, Q. Integrative omics analysis elucidates the genetic basis underlying seed weight and oil content in soybean. Plant Cell 2024, 36, 2160–2175. [Google Scholar] [CrossRef]
  167. Kumar, V.; Vats, S.; Kumawat, S.; Bisht, A.; Bhatt, V.; Shivaraj, S.M.; Padalkar, G.; Goyal, V.; Zargar, S.; Gupta, S.; et al. Omics advances and integrative approaches for the simultaneous improvement of seed oil and protein content in soybean (Glycine max L.). Crit. Rev. Plant Sci. 2021, 40, 398–421. [Google Scholar] [CrossRef]
  168. Bisht, A.; Saini, D.K.; Kaur, B.; Batra, R.; Kaur, S.; Kaur, I.; Jindal, S.; Malik, P.; Sandhu, P.K.; Kaur, A.; et al. Multi-omics assisted breeding for biotic stress resistance in soybean. Mol. Biol. Rep. 2023, 50, 3787–3814. [Google Scholar] [CrossRef] [PubMed]
  169. Shi, X.; Chen, Q.; Liu, S.; Wang, J.; Peng, D.; Kong, L. Combining targeted metabolite analyses and transcriptomics to reveal the specific chemical composition and associated genes in the incompatible soybean variety PI437654 infected with soybean cyst nematode HG1.2.3.5.7. BMC Plant Biol. 2021, 21, 217. [Google Scholar] [CrossRef] [PubMed]
  170. Nissan, N.; Mimee, B.; Cober, E.R.; Golshani, A.; Smith, M.; Samanfar, B. A Broad Review of Soybean Research on the Ongoing Race to Overcome Soybean Cyst Nematode. Biology 2022, 11, 211. [Google Scholar] [CrossRef]
  171. Mo, X.; Liu, G.; Zhang, Z.; Lu, X.; Liang, C.; Tian, J. Mechanisms Underlying Soybean Response to Phosphorus Deficiency through Integration of Omics Analysis. Int. J. Mol. Sci. 2022, 23, 4592. [Google Scholar] [CrossRef]
  172. Gupta, Y.K.; Marcelino-Guimarães, F.C.; Lorrain, C.; Farmer, A.; Haridas, S.; Ferreira, E.G.C.; Lopes-Caitar, V.S.; Oliveira, L.S.; Morin, E.; Widdison, S.; et al. Major proliferation of transposable elements shaped the genome of the soybean rust pathogen Phakopsora pachyrhizi. Nat. Commun. 2023, 14, 1835. [Google Scholar] [CrossRef]
  173. Bao, A.; Chen, H.; Chen, L.; Chen, S.; Hao, Q.; Guo, W.; Qiu, D.; Shan, Z.; Yang, Z.; Yuan, S.; et al. CRISPR/Cas9-mediated targeted mutagenesis of GmSPL9 genes alters plant architecture in soybean. BMC Plant Biol. 2019, 19, 131. [Google Scholar] [CrossRef]
  174. Cai, Y.; Chen, L.; Liu, X.; Sun, S.; Wu, C.; Jiang, B.; Han, T.; Hou, W. CRISPR/Cas9-Mediated Genome Editing in Soybean Hairy Roots. PLoS ONE 2015, 10, e0136064. [Google Scholar] [CrossRef] [PubMed]
  175. Zhao, F.; Lyu, X.; Ji, R.; Liu, J.; Zhao, T.; Li, H.; Liu, B.; Pei, Y. CRISPR/Cas9-engineered mutation to identify the roles of phytochromes in regulating photomorphogenesis and flowering time in soybean. Crop. J. 2022, 10, 1654–1664. [Google Scholar] [CrossRef]
  176. Du, Y.-T.; Zhao, M.-J.; Wang, C.-T.; Gao, Y.; Wang, Y.-X.; Liu, Y.-W.; Chen, M.; Chen, J.; Zhou, Y.-B.; Xu, Z.-S.; et al. Identification and characterization of GmMYB118 responses to drought and salt stress. BMC Plant Biol. 2018, 18, 320. [Google Scholar] [CrossRef]
  177. Roychowdhury, R.; Das, S.P.; Gupta, A.; Parihar, P.; Chandrasekhar, K.; Sarker, U.; Kumar, A.; Ramrao, D.P.; Sudhakar, C. Multi-Omics Pipeline and Omics-Integration Approach to Decipher Plant’s Abiotic Stress Tolerance Responses. Genes 2023, 14, 1281. [Google Scholar] [CrossRef] [PubMed]
  178. Chilcoat, D.; Liu, Z.-B.; Sander, J. Use of CRISPR/Cas9 for crop improvement in maize and soybean. Prog. Mol. Biol. Transl. Sci. 2017, 149, 27–46. [Google Scholar] [PubMed]
  179. Razzaq, M.K.; Aleem, M.; Mansoor, S.; Alam Khan, M.; Rauf, S.; Iqbal, S.; Siddique, K.H.M. Omics and CRISPR-Cas9 Approaches for Molecular Insight, Functional Gene Analysis, and Stress Tolerance Development in Crops. Int. J. Mol. Sci. 2021, 22, 1292. [Google Scholar] [CrossRef] [PubMed]
  180. Liu, B.; Watanabe, S.; Uchiyama, T.; Kong, F.; Kanazawa, A.; Xia, Z.; Nagamatsu, A.; Arai, M.; Yamada, T.; Kitamura, K.; et al. The Soybean Stem Growth Habit Gene Dt1 Is an Ortholog of Arabidopsis TERMINAL FLOWER1. Plant Physiol. 2010, 153, 198–210. [Google Scholar] [CrossRef] [PubMed]
  181. Liu, Y.; Zhang, D.; Ping, J.; Li, S.; Chen, Z.; Ma, J. Innovation of a regulatory mechanism modulating semi-determinate stem growth through artificial selection in soybean. PLoS Genet. 2016, 12, e1005818. [Google Scholar] [CrossRef]
  182. Wan, Z.; Liu, Y.; Guo, D.; Fan, R.; Liu, Y.; Xu, K.; Zhu, J.; Quan, L.; Lu, W.; Bai, X.; et al. CRISPR/Cas9-mediated targeted mutation of the E1 decreases photoperiod sensitivity, alters stem growth habits, and decreases branch number in soybean. Front. Plant Sci. 2022, 13, 1066820. [Google Scholar] [CrossRef]
  183. Wang, B.; Smith, S.M.; Li, J. Genetic Regulation of Shoot Architecture. Annu. Rev. Plant Biol. 2018, 69, 437–468. [Google Scholar] [CrossRef]
  184. Liang, Q.; Chen, L.; Yang, X.; Yang, H.; Liu, S.; Kou, K.; Fan, L.; Zhang, Z.; Duan, Z.; Yuan, Y.; et al. Natural variation of Dt2 determines branching in soybean. Nat. Commun. 2022, 13, 6429. [Google Scholar] [CrossRef]
  185. Ping, J.; Liu, Y.; Sun, L.; Zhao, M.; Li, Y.; She, M.; Sui, Y.; Lin, F.; Liu, X.; Tang, Z.; et al. Dt2 Is a Gain-of-Function MADS-Domain Factor Gene That Specifies Semideterminacy in Soybean. Plant Cell 2014, 26, 2831–2842. [Google Scholar] [CrossRef]
  186. Planell, N.; Lagani, V.; Sebastian-Leon, P.; van der Kloet, F.; Ewing, E.; Karathanasis, N.; Urdangarin, A.; Arozarena, I.; Jagodic, M.; Tsamardinos, I.; et al. STATegra: Multi-Omics Data Integration-A Conceptual Scheme With a Bioinformatics Pipeline. Front. Genet. 2021, 12, 620453. [Google Scholar] [CrossRef]
  187. Yang, Z.; Luo, C.; Pei, X.; Wang, S.; Huang, Y.; Li, J.; Liu, B.; Kong, F.; Yang, Q.-Y.; Fang, C. SoyMD: A platform combining multi-omics data with various tools for soybean research and breeding. Nucleic Acids Res. 2023, 52, D1639–D1650. [Google Scholar] [CrossRef] [PubMed]
  188. Kim, D.; Langmead, B.; Salzberg, S.L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods 2015, 12, 357–360. [Google Scholar] [CrossRef] [PubMed]
  189. Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene ontology: Tool for the unification of biology. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef]
  190. Kelley, D.R.; Reshef, Y.A.; Bileschi, M.; Belanger, D.; McLean, C.Y.; Snoek, J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 2018, 28, 739–750. [Google Scholar] [CrossRef]
  191. Liu, Y.; Zhang, Y.; Liu, X.; Shen, Y.; Tian, D.; Yang, X.; Liu, S.; Ni, L.; Zhang, Z.; Song, S.; et al. SoyOmics: A deeply integrated database on soybean multi-omics. Mol. Plant 2023, 16, 794–797. [Google Scholar] [CrossRef]
  192. Karp, P.D.; Midford, P.E.; Billington, R.; Kothari, A.; Krummenacker, M.; Latendresse, M.; Ong, W.K.; Subhraveti, P.; Caspi, R.; Fulcher, C.; et al. Pathway Tools version 23.0 update: Software for pathway/genome informatics and systems biology. Briefings Bioinform. 2019, 22, 109–126. [Google Scholar] [CrossRef] [PubMed]
  193. Jiang, P.; Thomson, J.A.; Stewart, R. Quality control of single-cell RNA-seq by SinQC. Bioinformatics 2016, 32, 2514–2516. [Google Scholar] [CrossRef]
  194. Korte, A.; Farlow, A. The advantages and limitations of trait analysis with GWAS: A review. Plant Methods 2013, 9, 29. [Google Scholar] [CrossRef] [PubMed]
  195. Mackay, T.F.C. Epistasis and quantitative traits: Using model organisms to study gene–gene interactions. Nat. Rev. Genet. 2013, 15, 22–33. [Google Scholar] [CrossRef]
  196. Moellers, T.C.; Singh, A.; Zhang, J.; Brungardt, J.; Kabbage, M.; Mueller, D.S.; Grau, C.R.; Ranjan, A.; Smith, D.L.; Chowda-Reddy, R.V.; et al. Main and epistatic loci studies in soybean for Sclerotinia sclerotiorum resistance reveal multiple modes of resistance in multi-environments. Sci. Rep. 2017, 7, 3554. [Google Scholar] [CrossRef]
  197. Assefa, T.; Zhang, J.; Chowda-Reddy, R.V.; Lauter, A.N.M.; Singh, A.; O’rourke, J.A.; Graham, M.A.; Singh, A.K. Deconstructing the genetic architecture of iron deficiency chlorosis in soybean using genome-wide approaches. BMC Plant Biol. 2020, 20, 42. [Google Scholar] [CrossRef] [PubMed]
  198. Neupane, S.; Wright, D.M.; Martinez, R.O.; Butler, J.; Weller, J.L.; Bett, K.E. Focusing the GWAS Lens on days to flower using latent variable phenotypes derived from global multienvironment trials. Plant Genome 2022, 16, e20269. [Google Scholar] [CrossRef]
  199. Lackey, S. Genome-Wide Analysis Contributes to and Promotes Adaptation of the Soybean, Glycine max, to Canadian Agriculture Landscapes; Carleton University: Ottawa, ON, Canada, 2024; 179p. [Google Scholar]
  200. Yoosefzadeh-Najafabadi, M.; Tulpan, D.; Eskandari, M. Application of machine learning and genetic optimization algorithms for modeling and optimizing soybean yield using its component traits. PLoS ONE 2021, 16, e0250665. [Google Scholar] [CrossRef]
  201. Parmley, K.A.; Higgins, R.H.; Ganapathysubramanian, B.; Sarkar, S.; Singh, A.K. Machine Learning Approach for Prescriptive Plant Breeding. Sci. Rep. 2019, 9, 17132. [Google Scholar] [CrossRef] [PubMed]
  202. Caudai, C.; Galizia, A.; Geraci, F.; Le Pera, L.; Morea, V.; Salerno, E.; Via, A.; Colombo, T. AI applications in functional genomics. Comput. Struct. Biotechnol. J. 2021, 19, 5762–5790. [Google Scholar] [CrossRef] [PubMed]
  203. Zhou, X.-X.; Zeng, W.F.; Chi, H.; Luo, C.; Liu, C.; Zhan, J.; He, S.-M.; Zhang, Z. pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning. Anal. Chem. 2017, 89, 12690–12697. [Google Scholar] [CrossRef]
  204. Tran, N.H.; Zhang, X.; Xin, L.; Shan, B.; Li, M. De novo peptide sequencing by deep learning. Proc. Natl. Acad. Sci. USA 2017, 114, 8247–8252. [Google Scholar] [CrossRef]
  205. Alipanahi, B.; Delong, A.; Weirauch, M.T.; Frey, B.J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015, 33, 831–838. [Google Scholar] [CrossRef]
  206. Zhou, J.; Troyanskaya, O.G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 2015, 12, 931–934. [Google Scholar] [CrossRef]
  207. Abadi, S.; Yan, W.X.; Amar, D.; Mayrose, I. A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action. PLoS Comput. Biol. 2017, 13, e1005807. [Google Scholar] [CrossRef]
  208. Krassowski, M.; Das, V.; Sahu, S.K.; Misra, B.B. State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Front. Genet. 2020, 11, 610798. [Google Scholar] [CrossRef] [PubMed]
  209. Luo, F.; Yu, Z.; Zhou, Q.; Huang, A. Multi-Omics-Based Discovery of Plant Signaling Molecules. Metabolites 2022, 12, 76. [Google Scholar] [CrossRef] [PubMed]
  210. Bourgey, M.; Dali, R.; Eveleigh, R.; Chen, K.C.; Letourneau, L.; Fillon, J.; Michaud, M.; Caron, M.; Sandoval, J.; Lefebvre, F.; et al. GenPipes: An open-source framework for distributed and scalable genomic analyses. GigaScience 2019, 8, giz037. [Google Scholar] [CrossRef] [PubMed]
  211. Madrid-Márquez, L.; Rubio-Escudero, C.; Pontes, B.; González-Pérez, A.; Riquelme, J.C.; Sáez, M.E. MOMIC: A Multi-Omics Pipeline for Data Analysis, Integration and Interpretation. Appl. Sci. 2022, 12, 3987. [Google Scholar] [CrossRef]
  212. Tonosaki, K.; Fujimoto, R.; Dennis, E.S.; Raboy, V.; Osabe, K. Will epigenetics be a key player in crop breeding? Front. Plant Sci. 2022, 13, 958350. [Google Scholar] [CrossRef]
Table 1. A summary of the main findings and the reference, as well as which -omics technique was used.
Table 1. A summary of the main findings and the reference, as well as which -omics technique was used.
Main FindingGenomicsTranscriptomicsProteomicsMetabolomicsEpigenomicsPhenomicsReference
Key pathways that confer drought tolerance Plants 13 02714 i001 Plants 13 02714 i001 [165]
Novel QTLs and candidate genes that influence both seed weight and oil contentPlants 13 02714 i001Plants 13 02714 i001 [166]
Molecular mechanisms behind seed oil and protein contentPlants 13 02714 i001Plants 13 02714 i001Plants 13 02714 i001 [167]
Resistance genes and understanding their expression patterns in resistance to SCN Plants 13 02714 i001 Plants 13 02714 i001 [168,169,170]
Genetic loci linked to yieldPlants 13 02714 i001 Plants 13 02714 i001[159]
Key genetic and biochemical pathways involved in phosphorus use efficiency Plants 13 02714 i001Plants 13 02714 i001Plants 13 02714 i001 [171]
Natural history and implications on adaptation and genetic plasticity of Phakopsora pachyrhiziPlants 13 02714 i001Plants 13 02714 i001 [172]
GmSPL9 genes in different combinations alter plant architecture in soybeanPlants 13 02714 i001Plants 13 02714 i001 Plants 13 02714 i001[173]
Roles of phytochromes in regulating photomorphogenesis and flowering timePlants 13 02714 i001Plants 13 02714 i001 Plants 13 02714 i001[175]
Validation of genes involved in soybean’s response to abiotic stresses such as drought and salinityPlants 13 02714 i001Plants 13 02714 i001 Plants 13 02714 i001[176]
E1 early maturity locus in soybean regulates stem growth using the Dt2-Dt1 signaling pathwayPlants 13 02714 i001Plants 13 02714 i001 Plants 13 02714 i001[180,181,182]
Interaction between Dt2 and the promoter region of GmAp1 gene family representativesPlants 13 02714 i001Plants 13 02714 i001 [183,184]
SoyMD, a multi-omics database for information regarding a gene of interest including functional annotation, homology, genetic variation, and epigenetic signalsPlants 13 02714 i001Plants 13 02714 i001 Plants 13 02714 i001Plants 13 02714 i001[187]
SoyOmics, an integrated multi-omics publicly available database that aims to gain a more holistic understanding of soybeanPlants 13 02714 i001Plants 13 02714 i001 Plants 13 02714 i001 [191]
Pathway tools (PTools) allow for genome analysis, metabolic modeling, and analysis of high-throughput data from this databasePlants 13 02714 i001Plants 13 02714 i001 Plants 13 02714 i001 [192]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Haidar, S.; Hooker, J.; Lackey, S.; Elian, M.; Puchacz, N.; Szczyglowski, K.; Marsolais, F.; Golshani, A.; Cober, E.R.; Samanfar, B. Harnessing Multi-Omics Strategies and Bioinformatics Innovations for Advancing Soybean Improvement: A Comprehensive Review. Plants 2024, 13, 2714. https://doi.org/10.3390/plants13192714

AMA Style

Haidar S, Hooker J, Lackey S, Elian M, Puchacz N, Szczyglowski K, Marsolais F, Golshani A, Cober ER, Samanfar B. Harnessing Multi-Omics Strategies and Bioinformatics Innovations for Advancing Soybean Improvement: A Comprehensive Review. Plants. 2024; 13(19):2714. https://doi.org/10.3390/plants13192714

Chicago/Turabian Style

Haidar, Siwar, Julia Hooker, Simon Lackey, Mohamad Elian, Nathalie Puchacz, Krzysztof Szczyglowski, Frédéric Marsolais, Ashkan Golshani, Elroy R. Cober, and Bahram Samanfar. 2024. "Harnessing Multi-Omics Strategies and Bioinformatics Innovations for Advancing Soybean Improvement: A Comprehensive Review" Plants 13, no. 19: 2714. https://doi.org/10.3390/plants13192714

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop