1. Introduction
Soil and water salinization are among the global challenges limiting the growth and production of crop plants [
1]. Elevated salinity levels can significantly impact plants and animals, particularly in communities that are located near sea water and in environments with high concentrations of salt. The sustainability of terrestrial animals can be negatively influenced by salinity [
2], unlike the case for aquatic animals such as fish, crabs, and waterfowl that are acclimated to this environment. Additionally, higher salt concentrations could severely impact plant life that grows in these shoreline regions [
3]. Furthermore, the use of irrigation systems in farming for watering practices may affect plants by the quantity of nutrients introduced to them, having a downstream effect on the surrounding environment. Excess nutrients, such as nitrogen and phosphorus, can be carried through runoff and land in nearby waterways, lakes, or groundwater, impacting unintended areas through eutrophication [
4]. Due to the adverse effects of high salt concentrations, areas surrounding high-salinity volumes of water are less desirable in the planning and dedication to planting crops [
5]. High salt concentrations can present challenges in many dietary staple crop species such as rice (
Oryza sativa), which is the most salt-sensitive cereal crop [
6,
7,
8], and is necessary for the sustainability of many populations across the globe; notably in the Southeast Asian region [
9]. Here, increased salinity can account for up to 69% yield loss in some instances. An additional crop that can be impacted by high salt is the lima bean (
Phaseolus lunatus), which is a staple grown in the state of Delaware. Salinity can negatively impact lima beans and cause physiological changes in lima beans exposed to increased levels of salt in water irrigation systems [
10].
Transcriptome studies can help to reveal the genes and regulatory mechanisms associated with abiotic stresses such as salt stress by examining upregulation (increase in gene expression) and downregulation (decrease in gene expression) in response to stresses [
11]. Over the past decades, transcriptome analyses have been a key factor in examining gene expression data for numerous organisms, including various plant species. Thale cress or mouse ear cress, more commonly known as Arabidopsis (
Arabidopsis thaliana), is one of the earliest publicly available sequenced genomes and transcriptomes and still serves as a vital reference for comparative studies among plant species. Many of the publicly available sequence databases such as the National Center for Biotechnology Information (
www.ncbi.nlm.nih.gov), the European Bioinformatics institute (
www.ebi.ac.uk), and the Plant Comparative Genomics portal of the Department of Energy’s Joint Genome Institute (
phytozome-next.jgi.doe.gov) include data assembled from Arabidopsis, as it is one of the easier plants to propagate and has a smaller genome. To this effect, several transcriptomic studies have been conducted using Arabidopsis as a reference including fungal-rust-challenged common bean (
Phaseolus vulgaris) [
12]; switchgrass (
Panicum virgatum) under heat and drought stress [
13]; and our current study in salt marsh grass, also identified as cord grass or smooth cord grass (
Spartina alterniflora), naturally occurring under saltwater conditions.
Salt marsh grass (
S. alterniflora), more recently known as
Sporobolus alterniflorus [
14], is a halophytic, salt-tolerant plant, growing in areas in which most plant species would be negatively impacted by the high salinity concentrations. Salt marsh grass thrives in seawater, which can be as salt concentrated as 3.5% (
w/
w) salinity [
15]. Salt marsh grass can also act as a physical barrier along coast lines, useful for its restorative properties in areas of erosion [
10] and for protection of the shoreline [
16]. Salt marsh grass may also provide a living environment for some aquatic animal species such as turtles, birds, and fish, as well as serving as a key species to study salt tolerance. The salt marsh grass used in the current study grows in Delaware coastal regions where the salinity varies depending upon the location, with northern shorelines typically presenting a lower salt concentration. Areas in the Southern Delaware coastal regions tend to be more highly salt concentrated, which does not appear to have a negative physiological impact on the plant’s ability to flourish and develop. Because of this, salt marsh grass may offer insight into the plant’s ability to withstand high salinity levels, which will be critical as sea levels continue to rise [
17]. In Li et al., 2018, the comparative effects of salinity stress on the marsh grass types
S. alterniflorus and
Phragmites australis compounded with water logging stress revealed negative effects on the photosynthetic process and shoot growth in
P. australis but negligible effects on
S. alterniflorus. Therefore, it can be determined that Sporobolus has a high salt tolerance; moreover, the high salinity can also serve as a major factor in its nutrient uptake and sequestration. In 2010, the Deepwater Horizon explosion and oil spill negatively impacted plants and wildlife in the vicinity of the spill. Interestingly, salt marsh grass that flourished near the region of the oil spill proved resilient in its growth [
18]. This information may also be beneficial in exploring the impact of salinity and mitigating techniques on crops cultivated for human consumption [
19]. Our capability to improve and maintain crop growth in adverse conditions such as global temperature changes and increased drought conditions has the capacity to be impactful to human survival. Although there are additional transcriptome studies identifying the differential expression of salinity genes in Sporobolus species or salt marsh grass [
20,
21], there is still room for the development of additional information on the
S. alterniflorus species. Furthermore, our sample analyses were conducted using naturally occurring marsh grass plants derived from the uncontrolled environment. The samples used in this study give a snapshot of the environmental conditions in the Delaware coastal regions.
In this study, our key objectives are to (1) develop a comprehensive reference transcriptome for salt marsh grass from naturally growing samples collected from high (>25 ppt), medium (~15–20 ppt), and low (<5 ppt) salinity concentrations of water along the Delaware coastal regions, (2) identify differentially expressed genes that are associated with the salinity conditions derived from various salt concentrations in the environment in which the collected plants naturally grew, and (3) analyze the genetic basis of the salt stress response. In addition to identifying genes associated with salt stress tolerance, we identified Gene Ontology (GO) terms and pathways related to the salt stress response. Furthermore, molecular genetic analysis of the samples, used in conjunction with transcriptomic data gained from this study, may help in identifying genes that are associated with salinity tolerance such as nutrient-uptake-, desiccation-tolerance-, and drought-related genes. The goal of the current research is to identify the differential effects of various saltwater concentrations on salt marsh grass gene expression. We intend to identify the genes associated with a gradient salinity concentration by analyzing the transcriptomic sequence data derived from salt marsh grass tissues collected along the Delaware coastline. Information gained from this study, such as differentially expressed genes (DEGs), functional enrichment, and identification of Kyoto Encyclopedia of Genes and Genomes (KEGG) categories, may help in elucidating the beneficial and unfavorable effects of salinity tolerance and susceptibility in other related crops that are used for human consumption such as rice and millet.
3. Discussion
The focus of the reported study was to develop a reference transcriptome and to identify differential expression in salt-related genes in marsh grass samples collected from areas varying in salinity concentrations. We successfully developed a high-quality reference transcriptome utilizing environmental samples collected at Delaware Inland Bays and beaches that are representative of salt concentration gradients in the local region. Of the approximately 211 K contiguous sequences that were identified and annotated, more than 400 are recognized as being associated with salinity, salt, and sodium for one or more of the reference datasets used for comparison. Our BLAST summary statistics detailed in
Table 1 identify non-redundant Viridiplantae, rice, and foxtail millet as being 75% or more aligned to the more than 200 K contigs. This demonstrates marsh grass’s similarity to foxtail millet and rice for this comparative study.
We employed two methods, DESeq2 and EBSeq, to identify differential expressions between sample types. Read mapping uncertainty is frequent among de novo assembled transcripts. To account for possible ambiguity, we used an empirical Bayesian (RSEM-EBSeq pipeline) analysis tool in addition to DESeq and presented the results in
Table 2. and
Figure 1. When the RSEM-EBSeq pipeline was run to find DE contigs for multiple conditions, additional files of “Patterns”, which define all expression patterns over the conditions, and “condition means”, which give normalized mean values for each transcript at each condition, resulted. These condition mean values were used to calculate the fold change (FC) and log2FC between the two conditions and are provided in
Table 3.
Since the study sought to identify differential expression in salt-related genes, the focal point of the qPCR analysis centered on choosing contigs associated with gene expression in genes relating to salinity. In
Figure 2, we created a heatmap dedicated to the sequence data derived from more than 400 contigs associated with salt-relatedness over the LS, MS, and HS samples. We selected three contigs (
Table 4) using comparative expression analysis as the focal point to perform further analysis through qPCR and created a smaller heatmap dedicated to those data (
Figure 3).
Figure 4 presents our qPCR expression in each sample across the three targets. Contrary to the sequence data, we observed higher expression in the HS sample in targets 32115 and 256589. We also noted that there was variation among the MS replicate samples. This was apparent through visual observations of the overall 419-contig heatmap (
Figure 2).
In addition to the targets designed from the contigs, we used BLAST analysis to identify other genes for qPCR analysis. Sequences were chosen from comparisons (HS vs. LS) exhibiting a higher than two log2 fold change from the differential gene expression files derived from transcriptome development. The chosen sequences were BLASTed against rice, foxtail millet, and other related species to identify candidate genes for target primer development. Again, we anticipated higher variability in the expression data because of the variation in the transcriptome-derived data.
Table 5 details the comparative expression analysis for the contig-derived primers. The table details data for sample name, target name, RQ (relative quantification), RQMIN (RQ minimum), RQMAX (RQ maximum), CT (cycle threshold), CTSD (CT standard deviation), DCT mean, and DDCT.
Although we anticipated identifying differential expression among the samples mostly within the genes associated with salt, we found many genes with a differential expression. The comparative data from the three sample gradients including biological replicates indicate an overlap of expression across sample types. The transcriptome analysis revealed the regulation of genes for transcription factors (TFs) and stress-responsive proteins related to salt stress as well as other genes, such as those associated with the KEGG IDs identified in
Table 6. However, there were several TFs that were unassociated with KEGG IDs, for example, the contig “tri_2672_c0_g1_i1” annotated as a basic helix–loop–helix (bHLH) DNA-binding superfamily protein transcription factor was differentially expressed in HS vs. LS, HS vs. MS, and MS vs. LS samples from both DESeq2 and EBSeq analyses. The contig had the best hit in the rice genome as LOC_Os04g41570.1 was downregulated in both HS vs. LS and HS vs. MS samples. Interestingly, the contig tra_67737_c0_g1_i1 with rice hit LOC_Os08g41320.1, also annotated as the bHLH transcription factor, was upregulated in salt marsh grass. Several TFs including bHLH have been identified to play a critical role in abiotic stresses such as drought, salinity, and cold stress [
29].
The contig “tri_13190_c0_g1_i1” with rice hit LOC_Os02g08440.4 encoding WRKY DNA-binding protein 40 was upregulated in both HS vs. LS and HS vs. MS samples. Similarly, the contig “tri_247732_c0_g1_i1” with rice hit LOC_Os01g18584.1, annotated as a WRKY family TF, was also upregulated in MS vs. LS samples. WRKYs are significant family TFs associated with plant development, defense regulation, and stress responses [
30]. The WRKY family of TFs has been identified to respond to salt stress. Interestingly, contrary to our findings, ZmWRKY
114 was downregulated in maize during salt stress [
31]. In addition, ion transport genes such as NRAMP metal ion transporter family protein with the contig tri_244640_c0_g1_i1 and a rice hit LOC_Os07g06130.1 was upregulated more than two-fold under high salinity stress in HS vs. LS samples. This finding supports previous studies [
20,
21] on the expression of salt-sensitive genes. Our study has revealed not only salt-tolerant genes but also key genes and transcription factors involved in abiotic stresses associated with salt. The current study identifies the Gene Ontology (GO) signatures associated with salinity relatedness, characterizing salt and sodium biological processes. As mentioned previously, the two GO terms most represented in the analyzed samples were GO:0055085 and GO:0016021, which are annotated as “sodium/calcium exchanger proteins” in foxtail millet and as “cation exchanger” in rice. Additionally, the KEGG pathway annotation of the DEGs or contigs showed several stress-response pathways, such as phenylpropanoid biosynthesis, amino acid biosynthesis, photosynthesis, glutathione metabolism, and MAPK signaling pathways. These significant pathways have been similarly reported in a previous salt-response study [
1,
21] and in drought and heat stress [
13].
As the focal point for the current study is the impact of sodium on salt marsh grass and its ability to tolerate salt stress, there is still room for detailed analyses and characterization of other nutrients and their effects on related plant species. In our qPCR analyses, we encountered variation among biological replicates from the same collection site and salinity profile. We found variations and juxtapositions between the transcriptomic and qPCR analyses that displayed conflicting gene expression values between samples. In essence, the environmental samples present unregulated influences from factors that are not related to salt stress or tolerance but may have an impact on downstream molecular analyses. Unregulated influences such as human interaction, accessibility, or uncontrolled rain events can have an impact on sample collection and variation. When collecting samples, we took great efforts to collect during periods in which rain events had not occurred for five or more days. Water samples were measured before collecting the marsh grass samples to ensure the salinity concentration was at a desired level. Future steps can include controlled treatments of related plant species under similar conditions to mimic the effects of salt on the physiological aspects and gene expression profiles. We can theorize that plant species related to marsh grass may exhibit similar characteristics under parallel salt gradients. The implications may be very beneficial to the planning and planting of staple crops in traditionally less desirable landscapes for the sake of sequestering more amounts of useful land. This will require additional research including plant growth, treatment, sequence analysis, candidate gene identification, and primer design for molecular analysis.
4. Material and Methods
4.1. Sample Collection
Salt marsh grass leaf tissue samples were collected from three Delaware coastal regions identified as having high (>25 parts per thousand, ppt), medium (15–20 ppt), and low (0–5 ppt) salinity and served as three biological replicates. The three regions included Rehoboth Bay (Rehoboth watershed) latitude: 38.62922° N; longitude: 075.0718° W, Cedar Creek (Mispillion River watershed), latitude: 39.04792° N; longitude: 075.39253° W, and Murderkill River, latitude: 39.01350° N; longitude: 075.46378° W, for high-, medium-, and low-salinity samples, respectively. Three technical replicates were collected for each of the three biological replicates. Three individual plants were collected from three different sites totaling nine samples (three from high salt, three from medium salt, and three from low salt concentration). The Samples were collected for each and transported to the laboratory for debris removal and further processing. The leaves were sorted to remove dry or brown leaf debris and briefly rinsed in type 2 Millipore water. Next, the samples were flash frozen in liquid nitrogen and stored in a −80 freezer for the subsequent steps. The samples were stored by biological replicate and technical replicate for each. Tissues from the technical replicates were ground with liquid nitrogen in a mortar and pestle and split into two batches: one for downstream RNA-Seq transcriptome analysis and one for ChIP-seq epigenome analysis.
4.2. Library Preparation and Sequencing
The total RNA was isolated for each technical replicate from the tissues aliquoted for transcriptome analysis by using the TRIzol reagent (Invitrogen, Waltham, MA, USA). RNA samples were quantified by Qubit (New York, NY, USA) and checked for quality using gel electrophoresis. The quantified RNA samples were normalized to 2 µg to achieve consistent concentrations across the samples. Sequence libraries were constructed for each sample using the TruSeq RNA Library Prep Kit v2 from Illumina (San Diego, CA, USA). The completed libraries were checked for successful library development by assessment on gel electrophoresis. Samples were delivered to the Delaware Biotechnology Institute at the University of Delaware for sequencing. The completed sequencing data were forwarded to the Bioinformatics Core at Purdue University for bioinformatic analysis.
4.3. Transcriptome Assembly
KmerGenie (v 1.6982) [
32] was used to find the best possible kmer for assembly based on the quality-control reads. Kmer 67 was determined to be the best for performing transcriptome assembly.
The quality-trimmed reads were assembled using three tools—Trinity (v 2.10) [
22], Transabyss (v 2.0.1), and SOAPdenovo-Trans (v 1.0.4) [
23]. Assembly using Trinity was performed using the default parameters except for read normalization. Trinity does not have an option to change the kmer value, so the default kmer value of 25 was used. The SOAPdenovo-Trans and Transabyss assemblies were also performed using default parameters and kmer 67.
The contigs with 500 nucleotides or more from all the assembly programs were concatenated using the “merge assembly” script, which uses TransDecoder LongOrf (v 3.0.1) and cd-hit-est (v 4.8.1) [
33]. This script has the following parameters: 100 m, i.e., at least 100 amino acids peptide size after running TransDecoder will be retained, and aS 1.00, i.e., an alignment coverage threshold for shortest sequence will be used for cd-hit clustering.
4.4. Differential Expression Analysis
The quality-trimmed reads were mapped against the transcriptome using bowtie2 (v 2.3.5.1) [
34] with default parameters. RSEM (RNA-Seq by expectation maximization) (v 1.3.3) [
27] within the Trinity plugin was used to estimate transcript abundance at the gene level. RSEM generates a normalized measure of transcript expression that considers the transcript length, the number of mapped reads to the transcript, and the total number of reads that mapped to any transcript. RSEM also reports other normalized values such as “fragments per kilobase transcript length” (FPKMs) and “transcripts per million transcripts” (TPMs). These estimated fragment counts (expected counts) at the gene level (based on RSEM) are used by DESeq2 and EBSeq to find differentially expressed contigs. RSEM-EBSeq (rsem-run-ebseq script) can find DE (differentially expressed) contigs for two conditions (e.g., HS vs. MS) at a time as well as find DE contigs for multiple conditions (HS, MS, and LS) at once. DE contigs at a false discovery rate (FDR) of ≤0.05 can be obtained using the rsem-controlfdr script. When the RSEM-EBSeq pipeline was run to find DE contigs for multiple conditions, additional files of “Patterns”, which define all expression patterns over the conditions, and “condition means”, which give normalized mean values for each transcript at each condition, were compiled. These condition mean values were used to calculate the log (base 2) fold change (log2FC) and fold change (FC) between the two conditions. The result files from the two-condition testing and the multiple-condition testing are provided in the
Supplementary Materials. Differential expression analysis among the samples was conducted using “R” (Version 4.0.2). A basic exploration of the expected counts generated from RSEM, such as accessing data range and library sizes, was performed to ensure data quality. DESeq2 (v 1.28.1) [
26] was used to identify differentially expressed features, in which normalized factors were calculated and gene-wise dispersion was estimated. Empirical Bayes shrinkage was also applied for the dispersion estimation. In DESeq2, differences between the negative binomial distribution of counts for the two experimental conditions were evaluated to identify differential gene expression (DGE) with
p-value and adjusted
p-values of FDR to correct for multiple tests. The differentially expressed contigs detected by the two methods (DESeq2 and EBSeq) were compared in a Venn diagram created using Venny (
http://bioinfogp.cnb.csic.es/tools/venny) (accessed 1 February 2023). The Venn diagrams and common and unique differentially expressed contig IDs between these methods are provided here.
The resulting sequences were selected by salt/sodium/salinity relatedness and analyzed for upregulation vs. downregulation by two-way comparison between the samples. Log2 fold change values were sorted from highest to lowest to determine upregulation vs. downregulation. For example, in the Excel datasheet “MVLDESEQ”, medium-salinity samples vs. low-salinity samples are compared and sorted by log2 fold change to determine gene expression values. This was carried out for all the comparative datasets including “HVLDESEQ”, which is high vs. low, and “HVMDESEQ”, which is high vs. medium.
4.5. cDNA Preparation
Complementary DNA (cDNA) samples were produced for qPCR analysis using ProtoScript® II First Strand cDNA Synthesis Kit (New England Biolabs, Ipswich, MA, USA) following the manufacture’s protocol. Each sample began with 1 μg of total RNA and was quantified with NanoDrop after cDNA conversion. In addition to reverse-transcribed RNA, a no-template water control and a no-enzyme control were both included.
4.6. qPCR Analysis
The real-time quantitative polymerase chain reaction (qPCR) was performed using Applied Biosystems™ 2X SYBR™ Green PCR Master Mix (Thermo Fisher Scientific, Waltham, MA, USA). We followed a protocol published in
Nature Protocols for obtaining relative quantification using the DDCT method [
35]. An Applied Biosystems 7500 Real-Time PCR system was used to perform qPCR analysis with primer sets and samples. For each qPCR analysis, samples were performed in triplicate for each LS, MS, and HS in addition to a water control. In addition to the target gene primers, actin gene primers were used as an endogenous control. The run method for qPCR included two initial hold cycles of 50° for two minutes followed by 95° for ten minutes, then 40 cycles of 95° for 15 s followed 60° for one minute.