1. Introduction
The advent of advanced high-throughput genomic platforms has provided us with various insights into biological gene regulation. It opines that the expansion of the regulatory potential of the noncoding parts of the genome leads to the evolution of developmental processes that regulate the organism’s complexity [
1,
2]. Only a portion of the genomes (~1–2%) is found to be responsible for protein coding, while many of the regulatory elements get transcribed into non-coding RNA (ncRNA). Among these ncRNAs, long non-coding RNAs (lncRNAs) represent the most prevalent, involved in various biological processes, which may functionally act as RNA rather than getting translated into protein [
3].
Noncoding RNAs which include microRNAs (miRNAs) and lncRNAs are the functional transcripts for gene regulation that are not translated into proteins. The lncRNAs are generally longer than 200 nucleotides and are involved in DNA methylation, chromatin remodeling or enhancing the expression of mRNA targeted by miRNA [
4]. The ncRNAs can be divided into two categories: lncRNAs and ncRNAs (those being smaller than 200 nucleotides). LncRNAs are the RNA transcripts that act as either primary or as spliced transcripts, independent of their length threshold [
5]. This definition supports the presence of some lncRNAs like BC1 and snaR in the lncRNAdb having a length of less than 200 nucleotides.
Different research shows lncRNAs to be involved in several important regulatory and functional roles [
6,
7] such as splicing, transcription, translation, cell cycle, protein localization, flowering, development of pollens, sex differentiation, cellular structure integrity, heat shock response, cancer progression, exhibit cell-specific expression and stress responses [
8], chromatin regulation [
9], development of floral organ and root [
10], etc. In addition, LncRNAs are poorly conserved among different species as compared to mRNAs, snoRNAs (small nucleolar) and miRNAs (micro RNAs). Similar to mRNAs, lncRNAs are transcribed by RNA polymerase II, 3′ polyadenylated, 5′ capped and found to be multi-exonic [
11].
Literature reports the discovery of lncRNAs in plants, such as Arabidopsis [
12], rice [
13], wheat [
14], maize [
15], rapeseed [
16] and cassava [
17], suggesting the role of lncRNAs in many biological processes contributing to the development of plants and their response to various stresses. Various research works have shown the evidence of the role of lncRNAs in abiotic stress, including drought stress but these are very few in number [
17]. There have been instances where lncRNAs are reported to control drought stress tolerance, both in monocots and dicots. Drought-responsive lncRNAs are evident in monocots like rice [
18], maize [
19], foxtail millet [
20], and switchgrass [
21]. These lncRNAs mediate response to drought stress through mechanisms based on eTM, antisense transcription-mediated modulation, chromatin modulation, etc. They may directly regulate the transcription of drought-responsive genes as well [
22,
23,
24].
Owing to the gaining importance of lncRNAs discovery, many lncRNA repositories and databases have been launched during the last 6–7 years. There exist many lncRNA databases, like PLncDB V2.0 which contains 1,246,372 lncRNAs for 80 plant species based on 13,834 RNA-Seq datasets [
25]. A total of almost 200,000 putative lncRNAs from 50 species have been cataloged in Green Non-Coding Database (Gallart et al., 2016) [
26]. Other existing lncRNA databases are EVLncRNAs [
27], lncRNAdb [
28], PLNlncRbase [
29], CANTATAdb [
30] and RNAcentral [
31].
Pearl millet (
Pennisetum glaucum L.) is the world’s sixth most important cereal crop and is mostly grown by the poor and marginal farmers in arid and semi-arid tropics of Asia and Africa, due to its less intensive agronomic practices like less fertilizers and limited irrigation input [
32]. It is cultivated mainly for its grains but also used as feed material for cattle, poultry, fish, etc. Pearl millet has multifarious applications as fodder for animals, feed for poultry, as bread/cookies ingredient (ICRISAT, 2003), as probiotic fermented food [
33], vegetable salad (
http://sprigandvine.in/modern-millet-salad/) (accessed on 4 August 2021), nutraceutical industry [
34] and as a biofuel crop. The crop is able to perform well in drought conditions where most of the cereals like wheat, rice and maize fail. So, understanding the molecular mechanism of the responses of pearl millet to adverse conditions is important. Key candidate genes controlling drought response in Indian pearl millet have been discovered and reported [
35] but lncRNA still remains uncovered. The expression of candidate genes is controlled by microRNA, TFs and lncRNA. The study aims at the identification of drought-responsive lncRNAs and the development of a web of genomic resources for user-friendly access to the investigation findings, which is otherwise lacking in this crop. The discovered lncRNAs can be used in the improvement of important traits as in the case of rice [
36]. Such studies can further be used by integrating CISPR-Cas technology in the editing of ncRNAs in plants for the improvement of their trait [
37]. All of this information has been cataloged in the pearl millet drought-responsive long non-coding RNA database (PMDlncRDB) accessible freely at
http://webtom.cabgrid.res.in/pmdlncrdb (accessed on 10 January 2021). The study aims at the identification of drought-responsive lncRNAs and the development of a web of genomic resources for user-friendly access to the investigation findings, which is otherwise lacking for this crop. This information can be used for pearl millet improvement programs in endeavors of higher production and combating drought in pearl millet.
3. Results
3.1. Data Pre-Preprocessing
The single-end RNASeq data used in this study comprised a total of 12.14 million raw reads. After trimming and adapter removal, the clean reads were mapped over the pearl millet reference genome using HISAT2 software to get the overall aligned rate of 73.46%, 73.53%, 69.81% and 72.95% for LC, LT, RC and RT samples (
Table 1). and recovered 3.35 million mapped reads
Table 1. A total of 39,978 transcripts were generated assembly by StringTie.
Table 2 delineates the summary statistics of transcriptome assembly.
3.2. Candidate lncRNA Prediction
All the transcripts were subjected to the Gffcompare tool, which, on further filtering and differentiating into various class codes, leaves 4834 transcripts having 288 intronic, 4458 intergenic and 88 transcripts with exonic overlap on the opposite strand. After discarding the sequences smaller than 200 nucleotides using perl scripts and predicting the ORFs in the transcripts using the OrfPredictor tool, a total of 3170 sequences were having ORF lengths less than 100 amino acids and 9 sequences with no ORFs were retained for further filtering. CPC2 online tool classified the 3179 input sequences into 36 coding and 3143 non-coding sequences, keeping 0.5 as the threshold value of the coding probability. Blast search against Housekeeping RNAs (rRNA, tRNA, etc.), Pfam, NCBI-nr helps in retaining 879 sequences that did not match with any sequences in the database. Expression of 57 lncRNAs (6 from the intronic region and 51 from the intergenic region) was observed under leaf control condition, 285 lncRNAs (33 from intronic region, 246 from intergenic region and 6 from exonic overlap of opposite strand) was observed under leaf treated, 494 lncRNAs (67 from intronic region, 408 from intergenic region and 19 from exonic overlap of opposite strand) was observed under root control and 210 lncRNAs (30 from intronic region, 179 from intergenic region and 1 from exonic overlap of opposite strand) was observed under root treated.
It was observed that 195 lncRNAs (22.2%), 393 (44.7%), 34 (3.9%) and 126 (14.3%) lncRNAs were unique for leaf treated, root control, leaf control and root treated, respectively. It was also found that 3 (0.3%) lncRNAs were common in all the conditions, which indicates that these are expressed under all the conditions.
Low exon number and the short transcript length were the typical features of lncRNA [
55]. The maximum number of lncRNAs belonged to the length range of 200–400 bp, followed by 400–600 bp under all four conditions. Most of the identified lncRNAs showed the involvement of 1 to 4 exons, as also supported by Li et al., 2014 [
15]; Shuai et al., 2014 [
56]; and Pauli et al., 2012 [
57].
3.3. Identification of Differentially Expressed lncRNAs, Their Characterization and Annotation
Differential expression of the common lncRNAs was found using NOISeq [
50], which uses data-adaptive and non-parametric approaches to calculate the differential expressing transcripts. NOISeq can control the rate of the false discoveries by combining the data of fold change and real expression differences and making a null distribution and finally comparing this null distribution to the observed or actual data. NOISeq is preferably used under low expression values and a lesser number of replicates.
A total of 209, 198, 115 and 194 differentially expressed lncRNAs were discovered for LC:RC, LT:RT, LC:LT and RC:RT, respectively. Out of these 67, 123, 46 and 116 lncRNAs were upregulated and 142, 75, 69 and 78 lncRNAs were down-regulated for LC:RC, LT:RT, LC:LT and RC:RT, respectively. An adjusted
p-value cutoff of 0.05 was set to filter out differential lncRNAs. The up and down-regulated differentially expressed lncRNAs are represented with Volcano plots for all four comparisons (
Figure 2). The dots show the up and down-regulated lncRNAs under different combinations.
Figure 3A shows the distribution of identified lncRNAs along the pearl millet chromosomes where the red dot represents intergenic lncRNA, the green dot is intronic lncRNA and the blue dot is exonic overlap on the opposite strand lncRNA respectively.
Figure 3B shows the graphical representation of the distribution of lncRNAs over the pearl millet chromosomes which can be used for designing different types of chemistry, multiplexing, etc. There is much use for developed lncRNAs as markers for molecular-assisted breeding.
The analysis showed 24 (5.6%), 76 (17.6%), 53 (12.3%), and 59 (13.7%) unique lncRNAs expressed under LC:LT, RC:RT, LC:RC and LT:RT, respectively, while 2 (0.5%) lncRNAs were commonly expressed in all the combinations (
Figure 4). The top 50 differentially expressed lncRNAs were represented as heatmap under LC:LT and RC:RT comparison (
Figure 5).
To test if these lncRNAs could regulate the expression of protein-coding genes as long molecules, lncRNAs were tested for homology to CDS sequences by BLAST with >87% percent identity and >83% query coverage. The result shows that one lncRNA could pair with eight CDS sequences with a very good match, which suggests that this one lncRNA might regulate the expression of eight proteins by inducing transcriptional or post-transcriptional gene silencing. The detailed information of these lncRNAs when mapped with the CDS sequences of pearl millet is presented in
Table 3.
Similarity search of predicted lncRNAs with the known plant lncRNA database CANATAdb keeping e-value less than e-10 (
http://cantata.amu.edu.pl/index.html) (accessed on 1 June 2021). showed 8 out of 879 (~0.001%) lncRNAs identified matched with entries in the database which shows lncRNAs are poorly conserved among species.
Oryza nivara lncRNAs were found to be more closely related to pearl millet lncRNAs (
Table 4).
3.4. Identification of miRNAs Targeting the lncRNAs
Blastn of the lncRNA transcripts against the miRBase database was done to find out the miRNAs with lncRNAs as potential targets. A total of three miRNAs targeting the lncRNAs were found, out of which two were unique. Among the identified differentially expressed lncRNAs, only two lncRNAs (TCONS_00024337 and TCONS_00026046) were targeted by the two miRNAs (dps-mir-2526 and oni-mir-10840). The total number of miRNAs target sites was found to be eight (
Table 5).
Identification of target mRNAs for the predicted miRNAs was also performed. It was found that two of the three predicted miRNAs were found to target different mRNAs of pearl millet. The number of unique mRNAs targeted was 14. The number of interactions between predicted miRNAs and mRNAs was 16 and all the interactions were found to be cleavage inhibitive in nature (
Table 6).
To understand the functional role of mRNAs involved in the network, the GO analysis was performed by employing the BLAST2GO software. The GO functional categorization generated 10 annotations from the 14 predicted mRNAs that were targeted by miRNAs. In that, a total of three, four, and three mRNAs were classified as the first level classification of biological processes, molecular functions, and cellular components, respectively. Among the genes involved in the biological process, three mRNAs each were classified into the categories of “metabolic process (GO: 0008152)” and “cellular process (GO: 0009987)”. In the classification of molecular functions, two main classes were “binding (GO: 0005488)” and “catalytic activity (GO: 0003824)”, which had three and two predicted mRNAs, respectively. When the predicted mRNAs were classified according to the cellular component classification, categories “cell (GO: 0005623)” and “cell part (GO: 0044464)” both made up the largest proportion of three predicted mRNAs, followed by “organelle (GO: 0043226)” that had two predicted mRNAs. The GO analysis on predicted mRNAs showed that the targets of lncRNAs under drought stress were associated with various functions involving different cellular components, biological processes and molecular functions.
The potential functions of the mRNAs categorized in the GO terms were also found. GO-based functions of mRNAs in pearl millet-drought were identified as Orcinol O-methyl transferase activity, Lignin biosynthetic process, Melatonin biosynthetic process, flavone and flavonoid biosynthesis process, phenylpropanoid biosynthesis pathway, Integral components of membrane and Structural constituent of ribosome. These functions carried out by the mRNAs are involved in the regulation of drought stress responsive mechanisms in soybean. So, when the miRNAs that are responsible for silencing these mRNAs get bind to the circRNAs, they remain no more available to regulate the mRNAs and so the genes become free to carry out their functions and help the plants in tolerating the drought stress conditions.
3.5. Development of Pearl Millet lncRNAs Database (PMDlncRDB)
The PMDlncRDb (URL:
http://webtom.cabgrid.res.in/pmdlncrdb/ (accessed on 10 January 2021)) is a three-tier architecture database containing information about the drought-responsive lncRNAs of pearl millet crop. It contains information on 879 lncRNAs from four samples, namely, leaf control, leaf treated, root control and root treated, miRNAs targeting it, target mRNAs of the miRNAs, etc. Users can browse through the submitted lncRNAs on several criteria like length, number of exons, FPKM values and their position with respect to the coding genes. The submitted lncRNAs have length values between 200 and 1013 nucleotides.
Figure 6 is the web interface of PMDlncRDb for various search options.
4. Discussion
4.1. Data Pre-Processing
Pre-processing of the Next Generation sequencing data before going down further in the RNA-seq pipeline is very important as the raw data may contain low-quality reads, noise, adaptor sequences, etc. which deteriorates the quality of assembly. RNA-seq data for the four conditions of the pearl millet crop was downloaded from the SRA database of NCBI and visualized using the FastQC tool [
38] which gives out graphs and statistical figures for the assessment of the sequence reads. Trimmomatic [
39] software with single-end read parameters was used to filter out the bad quality reads to finally give the cleaned trimmed reads, which were further used in the analysis.
4.2. Candidate lncRNA Prediction
Although the role of lncRNAs in drought stress has been reported in plants like wheat [
46], cassava [
17], etc. its role in pearl millet drought tolerance is still undercover. Though transcriptome studies on proteome [
58] and transcriptome signature [
35] of pearl millet against the drought stress have been conducted, a comprehensive study on pearl millet’s lncRNAs contributing to its drought tolerance needs to be investigated knowing the role of lncRNAs in stress conditions already identified in Brassica napus [
59], rice [
13], mulberry [
10], tomato [
60], Populus [
61], wheat [
62], etc. In our study for the steps of mapping the reads to the reference genome to assembly and estimating the abundance of the assembled transcripts, “new tuxedo” package [
42] programs have been used which includes HISAT2 for reads alignment, StringTie for transcripts assembly and abundance estimation. In our study of the identification of drought-responsive lncNAs, we found only 879 lncRNAs. The reason for the lesser number of lncRNAs is the expression of lncRNAs at very low levels as in the case of tomato [
63]. Most of the lncRNAs identified have only one exon in them, which may be due to their smaller size. We also observed a few lncRNAs having 3–4 exons, but they are lesser in number. The identified lncRNAs were divided into three groups on the basis of their position with respect to the protein-coding genes in the reference genome of pearl millet. Intergenic lncRNAs are present in between the coding genes, intronic lncRNAs which are completely in intronic regions and the lncRNAs having exonic overlaps on the opposite strand. Most of the lncRNAs were found to be intergenic in nature. Our study identified 57, 285, 594 and 210 lncRNAs in leaf control, leaf treated, root control and root treated conditions, respectively. The highest number of lncRNAs in root control conditions is potentially due to the much role of root-related mechanisms in the growth and development of plants. Under normal conditions, we expect a higher number of lncRNAs in the roots of plants, especially during the stress condition, but in our study, a limited number of lncRNAs were identified, which may be because of certain growth and development related pathways being slowed down or switched off. Under water stress conditions, the first response of plants is the closing of stomata, which in turn controls the CO
2 uptake, photosynthesis and transpiration of water [
58]. We got a higher number of lncRNAs in leaf-treated conditions in our study in response to self-defense against drought.
4.3. Identification of Differentially Expressed lncRNAs and Their Annotation
To compare whether the lncRNAs express differently in roots and leaves in control and treated (drought) conditions, four comparisons were made to compare the difference in expressions in roots and leaves in control and treated conditions individually and also the difference in lncRNAs expression in root and leaves in control and treated conditions separately. We found 209 (67 upregulated, 142 downregulated), 198 (123 upregulated, 75 downregulated), 115 (42 upregulated, 69 downregulated) and 194 (116 upregulated, 78 downregulated) differentially expressed lncRNAs in LC:RC (leaf control, root control), LT:RT (leaf treated, root treated), LC:LT (leaf control, leaf treated) and RC:RT (root control, root treated) conditions, respectively. Differential expression of lncRNAs clearly depicts its regulatory roles in drought response in pearl millet.
A similarity search was performed against the known plant database CANTATAdb which resulted in only 9 instances of our identified lncRNAs matching with any previously existing lncRNAs in the database. This result is in concordance with the studies of Ma et al. (2013) [
64] and Sahu et al. (2018) [
65], which stated that as compared to other RNA transcripts like mRNAS, miRNAs, snoRNAs, etc. lncRNAs are poorly conserved. Ma et al. (2013) [
64] explain this as lncRNAs being functional in a species-specific manner. Most of the matches were found to be with
Oryza nivara, which is also a member of the poaceae family, this also restricts the conserved nature of lncRNAs to species other than the same family. Our analysis shows that most of the lncRNAs of pearl millet are specific to the pearl millet itself. Studies on lncRNAs with well-defined functions are reported, for example, LDMAR, cis-NAT, PHO1;2, TL, LAIR in rice [
66,
67,
68,
69], Enod4 in rice, maize, legumes and soybean [
70,
71,
72], HvCesA6 in barley [
73] and WSGAR in wheat [
74]. LncRNAs have also been seen to regulate the expression of protein-coding genes as long molecules [
75]. Our lncRNAs when blasted with the CDS sequences of pearl millet showed only one result of lncRNA corresponding to eight protein-coding genes of pearl millet, which supports that the lncRNA can regulate the function of those proteins by transcriptional or post-transcriptional silencing [
76]. Also, studies show lncRNAs to be important modulators of drought tolerance in plants as reported in various crops like rice, wheat, maize, sorghum, tomato, coffee, cassava and peanut, etc. [
77]. The identified lncRNAs in this study can be of immense use for future studies of their association with gene expression.
4.4. Identification of miRNAs Targeting the lncRNAs
To evaluate whether lncRNAs in pearl millet could affect post-transcriptional regulation of functional genes by binding to miRNAs, the bioinformatics methods were employed to identify the lncRNA-originating target mimics in pearl millet based on the differentially expressed lncRNAs. In our study, we found three miRNAs targeting the lncRNAs [
78]. A total of eight interactions between the lncRNAs and miRNAs were found.
4.5. lncRNA-miRNA-mRNA Interaction
A total of 14 mRNAs were found to be potential targets for the identified miRNAs. The functions of the predicted mRNAs as reported by BLAST2GO in pearl millet under drought showed that the mRNAs were involved in plant hormone signal transduction, response to stress, defense response mechanism, transcription factor activity, Orcinol O-methyl transferase activity, Lignin biosynthetic process, Metalonin biosynthetic process, Flavanol biosynthetic process, Integral components of membrane, Structural constituent of ribosome, etc. [
78]. Flavone and flavonoid biosynthesis pathways (Pathway ID: ko00944; Ec:2.1.1.42) found in our study are reported to have key significance in plants under drought stress [
79], specifically in Ligustrum vulgare [
80] and peanut [
81]. Similarly, the phenylpropanoid biosynthesis (Pathway ID: ko00940, Ec:2.1.1.68) reported in our study is in concordance with the previous studies related to drought stress in crops like apple [
82], foxtail millet [
83] and tomato [
84].
4.6. Development of Pearl Millet lncRNAs Database (PMDlncRDB)
There exists a drought-associated web genomic resource of pearl millet, PMDTDb but there is no information on lncRNAs (Jaiswal et al., 2018). The developed web genomic resource, PMDlncRDB catalogs the identified drought-responsive lncRNAs. It is the first such resource for pearl millet lncRNAs. It contains information like lncRNA sequence, ID, length, peptide length, number of exons, miRNAs targeting the lncRNAs, etc. This will help researchers as a foundation for understanding the role of lncRNAs in pearl millet crop and further studies in improving crop performance in drought stress through selective breeding approaches. The developed genomic resource of lncRNAs in this study can be of immense use for future studies of their association with gene expression.