3.1. Overview of Transcriptome Sequencing and Assembly
In this study, to explore the physiological changes occurring during
R. pedestris development, seven transcriptome libraries were constructed and sequenced using the Illumina HiSeq6000 platform. As shown in
Table 1, the transcriptome libraries of eggs (E), 1st-instar nymphs (N1), 2nd-instar nymphs (N2), 3rd-instar nymphs (N3), 4th-instar nymphs (N4), 5th-instar nymphs (N5) and adults (A) generated 44,548,968, 45,220,988, 44,478,730, 57,621,498, 40,174,406, 47,482,056 and 50,791,284 raw reads, respectively, and produced 43,520,034, 44,664,702, 43,384,364, 56,296,668, 39,511,894, 46,750,128, 49,596,154 clean reads, respectively, with a higher Q20 percentage and GC content after performing quality control, filtering data and removing low-quality or redundant reads. Furthermore, a total of 60,058 unigenes were obtained from 141,789 high-quality transcripts (with a 1469 bp average length and a 2642 bp N50 length) with an N50 length of 2126 bp and average length of 1199 bp (from 301 to 32,322 bp) using de novo assembly, and only 5.5% of unigenes were fragmented, while 3.8% were determined to be missing by BUSCO assessment (
Table 2,
Figure 1,
Figure S1). Finally, the raw data, containing untrimmed data, have been deposited in the NCBI SRA database under the accession number PRJNA668857. Overall, we obtained abundant raw reads and assembled unigenes, higher than expected total number of genes found in other true bugs, which may establish a foundation for further functional annotation and classification analyses [
29].
3.2. Functional Annotation and Classification of Unigenes
To analyze the putative function of unigenes, BLASTx tools (e-value < 1.0 × 10
−5) were employed to search against the Nr, Nt, KO, SwissProt, Pfam, GO, and KOG databases. The results suggested that a total of 2,029 (3.48%) unigenes were annotated in all databases and that 25,881 (43.09%) were annotated in at least one database (
Table 3). In addition, 19,449 unigenes (32.38%) had significant matches in the Nr database. A total of 68.8% of unigenes had significant homology (<1.0 × 10
−30) to previously reported sequences by E-value distribution analysis (
Figure 2A). The results of similarity distribution indicated that 82.7% of unigenes had more than 60% similarity to sequences against the Nr database (
Figure 2B). Finally, the species distribution matched in the Nr database showed that 9,527 unigenes (49.5%) were similar to
Halyomorpha halys followed by
R. pedestris (1611 unigenes, 31.2%),
Cimex lectularius (972 unigenes, 8.4%), and
Trichuris trichiura (584 unigenes, 5%) (
Figure 2C).
Furthermore, we characterized all assembled unigenes by GO, KOG, and KEGG databases to annotate their potential functions. First, all unigenes were selected for annotation with the GO database, suggesting that a total of 17,364 (28.91%) unigenes were mapped into three categories, including molecular function, cellular component, and biological process (
Figure 3A). In brief, the top 3 biological processes were 9700 unigenes involved in cellular processes, 8742 unigenes involved in metabolic processes, and 7926 unigenes involved in single-organism processes; the top 3 molecular functions were 9095 unigenes involved in binding, 7285 unigenes involved in catalytic activity, and 1728 unigenes involved in transporter activity; and the top 3 cellular components were 5634 unigenes involved in cell parts, 3692 unigenes involved in membranes, and 3286 unigenes involved in macromolecular complexes (
Table S1). Second, in total, the KOG classification divided 7594 (12.64%) unigenes into 26 functional categories (
Figure 3B,
Table S2). Among the functional classifications, the largest group was general function prediction (1137, 14.97%) followed by signal transduction mechanisms (865, 11.39%), posttranslational modification, protein turnover and chaperones (786, 10.35%), and translation, ribosomal structure, and biogenesis (612, 8.06%). Third, a total of 4973 unigenes (8.28%) were classified into five KEGG pathway functional categories, including cellular process (764 unigenes), environmental information processing (702 unigenes), genetic information processing (911 unigenes), metabolism (1826 unigenes), and organismal system (1127 unigenes), within 228 known KEGG pathways (
Figure 3C,
Table S3). In addition, signal transduction was the predominant group, including 216 unigenes, followed by translation (428 unigenes) and transport and catabolism (356 unigenes). Taken together, these data may elucidate the functions of unigene and provide valuable resources for further analysis of the
R. pedestris transcriptome.
3.3. Detection of SSRs and SNPs
Simple sequence repeats (SSRs) are repeated sequences of 1 to 6 bp of DNA that have conserved flanking sequences and could be used for genomic mapping, evolutionary genetics, marker-assisted breeding, and marker-assisted selection in various species [
25]. Therefore, we screened all unigenes of the
R. pedestris transcriptome dataset to determine the nature and frequency of SSRs. In this research, a total of 35,158 SSRs from 60,058 unigenes were selected, and 7738 sequences containing more than one potential SSR were identified by searching for di-, tri-, tetra-, penta-, and hexanucleotide repeats (
Table 4,
Figure S2). Among all identified SSRs, the dinucleotide repeats numbered 6940, the majority of microsatellite repeat units followed by trinucleotide (1193), tetranucleotide (105), and hexanucleotide repeat motifs (5), which is similar to results of some previous reports on
Aphis aurantii and
Adelphocoris suturalis [
29,
30]. In addition, in the dinucleotide repeats, 4313 AT/AT were the primary types, and eight CG/CG were the minimum-respect sequences identified. The maximum and minimum trinucleotide repeats were AAT/ATT (751) and CCG/CGG (3), respectively. Moreover, to identify candidate SNP/INDEL positions in the
R. pedestris transcriptome, we used SAMtools and VarScan software to align primitive sequences with all unigenes. As
Table 5 shows, a total of 715,604 potential SNPs were identified from the seven transcriptome libraries, including 412,541 transition SNPs and 303,063 transversion SNPs. Furthermore, SNPs were typically located in the first (75,067), second (19,199), and third (92,709) codon positions.
These results provide large-scale genetic and genomic resources for research on the prevention of
R. pedestris outbreaks. In keeping with other biological systems, we obtained 715,604 SNPs and 35,158 SSR variants in this study, which will be applicable in further pest control strategies in Chinese agriculture [
29,
31]. In summary, all candidate molecular markers selected in the
R. pedestris transcriptome may provide more useful evidence for investigating genetic conservation, constructing genetic maps, and identifying genetic signatures of selection.
3.4. Identification and Annotation of Differentially Expressed Genes between Developmental Stages of R. pedestris
To identify differentially expressed genes, a false discovery rate (FDR) < 0.001, an absolute fold-change > 2 and a
p-value < 0.05 were utilized to calculate statistics for DEGs by using RSEM software within six comparison groups (E vs. N1, N1 vs. N2, N2 vs. N3, N3 vs. N4, N4 vs. N5, N5 vs. A). As a result, a number of read counts for each gene were obtained, and FPKM analysis was conducted accordingly. As shown in
Figure 4, the FPKM distribution, interval, and density levels demonstrated that the gene expression quantity changed at different developmental stages of
R. pedestris.
Overall, there were 2615 DEGs, with 1469 upregulated and 1146 downregulated unigenes, in the comparison between eggs and 1st-instar nymphs (
Figure 5A). Similarly, between 1st- and 2nd-instar nymphs, 1461 unigenes showed significant expression changes, with 585 unigenes upregulated and 876 unigenes downregulated (
Figure 5B). Next, the results of comparative analysis between 2nd- and 3rd-instar nymphs illustrated 7513 unigenes with significant expression changes, including 3311 upregulated unigenes and 4202 downregulated unigenes (
Figure 5C). The maximum number of DEGs (9606 unigenes) was in the comparison group between 3rd- and 4th-instar nymphs, with 4869 being upregulated and 4737 downregulated (
Figure 5D). However, the minimum DEG group contained 1409 unigenes with 1039 upregulated and 370 downregulated genes identified between 4th- and 5th-instar nymphs (
Figure 5E). In the comparison between 5th-instar nymphs and adults, the expression profiles demonstrated that a total of 4189 unigenes had significant changes. Among these unigenes, 1918 unigenes were upregulated, and 2271 unigenes were downregulated (
Figure 5F). In addition, we generated a heatmap using the Hcluster algorithm to visualize the expression patterns of all the unigenes in all developmental libraries (the color changes from red to green with decreasing expression) (
Figure 6A). As shown in
Figure 6B,C, different numbers of DEGs existed in different comparative groups, and only 95 and 70 DEGs existed in all five comparisons, respectively.
Finally, to better determine the biological function of DEGs in six comparisons, we performed GO annotation and KEGG enrichment analysis to annotate the DEGs. Different numbers of up- and downregulated unigenes were significantly enriched GO terms, with biological process, cellular component, and molecular function, suggesting that the maximum and minimum numbers of DEGs were observed in the N2 vs. N3 and E vs. N1 comparisons, respectively (
Figure S3,
Table S5). As shown in
Figure 7, we summarized the top 20 pathways in six comparison groups by KEGG enrichment. For example, the transcriptional changes were annotated in resistance and immune-related pathways, such as “Lysosome”, “Drug metabolism-cytochrome P450”, “Antigen processing and presentation” and “Apoptosis”, and metabolic-related pathways, including “Starch and sucrose metabolism”, “Ascorbate and aldarate metabolism”, “Porphyrin and chlorophyll metabolism”, “Tyrosine metabolism”, and “Amino sugar and nucleotide sugar metabolism”, in specific developmental stages, which might be closely associated with developmental process and survival activities. Taken together, the present study is the first report on all developmental life of
R. pedestris that suggested the differential expression of the genes involved in various physiological and biochemical pathways, similar to other DEGs data, which could provide more evidence for interpreting the changes of wing developmental genes in hemipteran or other related species [
5,
29].
3.5. Identification and Analysis of Wing Formation-Related Signaling Pathways
To obtain further details on the wing formation of
R. pedestris transcriptome libraries across all developmental stages based on the GO and KEGG databases, we selected a total of 426 unigenes in ten wing development-related signaling pathways, including the insulin signaling pathway, PI3K-Akt signaling pathway, mTOR signaling pathway, MAPK signaling pathway, JAK-STAT signaling pathway, Notch signaling pathway, Hedgehog signaling pathway, TGF-β signaling pathway, Hippo signaling pathway, and insect hormone biosynthesis with different numbers of DEGs in each library (
Table 6). In brief, in comparison to N3 vs. N4, we detected maximum DEGs (125 unigenes) followed by N2 vs. N3 (88 unigenes), N5 vs. A (65 unigenes), E vs. N1 (27 unigenes), N4 vs. N5 (8 unigenes), and N1 vs. N2 (3 unigenes) from ten signaling pathways. The results of different DEGs being observed in six comparisons within ten signaling pathways might be associated with differences observed during tissue maturation and development in bean bug, including such processes as wing growth.
The insulin signaling pathway is an evolutionarily conserved nutrient-sensing pathway that participates in growth and development in metazoans and primarily activates the downstream PI3K-Akt signaling cascade by phosphorylated adapters [
32,
33,
34,
35]. For example, biological studies have elucidated the regulatory mechanism governing the insulin signaling pathway, which plays an important role in autonomously controlling body, organ, and cell size in
Drosophila by encoding an insulin-like peptide to increase body size [
36]. In a previous report, the migratory brown planthopper
Nilaparvata lugens (Insecta, Hemiptera) was observed to possess
insulin receptor 1 (
InR1), which leads to the long-winged morph if it activates the PI3K-Akt signaling cascade, and
insulin receptor 2 (
InR2), which could negatively regulate the InR1–PI3K–Akt pathway to develop short-winged morphs in response to environmental or resource changes [
37]. In addition, the three
insulin receptors of the linden bug
Pyrrhocoris apterus (Insecta, Hemiptera) were differentially silenced, and their participation in wing polymorphism control was confirmed [
38]. Additionally,
mammalian target of rapamycin (
mTOR) activity is cell-autonomously stimulated by a series of extracellular stimulators, such as amino acids, glucose, and oxygen, to control growth and proliferation in species extending from invertebrates to vertebrates. In the mTOR signaling pathway in metazoans,
mTOR activity was increased after an endocrine insulin signaling pathway triggered a conserved intracellular signaling cascade involving
PI3K and
Akt [
32]. Based on these observations, we annotated and identified a total of 50, 61, and 13 DEGs in the insulin, PI3K-Akt, and mTOR signaling pathways, respectively. Notably, the N3 vs. N4 and N2 vs. N3 comparison groups had more DEGs than the other groups, suggesting that insulin receptors and downstream growth factors could play significant roles in wing formation during specific developmental periods in bean bug, similar to
N. lugens and
Drosophila melanogaster.
The Hippo pathway was first identified in
Drosophila through the notable tissue overgrowth phenotypes resulting from mutations of Hippo or downstream factors, including the transcriptional coactivator
Yorkie, as the nuclear effector, which combines with the TEAD family of DNA binding factors to activate transcription of cell growth and survival genes [
39]. In fruit flies, two members of the Hippo signaling pathway,
Atg1 and
Acinus, were subjected to targeted deletion in otherwise wild-type samples, which caused an increase in wing size and expression of target genes, and their overexpression inhibited growth [
40]. In this study, we obtained 45 DEGs in the Hippo signaling pathway by KEGG enrichment. The results showed that the notable DEGs were primarily observed in three comparison groups, N2 vs. N3, N3 vs. N4 and N5 vs. A, implying that N3, N4, and adults were crucial stages in the transition from immature to mature wings in
R. pedestris.
The Notch signaling pathway plays crucial roles in tissue development and homeostasis by regulating multiple biological processes, including cell fate determination, proliferation, and cycle progression. Upon binding to ligands, the
Notch receptor generates the Notch intracellular domain (NICD) by a series of proteolytic cleavages, and the NICD subsequently translocates into the nucleus to regulate the expression of downstream target genes [
41,
42]. In a previous study, an ATPase, the
TER94 gene, played a novel role in development and could be involved in positively regulating the Notch signaling pathway by influencing the Notch target genes
wingless and
cut during wing margin formation in
D. melanogaster [
43]. Based on this result, a total of 27 DEGs were obtained in the Notch signaling pathway by GO and KEGG annotation. The results of the comparison between N3 vs. N4 and N5 vs. A groups indicated that the Notch signaling pathway could play an important role in these stages because the DEG numbers were greater than those of the other groups during
R. pedestris wing development.
In addition, a total of 21 DEGs involved in insect hormone biosynthesis were identified by GO and KEGG enrichments in
R. pedestris transcriptome libraries, indicating that N3, N4, and adults were significant developmental stages with maximal DEGs, which are in keeping with the results described above. Consistent with our data, a number of DEGs annotated in the biosynthesis of sesquiterpenoid juvenile hormone (JH) and ecdysteroid pathways were identified by BLAST search in
Phenacoccus solenopsis to control the invasiveness of sap-sucking pests [
5]. The hedgehog, MAPK, and JAK-STAT signaling pathways are also key members of conserved signaling pathways for evaluating such developmental defects in the
Drosophila wing [
44,
45]. Hence, in this study, we also obtained different numbers of DEGs in these pathways, including maximal levels of DEG numbers in the N3 vs. N4 group, demonstrating that the three signaling pathways mentioned above play important roles in wing development for 4th-instar nymphs in
R. pedestris. Moreover, the TGF-β signaling pathway is a hallmark of metazoan cell communication, exhibiting a suite of core TGF-β pathway proteins, including multiple ligands, at least three Type I receptors, one Type II receptor, and four or more Smad effector proteins that have transcription factor activity and exist in multiple larval tissues, including wing vein formation [
46]. Our results showed that half of the DEGs (10 unigenes) annotated by the KEGG database were distributed in the comparison N3 vs. N4 group, similar to the above-mentioned pathways, indicating that these genes in the TGF-β signaling pathway could be involved in the wing formation of bean bug within a sophisticated regulatory network interaction with other pathways.
3.6. Identification and Expression Analysis of Wing Formation-Related Genes at Different Developmental Stages
To investigate and validate the molecular and expression characteristics of wing development-related genes across all life stages, a total of five unigenes were identified from the seven transcriptome libraries of
R. pedestris, including the
insulin-like receptor (
InR),
rictor, and
wingless 1-3 (
wg 1,
wg 2, and
wg 3) genes with similar tendency of up- or down-expression between qRT-PCR experiments and transcriptome analysis (
Figure S4). First, an
InR gene was found with a full-length open reading frame (ORF) of 4098 bp encoding 1365 amino acids (aa) (
Table 7). Moreover, the qRT-PCR results showed that the
InR gene was abundantly expressed in 1st-instar nymphs of
R. pedestris (
Figure 8). Previous data indicated that
insulin receptors were divided into two families: cluster I, conserved in apical Holometabola for approximately 300 million years, and cluster II, present in the secreted decoy of
insulin receptor in Muscomorpha due to ancestral duplication in a late ancestor of winged insects [
38]. In this study, we selected a typical
insulin-like receptor with two ligand-binding loops, furin-like cysteine-rich (Fu), three fibronectin type 3 (FN3), a single transmembrane (TM), and conserved tyrosine kinase (TyrKc) domains, and the expression patterns of this gene suggested that it could be involved in wing morph development and play significant roles in the development of 1st-instar nymphs of
R. pedestris. Second, the cDNA sequence of
rictor was identified from the
R. pedestris transcriptome libraries. The ORF of
rictor cDNA was determined to be 4320 bp and to encode a of 1,439 aa polypeptide with a theoretical molecular mass of 159.97 kDa and an isoelectric point of 6.42 (
Table 7). Furthermore, as shown in
Figure 8, quantitative real-time PCR was performed to determine its expression pattern during all developmental stages of
R. pedestris, suggesting that an increasing expression level was estimated from eggs to 4th-instar nymphs, and the
rictor gene was maximally expressed in 3rd-instar nymphs.
Rictor (
rapamycin-insensitive companion of mTOR) is a key member of the mTOR signaling pathway that is beneficial to assembly and promotes the activity of
mechanistic target of rapamycin complex 2 (
mTORC2), which primarily participates in cytoskeletal organization, cell migration, modulation of cell cycle progression, and control of cell survival [
47]. In addition,
rictor mutants result in reduced body weight and shrinking eyes and wings in a
Drosophila model system [
48]. In this study, the
R. pedestris rictor gene was ubiquitously expressed in all life stages but was primarily expressed in 1st-, 3rd- and 4th-instar nymphs, indicating that this gene could play a crucial role in wing development at specific life stages by activating and phosphorylating key factors in the mTOR or PI3K–Akt signaling pathways, such as
AKT and
protein kinase C.
Finally, we further selected three full-length
wingless genes from the
R. pedestris transcriptome based on the annotation information. All cDNA and protein sequence information regarding the
wingless genes is listed in
Table 7, showing that the ORF lengths of the three
wingless genes are 1059 bp, 1038 bp, and 1173 bp, encoding 352 aa, 335 aa and 390 aa proteins, respectively. Additionally, the molecular weights (MWs) of the proteins ranged from 43.40 kDa to 38.84 kDa, and the theoretical pI values varied from 9.54 to 8.91. As shown in
Figure 8, the expression of three
wingless genes appeared to change across different developmental stages. In brief, the temporal expression of
wg 1 was upregulated and reached a maximum in eggs of
R. pedestris and was subsequently downregulated from 1st-instar nymphs to adults of
R. pedestris. The
wg 2 mRNA transcripts were significantly increased and reached the maximum expression level in 1st-instar nymphs, and it was also primarily expressed in 4th-instar nymphs of
R. pedestris. Moreover, in keeping with the
wg 2 expression,
wg 3 was also primarily distributed in 1st- and 4th-instar nymphs of
R. pedestris. From nematodes to mammals, the wnt signaling pathway includes a large family of cysteine-rich secreted glycoproteins that are involved in controlling animal development. The
wnt genes exhibit sequence homology from
wnt in the mouse to
wingless in
Drosophila. In holometabolous insects, such as flies and butterflies, accurate patterning and development are regulated by a series of gene expression patterns in tissues at specific developmental stages [
49]. For example,
wingless is a morphogen that acts as a short-range inducer and a long-range organizer across development in
Drosophila and participates in patterning of wing discs followed by specification of wing margin-specific patterns [
50]. Taken together, the various expression profiles of
wingless genes demonstrated that the specific regulatory mechanisms of
wingless members involved in the wnt signaling pathway were active in hemimetabolous insects, especially bean bug.