Next Article in Journal
4-O-Methylascochlorin Synergistically Enhances 5-Fluorouracil-Induced Apoptosis by Inhibiting the Wnt/β-Catenin Signaling Pathway in Colorectal Cancer Cells
Previous Article in Journal
Impact of Biotic/Abiotic Stress Factors on Plant Specialized Metabolites
Previous Article in Special Issue
Molecular Insights into Female Hybrid Sterility in Interspecific Crosses between Drosophila melanogaster and Drosophila simulans
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Set of Serine Peptidases of the Tenebrio molitor Beetle: Transcriptomic Analysis on Different Developmental Stages

by
Nikita I. Zhiganov
1,
Konstantin S. Vinokurov
2,
Ruslan S. Salimgareev
3,
Valeriia F. Tereshchenkova
4,
Yakov E. Dunaevsky
1,
Mikhail A. Belozersky
1 and
Elena N. Elpidina
1,*
1
A.N. Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow 119991, Russia
2
Institute of Plant Molecular Biology, Biology Centre of the Czech Academy of Sciences, Branišovská 1160/31, 370 05 České Budejovice, Czech Republic
3
Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow 119991, Russia
4
Faculty of Chemistry, Lomonosov Moscow State University, Moscow 119991, Russia
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2024, 25(11), 5743; https://doi.org/10.3390/ijms25115743
Submission received: 16 April 2024 / Revised: 17 May 2024 / Accepted: 20 May 2024 / Published: 25 May 2024
(This article belongs to the Special Issue Transcriptomics in the Study of Insect Biology)

Abstract

:
Serine peptidases (SPs) of the chymotrypsin S1A subfamily are an extensive group of enzymes found in all animal organisms, including insects. Here, we provide analysis of SPs in the yellow mealworm Tenebrio molitor transcriptomes and genomes datasets and profile their expression patterns at various stages of ontogeny. A total of 269 SPs were identified, including 137 with conserved catalytic triad residues, while 125 others lacking conservation were proposed as non-active serine peptidase homologs (SPHs). Seven deduced sequences exhibit a complex domain organization with two or three peptidase units (domains), predicted both as active or non-active. The largest group of 84 SPs and 102 SPHs had no regulatory domains in the propeptide, and the majority of them were expressed only in the feeding life stages, larvae and adults, presumably playing an important role in digestion. The remaining 53 SPs and 23 SPHs had different regulatory domains, showed constitutive or upregulated expression at eggs or/and pupae stages, participating in regulation of various physiological processes. The majority of polypeptidases were mainly expressed at the pupal and adult stages. The data obtained expand our knowledge on SPs/SPHs and provide the basis for further studies of the functions of proteins from the S1A subfamily in T. molitor.

1. Introduction

Serine endopeptidases of the chymotrypsin S1A subfamily are a large group of enzymes widely distributed in nature. In insects, they play an important role in various physiological processes such as digestion, development, and innate immunity regulation [1,2,3,4,5,6,7,8]; therefore, these peptidases are of significant interest for further study. Activity of SPs depends on a catalytic triad of amino acid residues: histidine H57, aspartic acid D102, and serine S195 (hereinafter bovine chymotrypsinogen A numbering, XP_003587247). The substrate specificity of SPs is largely determined by the structure of the S1 substrate-binding subsite, where residues 189, 216, and 226 play the major roles [9]. According to the S1 pocket organization, various types of SPs are distinguished, including trypsins (D189, G216, G226), chymotrypsins (S189, G216, G226; S189, G216, A226, and others), and elastases (S189, V216, T226; S189, V216, D226, and others).
Development of high throughput sequencing technologies lead to the appearance of high-quality genome assemblies for the whole-genome investigation of SP/SPH genes, performed for model insects, as well as species of great agricultural and medical importance. Among Hemiptera, 90 SP/SPH genes were found in Nilaparvata lugens (Delphacidae) [10] (Figure 1). In Dipterans, 257 genes were identified in Drosophila melanogaster (family Drosophilidae) [11,12] and even more in mosquitoes (family Culicidae) Anopheles gambiae—337 [13,14], and Aedes aegypti—369 [15,16]. For the order Lepidoptera, data on several representatives are known: 242 genes were found in Manduca sexta (Sphingidae) [17,18], 169 genes in Bombyx mori (Bombycidae) [19,20], 221 genes in Plutella xylostella (Plutellidae) [21], and 109 genes in Spodoptera frugiperda (Noctuidae) [22]. A reduced set of only 57 SP/SPH genes was found in Apis mellifera (Hymenoptera: Apidae) [12,23]. The gene repertoire was larger in parasitic Hymenoptera with 74 genes described in Microplitis demolitor (Braconidae), 143 genes in the parasitic wasp Nasonia vitripennis, and 183 genes in Pteromalus puparum (Pteromalidae) [24].
Genome-wide analyses in beetles (Coleoptera) identified 125 SP/SPH genes in Rhyzopertha dominica (Bostrichidae) [25]. From the first coleopteran sequenced genome of the red flour beetle Tribolium castaneum (Tenebrionidae), 177 genes coding for SPs/SPHs were identified [12,26]. For another tenebrionid, the yellow mealworm Tenebrio molitor, 38 SP/SPH transcripts were previously identified in the larval gut [27], two of which corresponded to the major digestive trypsin and chymotrypsin studied using biochemical approaches [28,29]. Later, 48 SP/SPH transcripts were identified in the larval gut during the study of Cry3A intoxication in T. molitor [30]. Analyzing trypsin-like SPs/SPHs in transcriptome datasets from different stages of the T. molitor life cycle, we have previously de novo assembled 54 trypsins and five trypsin-like SPHs [31]. We also characterized recombinant preparations of SP, SerP38, and SPH, SerPH122, expressed in the Komagataella kurtzmanii system [32,33]. Recent work by Wu and coauthors [34] provided information on 200 T. molitor genes including 112 SPs and 88 SPHs, and transcriptome datasets together with RT-PCR analysis were used for SP-related genes expression profiling at various developmental stages and tissues.
Here, we present the extended and corrected dataset of putative T. molitor SP/SPH cDNAs obtained from genome and transcriptome datasets. We have identified several groups of deduced proteins based on the composition of their active site and predicted specificity, analyzed evolutionary relationships and evaluated differential expression along the life cycle. Finally, sets of SP-related genes involved in digestion, embryonic development, metamorphosis, and innate immunity were predicted, providing valuable information for further physiological, biochemical, and phylogenetic studies of tenebrionid pests. These data are of particular interest due to the fact that T. molitor is the first insect approved by the European Food Safety Authority as a novel food in specific conditions and uses, testifying its growing relevance and potential [35].

2. Results

2.1. General Characteristics of T. molitor Predicted SPs/SPHs of the S1A Subfamily

2.1.1. Identified Set of Peptidase-like Sequences

Analysis of the total T. molitor transcriptome assembly, transcriptomes from different developmental stages coupled with verification of sequences in three new whole genome assemblies (GCA_027725215.1; GCA_014282415.3; GCA_907166875.3), revealed a total of 269 mRNA sequences encoding putative SPs and SPHs. Of these, 137 were transcripts of active SPs with a conserved catalytic triad of amino acid residues in the active center—H57, D102, S195, whereas 125 sequences having one or more substitutions in the catalytic triad were SPHs. In addition, there were seven sequences of polypeptidases (polyserases in humans according to [36]) containing two or three tandem peptidase domains, SP and/or SPH, translated from a single ORF as an integral part of the same polypeptide chain.
Bioinformatics analysis allowed us to discover 69 new sequences, and the structure of another 23 sequences previously available [27,31,34] was revised and reannotated.

2.1.2. Annotation of Predicted Protein Sequences of T. molitor SPs

The sequences of active SPs were analyzed by the composition of the S1 substrate-binding subsite, where three amino acid residues in positions 189, 216, and 226 reflect to a large extent the specificity of the peptidase [9]. We identified trypsins as SPs with a conserved set of amino acid residues in the S1 subsite—D189, G216, G226 (DGG), bringing the negative charge to the S1 pocket base, ensuring specificity for basic residues (R/K) at the P1 position of the substrate [37]. Those with A, T, or S at positions 216 or 226 instead of G, while keeping negatively-charged D at the bottom (DGA; DGT; DSG; DAT), were tentatively named as trypsin-like, although their specificity is questionable due to larger side chains located at the pocket walls. Predicted peptidases lacking the negative charge at the base of the S1 pocket were defined as chymotrypsin- or elastase-like according to the residues that occupy the wall positions 216 and 226. Those with small amino acid residues (SGS; SGA; GGS; GAS; GSG; SSG) including sequences with negative charge in the pocket wall (GGD), characteristic of insects [38], were predicted as chymotrypsin-like, for which specificity towards large aromatic (F, Y, W) or mid-size aliphatic (L) side chains in the P1 position is generally accepted. Whereas in putative elastase-like SPs, wall position 216 occupied by bulky hydrophobic residues (SVS; GVS; GVN; GIS; GFS; GYS) generally provides a platform for interaction with small hydrophobic residues at P1. A group of non-annotated peptidases with an unusual S1 subsite was also established, for which specificity could not be reasonably predicted from sequence analysis. The most numerous SPs were trypsins with 64 sequences. Other groups included 10 trypsin-like peptidases, 30 chymotrypsin-like peptidases, 18 elastase-like and 15 non-annotated peptidases.

2.1.3. Domain Organization

To propose the functional role of T. molitor SPs/SPHs, their domain organization was studied. The vast majority of the sequences were presented as preproenzymes. The predomain or N-terminal signal peptide responsible for the secretory pathway was found in 262 sequences out of 269 studied. Eighty-three sequences contained one or more regulatory domains in the propeptide structure responsible for various physiological functions in the insect. Namely, these were 53 sequences out of 137 SPs with the classical catalytic triad, 23 sequences out of 125 SPHs, and all sequences of polypeptidases contained regulatory domains. Thirteen peptidases had a transmembrane domain (TM). Among them, seven had a TM at the N-terminus and six at the C-terminus. Most sequences of mature enzymes without prodomain contained 225–260 amino acid residues.

2.2. Trypsins and Trypsin-like Peptidases

In T. molitor transcriptome dataset, transcripts coding for putative trypsin-related proteins constituted the most numerous group: 64 trypsin sequences and 10 trypsin-like. Sequence analysis revealed that 39 trypsins were mosaic containing a variety of non-catalytic regulatory domains in the propeptide, as well as 6 trypsin-like sequences, and only 25 trypsins and 4 trypsin-like peptidases had no regulatory regions in the propeptide, but 4 trypsins had a transmembrane region in the C-terminal end of the sequence (Table 1, Figure 2).
Most of SPs without regulatory domains are probably activated by trypsins, since 24 out of 25 sequences demonstrate conserved cleavage (activation) site with R or K residues at the carboxyl side of the scissile bond (P1) and hydrophobic branched V or I at the P1′, indispensable for stabilization of new active conformation by hydrogen bonding to D194, the preceding residue to the catalytic S195 [39]. Non-tryptic activation (processing) of the proenzyme is proposed for only single trypsin SerP135 with G residue at P1 of the scissile bond, and single trypsin-like SerP105 with L residue at P1, both from the group of SPs without regulatory domains. In the group of trypsins and trypsin-like T. molitor SPs with regulatory domains, 16 sequences have mainly hydrophobic residues at the C-terminal of the propeptide, which do not match the specificity of trypsin and are presumably activated by other peptidases. It should be noted that none of the T. molitor trypsins compared to its mammalian counterparts contain a consensus motif for recognition and cleavage by enteropeptidase (DDDDK#) [40], suggesting an alternative regulation of zymogens conversion into active enzymes in insect midgut lumen.
Among the 45 mosaic sequences with one or more regulatory regions in the propeptide, clip domains of several different types represent the most abundant non-catalytic structural unit of these trypsin-related proteins. A total of 35 clip domain trypsins were identified, including 12 with clip-B, 12 with clip-C, and 11 with clip-D type domains, revealed according to the classification provided earlier [41]. Among 10 sequences of trypsin-like peptidases, which had substitutions in the structure of the S1 subsite (7 with DGA, and single DAT, DGT, and DSG) (Table 1), 4 of 6 sequences with regulatory regions had clip domains (1 with clip-B and 3 with clip-C) and 2 had the CUB domain (CUB, IPR000859) (Figure 2). The remaining four mosaic sequences of true trypsins contained chitin-binding modules (CBM, IPR002557), low-density lipoprotein receptor type A repeats (LDL, IPR002172), scavenger receptor cysteine-rich domain (SRCR, IPR017448), thrombospondin type 1 repeats (TSP, IPR000884), Frizzled domain (Fz, IPR020067), Pan/Apple domain (PAN, IPR003609), and a domain in Complement 1r/s, Uegf and Bmp1 (CUB, IPR000859).
The isoelectric point (pI) of true trypsins and trypsin-like SPs varied over a wide pH range from 4.3 to 9.5 pH units, suggesting possible involvement of these SPs in different physiological processes.
Table 1. Domain organization and key structure features of 64 trypsins and 10 trypsin-like SPs of T. molitor.
Table 1. Domain organization and key structure features of 64 trypsins and 10 trypsin-like SPs of T. molitor.
NameNCBI ID (Protein)Preproenzyme/Mature Enzyme (aa)SignalP
(aa)
Regulatory DomainPropeptide
Cleavage Site
Active SiteS1 SubsiteEnzyme
Specificity
Mm Mature, DapITM
(Position)
1SerP1ABC8872925822716-R|IVGGHDSDGGTrypsin22,7426.9-
2SerP2QWS6501225222716-R|IVGGHDSDGGTrypsin23,6184.3-
3SerP3QWS6504425922816-K|IVGGHDSDGGTrypsin24,3865.0-
4SerP4QWS6501325022515-R|IVGGHDSDGGTrypsin24,1405.2-
5SerP5QWS6504533323624-R|IVGGHDSDGGTrypsin26,0359.2-
6SerP6QWS6501425822617-R|IVGGHDSDGGTrypsin23,4143.8-
7SerP20QWS6504836123817-R|IVGGHDSDGGTrypsin26,3959.0-
8SerP21QWS6504927622822-R|IVGGHDSDGGTrypsin24,7324.5-
9SerP22QWS6505029024217-R|VVGGHDSDGGTrypsin25,9756.2-
10SerP26QWS6505525422723-R|IVGGHDSDGGTrypsin24,2145.8-
11SerP28QWS6505631024126-R|IVGGHDSDGGTrypsin27,0337.6-
12SerP30QWS6501524922616-K|IIGGHDSDGGTrypsin24,8628.9-
13SerP35QWS6505726023121-R|IVGGHDSDGGTrypsin24,8845.6-
14SerP37QWS6505829825119-R|VVGGHDSDGGTrypsin27,3276.2-
15SerP48QWS6501732129522-R|IVGGHDSDGGTrypsin32,0186.7-
16SerP76QWS6501938736218-K|IIGGHDSDGGTrypsin39,4175.7367–386
17SerP77QWS6506028825217-K|IVGGHDSDGGTrypsin27,1648.3-
18SerP84QWS6502033230820-K|VVGGHDSDGGTrypsin33,2865.0313–330
19SerP104QWS6506132330018-K|IVGGHDSDGGTrypsin32,6464.2300–323
20SerP125QWS6502427825419-R|IVGGHDSDGGTrypsin27,5354.8257–275
21SerP135QWS6502729224622-G|IIGGHDSDGGTrypsin26,8509.5-
22SerP209QWS6503325822716-R|IIGGHDSDGGTrypsin22,9434.8-
23SerP266QWS65037281256 18 -K|IVGGHDSDGGTrypsin27,8958.8-
24SerP360CAH137400428624919-K|IVGGHDSDGGTrypsin27,4804.7-
25SerP635WJL9798624922419-R|IVGGHDSDGGTrypsin24,0444.1-
26SerP100WJL9798729326323-R|IIGGHDSDGATrypsin-like28,6058.8-
27SerP105CAH137459130524323-L|IIGGHDSDGATrypsin-like26,1555.9-
28SerP188KAJ363725630327120-R|IVGGHDSDGATrypsin-like29,7518.3-
29SerP278CAH136394729825618-R|IIGGHDSDGATrypsin-like27,7166.8-
30SerP86QWS6502145825822Clip-BR|ILDGHDSDGGTrypsin28,2268.4-
31SerP113QWS6502238625523Clip-BR|IINGHDSDGGTrypsin28,2557.7-
32SerP116QWS6506338125716Clip-BK|IVNGHDSDGGTrypsin28,3826.4-
33SerP141QWS6502843525921Clip-BR|IFGGHDSDGGTrypsin28,8449.2-
34SerP161WJL9798827825420Clip-BR|ITSGHDSDGGTrypsin27,8077.7-
35SerP166QWS6506437625915Clip-BK|LVNDHDSDGGTrypsin28,4494.8-
36SerP183 SPEBAG14262383265 18 Clip-BR|IYGGHDSDGGTrypsin29,2037.6-
37SerP193QWS6503237524722Clip-BR|ILGGHDSDGGTrypsin27,5646.2-
38SerP272QWS6503840429717Clip-BK|IYGGHDSDGGTrypsin32,7108.0-
39SerP275QWS6506543025723Clip-B (2)K|IVGGHDSDGGTrypsin28,9698.5-
40SerP370QWS6504140725721Clip-BK|ISNGHDSDGGTrypsin28,0486.4-
41SerP409QWS6504244723422Clip-BK|IGKGHDSDGGTrypsin26,1428.8-
42SerP218CAH136399135626322Clip-BK|VSGGHDSDATTrypsin-like29,1296.3-
43SerP119QWS6502338725319Clip-CL|IVGGHDSDGGTrypsin28,3338.1-
44SerP145QWS6502937024122Clip-CH|IVGGHDSDGGTrypsin26,7817.7-
45SerP163QWS6503035425421Clip-CV|IAFGHDSDGGTrypsin28,0415.7-
46SerP173QWS6503136224921Clip-CF|VFGGHDSDGGTrypsin27,4954.9-
47SerP227QWS6503437625123Clip-CL|IVGGHDSDGGTrypsin27,9695.8-
48SerP228 SAEQWS6503537425020Clip-CL|IVGGHDSDGGTrypsin27,8496.2-
49SerP247QWS65036379257 18 Clip-CT|IISMHDSDGGTrypsin28,3436.1-
50SerP282QWS6503934927017Clip-CG|ITGGHDSDGGTrypsin29,2126.0-
51SerP297QWS6506635025518Clip-CV|EYEEHDSDGGTrypsin28,2385.7-
52SerP345QWS6504035923422Clip-CL|IVGGHDSDGGTrypsin26,3606.5-
53SerP347QWS6506736725625Clip-CG|IAIGHDSDGGTrypsin28,0015.8-
54SerP398CAH136589338525319Clip-CL|IIGGHDSDGGTrypsin28,3608.9-
55SerP61CAH137752242224626Clip-CL|IVGGHDSDGATrypsin-like27,3688.7-
56SerP124CAH138317437125020Clip-CL|IVGGHDSDSGTrypsin-like27,6906.0-
57SerP291WJL9798935725120Clip-CQ|IWGGHDSDGTTrypsin-like28,1087.1-
58SerP15QWS6504751623523Clip-DR|IVGGHDSDGGTrypsin25,6999.2-
59SerP24QWS6505181024319Clip-DR|IVGGHDSDGGTrypsin27,5555.4-
60SerP27QWS6505236924219Clip-DR|IVGGHDSDGGTrypsin26,7049.0-
61SerP31CAH137947455724415Clip-DK|IVGGHDSDGGTrypsin26,8886.5-
62SerP40QWS6501639224121Clip-DG|NPGGHDSDGGTrypsin26,5355.5-
63SerP65QWS6505361924020Clip-DR|IVGGHDSDGGTrypsin26,0019.2-
64SerP66QWS6505952324529Clip-DR|VVGGHDSDGGTrypsin27,5609.1-
65SerP109QWS6506296424717Clip-DR|IVGGHDSDGGTrypsin26,9877.8-
66SerP127QWS6502537624722Clip-DR|IVNGHDSDGGTrypsin27,0757.0-
67SerP131QWS6502637524722Clip-DR|VVNGHDSDGGTrypsin26,7998.4-
68SerP317QWS6505438924616Clip-DR|IIGGHDSDGGTrypsin27,1956.2-
69SerP178KAJ363892440924227CUBR|IVGGHDSDGATrypsin-like26,0195.0-
70SerP725KAJ363892240524623CUBK|IVGGHDSDGATrypsin-like26,6604.9-
71SerP285 CorinCAH1378270965247-Fz, LDL (2), SRCRR|IVGGHDSDGGTrypsin27,2685.9338–359
72SerP14QWS650461289286-LDL (3)R|IVGGHDSDGGTrypsin31,4485.968–94
73SerP11 TSPQWS6504344723119TSP (2)K|IIGGHDSDGGTrypsin26,3069.5-
74SerP55 TequilaQWS65018167224523CBM (3), LDL (3), SRCR (2) PANR|VVRGHDSDGGTrypsin26,9475.9-
SignalP—Signal peptide; Mm mature—molecular mass of the mature peptidase; pI—isoelectric point of the mature peptidase; TM—transmembrane domain; SerP—serine peptidase. Regulatory domains: Clip—clip domain (IPR022700), classification by [41]; CUB—a domain in Complement 1r/s, Uegf and Bmp1 (IPR000859); Fz—Frizzled domain (IPR020067); LDL—Low-Density Lipoprotein receptor type A repeats (IPR002172); SRCR—Scavenger Receptor Cysteine-Rich domain (IPR017448); TSP—thrombospondine domain (IPR000884); CBM—Chitin-Binding Module (IPR002557); PAN—Plasminogen-Apple-Nematode domain (IPR003609). The amino acid residues after which the propeptide is cleaved are highlighted in bold.

2.3. Chymotrypsin-like Peptidases

Thirty insect chymotrypsin-like peptidases are quite diverse in configuration of amino acid residues at positions 189, 216, and 226, which are essential to ensure primary substrate specificity. There was no residues configuration found in the classical vertebrate A-type chymotrypsin P00766 (S189, G216, G226) (Table 2). The bottom of the S1 specificity pocket (sequence position 189) was mostly occupied by G residues, as well as by five classical S, three A, and unique T. In 20 peptidases, where G was present at position 189, S residue was detected in wall positions 216 or 226, and in two sequences (SerP71 and SerP303), A residue was detected like in bovine chymotrypsin B P00767 (S189, G216, A226). Two sequences, SerP16 and SerP69, resembled bovine chymotrypsin-like elastase 2a Q29461 (S189, G216, S226). SerP69 was previously purified and was similar in substrate specificity to chymotrypsins, but did not hydrolyze short substrates containing up to two amino acid residues [27,29], which is typical for insect chymotrypsins [42].
Ten peptidases with a charged residue in the wall of the S1 specificity pocket (GGD, GSD, AGD, GAD) represent another specific to insects group of chymotrypsins, and according to the available biochemical data, display preferential hydrolysis of chymotrypsin substrates [38,43,44]. However, presence of a negatively charged residue at position 226 of the S1 pocket may provide additional specificity for basic side chains at P1 of the substrate due to differences in the overall structure of the S1 pocket, as it was described for crab collagenases brachyurins [45,46].
Most of the 30 chymotrypsin-like sequences identified in T. molitor represented SPs without regulatory domains, except only a single mosaic peptidase (SerP449) with four LDL and one Sushi (IPR000436) domains in propeptide (Figure 3), which was proposed as a putative ortholog of M. sexta HP14 (modular SP, MSP) [17]. For most of these chymotrypsin-like SPs, a conserved propeptide cleavage site was predicted (R#I), suggesting trypsins involvement in activation. Alternatively, cleavage at the proposed unique site (H#I) may provide a strictly specific activation (SerP16), or other chymotrypsin- or elastase-like SPs may perform cleavage at the L#I site as in the case of SerP449. Most remarkable was the absence of a canonical activation cleavage site in SerP586, which proposes alternative mechanisms for activation at the L#K site. Most of the chymotrypsin-like SPs had a pI in the acidic region, from 3.8 to 5.3 pH units. Two SPs (SerP101 and SerP276) had a neutral pI and only SerP69 had an alkaline pI of 8.8.

2.4. Elastase-like Peptidases

A group of 18 predicted T. molitor SP sequences with bulky hydrophobic residues (mostly V or I) at wall position 216 of the S1 binding subsite were annotated as elastase-like enzymes (Table 3). This position is considered a key determinant of the specificity of vertebrate elastases and ensures hydrolysis of small amino acid residues at position P1—A, V, and less commonly, L [47]. The other wall position 226 of the S1 specificity pocket was occupied by the S residue, with the exception of two proteins with residues N (SerP94) and A (SerP472), and at the bottom position 189, there were also small residues G, S, and one A (SerP156). The larger residues were found only at position 216 in three predicted enzymes: T in SerP185, F in SerP155, and Y in SerP85, and the two latter enzymes are of a special interest as its substrate-binding pocket theoretically should be more reduced in depth as compared to other T. molitor elastases. Unfortunately, there were no vertebrate peptidases described providing a similar residues configuration of the S1 pocket, to further speculate about their specificity. Elastases with two bulky residues in key positions of the specificity pocket, like bovine pancreatic elastase 1 (A189/V216/T226, Q28153), were absent in T. molitor, so it can be assumed that in the majority of insect elastases, the substrate-binding subsite is less occluded compared to that of pancreatic elastases 1 of vertebrates. Another interesting feature of the studied elastases was the presence of I in the position 216 and five SPs had the triad GIS in the S1 subsite, which is typical only for representatives of the Tenebrionidae family.
All elastase-like enzymes had no regulatory regions in the propeptide (Figure 3), with a conserved propeptide cleavage site (R#I) suggesting for most of the sequences (16 out of 18) involvement of trypsins in activation (Table 3). For only two SPs (SerP94 and SerP120), cleavage at a unique site (H#I) suggests a specific processing pathway. The majority of elastases-like SPs had a pI in the acidic region from 4.0 to 4.9 pH units. A single SP SerP74 had an alkaline pI of 8.6, while vertebrate elastases 1 and 2 are mostly cationic or neutral [48].

2.5. Non-Annotated Serine Peptidases

A heterogeneous array of sequences, of which the specificity remains obscure due to the non-typical combination of primary specificity determinant residues, were tentatively grouped as non-annotated SPs, until the biochemical data will become available or closely related orthologs will be found and characterized. A total of 15 sequences were attributed to this group showing the most diverse 189, 216, 226 residues configuration (AAT, GAT, GGK, QGS, RGV, VAD) (Table 4). The propeptide cleavage site in this group of sequences is variable including R, K, L, and I at the C-terminus of the propeptide. Most non-annotated peptidases had neutral or alkaline pI.
For seven sequences, regulatory regions were identified in the propeptide (Figure 3), including the GD (gastrulation defective, IPR031986) domain confirmed in five related peptidases, which are putative orthologs of D. melanogaster gastrulation defective involved in establishment of dorsoventral embryonic polarity [49]. SerP1040 had a Sushi domain, and SerP355 had four LDL and one Sushi. SerP416 had a C-terminal TM domain.

2.6. Serine Peptidase Homologs

Serine peptidase homologs are SP-related proteins, for which the functional role is still poorly understood. Although sharing an SP-like domain and fold, they contain one or more substitutions in the catalytic triad residues, suggesting partial or complete loss of catalytic activity, and new functions of SPHs (like regulation, inhibition, and immune modulation) may be compensated through an alternative exosite [50]. In total, 125 SPH sequences with various substitutions of the catalytic triad H57, D102, S195 were identified in T. molitor (Table S1). In the catalytic position H57, only 42 proteins had H, and the most common substitution was H195Q in 55 SPHs. At position D102, only 13 substitutions were observed, while S195 was retained in 24 SPHs. In the remaining proteins, S in position 195 was replaced by 26 G, 21 T, 11 N, 10 L, 9 V, 7 I, and also 1–4 residues were presented by A, M, D, E, K, R, Y, F.
Most SPHs had a signal peptide (that is, they are secreted proteins) and are presumably processed by trypsin. In addition, a significant group of proproteins with an unconventional type of processing was also identified, and in some cases, it was even difficult to identify the sequence of the processing site, which is highly conserved in SPs. Most SPHs were anionic proteins with pI at 4–5 pH units. However, a significant proportion of homologs, mainly SPHs with regulatory domains in the propeptide, had neutral or alkaline pI. Most of the SPHs (102 sequences) had no regulatory regions in the propeptide, while 21 out of the rest of the 23 sequences possessed an array or clip domains of A, B, and C types (Figure 4). Two homologs (SerPH570 and SerPH364) were proposed to be associated with plasma membrane via a type-II transmembrane motif. Their prolonged extracellular region included an array of domains such as characteristic juxtamembrane SEA (Sperm protein, Enterokinase, and Agrin domain, IPR000082) or Frizzled domains as well as modules for protein–protein interaction including LDL, EGF (laminin/Epidermal Growth Factor-like domain, IPR002049), and SRCR.

2.7. Polypeptidases

We identified seven T. molitor polypeptidase transcripts that encoded putative proteins comprising two to three tandemly arranged peptidase domains, which contained regulatory regions located upstream of each peptidase unit, most often presented by two Sushi domains (Figure 4, Table 5). Four of these proteins contained two peptidase-like domains of which the first (N-terminal) was chymotrypsin-like SP, while the second (C-terminal) was SPH. Another related polypeptidase (pSerPH608) contained two SPH domains, and pSerP614 comprised one chymotrypsin-like and two SPH domains. For all these six secreted proteins was predicted a conserved activation site (L#I) upstream of each SP/SPH domain. And a single transcript encoded a membrane-anchored protein (pSerP1050) containing trypsin and unusual SPH domain with on the whole seven LDL regulatory regions.
Based on data on “polyserases”, human polypeptidases, it can be assumed that upon activation, peptidase domains may be linked to each other by interdomain disulfide bonds [51]. It was also proposed that SPH domains of secreted polyserases would act as dominant negative binding proteins, modulating the function of the first active SP domain. The same proteolytic mechanism can be proposed for T. molitor polypeptidases that resemble human polyserases.

2.8. Phylogenetic Analysis of SPs and SPHs in T. molitor

Phylogenetic analysis of 269 SP-related sequences identified in T. molitor showed that they were clustered into two major groups, A and B (Figure 5). Group A (164 sequences) with nine major branches identified (A1–A9) included both SPs and SPHs without regulatory domains in the propeptide. The A1 clade mainly consisted of trypsins including the major digestive trypsin SerP1 (see Section 2.9.3), with only a few sequences proposed as chymotrypsin-like and non-annotated peptidases. Clade A2 included putative trypsins and a single homolog (SerPH43) with a carboxy-terminal hydrophobic extension that resembles a corresponding region of vertebrate peptidases prostasin and testisin, which are post-translationally modified via a glycophosphatydylinositol (GPI) linkage responsible for cell–surface association of these SPs [52,53]. Additionally, two SPs with extended hydrophobic C-terminus from clades A9 (SerP423) and B4 (SerP416) also likely represent distinct GPI-anchored enzymes with unknown specificity. Here, for the first time, we present a group of putative insect analogs of vertebrate regulatory GPI-anchored SPs, of which prostasin also shared a trypsin-like specificity [54]. In some SP sequences, the hydrophobic regions were longer, and they were confidently predicted as TM by programs such as Phobius (SerP125, SerP84, SerP76, SerP416), while in other sequences, predictions about these regions were only from the TM DOCK program (SerP48, SerP104, SerPH43) or had rather low probability. The A3 and A4 clades included predominantly chymotrypsin-like peptidases and related homologs. Chymotrypsin-related sequences from clade A4 represent another insect-specific group containing the acidic residue D226 located on the wall of the S1 pocket (see Section 2.3), but displaying chymotrypsin specificity [38,43,44] in contrast to crab homologs with the same S1 subsite triad [45,46], which efficiently hydrolyze both trypsin and chymotrypsin substrates. It is interesting to note that most of the related homologs from clades A3 and A4 also shared acidic (D or E) residues at position 226 of their primary specificity pocket (Table S1). Clades A5, A6, and A7 included numerous SPHs, likely evolved by multiple duplication events. All 18 predicted elastase-like SPs were scattered among the 87 SPHs, which similar to the elastases mostly shared large aliphatic residues (V/I) at position 216 of their S1 binding pocket. In clade 7, there were also four chymotrypsin-like SPs, one of which, SerP69, was the major digestive chymotrypsin, had an S1 binding subsite (SGS) similar to bovine chymotrypsin-like elastase 2a Q29461, and was biochemically shown to lack the ability to cleave short peptide substrates [27,29] in contrast to another digestive chymotrypsin-like enzyme SerP38 from clade A4 [44]. Clade A8 contained putative chymotrypsin-like SPs mainly with GGS primary specificity determinant. The A9 clade also included chymotrypsin-like SPs, but with the GSG structure of the S1 binding subsite, as well as trypsin SerP6 and unusual non-annotated peptidase SerP423; all these SPs were characterized by acidic pI.
Group B contained 105 sequences of which most possessed one or more regulatory domains in the extended propeptide. Clip domains represent the most abundant non-catalytic structural units predicted for 60 of such sequences, divided into four major groups (clip-A, -B, -C and -D) based on clip sequence similarity [41]. Fifteen clip-A proteins exclusively represented by non-active SPHs were clustered together into a single clade B5 including prophenoloxidase (pPO)-activating factor II PPAF II (SerPH415) [55]. Clip-A domain folds as irregular β-sheet [56], which is likely characteristic for all of these related SPHs. Clip-B and clip-C proteins from clades B3 and B2, respectively, mainly presented by trypsins and few SPHs, likely shared a more typical clip domain fold composed of antiparallel distorted β-sheet flanked by two α-helices [57]. It is established that clip-C SPs activate terminal clip-B peptidases of the extracellular immune signaling pathway, which cleave the effector molecules pPO or procytokine proSpätzle [41]. In T. molitor, these peptidases were identified [58] and clip-C trypsin (SerP228) named Tm-SAE is in clade B2. Clip-C SerP228 activates terminal clip-B trypsin Tm-SPE (SerP183) from clade B3, which in turn activates pPO and its inactive cofactor SerPH415 (clade B5), or proSpätzle in the Toll signaling pathway [4]. It must be noted that one clip-B SP from clade B3 (SerP275) contained two clip-B domains. Clip-D trypsins mainly located in clade B9 possessed a propeptide highly variable in length and sequence (108–548 aa) often including prolonged disordered regions downstream of the N-terminal clip domain. A clip-D peptidase HP1 of M. sexta is proposed as an unusual component of immunity associated with the signaling pathway [59].
The B1 clade included two trypsin-like peptidases with the CUB domain in propeptide. Shown to be involved in protein–protein interaction, CUB domain(s) are characteristic for an array of chymotrypsin family SPs such as mammalian complement subcomponents (C1r/C1s), enterokinase, and matriptase. Confirmed to be essential for a diverse range of functions from immune regulation to digestion, development, and morphogenesis in vertebrates [60,61], the role of the CUB domain SPs in insects still needs further research. A highly supported clade B6 contained peptidases with Sushi domains including the majority of polypeptidases and chymotrypsin-like modular SP Tm-MSP (SerP448) that initiates proteolytic signaling cascades activating clip-C trypsin Tm-SAE (SerP228) from clade B2 [4,62]. The clade B7 contained five peptidases with the gastrulation defective (GD) domain. In D. melanogaster embryo, GD SP participates in the developmental Toll signaling pathway [63]. The clade B8 included sequences of long SP-related proteins with a highly variable set of regulatory domains in the propeptides such as Tequila (SerP55), Corin (SerP285), Nudel (pSerP1050), TSP (SerP11), and membrane-associated homologs SerPH364 and SEA (SerPH570). The clade B10 contained predominantly low-expressed trypsins at the stages of embryogenesis and metamorphosis (see Section 2.9.1 and Section 2.9.2). Interestingly, in a tree constructed using only peptidase domain sequences without prepropeptides (Figure S1), major branches with minor variations are retained, including a clade containing the peptidases with the longest propeptides (B8).

2.9. Expression Profiling of SP and SPH Genes in Different Life Stages of T. molitor

To infer the functional role of the described diversity of SPs/SPHs in various physiological processes, we analyzed expression patterns of their transcripts at different stages of the T. molitor life cycle, including eggs, larvae of the II instar, larvae of the IV instar, early and late pupae, and male/female adults. Data for the most highly expressed transcripts at the egg, pupal and feeding larval and adult stages are presented in Table 6, Table 7 and Table 8, respectively, while the expression data for all transcripts are shown as heatmaps in Figure 6, where they are combined into six groups. Group 1—SPs without regulatory domains, expressed at the feeding stages of larvae and adults; group 2—SPs without regulatory domains, expressed in eggs and pupae; group 3—SPHs without regulatory domains, expressed at the feeding stages; group 4—SPHs without regulatory domains, expressed in eggs and pupae; group 5—SPs/SPHs with clip domains; group 6—SPs/SPHs with other regulatory domains.

2.9.1. Embryonic Stage: Eggs

Most of the SPs/SPHs with relatively high mRNA expression levels in the embryonic stage belonged to regulatory proteins, as they contained regulatory clip and GD domains (Table 6, Figure 6(5a,g,6b)). The maximum level of expression in eggs was observed for clip-A SPHs, SerPH236, and Ser PH235, with lower levels at other stages. Transcripts with egg-specific expression showed slightly lower expression levels. Those included clip-B trypsins (SerP166 and SerP116) and SPH (SerPH203), as well as a clip-A SPH SerPH165. Clip-C SPs with moderate expression (SerP145 and SerP61) as well as rather low-expressed SPs with a GD domain (SerP550 and SerP442) demonstrated constitutive expression across most of the stages with the predominance in eggs.
Transcripts without identified regulatory domains in the propeptide with rather low expression levels (Table 6, Figure 6(2a,c)), as well as two SPs with GD domains (SerP466 and SerP454) and clip-A SerPH389, also demonstrated constitutive expression including the egg stage, but with increased levels in the late pupae and IV instar larvae. It should be noted that within this group, three trypsins (SerP28, SerP22, and SerP5) had extended propeptides, but without known regulatory regions, which may indicate the possible presence of potential regulatory domains that have not yet been identified, and, accordingly, specific functions that have not yet been defined. And the only SP in this group with a short propeptide without regulatory domains was a putative elastase SerP156, which could be involved in hydrolytic functions in the egg, such as vitellin hydrolysis.

2.9.2. Metamorphosis: Early and Late Pupae

Most of the highly expressed SP/SPH transcripts at the pupal stages, as well as at the egg stage, contained regulatory domains, and among them, the majority were SPHs with a clip-A domain (Table 7, Figure 6(5d)). In general, SP/SPH transcripts were expressed at both pupal stages, but the levels of expression were higher at the late pupae, and most of the transcripts were also expressed at varying levels across the entire life cycle. The exception was the transcript of the anionic trypsin SerP35 (Table 7, Figure 6(2a)) with a short propeptide, which was specific only for the early pupal stage, and clip-A SerPH78 (Table 7, Figure 6(5g)), which was expressed predominantly at the early pupae. But the highest level of expression at the early pupae was observed for the transcript of homologs SerPH164 with a clip-A domain and SerPH1034 (Table 7, Figure 6(4a)) without regulatory domains, which were upregulated at the late pupae. Noticeable levels of expression were observed here also for transcripts of the SerPH364 homolog and the SerP55 Tequila peptidase (Table 7, Figure 6(6b)), both with a large number of regulatory domains in the propeptide.
At the late pupae in contrast to the early pupae, trypsin SerP28 (Table 7, Figure 6(2a)) had the highest level of expression together with two peptidases with regulatory domains, SerP247 (Table 7, Figure 6(5d)) and SerP466 (Table 7, Figure 6(6b)). The latter belonged to unannotated peptidases, had a GD regulatory domain, and was also actively expressed at the egg stage. The only transcript that was actively expressed at the late pupal stage and was not expressed in the early pupae belonged to the single elastase-like SerP156 (Table 7, Figure 6(2a)) without a regulatory domain. This type of non-regulated peptidases, SerP156 and SerP35 specific for the early pupae, may be involved in specific tissue remodeling at specific pupal stages. Interestingly, SerP156, as well as trypsin SerP28, were among the highly expressed peptidases at the egg stage, and their transcripts were also upregulated at larval stages IV and II, respectively.

2.9.3. Feeding Stages: Larvae and Imago (Adults)

The largest part of the SP/SPH transcripts was expressed at the feeding stages, larvae (II and IV instars) and adults (females and males) (Table 8, Figure 6(1,3)), whereas at the developmental stages, eggs and pupae, their genes were practically silent, which most likely indicates the involvement of these SPs/SPHs in the digestive process. This involvement is also confirmed by the data on the high level of expression of these transcripts in the larval gut transcriptome (Table 8). Almost all these transcripts coded for preproenzymes with a small propeptide without regulatory regions. In most cases, they were processed by trypsin after C-terminal R of the propeptide. The highest levels of expression were from active SPs (Table 8, Figure 6(1)), although highly expressed transcripts at feeding stages were also present in the large group of SPHs (Table 8, Figure 6(3)).
Among 61 transcripts of SPs with the classical catalytic triad HDS expressed at one or more feeding stages (Figure 6(1)), several subgroups could be distinguished with similar expression profiles. Subgroup 1a—SPs with a high level of transcripts expression at all feeding stages; 1b—SPs with a high level of expression at IV instar larvae and imago stages; 1c—SPs expressed only at adult stages; 1d—SPs with a high level of transcripts expression mainly at the IV instar larvae.
Subgroup 1a contained the most highly expressed transcripts of digestive SPs (Figure 6(1a)). The majority of them (10) encoded chymotrypsin-like SPs including the earlier characterized major digestive chymotrypsin SerP69 with an extended binding site [29], two transcripts encoded trypsins including the major digestive trypsin SerP1 [28], and two were elastase-encoding transcripts (SerP85, SerP288). The transcript of chymotrypsin-like SerP108 was characterized by an extremely high level of expression at the early larval stage (Table 8). A similar expression profile was demonstrated by chymotrypsin-like SerP314 and trypsin SerP16. All SPs from subgroup 1a had a pI in the acidic region, with the exception of the major trypsin SerP1 and chymotrypsin SerP69 (Section 2.2 and Section 2.3).
Transcripts from SPs of subgroup 1b expressed at IV instar larvae and adults (Figure 6(1b)) encoded five putative elastase-like SPs, three chymotrypsin-like and three trypsins. The most highly expressed were two elastase-like peptidases, SerP41 and SerP185, and chymotrypsin-like SerP246. The majority of SPs from subgroup 1b also had pI in the acidic region, with the exception of elastase-like SerP74 and trypsin SerP30 (Section 2.2 and Section 2.4). Another trypsin, SerP125, had a C-terminal TM domain.
Transcripts from subgroup 1c encoded SPs expressed predominantly at adult stages. Almost half of the group (five) were non-annotated SPs due to an atypical set of amino acid residues in the S1 subsite (Figure 6(1c), Table 4). The subgroup also included two chymotrypsin-like SPs, three elastase-like, three trypsins, and one trypsin-like SP. All these transcripts had a moderate level of expression with maximum values in non-annotated SerP462 (S1 binding subsite TSF). Interestingly, all non-annotated SPs had alkaline or neutral pI (Section 2.5), while all the other SPs were anionic.
Most of transcripts from subgroup 1d coded for SPs expressed predominantly at the IV instar larvae (Figure 6(1d)). The subgroup included 10 chymotrypsin-like, 6 elastase-like SPs, 5 trypsins, and one non-annotated SP. The maximum level was observed for chymotrypsin-like SerP38 with an unusual S1 binding subsite GGD, but exhibiting substrate specificity typical of chymotrypsins (Table 8) [44]. Another transcript with a high level of expression encoded trypsin SerP209. The remaining transcripts had a moderate or low level of expression. All SPs including the non-annotated one had a PI in the acidic region. Three trypsins (SerP76, SerP84, SerP104) with low levels of transcript expression had a C-terminal TM domain (Section 2.2).
It must be noted that we found two peptidases with regulatory domains expressed only at the feeding stages: trypsin SerP282 with clip-C domain and trypsin-like SerP178 with a CUB domain (Figure 6(5e,6c)).
Thus, group 1 of 61 SPs (Figure 6(1)) was related to digestion since their transcripts were expressed predominantly at feeding stages, and included the majority of identified chymotrypsin-, elastase-like, and non-annotated SPs without regulatory domains. At the same time, only about a half of non-regulatory trypsins have a similar connection with digestion. The general trend of digestive SPs expression level increase from early to the late larvae instars previously documented [65,66] was confirmed here regarding the expression of transcripts encoding the major digestive SPs of T. molitor larvae. Only a few SP transcripts were predominantly expressed at the early larval stage including three chymotrypsin-like enzymes: major SerP108, SerP314, and SerP16 (Table 8, Figure 6(1a)).
In addition to transcripts of active SPs with the classical catalytic active center, 95 transcripts coding for SPHs were predominantly expressed at feeding stages (Table 8, Figure 6(3)) and most of them can be associated with digestive function. The majority of these SPHs, as well as SPs expressed at feeding stages, had a small propeptide without regulatory domains, being processed to mature form by trypsin. The majority of the SPH transcripts were significantly upregulated at the IV larval instar, and the most highly expressed are summarized in Table 8. Almost all SPH transcripts were also confirmed in adults although with lower levels, and only about a quarter of the transcripts was also expressed at the II instar larvae. Two SPH transcripts (SerPH393 and SerPH485) had a significant level of expression only at the adult stage (Figure 6(3a)), while no transcripts specific to the II instar larvae were identified. Note that among the highly expressed SPHs (Table 8), there are SerPH122 and SerPH245 with conservative Ser/Thr substitution in the active center in contrast to the radical replacements in the other SPHs. Characterization of recombinant SerPH122 showed that this synonymous homolog had low but reliably detectable proteolytic activity towards chymotrypsin and trypsin chromogenic peptide substrates [33].
The exact role of SPHs is still poorly understood. Nevertheless, whole genome microarray analysis of T. castaneum larvae revealed that the transcripts of ten SPH genes were upregulated more than 5-fold as compensation for the effects of cysteine and serine peptidases dietary inhibitors [67]. Also, according to the Section 2.8. mention of the role of clip-A SerPH415 (PPAF-II) in activation of pPO [55], it may be speculated that the above-described major SPHs induced in the feeding stages are somehow involved in luminal digestive SPs activation.

2.9.4. Constitutively Expressed SP-Related Proteins of T. molitor

Another important group of transcripts included SPs/SPHs expressed at several or all stages of the beetle life cycle and presumably participated in important physiological processes such as immune defense, adhesion, regulation of development, and metabolism. Most of the SPs/SPHs with a sufficiently high level of expression at all or most of the life cycle stages had regulatory regions in the sequence structure (Figure 6(5d–g,6b)), and only about one third of the transcripts lacked regulatory domains (Figure 6(2,4)), the majority of which were active SPs.
SPs/SPHs expressed at all stages included the ones with a clip domain of different subtypes (clip-A, clip-B, clip-C, and clip-D) (Figure 6(5d–g)). The majority of peptidases with the Sushi domain were also expressed at all or most stages of the life cycle. They included polypeptidases and MSP-like SPs (chymotrypsin-like SerP449 and non-annotated SerP355) containing a Sushi domain and four LDL domains (Figure 6(6b,c)). Peptidases containing the GD domain had similar expression profiles. Four out of five of them (SerP442, SerP454, SerP466, SerP550) were expressed at all stages of development with their higher level at the egg and pupa stages, and SerP466 was also found to be highly expressed in adult females. Trypsins, which had a complex multidomain structure, were also expressed at most stages of the life cycle. These peptidases included SerP55 (Tequila), SerP285 (Corin), and SerP11 (TSP).

3. Discussion

SP-related proteins of S1A family identified in the T. molitor transcriptome include 269 sequences of which 137 were identified as active SPs with classical catalytic residues, and 125 were annotated as putative non-active SPHs that possess one or more substitutions in the catalytic triad. Seven deduced sequences containing several SP/SPH domains were putative polypeptidases, for which the physiological role remains generally unknown. T. molitor SPs/SPHs of the S1A chymotrypsin family occupy an intermediate position among insects in terms of the number of identified sequences. A comparable number of SP-related sequences (257) was described for D. melanogaster (Diptera: Brachycera) [12], whereas in mosquitoes A. aegypti and A. gambiae (Diptera: Nematocera), genome-wide analysis identified 369 and 337 SP-related sequences, respectively [12,15]. A significantly lower number with only 44 identified sequences of putative SPs/SPHs was described in A. mellifera (Hymenoptera) [23].
In T. molitor, 84 SPs and 102 SPHs without regulatory domains constitute the largest group of SP-related proteins. Transcripts of 61 SPs were expressed only in the feeding life stages; 24 of them were highly expressed in the larval gut and presumably play an important role in digestion. Similar quantitative data were previously obtained for other insects including larvae of D. melanogaster (53 gut peptidases of which 35 were highly expressed) [12], A. gambiae (63 and 27, respectively) [14], and M. sexta (61 and 35, respectively) [18]. But even closely related insects have functional differences in the general set of digestive SPs; for example, the most highly expressed SP in T. molitor is trypsin SerP1, and in T. castaneum it is chymotrypsin XP_970603.1, although their major digestive cysteine peptidases are orthologs with 74% identity [68]. At the same time, there is a close link between the primary structure of the certain digestive SPs and their functions. Accordingly, a comparison of two orthologous pairs of T. molitor and T. castaneum chymotrypsin-like digestive SPs, SerP38 and CBC01177 (pair I, respectively), and SerP88 and CBC01166 (pair II, respectively), shows that pair I was expressed at the larval and adult stages, while pair II was expressed only in the larval gut [69].
The remaining 23 transcripts of SPs without regulatory domains showed constitutive or specific expression at certain stages of T. molitor development. The physiological role of most of these SPs requires further study, but it can be assumed that SPs showing high expression at the egg stage participate in the hydrolysis of storage proteins, as was previously shown for B. mori [1], while the SPs expressed at the pupal stages of T. molitor can be involved in the breakdown of the larval structures during metamorphosis.
In addition to SPs, the largest group of 95 T. molitor SPH sequences lacking regulatory regions were also expressed predominantly during feeding stages. The physiological role of SPHs is still poorly understood; however, some of them highly expressed (9 out of 95) during the feeding stages may play a certain regulatory role that may be related with digestive peptidase activation or their interaction with substrates or inhibitors in the midgut lumen. It was shown that some of the homologs are able to bind with the substrates and even provide a low-rate hydrolysis [33,50].
Another group of SP-related proteins identified included 53 sequences of SPs and 23 SPHs with regulatory domains, such as different clips, LDL, SRCR, TSP, and others. While having significantly lower expression levels than that of the gut digestive peptidases, most of them demonstrated constitutive expression throughout the entire life cycle, while specific SPs and SPHs with various regulatory domains demonstrated increased expression at eggs or pupae stages.
Among these sequences, clip SPs/SPHs were the most numerous. Of the 60 SPs/SPHs with a clip domain that we identified in T. molitor, 16 belonged to the clip-A type (all SPHs), 16 to clip-B (13 SPs and 3 SPHs), 17 to clip-C (15 SPs and 2 SPHs), and 11 to clip-D (all SPs). A total number of 60 clip SPs/SPHs is close to 54 sequences identified in the closely related T. castaneum [12,34], and about twice the amount of clip SPs/SPHs, including a distinct subtype clip-E SPs, was identified in mosquitoes A. aegypti and A. gambiae [3,14]. According to the available data, SPs/SPHs with clip domains are non-digestive and are present in the hemolymph of insects and other arthropods. They play an important role in regulation of various physiological processes in insects like innate immune responses leading to activation of pPO necessary for melanization, activation of the Toll-dependent signaling pathway leading to synthesis of antimicrobial peptides [41] or regulation of the dorsal–ventral pattern in D. melanogaster embryos [70], as well as regulate the coagulation cascade during hemolymph clotting in crabs [71].
The majority of T. molitor clip-containing transcripts were expressed at all or most stages of the ontogeny, but three of them were specific to the egg stage (SerP116 and SerP166 and SerPH203), while at the pupae stage, only increased expression of constitutively expressed clip transcripts was observed. The egg stage specificity of SerP116 and SerP166 was also described by Wu et al. [34] using RT-PCR analysis. The only experimental data on the specific roles of clip SPs/SPHs in T. molitor came from B.L. Lee’s laboratory, where the extracellular larval activation cascade of the Toll receptor and pPO was characterized in detail [4,55,58,62]. The proteolytic part of the cascade starts with SerP449 with multiple regulatory domains (MSP), which activates the downstream proSerP228 with clip-C domain (proSAE), which in turn activates proSerP183 (proSPE) with clip-B involved in proSpätzle or pPO activation, but processing of pPO requires additional activation of clip-A homolog proSerPH415.
The remaining smaller part of T. molitor SPs/SPHs had different regulatory domains. Transcripts of SPs with a GD domain were expressed constitutively throughout the entire T. molitor life cycle including eggs and pupae, and all of them were from the non-annotated group of SPs. Similar peptidases with the GD domain were well studied in D. melanogaster, but for the egg stage only [49,63,70]. The stable constitutive mRNA expression of these peptidases in T. molitor transcriptomes indicates their possible participation in a wide range of physiological processes in addition to the expected involvement in the cascades forming embryonic polarity during egg development. Another transcript of a large SP Tequila (SerP55) with a variety of regulatory domains was upregulated during the T. molitor pupal and adult stages, and in D. melanogaster this SP was found throughout development participating in immunity response [72].
One of the most interesting groups in T. molitor were polypeptidases, mainly expressed at the pupal and adult stages. Six of them comprise two or three SP/SPH domains and several Sushi domains (Sushi(2)-SP(H)-Sushi(2)-SPH(-Sushi-SPH)). A similar domain architecture, including several peptidase domains and several Sushi domains, has a peptidase SP14 in T. castaneum [12]. In A. gambiae, several polypeptidases with a little different structure were identified (SP(H)-SPH-clipE-SPH) [14]. In addition, a polypeptidase Nudel (pSerP1050) was also found in T. molitor, which contained two peptidase domains—trypsin and an SPH domain with LDL domains. Similar Nudel (LDL(2)-SP-LDL(2)-SPH-LDL(3)) peptidases were identified in many insects [12,14,21]. In D. melanogaster embryo, Nudel initiates the peptidase cascade related with dorsal–ventral patterning [70]. Thus, complex polypeptidases were found in insects, but this issue requires further study in order to accurately identify the structure and functions of such proteins.
The great diversity and abundance of serine peptidases of the chymotrypsin S1A family in various insects provide great opportunities for a more detailed study of insects important for agriculture and/or medicine, and for a fundamental understanding of their physiology. We hope that our study will allow scientists to move in this direction.

4. Materials and Methods

4.1. Preparation of Biological Material, RNA Isolation and cDNA Sequencing

Whole-body transcriptomes from different stages of the life cycle of T. molitor were obtained from the laboratory colony at the Lomonosov Moscow State University (Moscow, Russia), maintained on milled oat flakes at 26 ± 0.5 °C and 75% relative humidity, 0L:24D. Insects were subcultured from the stock colony to obtain specific life stages. Larvae of the II and IV instars were collected one and five weeks post hatch. Not yet pigmented early pupae were sampled immediately after the moult and at the half of the pupal instar (at 10 days post moult). Adults used for the analysis were two weeks after eclosion (males and females separately). Eggs were sifted out of diet 24–48 h after oviposition. Eggs, larvae II and larvae IV, early and late pupae, adult males were collected in two independent biological samples, and adult females were taken in three replicates. RNA was extracted using the RNEasy Mini kit (Qiagen, Hilden, Germany). Immediately prior to isolation, the samples were homogenized by trituration in liquid nitrogen. The concentration of isolated RNA was measured on a Qubit (Thermofisher, Waltham, MA, USA) fluorimeter using a set of reagents for high-sensitivity RNA analysis. The integrity of the RNA was checked by capillary electrophoresis on a Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA). The NEBNext RNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) was used to prepare the libraries according to the recommended protocol with a fragmentation time of 5 min. Sequencing of T. molitor developmental stages libraries was performed on an Illumina HiSeq 2000 (Lomonosov Moscow State University, Moscow, Russia) using the TruSeq SBS Kit v3 reagent kit (200 cycles) with the following settings: read length 101, index read length 7, reverse reading length 101. The preprocessed samples contained from 7 million to 24 million reads.
Preparation of biological material, RNA isolation, and cDNA sequencing for gut transcriptome data from T. molitor larvae were performed as described earlier [68]. Approximately 240 million sequence reads were obtained, with an approximate 250 bp insert.

4.2. Transcriptomes Assembly

Three different types of T. molitor transcriptome assemblies were used in the research.

4.2.1. Assembly of Larval Gut Sequences

Assembly of T. molitor larval gut sequences was performed de novo with SeqManNGen (v. 4.0.1.4, DNAStar, Madison, WI, USA) as described in [68]. It included NCGR assembly from all replicates, resulting in 197,800 contigs (N50 = 2232) combined with previous databases of Sanger sequencing [27] and pyrosequencing [30] of mRNA from the larval gut.

4.2.2. Assemblies of Different Developmental Stages

In the transcriptomes of different developmental stages, the quality of the reads was assessed by the MultiQC program (https://multiqc.info) (accessed on 17 November 2023) [73] and preprocessed in Trimmomatic to remove adapters and filter short and low-quality reads (ILLUMINACLIP:TruSeq3-SE:2:30:10, MINLEN:30, SLIDINGWINDOW:5:20) [74]. The reads were mapped to the total transcriptome of T. molitor using HISAT2 [75] with mapped reads rate ranging from 84% to 93%. Assembly of transcripts was performed by the Cufflinks program [76] and abundance estimation was assessed with StringTie (-B option) [77].

4.2.3. The Total T. molitor Transcriptome Assembly

The total T. molitor transcriptome assembly was performed with SeqManNGen (v 15.0.0.160, default parameters) and included the gut assembly (240 million reads) (Section 4.2.1) combined with the Illumina sequencing data obtained for T. molitor developmental stages (Section 4.2.1) (628 million reads). There were 342,592,161 total reads assembled, with 143,807,206 reads not assembled and 382,435,025 removed during sampling due to read depth. Reads were assembled into 130,559 contigs, with 36,463 contigs of the length greater than 1 kb.

4.3. SP/SPH Identification in the Transcriptomes

BLAST [78] was used to identify ORFs homologous to those encoding SP/SPH. The sequence of human trypsin 2 (UniProt AC P07478) was used as a query and further identified T. molitor SP/SPH from different groups were used as queries to search for new sequences. Multiple sequence alignment with BioEdit (v. 7.0.5) [79] was used to refine and build consensus sequences, and in the case of SNPs, the amino acid chosen was the highest percentage and more than 50% of the total. ORFs that were grouped into blocks with identity of at least 95% and that overlapped with another block of at least 10 amino acid residues were considered as referring to a unique peptidase. The resulting sequences were compared with those available in three newly sequenced T. molitor genome versions (PRJNA820846: GCA_027725215.1; PRJNA579236: GCA_014282415.3; PRJEB44755: GCA_907166875.3) [80,81].

4.4. Analysis of Protein Sequences

Positions of propeptide cleavage site, active site, and S1 substrate-binding subsite residues were predicted by sequence homology through alignment with mature human trypsin 2 (UniProt AC P07478) using BioEdit and Clustal Omega multiple sequence alignment tool (https://www.ebi.ac.uk/Tools/msa/clustalo/) (accessed on 1 April 2024) [82]. Signal peptide was predicted with SignalP 5.0 server (https://services.healthtech.dtu.dk/services/TMHMM-2.0) (accessed on 1 April 2024) (https://services.healthtech.dtu.dk/services/SignalP-5.0/) (accessed on 1 April 2024) [83]. Transmembrane region was predicted with TMHMM Server (v.2.0) (https://services.healthtech.dtu.dk/services/TMHMM-2.0/) (accessed on 1 April 2024) [84], Phobius webserver [85], and TMDOCK server (https://membranome.org/tmdock) (accessed on 5 April 2024) [86]. Domain structure was analyzed using InterProScan (http://www.ebi.ac.uk/interpro/) (accessed on 1 April 2024) [87] and NCBI CDD databases (http://www.ncbi.nlm.nih.gov/Structure/cdd/docs/cdd_search.html) (accessed on 1 April 2024) [88]. Clip domains were identified in the InterProScan; however, some clip domains were identified manually by checking the amino acid sequence of the protein for the presence of Cys doublet in the region close to the peptidase or peptidase-like domain and with four additional Cys residues upstream of the doublet. This combination was designated as clip [41]. The molecular mass and isoelectric point of the mature enzyme of the predicted protein was computed using ExPASy server (https://web.expasy.org/compute_pi/) (accessed on 1 April 2024) [89]. To annotate the substrate specificity of SPs, the sequences were aligned and divided into several types (trypsins, trypsin-like, chymotrypsin-like, elastase-like, and non-annotated) according to the residues in S1 substrate-binding subsite at positions 190, 216, and 226 (chymotrypsin numbering) [9].

4.5. Phylogenetic Analysis

Multiple SP/SPH sequence alignments were performed using the MAFFT version 7 (https://mafft.cbrc.jp/alignment/server/) (accessed on 5 April 2024) [90] with default parameters. The phylogenetic tree was constructed using maximum likelihood method by IQ-TREE server in ultrafast mode with 1000 repetitions [91]. FigTree 1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/) (accessed on 5 April 2024) was used to visualize the phylogenetic trees.

4.6. Expression Profiling of SP and SPH at Different Developmental Stages

The expression values were calculated for assembled and refined sequences of complete peptidase mRNAs obtained from T. molitor transcriptomes and genomes (Section 4.3). To obtain expression values for peptidase mRNA by normalized reads per kilobase per million mapped reads (RPKM) [92], a custom script was used using tBLASTn, calculating each multiread as one unit. RPKM values in biological repeats were averaged for each stage of the life cycle. The transcript of eukaryotic translation factor 3 subunit B (NCBI ID: CAH1377306) was used as a housekeeping protein. Hierarchically clustered gradient heat maps of log2(RPKM+1) values were plotted using TBtools [93]. A Kruskal–Wallis test [94] was conducted among the life stages (df = 5), calculated from total RPKM values on Statistics Kingdom webserver (https://www.statskingdom.com/index.html) (accessed on 25 March 2024) [95]. The resulting p-values were adjusted using the Benjamini and Hochberg approach [64].

5. Conclusions

Serine peptidases (SPs) and homologs (SPHs) of the S1A family constitute a very diverse family of mostly secreted proteins involved in a variety of processes including digestion as well as development and innate immunity regulation. A thorough analysis of several transcriptomes and two newly sequenced genomes of T. molitor allowed us to update available information and identify 269 SPs and SPHs in this insect, performing sequence analysis and annotation, constructing phylogenetic relationships, and evaluating expression patterns across the entire life cycle. For 122 SPs, their putative trypsin-, chymotrypsin- and elastase-like specificities were predicted from the S1 binding subsite sequence analysis, and for 15 non-annotated SPs, specificity remains obscure, due to peculiarities of their S1 subsite structure. All studied SP-related sequences of T. molitor were grouped according to the organization of their propeptide region. The largest group of 84 SPs and 102 SPHs had no regulatory domains, while the remaining 53 SPs and 23 SPHs had different regulatory domains in the propeptide. Transcripts of 61 SPs without regulatory domains were expressed only in the feeding life stages likely being involved in digestion. The remaining 23 transcripts of SPs without regulatory domains showed mostly constitutive expression while those upregulated at the egg and pupa stages may be involved in the hydrolysis of storage proteins and in the breakdown of the larval structures during metamorphosis, respectively. In addition to SPs, the largest group of 95 T. molitor SPH sequences lacking regulatory regions were also expressed predominantly during feeding stages and their physiological role is presumably related to the digestive process; in particular, it may be an interaction with substrates or inhibitors in the midgut lumen.
The group of SPs and SPHs with regulatory domains contained in the propeptide four types of clips (A–D), GD, Sushi, LDL, SEA, PAN, FZ, TSP, EGF, CUB, SRCR, and CBM domains. Transcripts from the majority of these proteins were expressed constitutively throughout the entire life cycle of T. molitor, while some of them were specific to the egg stage or/and upregulated at the pupal stage. For most of these regulatory SP/SPH transcripts, a significantly lower expression level was documented than for the above-described transcripts associated with digestive functions. One of the most interesting groups in T. molitor were seven polypeptidases, mainly expressed at the pupal and adult stages. Most of them comprise two or three SP/SPH domains and several Sushi domains. Similar complex polypeptidases were identified in a few insect species, but this group of proteins requires further study in order to accurately identify their structure and functions. The data obtained provide valuable information for further studies on biological functions in insects of the diverse S1A peptidase family.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms25115743/s1.

Author Contributions

Conceptualization, E.N.E.; validation, V.F.T.; investigation, N.I.Z. and R.S.S.; data curation, K.S.V.; writing—original draft preparation, N.I.Z.; writing—review and editing, K.S.V., E.N.E. and Y.E.D.; visualization, N.I.Z.; supervision, M.A.B.; funding acquisition, E.N.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Russian Foundation for Basic Research, grant number 20-54-56044 Iran T (issued to E.N.E.).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Raw sequencing data can be accessed through the SRA database. SRA site: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1099774?reviewer=erel6q9bo7vvv7n3ii6hl8c0na (accessed on 1 April 2024).

Acknowledgments

We are grateful to Anastasia A. Zharikova and M. Kosimov for valuable advices on editing individual sections of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ikeda, M.; Yaginuma, T.; Kobayashi, M.; Yamashita, O. cDNA cloning, sequencing and temporal expression of the protease responsible for vitellin degradation in the silkworm, Bombyx mori. Comp. Biochem. Physiol. B 1991, 99, 405–411. [Google Scholar] [CrossRef] [PubMed]
  2. Choo, Y.M.; Lee, K.S.; Yoon, H.J.; Lee, S.B.; Kim, J.H.; Sohn, H.D.; Jin, B.R. A serine protease from the midgut of the bumblebee, Bombus ignites (Hymenoptera: Apidae): cDNA cloning, gene structure, expression and enzyme activity. Eur. J. Entomol. 2007, 104, 1–7. [Google Scholar] [CrossRef]
  3. Waterhouse, R.M.; Kriventseva, E.V.; Meister, S.; Xi, Z.; Alvarez, K.S.; Bartholomay, L.C.; Barillas-Mury, C.; Bian, G.; Blandin, S.; Christensen, B.M.; et al. Evolutionary dynamics of immune-related genes and pathways in disease-vector mosquitoes. Science 2007, 316, 1738–1743. [Google Scholar] [CrossRef] [PubMed]
  4. Kan, H.; Kim, C.H.; Kwon, H.M.; Park, J.W.; Roh, K.B.; Lee, H.; Park, B.J.; Zhang, R.; Zhang, J.; Söderhäll, K.; et al. Molecular control of phenoloxidase-induced melanin synthesis in an insect. J. Biol. Chem. 2008, 283, 25316–25323. [Google Scholar] [CrossRef] [PubMed]
  5. Jiang, H.; Vilcinskas, A.; Kanost, M.R. Immunity in lepidopteran insects. Adv. Exp. Med. Biol. 2010, 708, 181–204. [Google Scholar] [CrossRef] [PubMed]
  6. Veillard, F.; Troxler, L.; Reichhart, J.M. Drosophila melanogaster clip-domain serine proteases: Structure, function and regulation. Biochimie 2016, 22, 255–269. [Google Scholar] [CrossRef] [PubMed]
  7. Clark, K.D. Insect Hemolymph Immune Complexes. Subcell Biochem. 2020, 94, 123–161. [Google Scholar] [CrossRef] [PubMed]
  8. Contreras, E.G.; Glavic, Á.; Brand, A.H.; Sierralta, J.A. The serine protease homolog, scarface, is sensitive to nutrient availability and modulates the development of the Drosophila blood-brain barrier. J. Neurosci. 2021, 41, 6430–6448. [Google Scholar] [CrossRef] [PubMed]
  9. Perona, J.J.; Craik, C.S. Structural basis of substrate specificity in the serine proteases. Protein Sci. 1995, 4, 337–360. [Google Scholar] [CrossRef]
  10. Bao, Y.Y.; Qin, X.; Yu, B.; Chen, L.B.; Wang, Z.C.; Zhang, C.X. Genomic insights into the serine protease gene family and expression profile analysis in the planthopper, Nilaparvata lugens. BMC Genom. 2014, 15, 507. [Google Scholar] [CrossRef]
  11. Ross, J.; Jiang, H.; Kanost, M.; Wanga, Y. Serine proteases and their homologs in the Drosophila melanogaster genome: An initial analysis of sequence conservation and phylogenetic relationships. Gene 2003, 304, 117–131. [Google Scholar] [CrossRef] [PubMed]
  12. Cao, X.; Jiang, H. Building a platform for predicting functions of serine protease-related proteins in Drosophila melanogaster and other insects. Insect Biochem. Mol. Biol. 2018, 103, 53–69. [Google Scholar] [CrossRef] [PubMed]
  13. Christophides, G.K.; Zdobnov, E.; Barillas-Mury, C.; Birney, E.; Blandin, S.; Blass, C.; Brey, P.T.; Collins, F.H.; Danielli, A.; Dimopoulos, G.; et al. Immunity-related genes and gene families in Anopheles gambiae. Science 2002, 298, 159–165. [Google Scholar] [CrossRef] [PubMed]
  14. Cao, X.; Gulati, M.; Jiang, H. Serine protease-related proteins in the malaria mosquito, Anopheles gambiae. Insect Biochem. Mol. Biol. 2017, 88, 48–62. [Google Scholar] [CrossRef] [PubMed]
  15. Brackney, D.E.; Isoe, J.; Black, W.C.; Zamora, J.; Foy, B.D.; Miesfeld, R.L.; Olson, K.E. Expression profiling and comparative analyses of seven midgut serine proteases from the yellow fever mosquito, Aedes aegypti. J. Insect Physiol. 2010, 56, 736–744. [Google Scholar] [CrossRef] [PubMed]
  16. Soares, T.S.; Watanabe, R.M.O.; Lemos, F.J.A.; Tanaka, A.S. Molecular characterization of genes encoding trypsinlike enzymes from Aedes aegypti larvae and identification of digestive enzymes. Gene 2011, 489, 70–75. [Google Scholar] [CrossRef] [PubMed]
  17. Cao, X.; He, Y.; Hu, Y.; Zhang, X.; Wang, Y.; Zou, Z.; Chen, Y.; Blissard, G.W.; Kanost, M.R.; Jiang, H. Sequence conservation, phylogenetic relationships, and expression profiles of nondigestive serine proteases and serine protease homologs in Manduca sexta. Insect Biochem. Mol. Biol. 2015, 62, 51–63. [Google Scholar] [CrossRef] [PubMed]
  18. Miao, Z.; Cao, X.; Jiang, H. Digestion-related proteins in the tobacco hornworm, Manduca sexta. Insect Biochem. Mol. Biol. 2020, 126, 103457. [Google Scholar] [CrossRef] [PubMed]
  19. Zhao, P.; Wang, G.H.; Dong, Z.M.; Duan, J.; Xu, P.Z.; Cheng, T.C.; Xiang, Z.H.; Xia, Q.Y. Genome-wide identification and expression analysis of serine proteases and homologs in the silkworm Bombyx mori. BMC Genom. 2010, 11, 405. [Google Scholar] [CrossRef]
  20. Liu, H.; Heng, J.; Wang, L.; Tang, X.; Guo, P.; Li, Y.; Xia, Q.; Zhao, P. Identification, characterization, and expression analysis of clip-domain serine protease genes in the silkworm, Bombyx mori. Dev. Comp. Immunol. 2020, 105, 103584. [Google Scholar] [CrossRef]
  21. Lin, H.; Xia, X.; Yu, L.; Vasseur, L.; Gurr, G.M.; Yao, F.; Yang, G.; You, M. Genome-wide identification and expression profiling of serine proteases and homologs in the diamondback moth, Plutella xylostella (L.). BMC Genom. 2015, 16, 1054. [Google Scholar] [CrossRef] [PubMed]
  22. Yang, L.; Xing, B.Q.; Wang, L.K.; Yuan, L.L.; Manzoor, M.; Li, F.; Wu, S. Identification of serine protease, serine protease homolog and prophenoloxidase genes in Spodoptera frugiperda (Lepidoptera: Noctuidae). J. Asia-Pac. Entomol. 2021, 24, 1144–1152. [Google Scholar] [CrossRef]
  23. Zou, Z.; Lopez, D.L.; Kanost, M.R.; Evans, J.D.; Jiang, H. Comparative analysis of serine protease-related genes in the honey bee genome: Possible involvement in embryonic development and innate immunity. Insect Mol. Biol. 2006, 15, 603–614. [Google Scholar] [CrossRef] [PubMed]
  24. Yang, L.; Lin, Z.; Fang, Q.; Wang, J.; Yan, Z.; Zou, Z.; Song, Q.; Ye, G. The genomic and transcriptomic analyses of serine proteases and their homologs in an endoparasitoid, Pteromalus puparum. Dev. Comp. Immunol. 2017, 77, 56–68. [Google Scholar] [CrossRef] [PubMed]
  25. Oppert, B.; Muszewska, A.; Steczkiewicz, K.; Šatović-Vukšić, E.; Plohl, M.; Fabrick, J.A.; Vinokurov, K.S.; Koloniuk, I.; Johnston, J.S.; Smith, T.P.L.; et al. The Genome of Rhyzopertha dominica (Fab.) (Coleoptera: Bostrichidae): Adaptation for Success. Genes 2022, 13, 446. [Google Scholar] [CrossRef] [PubMed]
  26. Tribolium Sequencing Consortium. The genome of the model beetle and pest Tribolium castaneum. Nature 2008, 452, 949–955. [Google Scholar] [CrossRef] [PubMed]
  27. Prabhakar, S.; Chen, M.-S.; Elpidina, E.N.; Vinokurov, K.S.; Smith, C.M.; Marshall, J.; Oppert, B. Sequence analysis and molecular characterization of larval midgut cDNA transcripts encoding peptidases from the yellow mealworm, Tenebrio molitor L. Insect Mol. Biol. 2007, 16, 455–468. [Google Scholar] [CrossRef] [PubMed]
  28. Tsybina, T.A.; Dunaevsky, Y.E.; Belozersky, M.A.; Zhuzhikov, D.P.; Oppert, B.; Elpidina, E.N. Digestive proteinases of yellow mealworm (Tenebrio molitor) larvae: Purification and characterization of a trypsin-like proteinase. Biochemistry 2005, 70, 300–305. [Google Scholar] [CrossRef] [PubMed]
  29. Elpidina, E.N.; Tsybina, T.A.; Dunaevsky, Y.E.; Belozersky, M.A.; Zhuzhikov, D.P.; Oppert, B. A chymotrypsin-like proteinase from the midgut of Tenebrio molitor larvae. Biochimie 2005, 87, 771–779. [Google Scholar] [CrossRef]
  30. Oppert, B.; Dowd, S.E.; Bouffard, P.; Li, L.; Conesa, A.; Lorenzen, M.D.; Toutges, M.; Marshall, J.; Huestis, D.L.; Fabrick, J.; et al. Transcriptome profiling of the intoxication response of Tenebrio molitor larvae to Bacillus thuringiensis Cry3Aa protoxin. PLoS ONE 2012, 7, e34624. [Google Scholar] [CrossRef]
  31. Zhiganov, N.I.; Tereshchenkova, V.F.; Oppert, B.; Filippova, I.Y.; Belyaeva, N.V.; Dunaevsky, Y.E.; Belozersky, M.A.; Elpidina, E.N. The dataset of predicted trypsin serine peptidases and their inactive homologs in Tenebrio molitor transcriptomes. Data Brief 2021, 38, 107301. [Google Scholar] [CrossRef] [PubMed]
  32. Gorbunov, A.A.; Akentyev, F.I.; Gubaidullin, I.I.; Zhiganov, N.I.; Tereshchenkova, V.F.; Elpidina, E.N.; Kozlov, D.G. Biosynthesis and Secretion of Serine Peptidase SerP38 from Tenebrio molitor in the Yeast Komagataella kurtzmanii. Appl. Biochem. Microbiol. 2021, 57, 917–924. [Google Scholar] [CrossRef]
  33. Tereshchenkova, V.F.; Zhiganov, N.I.; Akentyev, P.I.; Gubaidullin, I.I.; Kozlov, D.G.; Belyaeva, N.V.; Filippova, I.Y.; Elpidina, E.N. Preparation and properties of the recombinant Tenebrio molitor SerPH122—Proteolytically active homolog of serine peptidase. Appl. Biochem. Microbiol. 2021, 57, 579–585. [Google Scholar] [CrossRef]
  34. Wu, C.Y.; Xiao, K.R.; Wang, L.Z.; Wang, J.; Song, Q.S.; Stanley, D.; Wei, S.J.; Zhu, J.Y. Identification and expression profiling of serine protease-related genes in Tenebrio molitor. Arch. Insect Biochem. Physiol. 2022, 111, e21963. [Google Scholar] [CrossRef] [PubMed]
  35. Errico, S.; Spagnoletta, A.; Verardi, A.; Moliterni, S.; Dimatteo, S.; Sangiorgio, P. Tenebrio molitor as a source of interesting natural compounds, their recovery processes, biological effects, and safety aspects. Compr. Rev. Food Sci. Food Saf. 2022, 21, 148–197. [Google Scholar] [CrossRef] [PubMed]
  36. Moncada-Pazos, A.; Cal, S.; Lopez-Otín, C. Polyserases. In Handbook of Proteolytic Enzymes, 3rd ed.; Rawlings, N.D., Salvesen, G., Eds.; Academic Press: London, UK, 2013; pp. 2990–2994. [Google Scholar]
  37. Schechter, I.; Berger, A. On the size of the active site in proteases. I. Papain. Biochem. Biophys. Res. Commun. 1967, 27, 157–162. [Google Scholar] [CrossRef] [PubMed]
  38. Botos, I.; Meyer, E.; Nguyen, M.; Swanson, S.M.; Koomen, J.M.; Russell, D.H.; Meyer, E.F. The structure of an insect chymotrypsin. J. Mol. Biol. 2000, 298, 895–901. [Google Scholar] [CrossRef] [PubMed]
  39. Rawlings, N.D.; Barrett, A.J. Introduction: Serine peptidases and their clans. In Handbook of Proteolytic Enzymes, 3rd ed.; Rawlings, N.D., Salvesen, G., Eds.; Academic Press: London, UK, 2013; pp. 2491–2523. [Google Scholar]
  40. Baird, T.T., Jr.; Craik, C.S. Trypsin. In Handbook of Proteolytic Enzymes, 3rd ed.; Rawlings, N.D., Salvesen, G., Eds.; Academic Press: London, UK, 2013; pp. 2594–2600. [Google Scholar]
  41. Kanost, M.R.; Jiang, H. Clip-domain serine proteases as immune factors in insect hemolymph. Curr. Opin. Insect Sci. 2015, 11, 47–55. [Google Scholar] [CrossRef] [PubMed]
  42. Lopes, A.R.; Sato, P.M.; Terra, W.R. Insect chymotrypsins: Chloromethyl ketone inactivation and substrate specificity relative to possible coevolutional adaptation of insects and plants. Arch. Insect Biochem. Physiol. 2009, 70, 188–203. [Google Scholar] [CrossRef]
  43. Whitworth, S.T.; Blum, M.S.; Travis, J. Proteolytic enzymes from larvae of the fire ant, Solenopsis invicta. Isolation and characterization of four serine endopeptidases. J. Biol. Chem. 1998, 273, 14430–14434. [Google Scholar] [CrossRef]
  44. Tereshchenkova, V.F.; Zhiganov, N.I.; Gubaeva, A.S.; Akentyev, F.I.; Dunaevsky, Y.E.; Kozlov, D.G.; Belozersky, M.A.; Elpidina, E.N. Characteristics of recombinant chymotrypsin-like peptidase from the midgut of Tenebrio molitor larvae. Appl. Biochem. Microbiol. 2024, 60, 420–430. [Google Scholar] [CrossRef]
  45. Tsu, C.A.; Perona, J.J.; Schellenberger, V.; Turck, C.W.; Craik, C.S. The substrate specificity of Uca pugilator collagenolytic serine protease 1 correlates with the bovine type I collagen cleavage sites. J. Biol. Chem. 1994, 269, 19565–19572. [Google Scholar] [CrossRef] [PubMed]
  46. Tsu, C.A.; Craik, C.S. Substrate recognition by recombinant serine collagenase 1 from Uca pugilator. J. Biol. Chem. 1996, 271, 11563–11570. [Google Scholar] [CrossRef] [PubMed]
  47. Bode, W.; Meyer, E., Jr.; Powers, J.C. Human leukocyte and porcine pancreatic elastase: X-ray crystal structures, mechanism, substrate specificity, and mechanism-based inhibitors. Biochemistry 1989, 28, 1951–1963. [Google Scholar] [CrossRef] [PubMed]
  48. Oliveira, E.B.; Salgado, M.C.O. Pancreatic elastases. In Handbook of Proteolytic Enzymes, 3rd ed.; Rawlings, N.D., Salvesen, G., Eds.; Academic Press: London, UK, 2013; pp. 2639–2645. [Google Scholar] [CrossRef]
  49. DeLotto, R. Gastrulation defective, a complement factor C2/B-like protease, interprets a ventral prepattern in Drosophila. EMBO Rep. 2001, 2, 721–726. [Google Scholar] [CrossRef] [PubMed]
  50. Reynolds, S.L.; Fischer, K. Pseudoproteases: Mechanisms and function. Biochem. J. 2015, 468, 17–24. [Google Scholar] [CrossRef] [PubMed]
  51. Cal, S.; Moncada-Pazos, A.; Lopez-Otin, C. Expanding the complexity of the human degradome: Polyserases and their tandem serine protease domains. Front. Biosci. 2007, 12, 4661–4669. [Google Scholar] [CrossRef] [PubMed]
  52. Chen, L.M.; Skinner, M.L.; Kauffman, S.W.; Chao, J.; Chao, L.; Thaler, C.D.; Chai, K.X. Prostasin is a glycosylphosphatidylinositol-anchored active serine protease. J. Biol. Chem. 2001, 276, 21434–21442. [Google Scholar] [CrossRef] [PubMed]
  53. Scarman, A.L.; Hooper, J.D.; Boucaut, K.J.; Sit, M.; Webb, G.C.; Normyle, J.F.; Antalis, T.M. Organization and chromosomal localization of the murine Testisin gene encoding a serine protease temporally expressed during spermatogenesis. Eur. J. Biochem. 2001, 268, 1250–1258. [Google Scholar] [CrossRef]
  54. Rickert, K.W.; Kelley, P.; Byrne, N.J.; Diehl, R.E.; Hall, D.L.; Montalvo, A.M.; Reid, J.C.; Shipman, J.M.; Thomas, B.W.; Munshi, S.K.; et al. Structure of human prostasin, a target for the regulation of hypertension. J. Biol. Chem. 2008, 283, 34864–34872. [Google Scholar] [CrossRef]
  55. Lee, K.Y.; Zhang, R.; Kim, M.S.; Park, J.W.; Park, H.Y.; Kawabata, S.; Lee, B.L. A zymogen form of masquerade-like serine proteinase homologue is cleaved during pro-phenoloxidase activation by Ca2+ in coleopteran and Tenebrio molitor larvae. Eur. J. Biochem. 2002, 269, 4375–4383. [Google Scholar] [CrossRef]
  56. Piao, S.; Song, Y.-L.; Kim, J.H.; Park, S.Y.; Park, J.W.; Lee, B.L.; Oh, B.-H.; Ha, N.-C. Crystal structure of a clip-domain serine protease and functional roles of the clip domains. EMBO J. 2005, 24, 4404–4414. [Google Scholar] [CrossRef] [PubMed]
  57. Huang, R.; Lu, Z.; Dai, H.; Velde, D.V.; Prakash, O.; Jiang, H. The solution structure of clip domains from Manduca sexta prophenoloxidase activating proteinase-2. Biochemistry 2007, 46, 11431–11439. [Google Scholar] [CrossRef] [PubMed]
  58. Kim, C.H.; Kim, S.J.; Kan, H.; Kwon, H.M.; Roh, K.B.; Jiang, R.; Yang, Y.; Park, J.W.; Lee, H.H.; Ha, N.C.; et al. A three-step proteolytic cascade mediates the activation of the peptidoglycan-induced toll pathway in an insect. J. Biol. Chem. 2008, 283, 7599–7607. [Google Scholar] [CrossRef] [PubMed]
  59. He, Y.; Wang, Y.; Yang, F.; Jiang, H. Manduca sexta hemolymph protease-1, activated by an unconventional non-proteolytic mechanism, mediates immune responses. Insect Biochem. Mol. Biol. 2017, 84, 23–31. [Google Scholar] [CrossRef]
  60. Bork, P.; Beckmann, G. The CUB domain: A widespread module in developmentally regulated proteins. J. Mol. Biol. 1993, 231, 539–545. [Google Scholar] [CrossRef] [PubMed]
  61. Blanc, G.; Font, B.; Eichenberger, D.; Moreau, C.; Ricard-Blum, S.; Hulmes, D.J.; Moali, C. Insights into how CUB domains can exert specific functions while sharing a common fold: Conserved and specific features of the CUB1 domain contribute to the molecular basis of procollagen C-proteinase enhancer-1 activity. J. Biol. Chem. 2007, 282, 16924–16933. [Google Scholar] [CrossRef] [PubMed]
  62. Park, J.W.; Kim, C.H.; Kim, J.H.; Je, B.R.; Roh, K.B.; Kim, S.J.; Lee, H.H.; Ryu, J.H.; Lim, J.H.; Oh, B.H.; et al. Clustering of peptidoglycan recognition protein-SA is required for sensing lysine-type peptidoglycan in insects. Proc. Natl. Acad. Sci. USA 2007, 104, 6602–6607. [Google Scholar] [CrossRef] [PubMed]
  63. Cho, Y.S.; Stevens, L.M.; Sieverman, K.J.; Nguyen, J.; Stein, D. A ventrally localized protease in the Drosophila egg controls embryo dorsoventral polarity. Curr. Biol. 2012, 22, 1013–1018. [Google Scholar] [CrossRef]
  64. Benjamini, Y.; Hochberg, Y. Controlling the false Discovery rate: A practical and powerful approach to multiple testing. J. R. Statist. Soc. B 1995, 57, 289–300. [Google Scholar] [CrossRef]
  65. Keller, M.; Sneh, B.; Strizhov, N.; Prudovsky, E.; Regev, A.; Koncz, C.; Schell, J.; Zilberstein, A. Digestion of δ-endotoxin by gut proteases may explain reduced sensitivity of advanced instar larvae of Spodoptera littoralis to CryIC. Insect Biochem. Mol. Biol. 1996, 26, 365–373. [Google Scholar] [CrossRef] [PubMed]
  66. Zalunin, I.A.; Elpidina, E.N.; Oppert, B. The role of proteolysis in the biological activity of Bt insecticidal crystal proteins. In Bt Resistance—Characterization and Strategies for GM Crops Producing Bacillus thuringiensis Toxins; Soberón, M., Gao, Y., Bravo, A., Eds.; CAB International Publishers: Wallingford, UK, 2015; pp. 107–118. [Google Scholar]
  67. Oppert, B.; Elpidina, E.N.; Toutges, M.; Mazumdar-Leighton, S. Microarray analysis reveals strategies of Tribolium castaneum larvae to compensate for cysteine and serine protease inhibitors. Comp. Biochem. Physiol. D Genom. Proteom. 2010, 5, 280–287. [Google Scholar] [CrossRef] [PubMed]
  68. Martynov, A.G.; Elpidina, E.N.; Perkin, L.; Oppert, B. Functional analysis of C1 family cysteine peptidases in the larval gut of Tenebrio molitor and Tribolium castaneum. BMC Genom. 2015, 16, 75. [Google Scholar] [CrossRef] [PubMed]
  69. Broehan, G.; Arakane, Y.; Beeman, R.W.; Kramer, K.J.; Muthukrishnan, S.; Merzendorfer, H. Chymotrypsin-like peptidases from Tribolium castaneum: A role in molting revealed by RNA interference. Insect Biochem. Mol. Biol. 2010, 40, 274–283. [Google Scholar] [CrossRef] [PubMed]
  70. LeMosy, E.K.; Tan, Y.Q.; Hashimoto, C. Activation of a protease cascade involved in patterning the Drosophila embryo. Proc. Natl. Acad. Sci. USA 2001, 98, 5055–5060. [Google Scholar] [CrossRef] [PubMed]
  71. Muta, T.; Hashimoto, R.; Miyata, T.; Nishimura, H.; Toh, Y.; Iwanaga, S. Proclotting enzyme from horseshoe crab hemocytes. cDNA cloning, disulfide locations, and subcellular localization. J. Biol. Chem. 1990, 265, 22426–22433. [Google Scholar] [CrossRef]
  72. Munier, A.I.; Medzhitov, R.; Janeway, C.A.; Doucet, D.; Capovilla, M.; Lagueux, M. Graal: A Drosophila gene coding for several mosaic serine proteases. Insect Biochem. Mol. Biol. 2004, 34, 1025–1035. [Google Scholar] [CrossRef] [PubMed]
  73. Ewels, P.; Magnusson, M.; Lundin, S.; Käller, M. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 2016, 32, 3047–3048. [Google Scholar] [CrossRef]
  74. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef]
  75. Pertea, M.; Kim, D.; Pertea, G.M.; Leek, J.T.; Salzberg, S.L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 2016, 11, 1650–1667. [Google Scholar] [CrossRef]
  76. Trapnell, C.; Williams, B.; Pertea, G.; Mortazavi, A.; Kwan, G.; van Baren, M.J.; Salzberg, S.L.; Wold, B.J.; Pachter, L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 2010, 28, 511–515. [Google Scholar] [CrossRef] [PubMed]
  77. Pertea, M.; Pertea, G.M.; Antonescu, C.M.; Chang, T.C.; Mendell, J.T.; Salzberg, S.L. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 2015, 33, 290–295. [Google Scholar] [CrossRef] [PubMed]
  78. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef] [PubMed]
  79. Hall, T.A. BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids. Symp. 1999, 41, 95–98. [Google Scholar]
  80. Kaur, S.; Stinson, S.A.; di Cenzo, G.C. Whole genome assemblies of Zophobas morio and Tenebrio molitor. G3 2023, 13, jkad079. [Google Scholar] [CrossRef] [PubMed]
  81. Eriksson, T.; Andere, A.A.; Kelstrup, H.; Emery, V.J.; Picard, C.J. The yellow mealworm (Tenebrio molitor) genome: A resource for the emerging insects as food and feed industry. J. Insects Food Feed 2020, 6, 445–455. [Google Scholar] [CrossRef]
  82. McWilliam, H.; Li, W.; Uludag, M.; Squizzato, S.; Park, Y.M.; Buso, N.; Cowley, A.P.; Lopez, R. Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res. 2013, 41, W597–W600. [Google Scholar] [CrossRef] [PubMed]
  83. Almagro Armenteros, J.J.; Tsirigos, K.D.; Sønderby, C.K.; Petersen, T.N.; Winther, O.; Brunak, S.; von Heijne, G.; Nielsen, H. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat. Biotechnol. 2019, 37, 420–423. [Google Scholar] [CrossRef]
  84. Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E.L. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J. Mol. Biol. 2001, 305, 567–580. [Google Scholar] [CrossRef]
  85. Käll, L.; Krogh, A.; Sonnhammer, E.L. Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucleic Acids Res. 2007, 35, W429–W432. [Google Scholar] [CrossRef]
  86. Lomize, A.L.; Hage, J.M.; Pogozheva, I.D. Membranome 2.0: Database for proteome-wide profiling of bitopic proteins and their dimers. Bioinformatics 2018, 34, 1061–1062. [Google Scholar] [CrossRef] [PubMed]
  87. Paysan-Lafosse, T.; Blum, M.; Chuguransky, S.; Grego, T.; Pinto, B.L.; Salazar, G.A.; Bileschi, M.L.; Bork, P.; Bridge, A.; Colwell, L.; et al. InterPro in 2022. Nucleic Acids Res. 2023, 51, 418–427. [Google Scholar] [CrossRef] [PubMed]
  88. Wang, J.; Chitsaz, F.; Derbyshire, M.K.; Gonzales, N.R.; Gwadz, M.; Lu, S.; Marchler, G.H.; Song, J.S.; Thanki, N.; Yamashita, R.A.; et al. The conserved domain database in 2023. Nucleic Acids Res. 2023, 51, D384–D388. [Google Scholar] [CrossRef] [PubMed]
  89. Gasteiger, E.; Gattiker, A.; Hoogland, C.; Ivanyi, I.; Appel, R.D.; Bairoch, A. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003, 31, 3784–3788. [Google Scholar] [CrossRef] [PubMed]
  90. Katoh, K.; Rozewicki, J.; Yamada, K.D. MAFFT online service: Multiple sequence alignment, interactive sequence choice and visualization. Brief. Bioinform. 2019, 20, 1160–1166. [Google Scholar] [CrossRef] [PubMed]
  91. Trifinopoulos, J.; Nguyen, L.T.; von Haeseler, A.; Minh, B.Q. W-IQ-TREE: A fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 2016, 44, W232–W235. [Google Scholar] [CrossRef] [PubMed]
  92. Mortazavi, A.; Williams, B.A.; McCue, K.; Schaeffer, L.; Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 2008, 5, 621–628. [Google Scholar] [CrossRef] [PubMed]
  93. Chen, C.; Chen, H.; Zhang, Y.; Thomas, H.R.; Frank, M.H.; He, Y.; Xia, R. TBtools: An Integrative Tool kit Developed for Interactive Analyses of Big Biological Data. Mol. Plant 2020, 13, 1194–1202. [Google Scholar] [CrossRef]
  94. Kruskal, W.H.; Wallis, W.A. Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 1952, 47, 583–621. [Google Scholar] [CrossRef]
  95. Statistics Kingdom. Available online: https://www.statskingdom.com/index.html (accessed on 1 April 2024).
Figure 1. Total number of SP and SPH genes found in sequenced genomes of insects from different orders. Data on SP are shaded in blue, data on SPH are in yellow, and undifferentiated data on the sum of SP/SPH genes are shaded in green.
Figure 1. Total number of SP and SPH genes found in sequenced genomes of insects from different orders. Data on SP are shaded in blue, data on SPH are in yellow, and undifferentiated data on the sum of SP/SPH genes are shaded in green.
Ijms 25 05743 g001
Figure 2. Domain organization of 64 trypsins and 10 trypsin-like SPs of T. molitor. Regulatory domains are marked with different shapes and colors. Description for domains: SignalP—signal peptide; TM—transmembrane domain; Clip (B/C/D)—Clip domain; CUB—C1r/C1s, Uegf, Bmp1 domain; TSP—thrombospondin domain; LDL—Low-Density Lipoprotein receptor class A repeat; CBM—Chitin-Binding Domain; SRCR—Scavenger Receptor Cysteine-Rich domain; Fz—Frizzled domain; PAN—Plasminogen-Apple-Nematode domain.
Figure 2. Domain organization of 64 trypsins and 10 trypsin-like SPs of T. molitor. Regulatory domains are marked with different shapes and colors. Description for domains: SignalP—signal peptide; TM—transmembrane domain; Clip (B/C/D)—Clip domain; CUB—C1r/C1s, Uegf, Bmp1 domain; TSP—thrombospondin domain; LDL—Low-Density Lipoprotein receptor class A repeat; CBM—Chitin-Binding Domain; SRCR—Scavenger Receptor Cysteine-Rich domain; Fz—Frizzled domain; PAN—Plasminogen-Apple-Nematode domain.
Ijms 25 05743 g002
Figure 3. Domain organization of 30 chymotrypsin-like peptidases, 18 elastase-like peptidases, and 15 non-annotated peptidases of T. molitor. Regulatory domains are marked with different shapes and colors. Description for domains: SignalP—signal peptide; TM—transmembrane domain; LDL—Low-Density Lipoprotein receptor class A repeat; Sushi—Sushi domain; GD—Gastrulation Defective domain.
Figure 3. Domain organization of 30 chymotrypsin-like peptidases, 18 elastase-like peptidases, and 15 non-annotated peptidases of T. molitor. Regulatory domains are marked with different shapes and colors. Description for domains: SignalP—signal peptide; TM—transmembrane domain; LDL—Low-Density Lipoprotein receptor class A repeat; Sushi—Sushi domain; GD—Gastrulation Defective domain.
Ijms 25 05743 g003
Figure 4. Domain organization of 125 SPHs and 7 polypeptidases of T. molitor. Regulatory domains are marked with different shapes and colors. Description for domains: SignalP—signal peptide; TM—transmembrane domain; Clip—Clip domain; SEA—Sperm protein, Enterokinase, and Agrin domain; LDL—Low-Density Lipoprotein receptor class A repeat; SRCR—Scavenger Receptor Cysteine-Rich domain; Fz—Frizzled domain; Sushi—Sushi domain; EGF—laminin/Epidermal Growth Factor-like domain.
Figure 4. Domain organization of 125 SPHs and 7 polypeptidases of T. molitor. Regulatory domains are marked with different shapes and colors. Description for domains: SignalP—signal peptide; TM—transmembrane domain; Clip—Clip domain; SEA—Sperm protein, Enterokinase, and Agrin domain; LDL—Low-Density Lipoprotein receptor class A repeat; SRCR—Scavenger Receptor Cysteine-Rich domain; Fz—Frizzled domain; Sushi—Sushi domain; EGF—laminin/Epidermal Growth Factor-like domain.
Ijms 25 05743 g004
Figure 5. Phylogenetic analysis of 269 SPs and SPHs of T. molitor. Complete protein sequences were aligned using MAFFT. The phylogenetic tree was built in the IQTREE service. Peptidases in the tree are divided into two groups: (A) (red)—SP and SPH without regulatory domains; (B) (blue)—SP and SPH with regulatory domains (including polypeptidases). For the interpretation of the colors of the identifiers, see the legend above.
Figure 5. Phylogenetic analysis of 269 SPs and SPHs of T. molitor. Complete protein sequences were aligned using MAFFT. The phylogenetic tree was built in the IQTREE service. Peptidases in the tree are divided into two groups: (A) (red)—SP and SPH without regulatory domains; (B) (blue)—SP and SPH with regulatory domains (including polypeptidases). For the interpretation of the colors of the identifiers, see the legend above.
Ijms 25 05743 g005
Figure 6. Heatmaps of stage-specific expression pattern of 269 SP/SPH transcripts of T. molitor. The hierarchical clustering of RPKM values was used to compare the relative expression levels of transcripts from different T. molitor life stages transcriptomes, differentiated into 6 distinct groups. Groups 1–4—SP/SPH without regulatory domains in the propeptide; groups 5–6 have regulatory domains. Group 1 (1a–d) (red)—SPs expressed in feeding stages, group 2 (2a–c) (purple)—SPs expressed at the stages of development, metamorphosis or also at other stages of the life cycle, group 3 (3a–d) (orange)—SPHs expressed at feeding stages; group 4 (4a,b) (yellow)—SPHs expressed at the stages of development, metamorphosis or also at other stages of the life cycle, group 5 (5a–g) (blue)—SPs and SPHs containing clip domains; group 6 (6a–c) (green)—SPs, SPHs, and polypeptidases containing other than clip regulatory domains. The level of mRNA expression is presented as a heatmap from blue to red (log2(RPKM + 1)). The resulting p-values were adjusted using the Benjamini and Hochberg approach [64]. Values p < 0.05 are colored green, indicating the significance of differences in the expression at different stages of T. molitor development; values from 0.05 to 0.1 are colored yellow; values greater than 0.1 are colored red, showing the unreliability of differences in the expression values at different stages of T. molitor development. The colors of SP/SPH names indicate the types of SPs: trypsins (TRY)—blue, trypsin-like, (TRY-like)—light blue, chymotrypsin-like, (CHYM)—purple, elastase-like, (ELA)—orange, (NA)—non-annotated, grey, pSerp—polypeptidases, TM—transmembrane domain. Designations for regulatory domains: Clip-A—brown; Clip-B—blue; Clip-C—light blue; Clip-D—grey-blue; Sushi—green; GD—red; MSP—blue-green; peptidases with several regulatory domains—dark blue. Life cycle stages: E—egg, LII—second instar larvae, LIV—four instar larvae, EP—early pupa, LP—late pupa, M—male, F—female.
Figure 6. Heatmaps of stage-specific expression pattern of 269 SP/SPH transcripts of T. molitor. The hierarchical clustering of RPKM values was used to compare the relative expression levels of transcripts from different T. molitor life stages transcriptomes, differentiated into 6 distinct groups. Groups 1–4—SP/SPH without regulatory domains in the propeptide; groups 5–6 have regulatory domains. Group 1 (1a–d) (red)—SPs expressed in feeding stages, group 2 (2a–c) (purple)—SPs expressed at the stages of development, metamorphosis or also at other stages of the life cycle, group 3 (3a–d) (orange)—SPHs expressed at feeding stages; group 4 (4a,b) (yellow)—SPHs expressed at the stages of development, metamorphosis or also at other stages of the life cycle, group 5 (5a–g) (blue)—SPs and SPHs containing clip domains; group 6 (6a–c) (green)—SPs, SPHs, and polypeptidases containing other than clip regulatory domains. The level of mRNA expression is presented as a heatmap from blue to red (log2(RPKM + 1)). The resulting p-values were adjusted using the Benjamini and Hochberg approach [64]. Values p < 0.05 are colored green, indicating the significance of differences in the expression at different stages of T. molitor development; values from 0.05 to 0.1 are colored yellow; values greater than 0.1 are colored red, showing the unreliability of differences in the expression values at different stages of T. molitor development. The colors of SP/SPH names indicate the types of SPs: trypsins (TRY)—blue, trypsin-like, (TRY-like)—light blue, chymotrypsin-like, (CHYM)—purple, elastase-like, (ELA)—orange, (NA)—non-annotated, grey, pSerp—polypeptidases, TM—transmembrane domain. Designations for regulatory domains: Clip-A—brown; Clip-B—blue; Clip-C—light blue; Clip-D—grey-blue; Sushi—green; GD—red; MSP—blue-green; peptidases with several regulatory domains—dark blue. Life cycle stages: E—egg, LII—second instar larvae, LIV—four instar larvae, EP—early pupa, LP—late pupa, M—male, F—female.
Ijms 25 05743 g006
Table 2. Domain organization and key structure features of 30 chymotrypsin-like SPs of T. molitor.
Table 2. Domain organization and key structure features of 30 chymotrypsin-like SPs of T. molitor.
NameNCBI ID (Protein)Preproenzyme/Mature Enzyme (aa)SignalP
(aa)
Regulatory DomainPropeptide
Cleavage Site
Active SiteS1 SubsiteEnzyme SpecificityMm Mature, DapI
1SerP16CAH138306127524616-H|ITNGHDSSGSChymotrypsin-like25,7493.9
2SerP69ABC8874627523016-R|IISGHDSSGSChymotrypsin-like22,8998.8
3SerP71CAG903501727123521-R|IINGHDSSGAChymotrypsin-like24,3084.1
4SerP303CAG901855328123718-R|ITGGHDSSGAChymotrypsin-like25,0474.2
5SerP129CAH136573726523018-R|IISGHDSGASChymotrypsin-like24,4394.0
6SerP314ABC8874726623216-R|IVGGHDSGASChymotrypsin-like24,4754.2
7SerP7CAG903766527924616-R|IINGHDSGGSChymotrypsin-like25,7073.9
8SerP39CAG902980626722516-R|IIGGHDSGGSChymotrypsin-like23,8384.3
9SerP54CAH137518827623316-R|IIGGHDSGGSChymotrypsin-like24,7364.0
10SerP107WJL9799027723416-R|IIGGHDSGGSChymotrypsin-like25,4284.3
11SerP108CAH137518927623316-R|IIGGHDSGGSChymotrypsin-like24,9873.8
12SerP246CAH137519027523316-R|IIGGHDSGGSChymotrypsin-like24,8223.9
13SerP253CAH136774227724121-R|IIGGHDSGGSChymotrypsin-like25,9984.1
14SerP479WJL9799127623316-R|IIGGHDSGGSChymotrypsin-like25,0124.2
15SerP33WJL9799225621724-R|IVGGHDSGSGChymotrypsin-like22,6184.2
16SerP251CAH137232025523217-R|IIVGHDSGSGChymotrypsin-like24,5765.2
17SerP101ABC8873425823517-R|IVNGHDSGSGChymotrypsin-like25,0146.6
18SerP19WJL9799325222716-R|IVGGHDSSSGChymotrypsin-like23,9004.5
19SerP38QRE0176425822916-R|VVGGHDSGGDChymotrypsin-like24,4105.3
20SerP88ABC8873725822918-R|VVGGHDSGGDChymotrypsin-like24,8965.3
21SerP226WJL9799425822122-R|LIGGHDSGGDChymotrypsin-like23,6064.2
22SerP146CAH138300326222218-R|IVGGHDSGGDChymotrypsin-like23,9934.5
23SerP276KAJ362803428424715-R|IIHGHDSGGDChymotrypsin-like27,4326.9
24SerP301CAH138040124422117-R|IFGGHDSGSDChymotrypsin-like23,6204.1
25SerP368WJL97995247233--R|IFGGHDSAGDChymotrypsin-like24,5604.2
26SerP137CAH137990924821819-K|IVGGHDSAGDChymotrypsin-like23,6835.4
27SerP484CAH136890824722616-R|IVGGHDSAGDChymotrypsin-like24,7345.0
28SerP215CAH138039925423117-R|IFGGHDSGADChymotrypsin-like24,8224.4
29SerP586KAJ3636193270224--L|KDNGHDSTGSChymotrypsin-like24,9615.0
30SerP449 MSPBAG1426463225823LDL (4), Sushi L|IVNGHDSSSGChymotrypsin-like28,7576.4
SignalP—Signal peptide; Mm mature—molecular mass of the mature protein; pI—isoelectric point of the mature protein; SerP—serine peptidase. Regulatory domains: LDL—Low-Density Lipoprotein receptor (IPR002172); Sushi—Sushi-domain (IPR000436). The amino acid residues after which the propeptide is cleaved are highlighted in bold.
Table 3. Domain organization and key structure features of 18 elastase-like SPs of T. molitor.
Table 3. Domain organization and key structure features of 18 elastase-like SPs of T. molitor.
NameNCBI ID (Protein)Preproenzyme/Mature Enzyme (aa)SignalP
(aa)
Regulatory DomainPropeptide
Cleavage Site
Active SiteS1 SubsiteEnzyme SpecificityMm Mature, DapI
1SerP41ABC8876026623319-R|IVGGHDSGISElastase-like25,0064.4
2SerP121CAH136823627423616-R|IIGGHDSGISElastase-like26,2854.5
3SerP144WJL9799626823419-R|IIGGHDSGISElastase-like25,4484.4
4SerP238KAJ363256026423421-R|IVGGHDSGISElastase-like25,3304.2
5SerP441KAJ363256126723422-R|IIGGHDSGISElastase-like25,0724.3
6SerP156CAH138038426723619-R|IINGHDSAVSElastase-like25,3264.6
7SerP94WJL9799726623221-H|IVAGHDSGVNElastase-like24,8744.8
8SerP120WJL9799826823219-H|IILGHDSGVSElastase-like24,9884.7
9SerP288CAH137548326623216-R|IVGGHDSGVSElastase-like24,2594.0
10SerP472CAH138070127223517-R|IVNGHDSSVAElastase-like25,2654.4
11SerP73KAJ363865726723216-R|IINGHDSSVSElastase-like24,4854.1
12SerP74KAH082046126122916-R|IINGHDSSVSElastase-like23,4238.6
13SerP110KAH081365426623116-R|IINGHDSSVSElastase-like24,8314.2
14SerP98CAH136574026723216-R|IINGHDSSVSElastase-like24,9694.2
15SerP751CAH136574126723216-R|IINGHDSSVSElastase-like24,3604.0
16SerP185KAJ362042926623317-R|IINGHDSSTSElastase-like24,7794.9
17SerP155KAJ363264926523516-R|IIGGHDSGFSElastase-like24,9054.4
18SerP85ABC8876126723716-R|IIGGHDSGYSElastase-like25,3644.3
SignalP—Signal peptide; Mm mature—molecular mass of the mature protein; pI—isoelectric point of the mature protein; SerP—serine peptidase. The amino acid residues after which the propeptide is cleaved are highlighted in bold.
Table 4. Domain organization and key structure features of 15 non-annotated SPs of T. molitor.
Table 4. Domain organization and key structure features of 15 non-annotated SPs of T. molitor.
NameNCBI ID (Protein)Preproenzyme/Mature Enzyme (aa)SignalP
(aa)
Regulatory DomainPropeptide
Cleavage Site
Active SiteS1 SubsiteEnzyme SpecificityMm Mature, DapITM (Position)
1SerP18WJL9799925822816-K|IVWGHDSAATNA24,3758.4-
2SerP169CAH137876125722616-K|IVGGHDSGATNA24,3009.9-
3SerP423CAH137231927925717-R|IVNGHDSGGKNA28,2564.5-
4SerP416KAH0820967300-23--HDSQGSNA--277–299
5SerP378WJL9800035725322-K|ISGGHDSRGINA28,5518.1-
6SerP424KAJ363346125022718-R|IIGGHDSRGVNA25,2107.0-
7SerP462KAH0817404257-23--HDSTSFNA---
8SerP653CAH1380361252-16--HDSVADNA---
9SerP355 WJL9800155126719LDL (4), SushiL|IVNGHDSGSTNA29,9805.1-
10SerP1040WJL9800243226322SushiL|IINGHDSSSSNA27,1327.7-
11SerP454CAH138488947625715GDL|ITHGHDSSSVNA28,6427.8-
12SerP442CAH138489056125717GDL|ISYGHDSTGINA28,7507.7-
13SerP466KAJ362855442724723GDK|PANEHDSSGVNA27,6187.3-
14SerP550CAH138012944724918GDL|VLKGHDSGAINA27,9498.9-
15SerP1035CAH138012756824925GDL|VVNGHDSGSVNA27,5829.7-
SignalP—Signal peptide; Mm mature—molecular mass of the mature protein; pI—isoelectric point of the mature protein; TM—transmembrane domain; SerP—serine peptidase. Regulatory domains: LDL—Low-Density Lipoprotein receptor (IPR002172); Sushi—Sushi-domain (IPR000436), GD—Gastrulation Defective domain (IPR031986). The amino acid residues after which the propeptide is cleaved are highlighted in bold.
Table 5. Domain organization and key structure features of seven polypeptidases of T. molitor.
Table 5. Domain organization and key structure features of seven polypeptidases of T. molitor.
NameNCBI ID (Protein)Preproenzyme (aa)SignalP
(aa)
Regulatory
Domain
Propeptide
Cleavage Site
Active SiteS1
Subsite
Enzyme Specificity
1pSerP448WKK2989189220Sushi (2)L|IVGGHDSSSGChymotrypsin-like
Sushi (2)L|IVKGHDASSASPH
2pSerP900CAH138058989122Sushi (2)L|IVGGHDSSSGChymotrypsin-like
Sushi (2)L|IVKGHDASSASPH
3pSerP333CAH138242489124Sushi (2)L|IVSGHDSSSGChymotrypsin-like
Sushi (2)L|IVNGRNVFQVSPH
4pSerP382WKK2989283723Sushi (2)L|IVGGHDSSAGChymotrypsin-like
Sushi (2)L|IIGGQDRISGSPH
5pSerPH608WKK2989389523Sushi (2)L|IVGGHDGSSGSPH
Sushi (2)L|IIGGYDGSFTSPH
6pSerP614WKK29894134724Sushi (2)L|IVNGHDSSSAChymotrypsin-like
Sushi (2)L|IINGHDGSSSSPH
SushiL|IVNGQDSASASPH
7pSerP1050 NudelCAH13743461830TM (58–80)LDL (7)R|VVGGHDSDGGTrypsin
N|ITSQTEDDSASPH
SignalP—Signal peptide; pSerP(SerPH)—serine (serine peptidase homolog) polypeptidase. Regulatory domains: LDL—Low-Density Lipoprotein receptor (IPR002172); Sushi—Sushi domain (IPR000436). Replacements in the active center are marked in grey. The amino acid residues after which the propeptide is cleaved are highlighted in bold.
Table 6. T. molitor SP/SPH transcripts with the highest expression levels at the egg stage compared to other stages.
Table 6. T. molitor SP/SPH transcripts with the highest expression levels at the egg stage compared to other stages.
Expression, RPKM
Sequence NameRegulatory DomainsActive SiteS1 SubsiteAnnotation of SequenceEggsLarvae IILarvae IVEarly PupaeLate PupaeMalesFemales
SerPH236Clip_AHDGDGGSPH10012726165789
SerPH235Clip_AHDGDGGSPH518163603344471
SerPH203Clip_BHDGDGASPH357000000
SerP166Clip_BHDSDGGTrypsin344000000
SerPH165Clip_AHDGDGGSPH331413422
SerP116Clip_BHDSDGGTrypsin3292363139
SerP145Clip_CHDSDGGTrypsin150533761895357
SerP28-HDSDGGTrypsin147114133218601
SerP466GDHDSSGVN/A122102637139662
SerP61Clip_CHDSDGATrypsin-like119414441602332
SerP156-HDSAVSElastase-like6568321010611391
SerP454GDHDSSSVN/A61652522932423
SerP550GDHDSGAIN/A61563234541325
SerP442GDHDSTGIN/A53191612452421
SerPH389 ScarfaceClip_A_HDYDDGSPH512002942272743
SerP5-HDSDGGTrypsin471700900
SerP22-HDSDGGTrypsin34607000
Bold indicates RPKM values for the egg stage.
Table 7. T. molitor SP/SPH transcripts with the highest expression levels at the early and late pupal stages compared to other stages.
Table 7. T. molitor SP/SPH transcripts with the highest expression levels at the early and late pupal stages compared to other stages.
Expression, RPKM
Sequence NameRegulatory DomainsActive SiteS1 SubsiteAnnotation of SequenceEggsLarvae IILarvae IVEarly PupaeLate PupaeMalesFemales
SerPH164Clip_AHDGDGGSPH61085881237278
SerPH1034-QNTEEKSPH973570923153
SerP247Clip_CHDSDGGTrypsin97348651611934
SerP145Clip_CHDSDGGTrypsin150533761895357
SerPH159Clip_AHDGDGASPH299105601213770
SerPH78Clip_AHDGDGGSPH27241158701
SerP35-HDSDGGTrypsin40058010
SerP228Clip_CHDSDGGTrypsin1703552801514
SerPH364 SEA; EGF; LDL; SRCRSDEDRRSPH1552752271629
SerPH243Clip_AHDGDGASPH42226751434451
SerP55 TequilaCBM (3), LDL (3), SRCR (2) PAN HDSDGGTrypsin5102551584640
SerP113Clip_BHDSDGGTrypsin27358249754534
SerPH223Clip_AHDGDGGSPH44157346861933
SerPH216Clip_AHDGDGGSPH255046736431
SerP11 TSPTSP (2)HDSDGGTrypsin5231444722510
SerP28-HDSDGGTrypsin147114133218601
SerP247Clip_CHDSDGGTrypsin97348651611934
SerP466GDHDSSGVN/A122102637139662
SerPH164Clip_AHDGDGGSPH61085881237278
SerPH159Clip_AHDGDGASPH299105601213770
SerPH415Clip_AHDGDGGSPH470341411800
SerP15Clip_DHDSDGGTrypsin3514293511310
SerP156-HDSAVSElastase-like6568321010611391
SerPH680-HNISGTSPH177115309861105
SerPH589Clip_CRDSDGASPH014741944339
SerP454GDHDSSSVN/A61652522932423
SerPH1034-QNTEEKSPH973570923153
SerP145Clip_CHDSDGGTrypsin150533761895357
SerPH223Clip_AHDGDGGSPH44157346861933
SerPH618-SDGVQGSPH2627028500
Bold indicates RPKM values for the pupae stages.
Table 8. T. molitor SP/SPH transcripts with the highest expression levels at the feeding stages compared to other stages and IV instar larvae gut.
Table 8. T. molitor SP/SPH transcripts with the highest expression levels at the feeding stages compared to other stages and IV instar larvae gut.
Expression, RPKM
NameActive SiteAnnotation of
Sequence
S1 SubsiteEggsLarvae IILarvae IVEarly PupaeLate PupaeMalesFemalesLarval IV Gut
SerP1HDSTrypsinDGG767516,880602563264610,480
SerP69HDSChymotrypsin-likeSGS0372574007826954107
SerP108HDSChymotrypsin-likeGGS061512092001911526195
SerP54HDSChymotrypsin-likeGGS02551934002072992102
SerP314HDSChymotrypsin-likeGAS012071087004133125564
SerP38HDSChymotrypsin-likeGGD04976000261517
SerP209HDSTrypsinDGG007650000494
SerP303HDSChymotrypsin-likeSGA045073600718770419
SerP41HDSElastase-likeGIS00678004032948062
SerP16HDSChymotrypsin-likeSGS0119351200114270151
SerP246HDSChymotrypsin-likeGGS011481002041511
SerP85HDSElastase-likeGYS01584650093991394
SerP185HDSElastase-likeSTS0042500117722417
SerP71HDSChymotrypsin-likeSGA03037800147100991
SerP253HDSChymotrypsin-likeGGS0160331627256271517
SerP156HDSElastase-likeAVS6568321010611391220
SerP39HDSChymotrypsin-likeGGS0522200018771324718
SerP251HDSChymotrypsin-likeGSG00217001061164
SerP74HDSElastase-likeSVS04146006468615
SerP288HDSElastase-likeGVS0371400078811011681
SerPH219SDVSPHGIS0641243002822953687
SerPH384QDSSPHGIS009850052981813
SerPH237QDGSPHSIS009651010355
SerPH239QDGSPHSIS008000002733
SerPH122HDTSPHGLS00411501031251223
SerPH493QDISPHGVS09347008935991
SerPH562QDSSPHGIS08264005167576
SerPH245HDTSPHGMT00225003741510
SerPH290QDMSPHGRS06207004135106
SerPH136QDTSPHGLS07185001718490
Bold font indicates the RPKM values for larvae IV and larval IV gut. Shaded are replaced amino acids in the catalytic triad of the SPH.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhiganov, N.I.; Vinokurov, K.S.; Salimgareev, R.S.; Tereshchenkova, V.F.; Dunaevsky, Y.E.; Belozersky, M.A.; Elpidina, E.N. The Set of Serine Peptidases of the Tenebrio molitor Beetle: Transcriptomic Analysis on Different Developmental Stages. Int. J. Mol. Sci. 2024, 25, 5743. https://doi.org/10.3390/ijms25115743

AMA Style

Zhiganov NI, Vinokurov KS, Salimgareev RS, Tereshchenkova VF, Dunaevsky YE, Belozersky MA, Elpidina EN. The Set of Serine Peptidases of the Tenebrio molitor Beetle: Transcriptomic Analysis on Different Developmental Stages. International Journal of Molecular Sciences. 2024; 25(11):5743. https://doi.org/10.3390/ijms25115743

Chicago/Turabian Style

Zhiganov, Nikita I., Konstantin S. Vinokurov, Ruslan S. Salimgareev, Valeriia F. Tereshchenkova, Yakov E. Dunaevsky, Mikhail A. Belozersky, and Elena N. Elpidina. 2024. "The Set of Serine Peptidases of the Tenebrio molitor Beetle: Transcriptomic Analysis on Different Developmental Stages" International Journal of Molecular Sciences 25, no. 11: 5743. https://doi.org/10.3390/ijms25115743

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop