*2.3. Cloning and Expression of Cryptoxin-1*

After the venom proteotranscriptomic analysis, three putative unknown toxins that showed best proteome coverage (Ciheringi14246, Ciheringi38643, and Cryptoxin-1) were selected to be cloned and expressed in the recombinant form to study their biological activities. However, after initial evaluations, only the protein Cryptoxin-1 was obtained in a soluble form and with a satisfactory yield (8.5 mg/L of culture) after its expression in *E. coli*. This putative toxin, Cryptoxin-1, was characterized as described below.

Cryptoxin-1 is composed of 119 amino acids (Figure 4), with a predicted molecular mass of 12,769.33 Da, and a theoretical pI of 5.76. It also showed a predicted signal peptide, indicating that this toxin is secreted. In addition, it showed a GRAVY (Grand average of hydropathicity) index of −0.392, indicating its hydrophilic characteristics and the absence of a predicted glycosylation site.


**Figure 4.** Nucleotide sequence of Cryptoxin-1 found in the transcriptome and its amino acid translation. The nucleotides in bold were changed to optimize expression in *E. coli*. The coverage of peptides found in the proteome is highlighted in grey. Underlined amino acids indicate the predicted signal peptide (SignalP-5.0), (\*) indicates the stop codon.

> The cDNA of Cryptoxin-1 was codon-optimized and cloned into the pET-24b (+) expression vector and then transformed into *E. coli* BL21 (DE3). The SDS-PAGE protein expression analysis revealed a single major band at around 16 kDa (Figure 5a, line 3). The

mass spectrometry analysis (MALDI-TOF-MS) of purified Cryptoxin-1 showed a molecular mass of 14,138.5 Da (Figure 5c), which corresponds to the combination of Cryptoxin-1 (12,769.33 Da), a C-terminal tail of six histidines for IMAC purification, and additional residues encoded by the cloning vector (1360.67 Da). Its expression was also confirmed by polyclonal anti-histidine antibody immunoblotting, as shown in Figure 5b.

**Figure 5.** (**a**) 12% SDS-PAGE gel stained with Coomassie Brilliant blue. MW—Molecular mass marker in kDa; 1—*E. coli* sediment before IPTG induction; 2—*E. coli* sediment after IPTG induction; 3—Cryptoxin-1 purified by nickel-sepharose's affinity. (**b**) Recognition of the recombinant protein by immunoblotting using the polyclonal antibody anti-histidine (Sigma– Aldrich, St. Louis, MO, USA). Arrow indicates protein height; (**c**) MALDI-TOF—Mass spectrometry analysis of purified Cryptoxin-1 showing its molecular mass of 14,138.51 Da. (**d**) ELISA, IgG anti—*C. iheringi* venom against crude venom, Cryptoxin-1, and GST (unrelated recombinant protein). Fixed antibody dilution used was 1:200 (7.5 ug/mL) versus serial protein dilution starting at 1 μg/mL.

To further confirm the presence of Cryptoxin-1 in the venom, its recombinant form, as well as the whole venom and a recombinant non-related protein (negative control) glutathione protein S-Transferase (GST) from *Schistosoma mansoni* were tested by ELISA, using a purified polyclonal IgG anti- *C. ihering'*s venom. As can be seen in Figure 5d, the

anti-venom IgG recognized both the venom and Cryptoxin-1, while the negative control (GST) was not recognized.

#### *2.4. Crypotoxin-1 Induces Edema in Mice Footpad*

As previously demonstrated, local tissue inflammation is one of the deleterious effects in *C. iheringi* envenomation [25]. Thus, we evaluated the local injury induced by Cryptoxin-1 injection in the footpad of BALB/c mice.

Mice were injected, through the right footpad, with either PBS (negative control), 45 μM of recombinant proteins Cyptoxin-1, or GST (negative protein control). The edema was measured through the thickness of the footpad at different time intervals, including: 1, 24, 48, and 72 h. The group injected with Cryptoxin-1 experienced a marked presence of edema during all the measurement times with a statistical difference when compared to the control groups (Figure 6a).

**Figure 6.** (**a**) Cryptoxin-1 induced footpad edema in mice. Groups of BALB/c mice were injected with 30 μL (45 μM) of Cryptoxin-1, GST (negative protein control), or 30 uL PBS (negative control). Edema was determined by thickness difference, at times 1, 24, 48, and 72 h. The results represent the ± S.E.M compared with the negative control group (Cryptoxin-1 vs PBS and Cryptoxin-1 vs GST), (*n* = 5). Statistical analysis was performed by ANOVA, followed by the Bonferroni test, \*\*\* *p* < 0.0001. (**b**) Histological analysis of the footpad of mice at 24 h after protein injection or PBS. All samples were analyzed with hematoxylin and eosin staining. 1. and 2.: Cryptoxin-1, bar 20 μm (40×) and 10 μm (100×) respectively; 3. and 4.: GST, bar 20 and 10 μm respectively; 5. and 6.: PBS, bar 20 and 10 μm, respectively. Neutrophilic inflammatory infiltrates (arrow). The images are representative of five mice/groups.

The cellular infiltration was then analyzed using the histological sections. Twenty-four hours after the injection, the GST and PBS groups presented normal tissue without an excess of inflammatory infiltration (Figure 6b, images 3, 4, 5, and 6). In contrast, 24 h after the Cryptoxin-1 injection, we observed the predominance of neutrophilic inflammatory infiltration (Figure 6b, images 1. and 2.).

#### *2.5. Cryptoxin-1 Induces Potent Neutrophil Migration in Mice Footpad*

Since we verified the peak of the edema induced by Cryptoxin-1 injection, as well as neutrophil infiltration in the histological analysis at 24 h, we confirmed this cellular profile by flow cytometry. Thus, at the peak of the edema (24 h), cellular suspensions were prepared from the footpad of the different mice groups and stained with anti-CD45, anti-CD11b, and anti-Ly6G mAbs conjugated to fluorochromes followed by flow cytometry. As shown in Figure 7, Cryptoxin-1 induced a significant level of neutrophils infiltration compared to that achieved in the other groups.

**Figure 7.** Neutrophil migration in the footpad of BALB/c mice injected with Cryptoxin-1, GST (45 μM), or PBS. (**a**) Flow cytometry gate strategy. Cells suspensions were prepared from footpad macerates after 24 h of the injection. Samples of cells (1 <sup>×</sup> <sup>10</sup><sup>6</sup> cells) were incubated with anti-CD 45- APC, anti-CD 11b-PE-Cy7, and anti-Ly6G (PE) antibodies followed by flow cytometry analysis. (**b**) The mean of the percentage of CD45+CD11b+Ly6G+ cells of individual mice/group (*<sup>n</sup>* = 5) <sup>±</sup> S.E.M. Statistical analyses was performed by ANOVA, followed by Bonferroni test, \*\*\* *p* < 0.05 Cryptoxin-1 group compared with PBS or GST groups.

#### **3. Discussion**

Centipedes are well adapted to urban areas and are very commonly found in gardens and other residential areas. As a consequence, there is a great risk of accidents occurring for humans [2,13,26]. Although their venom may cause undesirable effects, centipedes have been used in traditional eastern medicine for centuries [11]. However, individual substances have rarely been refined [27,28].

The vast majority of studies of centipede venom are restricted to the *Scolopendra* genus [8–10,27,29–33]. In addition, some studies use the whole centipede instead of the whole venom for their proteomics analyses, making a more specified comparison inviable [34,35]. Five comparative studies have demonstrated that centipede venoms are complex cocktails, encompassing more than 60 phylogenetically distinct protein families [10,32,36–38]. Among them, there exist, enzymes, protease inhibitors, a great diversity of cysteine-rich proteins, and unknown proteins that are yet to be functionally characterized. Therefore, in this study, we aimed to contribute to the understanding of the toxin genes present in centipedes by generating a gene expression profile of the venom gland of *Cryptops iheringi* species.

Since literature for this species is scarce, we followed the transcriptome and proteomic approaches that were effective to identify toxins for other related species. In this regard, Ward, et al. (2018) [33], using these techniques, were able to identify 39 new toxins in the venom gland of the *Scolopendra viridis*, while Liu, et al. (2020) [8] found more than 400 toxinlike unknown sequences in the venom gland of *Scolopendra mojiangica.* Similarly, we found as high as 57.9% of the proteins to be uncharacterized from the *C. iheringi* centipede and 454 protein sequences that could only be characterized as putative unknown toxins or known toxins due to the proteomic approach. Among them, 263 proteins showed no similarity with the available sequences in public databases, indicating a great diversity of components with an unknown structure and function.

The putative venom toxins of *C. iheringi* revealed diversely distributed proteins with novel structures and biological activities that need to be further investigated. The majority of the venom proteins are putatively functional enzymes. Most notably, lipases and other hydrolases (8.8%), which include a large group of different proteins, such as phospholipases, are frequently reported as venom components of several other arthropods, such as centipedes, spiders, and scorpions [25,34,39–43], contributing to prey digestion and venom toxicity [42].

Trypsin domain proteins were also found in this venom (5.8%). Food protein degradation is crucial for digestion and is catalyzed by trypsin enzymes. Trypsin appeared early in evolution, and it became the most abundant proteinase in the digestive systems of invertebrates [44]. Trypsin performs two main functions, namely, the hydrolysis of protein and the activation of other digestive proteases, although it also plays a role in the innate immunity of these animals [45]. Some trypsin domains proteins have also been found in the centipede *S. subspinipes dehaani* venom gland transcriptome [29].

Peptidases (4.6%) were found to be another relevant group, comprised of endopeptidases, carboxypeptidases, and esterases that are among the reported protein components of some centipedes [30]. These kinds of proteins have an effect on amino acid production for digestive purposes and may be responsible for the tissue deleterious effects of the envenomation [46]. Several classes of peptidases, for which activities were not yet clarified, have also been found in the venom proteome and transcriptome of the scorpion *Hadrurus spadix* [47].

Putative neuron cell adhesion toxins were also present in this venom (4%). Findings on black widow spider venom indicate that such toxins can modulate a neuronal adhesion receptor, which stimulates strong neuronal exocytosis in vertebrates, and, interestingly, may perform functions in synapse development [48,49]. In the context of venom activity, interesting studies exist showing that a toxin from the snakes *Bothrops atrox* and *Bothrops moojeni* is capable of improving spatial memory disorder in temporal ischemic rats through its effects on the neural cell adhesion molecule [50]. Regarding the other putative venom toxin found here, more studies are necessary to propose and define its function in the venom of arthropods, especially from *C. iheringi*.

In addition to the whole venom proteome, *C. ihering'*s crude venom was subjected to a protein separation by SDS-PAGE to better visualize the main bands and their relative

expression. In this gel, the venom showed an electrophoretic profile with a wide range of proteins between 15 and 200 kDa, with a large amount located above 70 kDa. The main bands of the venom were excised from the gel and subjected to LC-MS/MS mass spectrometry, in a strategy successfully utilized for the proteome decomplexation of other venoms [51]. The proteomic analysis returned several peptides with good spectra quality, which allowed us to classify 11 toxins, five of which ranged in size between 17 and 37 kDa, whose sequences did not show any similarity to the public databases, and therefore, represent new *C. iheringi* specific toxins. In order to unravel the function of unknown putative toxins, one, named Cryptoxin-1, was cloned and expressed in *E. coli.* The presence of this toxin in the venom was further confirmed by ELISA, where it was strongly recognized by IgG anti- *C. iheringi*'s venom, indicating its presence in the venom.

As it is known from the literature, envenomation by centipedes usually causes pain, erythema, and edema formation in humans and mice [13]. However, the characterization of the inflammatory activities induced by the venom is poorly described in the literature. For the centipede *C. iheringi*, there is only one published article showing that the venom induces strong pro-inflammatory activity able to induce edema and nociception, in addition to being myotoxic for mice [25]. Similarly, previous studies showed that the crude venom of the centipede species *S. viridicornis*, and *O. pradoi* induced edema in mice's footpads, which progressively diminished by 72 h [25,52]. In this respect, the injection of Cryptoxin-1 into mice's footpads was able to cause an edema of rapid evolution and progressive decay after 72 h. In addition, the injected animals were prostrate, bristly, with low temperature, and showed erythema at the injection site (data not shown).

After considering the results obtained with the edematogenic activity, we performed a histological analysis of the mice footpad injected with Cryptoxin-1 to characterize the cellular influx in the peak of the edema. The histological sections demonstrated the predominance of neutrophil infiltration, and flow cytometry analysis confirmed this result. Following these findings, Fung et al. (2011) [26] reported that 40% of patients who have been admitted to Hong Kong Emergency Hospital with centipede bites (species not specified), showed an increased neutrophil-predominant leukocytosis in their blood tests with an edema and erythema at the bite site, and strong pain. We also observed that the neutrophil infiltration lasted up to 72 h, which was also reported for the crude venom of *S. viridicornis* [53]. Taking these observations together, the results indicate that Cryptoxin-1 may contribute to the symptoms observed in envenomation. It is important to point out that in all the experiments, the recombinant GST, which was subjected to the same expression and purification procedures as Cryptoxin-1, was used as a non-related protein control to exclude any effect related to the protein purification steps.

Kinetics cellular infiltrate studies show that neutrophils are the first inflammatory cells to reach the lesion site and that edematogenic activity may occur due to the neutrophils release of cytokines, prostaglandin, myeloperoxidase, bradykinin, and histamine, causing increased vasodilation and the permeability of small vessels, resulting in the migration of other cells to the local tissue [54,55]. Although it is already known that innate immune cells participate in the local inflammatory response [27], the correlation between the local edema induced by *C. iheringi* venom and cellular infiltration is not completely understood. Therefore, further investigation is necessary to elucidate the complex interplay of the toxins present in its venom.

In this work, we described the profile of toxins present in the *C. iheringi* venom gland using transcriptome and proteome approaches that may contribute to understanding the venom composition and its effects in envenomation. In addition, new toxin genes were identified that may allow for the characterization of their role in this venom, and possibly for other toxins in related species. Furthermore, a new recombinant toxin named Cryptoxin-1 was also characterized as showing a proinflammatory activity, suggesting that it is likely to be one of the components responsible for the envenomation symptoms observed in accidents with humans. Additional studies are being conducted with this toxin as well as the other unknown toxins to understand their role in envenomation. Keeping this in

mind, we understand the potential of novel developments for further studies concerning this centipede species and its venom.

#### **4. Materials and Methods**

#### *4.1. Specimen Collection and Venom Extraction*

Seven *C. iheringi* adult specimens were collected in the metropolitan area of the city of São Paulo, Brazil with the permission of SISBIO (15222-2) and kept in the Arthropod Laboratory of the Butantan Institute. To obtain the venom, the animals were anesthetized by anoxia, and the venom was extracted through electrical discharges (12 V) in the ventral region of the head (coxo sternum) with an electroshock device. The venom obtained through the forcipules was aspirated with an automatic micropipette and deposited in a microcentrifuge tube in an ice bath. The venom obtained was stored at −80 ◦C for a subsequent proteome analysis. The extraction was performed every 30 days.

## *4.2. RNA Isolation, Library Preparation, and Illumina Sequencing*

The heads of seven specimens of *Cryptops iheringi* were submitted for the dissection the of venom glands for transcriptomics. The total RNA was extracted with TRIZOL Reagent (Invitrogen, Life Technologies Corp., Carlsbad, CA, USA), a method based on the procedure described by Chomczynski et al. (1987) [56]. The total RNA was quantified by its absorbance at a wavelength of 260 nm in a NanoDrop 2000 device (Thermo Fisher Scientific, Waltham, MA, USA). Beginning with an amount of total RNA ranging from 75 to 77 μg for each sample, the purification of mRNA was performed through an affinity to magnetic microspheres containing oligo (dT), using the protocol of the Dynabeads® mRNA DIRECT kit (Invitrogen, Life Technologies Corp.), with reagents to reduce the number of ribosomal RNA (rRNA). The quantification of mRNA was performed using the Quant-iT RiboGreen® reagent (Invitrogen, Life Technologies Corp.), according to the manufacturer's specifications. All RNA procedures were performed using RNAse-free tubes and tips with a filter and water, treated with diethylpyrocarbonate (DEPC, Sigma–Aldrich, St. Louis, MO, USA). After the extraction of mRNA, its integrity was assessed using the 2100 Bioanalyzer, pico chip series (Agilent Technologies Inc. Santa Clara, CA, USA). The mRNA was then subjected to a purification and concentration step using the MinElute® PCR Purification Kit (Qiagen) protocol. To confirm that mRNA was not lost of during this purification and concentration step, a further quantification of the mRNA was performed through its absorbance at a wavelength of 260 nm in a NanoDrop 2000 device (Thermo Fisher Scientific, Waltham, MA, USA).

A cDNA library was generated by TruSeq RNA Sample Prep Kit protocol (Illumina, San Diego, CA, USA). The cDNA was synthesized from fragmented mRNA using random hexamer primers, followed by ligation with appropriate sequencing adaptors. The size distribution of the cDNA libraries was measured with a 2100 Bioanalyzer using DNA1000 assay (Agilent Technologies Inc. Santa Clara, CA, USA). An ABI StepOnePlus Real-Time PCR System with *KAPA* Library *Quantification* was used for library sample quantification before sequencing. The cDNA library was then sequenced on Illumina HiSeq 1500 System, in a Rapid Run mode in a 2-lane paired-end flowcell, run for 300 cycles, generating 2 × 151 bp paired-end reads for each fragment, according to the manufacturer's protocol (Illumina).

#### *4.3. RNA-Seq Raw Data Pre-Processing, De Novo Assembly, and Functional Annotation*

After large-scale sequencing of the cDNA, using Illumina HiSeq1500 equipment, bioinformatics analyses were performed. Thus, the sequencing platform generated sequencing images, which were converted to BCL format, after the CASAVA software was used to demultiplex the samples through the identification of the indexes (barcodes). The demultiplexing step generates the FASTQ file format, with a quality control of Q30.

For the pre-processing of the reads, an in-house pipeline was used to analyze the raw reads with a read filter by quality, eliminating reads with homopolymer and low complexity regions, poly-A/T/N tails, removal of adapters, indexes, and low-quality edges using the software FASTQ-mcf, [57] and bowtie2 [58]. The criteria used for filtering were as follows: the removal of homopolymer regions and a low complexity above 90% of the sequence, trimming tip regions with an average quality lower than 25. Only reads at a minimum size of 40 bp were kept. The raw reads were filtered by PhiX contaminants using the software Bowtie2 [58] standard parameters.

The transcriptome was assembled using the rnaSPAdes [59] with a K-mer size of 55.

The TransDecoder software version 3.0.1 (http://transdecoder.sourceforge.net/; accessed on 15 January 2018) was used to identify Open Reading Frames (ORFs) from the assembled transcripts with protein lengths higher than 60 amino acids. The program SignalP version 5.0 [60] was used for signal peptide predictions.

The completeness of the transcriptome was also estimated by the presence of sequences belonging to the set of ultraconserved eukaryotic proteins, tested using the BUSCO approach based on metazoa database [61].

Using TSA/NCBI, we downloaded the transcriptome assemblies from 10 species from the Scolopendromorpha orders (Table 2) (*Cryptops anomalans* (GERT01.1), *Hemiscolopendra marginata* (GHBY01.1), *Scolopendra alternans* (GASK01.1), *Scolopendra cingulate* (GCAP01.1), *Scolopendra dehaani* (GBIM01.1), *Scolopendra morsitans* (GHKQ01.1), *Scolopendra subspinipes* (GGDW01.1), *Scolopendra virirdis* (GGNE01.1, *Scolopocryptops rubiginosus* (GCIY01.1), *Scolopocryptops sexspinosus* (GHBZ01.1)) summarizing 106197 transcripts used to create the database for Blast alignment. The C. iheringi were aligned against the Scolopendromorpha database using the BlastN alignment tool with a cutoff of 1 × <sup>10</sup><sup>−</sup>15.

The predicted amino acid sequences were aligned using the BLASTx and BLASTp programs [62] against NCBI's Uniprot/Swissprot protein databases, and Transcriptome Shotgun Assembly (TSA), to access sequence similarity with proteins in other species with a cutoff e-value of 1 × <sup>10</sup>−5. The hmm search tool [63] allowed us to identify the conserved PFAM domains [64], with a cut-off e-value < 1 × <sup>10</sup>−3. The priority order of the UniProt/Swissprot, PFAM, and TSA-NCBI protein hits was used to select the best candidate for each transcript.

The sequencing reads were aligned against the *C. iheringi* transcriptome with the bowtie2 program [58]. The method was used to estimate the transcript abundance. Further computing of the abundance for each transcript was performed by RSEM [65], along with a Maximum Likelihood abundance estimate, using the Expectation-Maximization algorithm for its statistical model. Final abundance estimates were calculated as Expected counts, Fragments Per Kilobase of exons per Million fragments mapped (FPKM) and Transcripts Per Million (TPM) values. Functional annotation was performed using the Blast2GO program [66], which is a tool used for analyzing a set of sequencing tags that makes it possible to understand the physiological meaning of a large number of genes. Transcript sequences were used as input sequences for the Blast2GO program. BLASTx was used to find counterparts in the NCBI database NR with a cut-off value of 1 × <sup>10</sup>−5. Furthermore, the analysis was performed using the first 20 hits, a minimum alignment length of 33 amino acids, and a low complexity filter activation. The program then extracted the Gene Ontology (GO) terms for each hit obtained by mapping the existing annotation associations, after an annotation rule assigns the GO term to the sequence in question. After the BLAST, mapping, and annotation steps, the graphs, tables, and organization charts provided by the program were analyzed. For the distribution data of the GO terms provided by the program, tables with raw data were used instead of the graphs provided, since this allowed for greater formatting flexibility for the presentation of the data.

Bioinformatics analyses were performed using the computational infrastructure of the Center of Toxin, Immune response and cell signaling (CeTICS), and the Bioinformatics and Computational Biology Core in the Butantan Institute. The raw data generated in this project was deposited in the NCBI BioProject section under the accession code PR-JNA763193, BioSample SAMN21432369 and SRA SRR1608688.This Transcriptome Shotgun Assembly was deposited in NCBI TSA under the accession GJOG00000000.
