**2. Results**

*2.1. Bioinformatic and Phylogenetic Analysis on Dromaserpin Sequence*

The composite full-length Dromaserpin cDNA sequence was 1371 nucleotides long with a single open reading frame (ORF) of 1209 bases. The 5 non-coding region was 27 bp, and the 3 non-coding region was 135 bp (data not shown). The ORF coded for a protein with 403 amino acid residues, including a 21 residues signal peptide, which suggests the protein is secreted and acts extracellularly (Figure 1).

**Figure 1.** Analysis of the predicted amino acid sequence of Dromaserpin. The predicted signal peptide is underlined. N-glycosylation and O-glycosylation sites are identified by hashtag and asterisk, respectively. Conserved s3a domain is boxed and conserved reactive center loop (RCL) domain is indicated between square brackets. A poly-histidine tag and two residues, indicated in bold, were added to the C-terminal of the sequence. Numbers indicate amino acid residue positions in the sequence. The rDromaserpin sequence extends from residue 22 to residue 409. In the rDromaserpin, the signal peptide was removed and changed by a methionine.

The predicted rDromaserpin has 391 amino acid residues, including the histidine tag sequence, with a theoretical molecular weight of 43,159.27 Da. The predicted protein has a single potential N-glycosylation site at the position N109, and five O-glycosylation sites at the positions T205, S314, S360, S363 and S366 (Figure 1). The predicted protein presents the serine protease inhibitor-associated domains s3a and RCL (Figure 1) and, indeed, its membership in the serpin superfamily was confirmed by SMART software (Figure 1). Since the RCL consensus critical residues for serpin inhibitory activity were previously described [29], we aligned the predicted RCL of the newly identified serpin with the corresponding amino acid sequence of inhibitory and non-inhibitory serpins (Figure 2).

**Figure 2.** Comparison between RCL sequence in Dromaserpin and inhibitory and non-inhibitory serpins. RCL amino acid sequences alignment between α-Antitrypsin (Uniprot: P01009) and Antithrombin-III (P01008) from human, Ovalbumin from chicken (Uniprot: P01012), and Dromaserpin was obtained using ClustalW. The consensus critical residues for inhibitory activity have been described elsewhere [29]. P notation was applied according to a pervious study [30].

The RCL sequence extends from P17 to P4 . P residues are numbered from the cleavage site to the C-terminus. P residues are numbered from the cleavage site to the N-terminus. Dromaserpin's RCL was similar to both inhibitory serpins α-1 Antitrypsin and Antithrombin-III, unlike the non-inhibitory serpin ovalbumin (Figure 2). Particularly, the consensus restudies described as crucial for the inhibitory activity were present in Dromaserpin among these: Glu347, Gly348, Thr349, Ala351, and were perfectly conserved, except Val354. Comparison of the aligned RCLs in Figure 2 hypothesizes that Dromaserpin may be an inhibitory serpin.

As proteins sharing sequence similarities are quite likely to have similar or close activity [31], we aligned the Dromaserpin amino acid sequence with 25 tick serpins (Table S1). These serpins have >30% homology with the Dromaserpin and most have been experimentally proven to act as anti-hemostatic proteins [32]. Sequence comparison showed that amino acid patterns related to the inhibitory propriety (mainly the RCL domain and s3a domain) were present in Dromaserpin and were conserved among all the compared tick serpins (Figure 3b). In the evolutionary analyses, the resulting phylogenetic tree revealed two separate groups of serpins (Figure 3a). Dromaserpin was clustered together with the larger one, composed of 18 serpins (Figure 3a). Dromaserpin was closer to the serpins of genus *Rhipicephalus*. Indeed, the BLASTP search identified two tick serpins, from *Rhipicephalus* genus, whose sequences were very similar to the Dromaserpin sequence.

Dromaserpin shares the highest sequence identity 83.33% (99% coverage) with RHS-1 (accession: AFX65224.1), an anticoagulant serpin from *Rh. haemaphysaloides* presenting anti-chymotrypsin activity [32]. Dromaserpin presents 81.94% identity (97% coverage) with RmS5 (accession: AHC98656.1), from *Rh. microplus*, which has not been functionally characterized [34]. In the tree, Dromaserpin formed a small branch with RHS-1 and RmS5, both presenting a basic amino acid (lysine) in the position P1, similar to Dromaserpin (Figure 3a). The aligned RCLs of the serpins used in the phylogenetic analysis are described in (Figure 3b). The serpins grouped in the same cluster had highly conserved RCLs compared to other tick serpins, which showed fewer similarities.

**Figure 3.** Molecular phylogenetic analysis of 25 tick serpins and Dromaserpin. (**a**) The evolutionary phylogenetic tree of Dromaserpin and the chosen tick serpins. Antithrombin III was utilized as an outgroup. The tree with the highest log likelihood (−12,179.3074) is shown, drawn to scale, with branch lengths measured in the number of substitutions per site (next to the branches). Evolutionary analyses were conducted in MEGA7 [33]. (**b**) Alignment of the RCL regions of the selected serpins was carried out using MEGA7.

#### *2.2. Expression and Purification of the Recombinant Dromaserpin (rDromaserpin)*

rDromaserpin has 391 amino acid residues, including the histidine tag sequence, a predicted molecular weight of 43,159.27 Da, and a predicted isoelectric point (pI) of 8.03. These features dictated the purification strategy adopted. rDromaserpin was successfully expressed using the *E. coli* BL21 (DE3) system in the optimal expression condition (1 mM IPTG, 30 ◦C, 3 h incubation). It was obtained in both soluble and insoluble forms (Figure 4a).

Although rDromaserpin mostly aggregates as inclusion bodies, it was possible to use the soluble fraction for the purification procedure and still obtain an acceptable yield (1.73 mg/L). As expected, purified rDromaserpin migrates at an apparent molecular weight of ~43 kDa on 12.5% SDS-PAGE (Figure 4b). After the purification steps, rDromaserpin was obtained with a high level of purity, observed after Coomassie Blue staining. The purified protein was used for subsequent experiments.

**Figure 4.** Expression and purification of rDromaserpin. cDNA coding for rDromaserpin was cloned into pET28a expression vector and the recombinant protein was expressed in *E. coli* BL21 (DE3) in 2YT medium. Whole cell lysates of non-induced or induced (IPTG 1 mM, 3 h, 30 ◦C) cultures were analyzed by SDS-PAGE (12.5%). (**a**) Both soluble and insoluble fractions were analyzed. Lanes M, −IN, +IN, IS, and S represent protein marker, not induced, induced, insoluble and soluble fractions, respectively; (**b**) SDS-PAGE of fractions from three chromatography steps. Lanes M, TP, and SP represent protein marker, total protein and soluble protein respectively. Lanes P1, P2, and P3 represent eluted purified proteins pooled from IMAC chromatography, Q sepharose ion-exchange chromatography, and size exclusion chromatography, respectively.

#### *2.3. Structural Characterization of the rDromaserpin*

#### 2.3.1. Analysis of the Secondary Structure of rDromaserpin

The purified rDromaserpin was obtained as a monomer, according to an analytical gel filtration analysis (Figure 5). According to the calibration curve obtained with standards, (Figure 5), the calculated molecular weight of rDromaserpin was 37.025 kDa.

**Figure 5.** Separation of rDromaserpin and five standard proteins by size exclusion chromatography. Loading of the 100 μL of rDromaserpin (labeled in red) or the following standards (labeled in black): Conalbumin (75 kDa), Ovalbumin (44 kDa), Carbonic anhydrase (29 kDa), Ribonuclease A (13.7 kDa), Aprotinin (6.5 kDa), as well as blue dextran (for the void volume V0). A calibration curve was obtained with standards to calculate the molecular weight of rDromaserpin using the equation (log (MW) = −0.1615 Ve + 3.4268).

Its secondary structure content was investigated by carrying out circular dichroism (CD) analyses in the Far-UV region, which makes possible estimations of the protein's α-helical, β-sheet and random coil content (Figure 6).

**Figure 6.** Far-UV CD spectra obtained for rDromaserpin. The ellipticity was expressed as the mean-residue molar ellipticity (θ) in degree cm2 dmol<sup>−</sup>1.

The Far-UV CD spectrum presented characteristics of a structured protein, and showed one positive peak at 194 nm and two minimums at 211 nm and 220 nm (Figure 6), a pattern related to an α/β rich protein [35]. The deconvolution of the CD spectrum, using the BestSel program, indicates the secondary structure content of rDromaserpin as approximately 54% α-helices, 21% β-strands and 25% random coils (Figure S1).

#### 2.3.2. Comparative Modeling

The predicted amino acid sequence of Dromaserpin was used to search for homologous proteins, with experimentally-solved 3D structures to serve as templates for structure homology modeling, using the Swiss-Model workspace. Conserpin, which shares 40.62% identity (92% coverage) with Dromaserpin, was the protein selected as a template, and had its 3D structure solved in both RCL open/uncleaved (PDB code: 5cdx) and closed/cleaved (PDB code: 5cdz) conformations. Two three-dimensional model structures of Dromaserpin were thus constructed using each of the conserpin conformations as templates (Figure 7).

**Figure 7.** Cartoon representations of the three-dimensional models of Dromaserpin. (**a**) A cartoon representation of Dromaserpin 3D model 1 with an exposed RCL, based on the structure of Conserpin (PDB: 5cdx). (**b**) A cartoon representation of the Dromaserpin 3D model 2 with an inserted RCL, based on the structure of Conserpin (PDB: 5cdz). The RCL (in yellow) is inserted in the 5-sheet β-strand (in red). Loops are colored grey.

According to the obtained models, the overall structure of Dromaserpin adopts a typical serpin fold composed of eight α-helices (Figure S2a) and three large β-sheets (A, B, and C) (Figure S2b). An uncleaved RCL model, based on the structure of Conserpin (5cdx), shows an intact RCL (Figure 7a) (yellow). As observed in other serpins [36], RCL in this configuration is located as a flexible loop outside the Dromaserpin structure core and can act as bait for target proteases. Additionally, a second model was built based on the structure of Conserpin in the latent state (PDB code: 5cdz) (Figure 7b). In this model, the RCL is cleaved and inserted into the central β-sheet A (Figure 7b) (S1, S2, S3, S5, and S6 indicated in red) as a strand, S4 (indicated in yellow on Figure 7b). Thus, according to Ramachandran plot, 97.6% (5cdz) (Figure S3) and 98.5% (5cdx) (Figure S4) of the residues from Dromaserpin 3D models are in favored and/or allowed regions. Neither model had residues located in the disallowed regions of the Φ, Ψ angle pairs of the Ramachandran plot, indicating correct stereochemistry. The percentage of secondary elements in Dromaserpin models was predicted by STRIDE, and was compared to those obtained experimentally by analyzing CD spectra, using the BestSel tool (Table 1).

**Table 1.** Comparison of the secondary structure content obtained theoretically (exposed RCL and inserted RCL) and experimentally (CD data).

