*3.1. Protein Identification from the LC*−*MS*/*MS Peptide Spectra*

The main objective of this study was to obtain the experimental proteome of the *L. infantum* promastigote stage with the additional aim of improving current genomic annotations. For this purpose, a recently published re-sequenced genome [13] was used. However, we did not restrict the search of peptide spectra on currently annotated protein-coding genes; instead, a database consisting of all possible polypeptides (equal or larger than 20 amino acids) was used (see Materials and Methods for further details). The workflow used for sample preparation and proteomics is shown in Figure 1. From the MS/MS data, it could be seen that only those associated with peptides longer than seven amino acids were considered for protein identification. Among the identified proteins, 2344 proteins matched with previously annotated proteins [13]. Moreover, eight novel proteins were uncovered, thus legitimating the search strategy. In addition, some ORFs had to be extended to accommodate the MS-identified peptides (see below for further details about these findings). Most of the proteins—70.5% (1659 out of 2352)—were identified by three or more unique peptides per protein, 14.5% (341) of the protein identifications were supported by two unique peptides, and only 15% (352) of the identifications were done by a single unique peptide. Currently, 3482 out of 8590 annotated proteins (around 40%) in the *L. infantum* genome have the status of hypothetical proteins; the MS spectra obtained in this work provided experimental evidence of the real existence for 456 of those hypothetical proteins.

**Figure 1.** Workflow for protein extraction and proteomic analyses of *Leishmania infantum* promastigotes. The experimental MS data were searched against the UniProt protein database and a database consisting of all possible polypeptides encoded in the six-frames of the *L. infantum* genome (based on v2/2018; www.leish-esp.cbm.uam.es; [13]).

The first comprehensive study aimed to characterize the *L. infantum* proteome was carried out by the Papadopoulou's group [30]. Using two-dimensional (2D) gel electrophoresis, these authors visualized 2261 protein spots in promastigote samples and 2273 spots in amastigote ones. However, after MS analysis, only 168 protein spots, derived from 71 different genes, could be identified [30]. A better proteome resolution was attained after a fractionation step including digitonin extraction; hence, 153 *L. infantum* proteins were identified by MS analysis of selected spots [31]. The combination of two-dimensional liquid chromatography (2DLC), electrospray ionization mass spectrometry (2DLC-ESI-MS), and 2DLC-matrix-assisted laser desorption/ionization mass spectrometry (2DLC-MALDI-MS) allowed Leifso and co-workers to identify 91 *L. infantum* proteins [19]. An enrichment for basic proteins using the technique of free-flow electrophoresis prior to separation by 2D gel electrophoresis led to the identification of around 200 *L. infantum* proteins [32]. Alcolea and coworkers [33] identified 28 proteins in a proteomic study aimed to uncover differentially expressed proteins between the early-logarithmic and the stationary phases during the culturing of *L. infantum* promastigotes. In two different studies using MS analysis of the exoproteome derived from *L. infantum* promastigote cultures, a total of 102 [34] and 494 [35] proteins were identified. Therefore, our work provides the most complete, to date, experimentally evidenced proteome for *L. infantum*.

Outstanding studies on proteome identification have been performed in both *L. donovani* and *L. major*. In 2008, Rosenzweig and collaborators reported the identification of 1713 proteins in *L. donovani* [20]. A comparison between the proteins identified in our work (*L. infantum* JPCM5) and those identified in *L. donovani* showed that 1218 proteins were common (orthologs) in both studies (Figure 2). We failed to identify 207 proteins of those reported in *L. donovani*, whereas we found 1130 proteins that are absent from the *L. donovani* proteome reported by Rosenzweig et al. [20]

**Figure 2.** Comparison (Venn diagram) between the identified proteins in this work (*L. infantum*, in green) and those identified in two previous studies [20,36] performing proteomic analysis in *Leishmania donovani* (in red) and *Leishmania major* (in blue). The Venn diagram was created by the tool available at bioinformatics.psb.ugent.be/webtools/. Note—the discrepancy between the number of proteins identified by Rosenzweig et al. [20] (see text) and those represented in the Venn diagram (1713 vs. 1708) was due to 5 gene duplications that were corrected after re-assembling of the *L. infantum* genome [13].

More recently, Pandey and coworkers reported the identification of 3386 different proteins in *L. donovani* promastigote and amastigote stages [37,38]. After comparing their data and the proteins identified in this study, 1650 of the proteins observed in *L. infantum* promastigotes were found to be present (their orthologues) in the *L. donovani* promastigote proteome. However, among the 613 proteins that Nirujogi et al. [37] reported to be exclusively expressed in *L. donovani* amastigotes, 126 proteins were also identified in our proteomics study, thus indicating that these proteins are also being expressed in the promastigotes stage, at least in *L. infantum* (see Supplementary File, Table S1). Most of them were annotated as hypothetical proteins or with unknown function, but there are also metabolic enzymes, translation machinery components (ribosomal proteins and eukaryotic initiation factors), and RNA binding proteins.

In 2014, Pawar et al. [36] reported a quite wide proteome of the *L. major* promastigote stage, in which 3613 proteins were identified. These authors followed a proteogenomic approach, as we did in this study, consisting of searching the mass spectra against a six-frame translated database generated from a complete genome sequence. An orthology-based comparison indicated that the *L. major* promastigote proteome and the *L. infantum* proteome of this study shared 1733 proteins (Figure 2). Moreover, considering the 1792 proteins identified in the *L. major* proteome, though not in our study, and the 615 proteins exclusively identified by us in the *L. infantum* proteome, the total number of identified proteins presumably expressed in the promastigote stage is 4140 (roughly half of the predicted proteins to be encoded in the *Leishmania* genome).
