*3.1. Expression of SUMO and SENP Proteins in Leishmania spp.*

When screening the *L. donovani* genome database using BLAST, we identified genes coding for SUMO (LdBPK\_080480) and SENP (LdBPK\_262070). A ClustalW amino acid sequence comparison of SUMO genes from five *Leishmania* species and two *Trypanosoma* species with four human paralogs and orthologs from *Drosophila* and yeast was performed and used to build a phylogenetic tree (Figure 1A). The SUMO orthologs from the lower eukaryotic clade are distinct from the metazoan SUMOs, but reasonably well conserved (Figure 1C). Notably, the di-glycine motif near the C terminus is present in all SUMO orthologs. The SENP/Ulp2 peptidases, too, were highly conserved among the *Leishmania* spp. and clearly related to the *Trypanosoma* orthologs (Figure 1B).

Both SUMO and SENP are constitutively expressed in *L. donovani*. RNA-seq and ribosome profiling data generated previously [5] show minor variations for SUMO protein synthesis and RNA abundance for *L. donovani* before and after radicicol-induced promastigote-to-amastigote differentiation (Figure 1D). SENP also shows a constitutive, stage-independent protein synthesis and RNA levels. The normalized [5] ribosome footprinting read densities for SUMO and SENP were slightly above those for ubiquitin fold modifier (UFM, LdBPK\_161100), another PTM polypeptide [19,20], and lower than those recorded for polyubiquitin (LdBPK\_090950), indicating a gene expression rate slightly above the median (1.0) for *L. donovani* genes. With expression of SUMO and SENP established, we decided to target both genes for replacement, using a CRISPR/Cas9 approach.

**Figure 1.** Conservation and expression of *SUMO* and *SENP* in *Leishmania*. (**A**) Phylogenetic analysis of SUMO proteins. Sequence alignment and tree building were done using the neighbor joining algorithm and best-fit analysis with Poisson correction. Numbers indicate amino acid sequence deviation. (**B**) Phylogenetic analysis of SENP proteins, performed as in (**A**). (**C**) Alignment of SUMO amino acid sequences, with the C-terminal di-glycine highlighted. (**D**) Gene expression analysis by ribosome profiling and RNA-seq analysis for *L. donovani* before (−RAD) and after (+RAD) radicicol-induced differentiation. Shown are relative read densities, normalized to the median read densities, for protein synthesis (RFP) and RNA abundance (RNA). Data collected from [5].
