*2.1. Sequence Retrieval*

The results of the amino acid sequence search showed that 14 protein sequences from nine *C. vulgaris* strains of the UTEX259 UTEX259 culture collection (taxid 3077)-scaffolds met determined criteria. The accession numbers of Transcriptome Shotgun Assembly (TSA) and Whole Genome Shotgun (WGS) sequences are given in Table 1andTable 2, respectively. As can be seen from the Table 1, all found lipase sequences belong to AB\_hydrolase family (Interpro number IPR029058) and display Acyl hydrolase motif GXSXG. Nine of them show high sequence identity to Lipase\_3 domain-containing protein (*Chlorella variabilis*) from the ESTHER database. Two sequences, namely Lip\_5800 and Lip\_5999, present high identity with sn1-specific diacylglycerol lipases alpha from *Auxenochlorella protothecoides* and *Micractinium conductrix*, respectively. In addition, 46.6% of sequence identity with chloroplastic Phospholipase A1 from *M. conductrix* was also detected with Lip\_3448. Sequence homology analysis with multiple alignments revealed that these 14 sequences could be broadly clustered into two groups; 3 probable sn1-diacylglycerol lipases and 11 other lipase\_3 family. Subsequently, gene prediction experiments were carried out with ab initio gene models (Table 2). These predictions showed different scaffold localization of the predicted lipase sequences with an exon number varying from 8 (Lip\_5800 and Lip\_5462) to 23 (Lip\_2999). Lip\_4551 and Lip\_6297 lipases genes were found to be tandemly arrayed in the genome structure. These two genes have different sequence and size and their adjacent organization could allow faster transcription [15].


53


**Table 2.** Genes annotation of the putative TAG predicted lipases.

#### *2.2. Physicochemical Characterization of Protein Sequences*

ProtParam parameters shown in Table 3 reveal protein lengths varying from 421 to 1145 amino acids corresponding to diverse molecular masses (from 44.8 to 124.3 kDa). Various theoretical isoelectric points (Ip) were also found (4.09 to 9.34) and all proteins were predicted to have high molar extinction coefficients (46,300 to 193,210). Predicted repeats, motifs and localizations are given in Table 4. Among all predicted lipases, seven proteins have transmembrane motifs, including four predicted as being localized in plasma membrane and three in chloroplastic membrane. The seven other lipases have different cellular localizations (cytoplasmic, mitochondrian, chloroplastic or extracellular space), with five of them possessing a predicted signal peptide sequence. This enhances the possibility of extracellularity prediction however the signal peptides of chloroplasts and mitochondria are also N-terminal cleavable peptides [16]. They are less characterized than the secretory ones, but they are both rare in negatively charged amino acids and able to fold into amphiphilic α-helices [17].

The half-life is a prediction of the time it takes for half of the amount of protein in a cell to disappear after its synthesis in the cell; for all predicted lipases, it was found to be 30 h in mammalian (in vitro), more than 20 h in yeast, (in vivo) and more than 10 h in *Escherichia coli* (in vivo). ProtParam classifies also all studies proteins as stable (Instability index < 40).

Soluble predicted lipases have molecular weights between 44.8 and 102.5 kDa and Ip between 4.09 and 8.5. Concordant results were found by Ursu et al. [18]. The authors demonstrated, using the 2-DE profile of *C. vulgaris* soluble proteins, the presence of two protein groups that have been identified considering their isoelectrical points: a main group, having an Ip range of 4.0–5.5, and a minor group, with an Ip range of 6.0–8.0. However, the majority of separated proteins have apparent molecular weights range between 12 and 75 kDa. The difference observed herein could be explained by the fact that some proteins are not expressed under the culture conditions used by the authors.




**Table 4.** Predicted repeats, motifs and localization of putative lipases.
