*4.1. Sequence Retrieval*

BlastP search was performed using amino acid sequences of functionally characterized lipases from terrestrial plants (*Trifolium pretense* and *Diplocarpon rosae*), fungi (*Colletotrichum chlorophyti*), microalga (*Scenedesmus* sp. and *Symbiodinium microadriaticum*) and bacteria (*Pseudomonas fluorescens* and *B. subtilis*) available in the NCBI database (http://ncbi.nlm.gov/protein/). The FASTA sequences were searched using tblastn modality against Transcriptome Shotgun Assembly database (TSA) of *C. vulgaris* strain UTEX259 UTEX259 (taxid 3077) and every hit with an E-value < 10−<sup>5</sup> was identified as putative Lipase transcript. The open reading frames (ORFs) were searched using the ORF finder program [42] and the longer ones were blasted a second time against non-redundant protein database to ensure that the respective TSA corresponds to a putative Lipase ORF. The selected TSA sequences were submitted to a blastn search against the whole Genome Shotgun contigs (WGS) database of the same *C. vulgaris* strain (taxid 3077) and single hits with E-value < 10−<sup>100</sup> were identified as scaffolds with putative Lipase genes. Gene predictions from the selected WGS scaffolds were performed using ab initio gene models through Augustus [43]. The application was trained on the gene structures of *Chlamydomonas reinhardtii* and the TSA sequences were used in cDNA uploaded option. The final output ORF and protein sequences were saved for further in silico analysis.

#### *4.2. Multiple Sequence Alignment*

The multiple sequence alignment and calculation of cladogram illustrating sequence similarity relationships among the 14 putative lipase sequences was executed by MAFFT (v7.310) with G-INS-1 strategy, unalign level 0.8, leave gappy region options for alignment and UPGMA as average linkage method for clustering. Rendering was done using ESPript [44].

#### *4.3. Physicochemical Characterization of Protein Sequences*

Basic physicochemical properties such as molecular weight, extinction coefficient, isoelectric point, aliphatic index, grand average of hydropathicity and instability index were estimated by ProtParam tool (http://web.expasy.org/protparam/) [35]. Extinction coefficients were calculated assuming all pairs of Cys residues form cystines or assuming all Cys residues are reduced. Sequence analysis and lipase motifs search were performed with InterPro [45] and the Expasy my hits search tool (https://myhits.isb-sib.ch/cgi-bin/ motif\_scan), respectively. These sequences were also compared in the ESTHER database to check higher sequence identity [19]. For predicting subcellular localization Deepmito [46], Mitoprot v1.101 [47], HECTAR v1.3 [48] and TMHMM v2.0 [49] were performed. Putative signal peptides in each sequence were predicted using the SignalP 4.0 server [50]. Since N-glycosylation was widly described for lipases prediction of N-glycosylation sites were performed using NetOGlyc 4.0 Server [51].

#### *4.4. Tertiary Structure Prediction, Structure Validation and Quality Prediction*

Three-dimensional models of the selected putative enzymes were generated using different approaches. For sequences with acceptable homology in the template of the programs, UCSF Chimera (https://www.rbvi.ucsf.edu/chimera/) and the automated protein homology modeling server SWISS-MODEL (http://swissmodel.expasy.org/) were used. For sequences with low homology with the structures in the database, multiple-threading alignments using the I-TASSER approach (zhanglab.ccmb.med.umich.edu/I-TASSER/) was used. I-TASSER is an automated bioinformatics tool for predicting protein structures from an amino acid sequence followed by iterative structural assembly simulations and atomic-level structure refinement.

The predicted structures were evaluated to ensure correctness of the model stereochemistry, as checked by a Ramachandran plot (http://mordred.bioc.cam.ac.uk/~rapper/ rampage.php) (Lovell et al., 2003) and Verify 3D [52]. The Ramachandran plot scores of the predicted structures showed more than 90% of the amino acids were in favorable regions. ProSA-web Z-score plot (https://prosa.services.came.sbg.ac.at/prosa.php) [53] was used to check whether the Z-score of the input structures is within the range of typically found for the native proteins of a similar size. The Z-score values of all protein structures checked in this study were highlighted as a black dot, which indicates being in the range of native conformations. The final modeled structures were further energetically minimized and molecular dynamics simulation was performed with CABS-flex 2.0 (http://212.87.3.12/CABSflex2). The latter program is an efficient simulation engine that allows modeling of the large-scale conformational change related to protein flexibility [54]. The models were comprehensively analyzed using PyMol (http://pymol.org/) to check for the presence of a lid, and the existence and orientation of the catalytic triad. The depth of the putative intramolecular tunnels was calculated with DEPTH [55] taking residues from the oxyanion hole in each candidate as the cavity end point.
