*2.2. Sequence Alignment and Phylogenetic Reconstruction*

After the DNA and cDNA sequences of the *Cppsy* gene were determined, it was possible to investigate its evolutionary position among the various *psy* genes. Using MEGA 4.0 from Clustal W1.6 alignments, the phylogenetic tree of PSYs from different organisms was constructed based on their deduced amino acid sequences. It showed that *psy* was derived from an ancestor gene and later evolved into four subgroups, including higher plants, cyanobacteria, algae, and bacteria (Figure 2). According to the neighbor-joining (NJ) tree, *Cppsy* belongs to the algae group, and is more ancient than plant species (Figure 2). 

The deduced amino acid sequence of *Cppsy* was submitted to NCBI for PSI-BLAST searches and the results showed that *Cppsy* has high homology with *psy*  genes from other algal species, with 83% identity and 88% positives with *psy* from *Chlorella* NC\_64A. *Cppsy* was also highly similar to *psy* from *C. reinhardtii* (67% identities, 79% positives), *H. pluvialis* (63% identities, 77% positives), *D. bardawil* (68% identities, 80% positives), and *D. salina* (68% identities, 79% positives), suggesting that *Cppsy* belongs to the algae *psy* family. In the algae family, CpPSY belongs to class I of PSY according to Tran's data [29]. BlastP analysis suggested that this protein has the essential characteristics of PSY. It belongs to the Isoprenoid\_Biosyn\_C1 superfamily, and contains the consensus sequence, including three predicted substrate-Mg2+ binding sites (aspartate-rich regions) (DXXXD), 130- DELVD-134, 203-DELYD-207, and 256-DEGED-260 (Figure 3A). In other algae and higher plants, there are two (DELVD and DVGED) (Figure 3A); hence, CpPSY has one more DXXXD motif than other PSYs. The abundant 203-DELYD-207 site possibly plays an important role in the function of CpPSY, which should be studied further. 

**Figure 2.** Phylogenetic tree of PSY sequences from various species. The phylogeny was derived using neighbor-joining analysis. The accession numbers of the amino acid sequences follow the taxon names. Horizontal branch lengths represent relative evolutionary distances, with the scale bar corresponding to 0.05 amino acid substitutions per site. 

*Mar. Drugs* **2015,** *13*, 6620–6635 

**Figure 3.** (**A**) Alignment of the selective PSY-deduced amino acid sequences from different algae produced with the GeneDoc program using Clustal W. The alignment indicates aspartate-rich regions/substrate-Mg2+ binding sites (DXXXD). The three DXXXD motifs are shown by the red boxes. Cppsy, Cvpsy, Mspsy, Olpsy, Dbpsy, Dspsy, Hppsy, Cspsy, Czpsy, Vcpsy, Ntpsy, Atpsy, and Zmpsy are the PSY of *Chlorella protothecoides* CS-41, *Chlorella variabilis*, *Micromonas* sp. RCC299, *Ostreococcus lucimarinus*, *Duanliella bardawil*, *Duanliella salina*, *Haematococcus pluvialis*, *Coccomyxa subellipsodiea* C-169, *Chromochloris zofingiensis*, *Volvox carteri* f. *nagariensis*, *Nicotiana tabacum*, *Arabidopsis thaliana*, and *Zea mays*, respectively; (**B**) Three-dimensional model structure of CpPSY. Comparative modeling was performed using homology-based 

three-dimensional structural modeling. The three aspartate-rich motifs (DXXXD) are colored in orange (DELVD), yellow (DELYD), and magenta (DEGED); others are shown in green. The *N*-terminus and *C*-terminus are also shown; (**C**) High-performance liquid chromatography trace and UV spectrum of carotenoid pigments in the *E. coli* heterologous complementation system. Pigments extracted from *E. coli* cells transformed  with pACCRT-E and pUC-psy together (1), pUC-psy only (2), and pACCRT-EB only (3). Absorbance was recorded at 285 nm. The peak indicated by the arrow is phytoene. 

The secondary structure prediction carried out at NPS@ (https://npsaprabi.ibcp.fr/) showed that CpPSY consists of 58.68% alpha helix, 26.58% random coil, 10.79% extended strand, and 3.95% beta turn. The tertiary structure of CpPSY was constructed using homology-based modeling by Swiss-Model (Figure 3B). A total of 50 models were found. Squalene synthase (HpnC) was used as a template for molecular modeling, since the identity is the highest (30.42%). The modeled structure also showed that CpPSY consists mostly of alpha helices. The three conserved DXXXD motifs (orange DELVD, yellow DELYD, and magenta DEGED) were marked in the three-dimensional model structure (Figure 3B). It seems that the three DXXXD motifs form a circle-like structure, which could be important for enzyme activity. 

All of the analysis results strongly suggest that PSY from *C. protothecoides* CS-41 is an algal phytoene synthase protein involved in the carotenoid biosynthesis pathway. Bacterial complementation assay further confirmed that this gene is functional. The expressed protein in pUC-psy could catalyze the GGPP produced by pACCRT-E (Figure 3C,1) to phytoene, similar to the function of the *crtB* gene in pACCRT-EB (Figure 3C,3). 
