*2.1. Cloning and Characterization of the psy Gene from C. protothecoides*

Touchdown PCR with primers YF and YR (Table 1) generated a predicted 373 bp fragment (Supplementary Figure S1, lane 1). BLAST analysis showed that the nucleotide sequence of this fragment shared about 74% and 73% identities with that of *C. reinhardtii* and *D. salina*, respectively, demonstrating that this fragment sequence is derived from a putative phytoene synthase. 


**Table 1.** PCR primers and target fragments for *Cppsy.* F: forward; R: reverse; O: outer primer; I: inner primer. 

With the sequence information, specific primers were designed for 5ȝ- and 3<sup>ȝ</sup>rapid amplification of cDNA ends (RACE) of the related gene. 5ȝ-RACE generated a 598 bp fragment (Supplementary Figure S1, lane 2), and 3ȝ-RACE produced an 816 bp fragment (Supplementary Figure S1, lane 3). They were displayed by sequencing as the 5ȝ and 3<sup>ȝ</sup> regions of the phytoene synthase gene of *C. protothecoides* (*Cppsy*). RT-PCR (Reverse Transcription) with a pair of primers YF1 and YR1 generated an 1143 bp fragment (Supplementary Figure S1, lane 4), which was identified as the fulllength *Cppsy* cDNA (GenBank accession No. FJ968161). 

The open reading frame of *Cppsy* cDNA encoded a protein of 380 amino acid residues with a calculated molecular mass of 43.035 kDa and an isoelectric point of 6.40 (http://cn.expasy.org/ 

tools/protparam.html) and shared 81.7% identical sequence with *Chlorella* NC\_64A. 

To characterize the corresponding gene of *Cppsy* cDNA, genomic PCR was performed. A 2488 bp fragment (Supplementary Figure S1, lane5) (GenBank  accession No. GU351883) was generated and sequenced. Analysis of the obtained nucleotide sequence revealed that the product was the corresponding *Cppsy* gene. 

The Southern blot analysis results indicated that there is only one *Cppsy* gene copy in *C. protothecoides* CS-41 (Supplementary Figure S2), which is different to those in higher plants. *Psy* gene replication 

is common in dicot plants, such as tomato (SlPSY1 and SlPSY2), and in monocot plants, such as maize (ZmPSY1-3), rice (OsPSY1-3), and sorghum (SbPSY1-3) [16,26– 28]. 

Analysis of the *Cppsy* gene structure (Figure 1) revealed that it is more complicated than those of dicot and monocot plants. It consists of ten exons and nine introns. *Chlorella* has a higher intron density than other algae and higher plants; in most of the higher plants, *psy* genes always have four or five introns, but this alga has nine introns. Compared with the structure of the *psy* gene from 

*C. reinhardtii* (*Crpsy*), it seems that there are two introns inserted into each of the first and second exons, and one intron inserted into the fourth exon, which makes the gene structure more complicated (Figure 1C). 

**Figure 1.** Exons and introns of the *Cppsy* gene in *C. protothecoides* CS-41.The ten exons are: (1) 1 bp to 280 bp; (2) 477 bp to 576 bp; (3) 691 bp to 743 bp; (4) 913 bp to 1030 bp; (5) 1139 bp to 1180 bp; (6) 1327 bp to 1427 bp; (7) 1635 bp to 1793 bp; (8) 1957 bp to 2045 bp; (9) 2171 bp to 2233 bp; (10) 2351 bp to 2488 bp. (**A**) Intron density; (**B**) DNA structure; (**C**) The relationship between introns and exons of *Cppsy*, *Crpsy*, and *Cnpsy* genes. *Dbpsy*, *Dspsy*, *Hppsy*, *Crpsy*, *Cnpsy*, *Cppsy*, *Mzpsy*, and *Atpsy* are the *psy* genes of *Duanliella bardawil*, *Duanliella salina*, *Haematococcus pluvialis*, *Chlamydomonas reinhardtii*, *Chlorella*  NC\_64A, *Chlorella protothecoides* CS-41, *Zea mays*, and *Arabidopsis thaliana*, respectively. 
