*4.3. Chloroplast Genome Assembly and Annotation*

A paired-end Illumina sequencing library was prepared from total DNA using the NEBNext Ultra DNA Library Prep Kit (E7370S). The libraries were sequenced using an Illumina NovaSeq 6000 with 150 bp insertion fragments (Illumina, San Diego, CA, USA). High-throughput sequencing data was sequentially analyzed by SOAPnuke v1.3.0 and SPAdes v3.10.0 [31,32]. A 1D genomic DNA by ligation (SQK-LSK108) kit was used to construct a long-reads library according to the manufacturer's instructions. The prepared library was loaded on Oxford Nanopore GridION X5 platform and sequenced. Only reads with mean scores >7 were retained. Long-read sequencing data were assembled using Canu v.1.6.0 [33]. The contigs were used to screen the chloroplast genome using the Blast program [34]. The selected chloroplast genome contig was assembled using Sequencher 4.10. Following this, we used Geneious 8.1 to map all the reads to the spliced genome sequence to verify that the contig was concatenated [35]. Finally, we obtained 21 chloroplast contigs from short-read

sequencing data and one circular molecule from long-read sequencing data. A multicollinear dot plot was performed using Genome Pair Rapid Dotter v1.40 to detect the homology between 21 contigs and one circular molecule [36].

The chloroplast genome was primarily annotated using the online program DOGMA (Wyman et al., 2004) [37] and MAKER [38]. All open reading frames (ORFs) (with length >300 bp) were extracted by ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder/), and then BLASTn and BLASTp (http: //blast.ncbi.nlm.nih.gov/) were used with the e value set to 1e-10 to annotate the free-standing ORFs. Transfer RNA and ribosomal RNA genes were identified using tRNAscan-SE v1.23 and RNAmmer, respectively [39,40]. Intron boundaries were determined by modeling intron secondary structures and by comparing intron-containing genes with intronless homologs [41,42]. The graphical gene map was designed with Organellar Genome DRAW program (https://chlorobox.mpimp-golm.mpg.de/ OGDraw.html) [43]. The annotated chloroplast genome was submitted to GenBank under the accession number MK580484.

We obtained a nucleoid dataset of 16,359 unambiguously aligned positions consisting of 31 common cpDNA-encoded genes of 43 Chlorophytes from Genbank (https://www.ncbi.nlm.nih.gov/ genbank/), of which *Picocystis salinarum* and *Nephroselmis astigmatica* were used as the outgroup taxa. Most the genome accession numbers were presented before species name in Figure 8. Accession numbers of *Cephaleuros* sp., *Acetabularia peniculus* and *Scotinosphaera* sp. chloroplast genome sequences were MG721699-MG721754, MH545187-MH545222, and MG721898-MG721961 respectively. The data partition and best-fit models were selected using ModelFinder according to Bayesian inference criteria [28]. We used IQtree v1.7 and MrBayes 3.2 to perform maximum-likelihood analysis and Bayesian inference, respectively [29,30]. Additionally, because genes in the Cladophorales plastome were unique and there were fewer genes than the chloroplast genomes of other green algae, we did not include it in our analysis.
