*2.3. Chloroplast Genome Analysis*

We obtained 21 contigs with a total of 355,893 bp and one circular molecule with a length 399,372 bp from short-read sequencing data and long-read sequencing data, respectively. The dot plot showed a high-level congruence between the 21 contigs and one circular molecule (Figure S3), indicating that the plastome of *T. odorata* was circular and had a length of up to 399,372 bp. Our annotation results showed that the plastome possesses the typical quadripartite structure. Two inverted repeat regions were 26,700 bp and 26,778 bp, respectively. The large single copy region (LSC) was 178,629 bp, while the small single copy region (SSC) was 167,265 bp (Figure 5). The overall G + C content of the circular cpDNA was calculated to be 29.75%. The analysis revealed that the cpDNA encodes 97 genes (Table 1).

**Figure 5.** Circular map of the chloroplast genome of the Trentepohlia odorata (MK580484). Genes are color coded according to the functional categories listed in the index below the map. The GC content and inverted repeats (IRA and IRB) which separate the genome into two single copy regions are indicated on the inner circle. Genes on the inside of the outside circle are transcribed in a clockwise direction; those on the outside of the map are transcribed counterclockwise.

The genes were grouped into two major categories, coding genes and non-coding genes. The coding genes consisted of 63 predicted protein-coding genes, including five *atp* genes, four *chl* genes, four *pet* genes, five *psa* genes, 15 *psb* genes, seven *rpl* genes, four *rpo* genes, 11 *rps* genes, two *ycf* genes, and six other genes (*ccsA*, two *clpP*, *infA*, *rbcL*, and *tufA*). The non-coding gene category included 31 tRNAs and three rRNAs (Table 1). We annotated a total of 49 introns, of which 39 group I introns were present in eight genes (*rrl*-IRa (8), *rrl-*IRb (8), *rrs* (2), *psaA* (2), *psbA* (3), *psbC* (5), *psbD* (3), *petB* (4), and *rbcL* (2)), nine group II introns in eight genes (*rpl*2, *rps*12, *psaA* (3), *psbA*, *psbD*, *psaC*, *petB*, and *tufA*), and one unidentified type intron (*rpoB*). The IR regions contained only two of the same rRNA gene (*rrl*) and no other genes. Additionally, we detected that the *rrs* gene was not located in IR regions but had only one copy in the SSC region. The *ycf3* gene was located across LSC region and IRb region; however, there was only a partial sequence of this gene detected in the IRa region. We annotated two copies of clpP in the LSC region and SSC region, respectively. Additionally, there were 95 free-standing ORFs (length >100 aa) annotated in the intergenic region, with a total length up to 72,720 bp (18.21%) (Figure 6). Among the free-standing ORFs, seven were annotated with plastid origins (POP), 16 with eukaryotic genome origins (EOP), and 33 with bacterial genome origins (BOP) (Figure 7). Four genes

(*rpoA*, *rpoB*, *rpoC1*, and *rpoC2*) were annotated within four ORF clusters (including partial ORFs). All four genes were fragmented into several ORFs by in-frame stop codons (Figure 8, red asterisks). There were three fragments annotated in *rpoC1*, three in *rpoC2*, eight as *rpoB*, and two as *rpoA*. In the *rpoB* ORF cluster, we detected a frame shift mutation (Figure 8, arrow).


**Table 1.** Genes encoded by *Trentepohlia odorata* chloroplast genome.

**Figure 6.** The size of CDS (protein coding regions), ORF (open reading frames, >100 aa), tRNA, rRNA and other regions in *Trentepohlia odorata* chloroplast genome. The number represents its size (bp).

**Figure 7.** Blast result of free-standing ORFs in *Trentepohlia odorata* chloroplast genome.

**Figure 8.** The fragmentation of *rpo* gene cluster. The red asterisk represents a detected in-frame stop codon; the shift mutation was labelled in a blue arrow. The BOP, EOP and POP represent ORF may have a bacteria, eukaryotic nucleus, and chloroplast origin respectively. ORF without BOP, EOP or POP label means no blast hit.

#### **3. Discussion**

The taxonomic controversy related to *T. odorata* primarily focused on whether or not the species was synonymous with *T. umbrina* or *T. iolithus* [7,8]. The main differences between *T. iolithus* and *T. odorata* was their substratum. *Trentepohlia iolithus* was only found on exposed stones or concrete, and *T. odorata* was found on tree bark [7,9,20]. A previous study reported *T. odorata* on other substratum, such as concrete, which was not consistent with the original description [21]. Although *T. odorata* has a very similar vegetative morphology with *T. iolithus* var. *yajiagengensis*, the morphology of sporangia and the phylogenetic position suggests that they are different species [22]. *Trentepohlia umbrina* is a paraphyletic species as sequences from *T. umbrina* clustered into several small clades in many studies. There was an obvious morphological difference between the two species. According to the original description, the thallus of *T. odorata* is heterotrichous and the thallus of *T. umbrina* is prostrate. The vegetative cells of *T. odorata* have a greater length/width ratio than that of *T. umbrina*. In the present study, our observation was consistent with the original description and the Printz description, thus we support Printz in that *T. odorata* is a morphologically distinct species, rather than Hariot [7,10]. The phylogenetic result shows that *Trentepohlia odorata* has the closest relationship with *Trentepohlia annulata*. One possible explanation for their close relationship is that both algae seem not to possess sporangiate-lateral. *Trentepohlia* species with lateral or intercalary sporangia and dorsal pore sporangia may represent several deep lineages in Trentepohliales. Additionally, there are few images regarding *Trentepohlia odorata* in previous studies, and our study provided new morphological evidence to compare those *Trentepohlia* species.

Although a considerable number of published plastomes are available, there are many gaps in Chlorophytes plastomes, especially in several orders of Ulvophyceae [12]. A recent study reported that chloroplast genomes in Cladophorales are fragmented into many small hairpin chromosomes [17]. The chloroplast genome in Bryopsidales are circular but lack a large inverted repeat [18]. Our study reported the first whole plastome of Trentepohliales, with a size up to 399,372 bp, which is the largest currently identified within Ulvophyceae. The plastome of *Trentepohlia odorata* presented a quadripartite structure, which differs from its close relatives, Cladophorales and Bryopsidales. We found several free-standing ORFs of bacterial origin and fragmentation by in-frame stop-codons in *rpoA*, *rpoB*, *rpoC1*, and *rpoC2* genes, which is similar with Bryopsidales [18]. The *rrf* gene was not detected in this study. We detected that the *rrl* gene located in IR region, and the *rrs* gene located in the SSC region. Fragmentation by introns were found in the *rrl* gene, with eight group-I introns. Similar cases were also found in *Caulerpa manorensis*, *Jenufa perforata*, *Schizomeris leibleinii*, and *Floydiella terrestris*, with

seven, five, seven, and eight introns, respectively [18,19,23,24]. Two copies of the *clpP* gene were annotated. This gene duplication is very common in the nuclear genome and is usually caused by two repeats located at the two sides of the genes in the organellar genome. However, we did not detect repeat sequences at the sides of the two *clpP* genes. Our phylogenetic analysis using chloroplast genomes indicated that both Ulvophyceae and Trebouxiophyceae are paraphyletic, which is consistent with previous studies [11,13]. Trentepohliales clustered with Dasycladales in present study, which is also reported by previous studies [11,13]. However, we cannot rule out that Trentepohliales have a closest relationship with Cladophorales since species in Cladophorales was not included in our phylogenomic analysis.
