*2.2. Chara vulgaris*

The 5- UTRs were much longer than those observed in *P. duplex* (Table 3). Based on the termini we detected, UTRs ranged from 6–273 nucleotides, with an average length of 80 nts (S.E. = 15 nt). They were also more variable, with two or more termini detected for seven genes, *atp6*, *cob*, *cox2*, *cox3*, *nad1*, *nad4*, and *nad4L*. For three of the genes, *cox1*, *nad1*, and *nad2*, the mapped 5 termini occurred downstream of the start codons in Gen-Bank record NC\_005255 (Figure 5). For *cox1*, a single terminus was detected 125–129 nts downstream of the predicted start codon. The next start AUG occurs 75–79 nt downstream (Figure 5A). Three 5- UTRs were detected for *nad1*, all of which remove the predicted start codon but, depending upon the cleavage site, leave two possible alternative AUG start codons (Figure 5B). In *nad2*, two 5- UTRs were detected, both of which leave a single alternative AUG start codon (Figure 5C). The length of the 5- UTRs in *C. vulgaris* raised the possibility that they may fold to form RNA secondary structures, but the probability of secondary structures in the 5- UTRs was found to be extremely low. The coverage of chondriome derived transcripts for *C. vulgaris* using PacBio sequencing was very low, so the 5 termini of only two genes was recovered, *atp9* and *cox2*. Neither were the same length as those detected by circRT-PCR (Table 3).

**Table 3.** Site of 5- UTR terminus upstream from start codon (in nucleotides) in *Chara vulgaris* mitochondria, nd = no data. \* 5- UTR termini detected in this study occur downstream of the start codons found in archived chondriomes. The distances presented in this table are marked from the next available AUG start codon.


*C. vulgaris* 3 ends ranged from 0–162 nts with an average of 61.3, S.E. = 7.5, (Table 4, Figure 6). Multiple 3 termini were detected for each gene, except *cox1*, and *nad2*, which had single termini. Polyadenylation was detected on eight genes (*atp6*, *atp9*, *cob*, *cox3*, *nad1*, *nad2*, *nad4*, and *nad4L*). For genes where multiple termini were detected, the polyA tail only occurred on one of those termini (Figure 6B,C,F,G,I). The exception was *atp6*, where all three termini were polyadenylated (Figure 6A). The proportion of those specific transcripts with polyA additions varied considerably. For example, 0.7% of *atp9* transcripts with 52–57 nt 3- UTRs had a polyA tail, whereas the majority of specific *nad2*, *nad4*, and *nad4L* transcripts were tailed. PacBio sequencing produced data for four genes and the 3 termini agreed with three (Table 4). The exception was *cob*, where the PacBio sequencing revealed a longer 3- UTR than those found with circRT-PCR. PacBio sequencing could not be used to detect non-template poly(A) tails since the mitochondrial transcripts were artificially polyadenylated to accommodate the Iso-Seq technique. The forty nucleotides upstream of the 3 termini were analyzed for conserved sequences using a logo plot and none were found (Supplemental Figure S1B). The length of the 3- UTRs in *C. vulgaris* raised the possibility that these regions could fold into secondary structures. Secondary structure prediction suggested a high probability that stable stem-loop structures occur adjacent to the 3 terminus in all but one (*cox2*) of these 3-UTRs (Supplemental Figure S2).

**Figure 5.** The 5 termini and UTRs of three *C. vulgaris* mitochondrial mRNAs where the 5 terminus occurred downstream of predicted start codons. Start codons are red and truncated portions of coding regions grey. (**A**). For *cox1* several 5 termini were detected within a 5-nucleotide region. (**B**). Three 5termini were detected for *nad1*. (**C**). Two different termini were detected for *nad2*.

**Table 4.** Length of the most abundant 3-UTRs found in *Chara vulgaris* and those with a non-template polynucleotide addition.



**Table 4.** *Cont*.

nd: no data.


**Figure 6.** The 3 termini and UTRs of ten *C. vulgaris* mitochondrial mRNAs. Stop codons are blue. (**A**). Three termini were detected for *atp6*, a portion of each had a poly(A) addition. (**B**). Several termini were detected for *atp9*. The longer one (upper) had no oligonucleotide additions, while those lacking the terminal uracil had poly(A) additions within a 6 nt region. (**C**). Two *cob* termini were detected and the shorter one had no detectable oligonucleotide additions while the longer (lower) one did. (**D**). A single terminus was detected for *cob* with no oligonucleotide additions. (**E**). Four termini were detected for *cox2*. All had a portion of the adjacent *cox3* gene and no oligonucleotide additions. (**F**). Three *cox3* termini were detected. A portion of the shortest UTR had poly(A) tails, while the two longer ones did not. (**G**). Four termini were detected for *nad1* but only one had a poly(A) tail. (**H**). A single terminus was detected for *nad2* and a portion of them had a poly(A) tail. (**I**). Three termini were detected for *nad4* and a portion of the shortest had a poly(A) tail. (**J**). Two termini were detected for *nad4L*. One occurred directly adjacent to the stop codon. A portion of the longer UTR had a poly(A) addition.

Since tRNA placement was found to be important for 3 maturation of some *P. duplex* genes, the distances of mature 3 termini from the stop codons of genes with an adjacent tRNA were compared for *C. vulgaris* (Table 4). For the twelve genes used in this study, none had 3 termini formed from the removal of an adjacent tRNA. The presence of RNA secondary structure (t-elements) immediately downstream of 3 termini was also checked, and three genes (*atp6*, *nad1*, and *nad2*) had potential stem-loop structures adjacent to their mature 3termini.

In *C. vulgaris*, circularized full length coding regions were detected for five genes. Circularized *nad2* transcripts matched the 3 and 5 termini detected in earlier experiments (Figures 5C and 6H), suggesting that only naturally circularized transcripts were detected for this gene. For the genes *atp9*, *cox1*, *cox2*, and *nad4L*, circularized variants differed from the artificially circularized transcripts analyzed in previous experiments and are represented in Figure 7. For genes *cox2*, *cox3*, *nad1*, and *nad4*, only fragments of coding regions were found circularized. In *C. vulgaris*, there were no repeat motifs associated with the ligation sites and no polyA additions detected in the circularized transcripts.

**Figure 7.** Naturally circularized mRNAs found in *C. vulgaris* mitochondria. Start codons are colored red, stop codons blue, and truncated portions of coding regions grey. (**A**). Two circularized versions of full-length transcripts were detected for *atp9*. (**B**). Two circularized *cox1* transcripts were detected. Both had the same 5 end but different termini on the 3 ends. (**C**). A single circularized version of the *cox2a* transcript was detected. (**D**). Two circularized full-length *nad4L* transcripts of different lengths were detected.
