*2.1. Pediastrum duplex*

In *P. duplex*, *cox2a*, *cox3*, *nad2*, *nad4*, and *nad4L* 5 termini occurred directly upstream of the AUG start codon, essentially leaving no UTR (Table 1). For *atp6*, 5 and 3 end processing occurred within a 9 nt genomic DNA (gDNA) encoded stretch of adenines flanking the gene (Figure 1A). Since the same oligonucleotide sequence occurred at both ends, it was not possible to distinguish the exact location of the 5 or 3 exonucleolytic cleavage using cRT-PCR. Two of the *P. duplex* mRNAs (*atp9* and *nad1*) had cleavage sites producing 5- UTR termini downstream of the predicted start codons in archived chondriome maps (KR026340, KR026340, MK895949). For *atp9* (Figure 1B) and *nad1* (Figure 1D), there is an in-frame AUG start codon adjacent to the cut site leaving a short 5- UTR consistent with the other genes. For *cob*, there was disagreement in the 5 terminus between the circRT-PCR and PacBio techniques. Using circ-RT-PCR, a single 5 terminus 22 nt downstream of the predicted start codon was detected (Figure 1C), while reads using PacBio IsoSeq revealed a single terminus adjacent to the originally predicted start codon.

**Table 1.** Site of 5- UTR terminus upstream from start codon (in nucleotides) in *Pediastrum duplex* mitochondria, nd = no data. \* Some 5- UTR termini detected in this study occur downstream of the start codons in archived chondriomes. The distances presented in this table are marked from the next available AUG start codon.


**Figure 1.** The 5 termini and UTRs of four *P. duplex* mitochondrial mRNAs. Start codons are red, stop codons blue, and truncated portions of coding regions grey. (**A**). The 5- UTR of the *atp6* mRNA occurred within a 9 nt templated stretch of adenines that also appears downstream of the stop codon. (**B**). The *atp9* 5 terminus occurred downstream of the AUG start codon in archived *P. duplex* chondriomes, suggesting the protein may be six amino acids shorter than predicted. (**C**). Two *cob* 5 termini were detected. Circular RT-PCR revealed one downstream of the AUG start codon in archived *P. duplex* chondriomes, while the PacBio Iso-Seq technique revealed a terminus directly adjacent to the predicted start codon. (**D**). The *nad1* 5 terminus occurred downstream of the AUG start codon in archived *P. duplex* chondriomes, suggesting the protein may be thirty amino acids shorter than predicted.

> *P. duplex* 3- UTR lengths were gene specific, with several having two or more termini (Figure 2A–J and Table 2). Most were relatively short, fewer than 25 nts, the exception being *cob* which were 100 and 110 nts in length (Figure 2C). Eight genes (*atp6*, *atp9*, *cob*, *cox1*, *cox2a*, *cox3*, *nad2*, and *nad5*) were polycytidylated. For some, this occurred at specific

termini, e.g., *atp6*, *atp9*, *cob*, *cox1*, *cox2a*, and *cox3*. For two genes, *nad2* and *nad5*, the poly(C) additions occurred at variable locations within templated repetitive AU regions beginning 9nts downstream of the stop codons (Figure 2G,J and Table 2). No poly(C) additions were detected on *nad1*, *nad4*, or *nad4L*. On *nad4L*, there was an AU repeat region adjacent to the stop codon, but no poly(C) additions were detected. There was general agreement of the 3 termini of fully processed mRNAs between cRT-PCR and PacBio reads for all but one transcript, *nad4L*. For this gene, cRT-PCR provided one 3 terminus with a truncated stop codon (Figure 2J upper sequence), whereas PacBio data provided a 3 terminus 37 nt downstream (Figure 2J lower sequence). The nucleotide sequences between the stop codon and the 3 terminus for each gene were aligned and analyzed using a logo plot, and the 15 nucleotides upstream of the terminus were comprised nearly exclusively of adenines and uracils (Supplemental Figure S1A). These regions were analyzed for RNA secondary structure and none were detected.

Naturally circularized mRNAs were also detected using RT-PCR. In *P. duplex*, circularized variants carrying full-length coding regions were detected for seven mRNAs (*atp6*, *atp9*, *cob*, *cox2a*, *cox3*, *nad4L*, and *nad5*) (Figure 3). For five of these, the circularization coincided with a tandemly repeated nucleotide motif. The ligation site for naturally circularized *atp6* transcripts occurred within a stretch of template coded adenines (Figure 3A). Several circularized variants of *cob* were found (Figure 3C), one where the circularization occurred within a template encoded polyU motif and a second within two AU rich motifs. Two *cox3* circular transcript variants were detected, one where the ligation occurred within a GAACGAA motif and a second ligated at a GCGTCTT motif that removed the final 45 nts of the coding region. Two *nad4L* circular variants were detected, each occurring within AT rich motifs and including full-length coding regions. Three naturally circularized transcripts were detected with poly(C) additions. The two circular variants of *atp6* and *cob* with a poly(C) addition had severely truncated coding regions (Figure 3A,C). Two *atp9* circular variants were detected and both had poly(C) additions and full-length coding regions (Figure 3B). A single circularized variant of the *cox2a* transcript was detected, with no obvious repeat motif and no poly(C) additions (Figure 3D).

The long-read PacBio data did not cover the entire chondriome or contain reads longer than ~2200 nt but, when combined with the circ-RT-PCR data, they did allow the detection of some broader transcript processing events from three portions of the *P. duplex* chondriome (Figure 4). For *cox1*, reads spanning the two exons and the intron were detected (Figure 4A). All of these reads had 5 termini directly adjacent to the start AUG, while some were polycytidylated on the 3 terminus. Transcripts with the intron removed were also detected that had the same 5 and 3 termini as the unspliced transcript. PacBio reads were also produced for another section with three genes, *nad2*-*nad6-cob*, flanked by tRNAs (Figure 4B). For *nad2*, transcripts appeared to have been endonucleolytically cleaved adjacent to *trnN*, forming a 5 terminus for *nad2* and adjacent to *nad6* to form a 3- terminus. The 5 end was further processed, leaving multiple termini eventually resulting in an mRNA with no UTR that was also polycytidylated on the 3 terminus. A transcript with both *nad6* and *cob* was detected. Its 5 terminus occurred adjacent to *nad6*- s start codon, while the 3 terminus appeared to have been created by the cleavage of *trnV* from the primary transcript followed by polycytidylation. The *cob* coding region was cleaved away from *nad6*, leaving two different 5 termini. The linear transcripts were polycytidylated, whereas the circular version of *cob* was ligated with no poly(C) tract. No individual *nad6* transcript was detected from either circRT-PCR or PacBio results. A third region with the two genes *nad4L-atp9* flanked by tRNAs was also produced by the PacBio Iso-Seq methodology (Figure 4C). A single transcript that appeared to have been produced by the removal of the two tRNAs from the primary transcript was detected. The 3 terminus of this poly-cistronic mRNA was poly-citidylated after *trnE* was removed. The removal of the *trnG* occurring 5 of *nad4L* left 170 nt upstream of the start codon, but this was removed, leaving a 5 terminus adjacent to the start AUG. The 3 terminus was produced by endonucleolytic cleavage, leaving a 37 nt 3-UTR that was polycytidylated. A second *nad4L* transcript was

detected with a longer 5- UTR (−80–94 nt within an AU repeat region) and a 3 terminus comprised of a truncated stop codon that had been polycytidylated. Both versions of these shortest *nad4L* transcripts were detected as circular RNAs and neither contained a poly(C) tract. The *atp9* coding region was cleaved from the primary transcript, leaving a 5 terminus −2 nt upstream of its start AUG, but its 3 terminus was the one formed by the removal of *trnE*. A non-polycytidylated version of this mRNA was circularized.



**Figure 2.** The 3 termini and UTRs of ten *P. duplex* mitochondrial mRNAs. Stop codons are blue. (**A**). Two 3 termini were detected for *atp6*. The upper (shorter) sequence had no oligonucleotide addition, while a portion of transcripts represented by the lower (longer) sequence had poly(C) additions. (**B**). For *atp9* a portion of the transcripts with the upper terminus had a poly(C) addition, while the slightly longer one represented by the lower sequence did not. (**C**). For *cob* a portion of the transcripts with the upper terminus had a poly(C) addition, while the one represented by the longer sequence did not. (**D**). For *cox2a* a single terminus was detected and a portion of them had a poly(C) tail. (**E**). A single terminus, some of which had a poly(C) addition, was detected for *cox3*. (**F**). A single terminus was detected for *nad1* and no oligonucleotide additions were detected. (**G**). For *nad2* poly(C) additions were detected at several different termini within an AU repeat region. (**H**). A single terminus was detected for *nad4* and no oligonucleotide additions were detected. (**I**). A single terminus was detected for *nad4L* and no oligonucleotide additions were detected. (**J**). For *nad5* poly(C) additions were detected at several different termini within an AU repeat region.


**Table 2.** Lengths of the most abundant 3-UTRs found in *Pediastrum duplex* and those found with a non-template addition

nd: no data.

Since tRNA removal was found to be integral in the maturation of *cob* and *atp9* 3- termini, the placement of tRNAs was compared to the 3 termini of other genes (Table 2). Five of the genes we analyzed had a 3 adjacent tRNA, but only two, *cob* and *atp9*, had mature 3 ends matching the placement of those tRNAs. The possibility of t-elements was considered for the other genes, but no evidence of secondary structures immediately downstream of the mature 3 ends was detected for *atp6*, *cox1*, *cox2a*, *cox3*, *nad1*, *nad2*, *nad4*, *nad4L*, *nad5*, or *nad6*.


**Figure 3.** Naturally circularized mRNAs found in *P. duplex* mitochondria. Start codons are colored red, stop codons blue, repeat sequences orange, and truncated portions of coding regions grey. (**A**). Two circularized mRNAs were detected for the gene *atp6*, the upper sequence represents a full-length coding region whose 3 and 5 termini were ligated within templated adenine stretches that flank the coding region. The lower sequence represents a circularized portion of the mRNA. (**B**). Two versions of a circularized *atp9* transcript were detected. Both would be full length considering a start codon downstream of the previously predicted one (grey) would be the actual start codon. (**C**). Two full-length circularized versions of the *cob* transcript were detected (upper two sequences). The ligation termini coincided with two different repeat sequences (orange). A third circularized transcript with a truncated coding region was also detected for *cob*. (**D**). Two circularized full-length coding regions were detected for *cox2a*. (**E**). Two circularized *cox3* transcripts were detected, both ligations having occurred at repeat sequences. One carried a full-length coding region (upper) the other truncated (lower). (**F**). Two fulllength circularized versions of the *nad4L* transcript were detected where ligation occurred at AU-rich repeat sequences. (**G**). One circularized *nad5* transcript was detected with the ligation occurring within AU-rich repeat sequences.

**Figure 4.** Mitochondrial RNA processing as determined by PacBio Iso-Seq and circRT-PCR. (**A**). The mitochondrial genome of *P. duplex* (GenBank MK895949). The regions highlighted in portions B, C, and D of this figure are marked. The map was generated using OGDRAW [64]. (**B**). The *cox1* gene, the upper diagram represents a portion of the chondriome, the lower two portions represent partially and fully processed transcripts. (**C**). The *trnG-nad2-nad6-cob-trnE* portion of the chondriome. (**D**). The *trnG-nad4L-atp9-trnE* portion of the chondriome.
