*2.2. Monophyly of Eukaryotic and Cyanobacterial GOX-Like Proteins*

To reanalyze the phylogenetic origin of the plant and animal GOX-like proteins, we took sequences from a broad spectrum of phyla into consideration, for example, we also included related proteins from the chromalveolate taxa (Table S1). In total, 111 GOX-like proteins from 11 groups, including Archaeplastida and Metazoa, were incorporated into the phylogenetic analysis. GOX-like proteins from fungi were excluded, as these proteins show an accelerated evolution preventing their comparison [32]. We also restricted the analysis to one putative GOX isoform each from algae, plants, and animals, as the previous study [32] showed that the diversification of GOX into different biochemical subgroups occurred from one of the ancestral proteins within these groups. The alignment was constructed with ProbCons [34]. Additionally, we also constructed alignments using MUSCLE [35], compared both

results, and changed the alignment if necessary in order to obtain the best scores. Using the final alignment (Supplementary Material 2), we reconstructed a protein tree using Bayesian interference.

The midpoint-rooted Bayesian tree (Figure 2) is well supported by the Bayesian posterior probabilities (BPP). The GOX-like proteins from all of the eukaryotes form a monophyletic group (BPP = 0.96). The proteins from Chlorophyta cluster in that group, however, they build an outgroup to other eukaryotic GOX-like proteins and are distinct from that of other Archaeplastida. The divergence of the GOX proteins of streptophytes is consistent with the fact that the chlorophyte GOX-like proteins act as LOX enzymes (see below). The GOX-like sequences from the Metazoa cluster together with sequences of chromalveolate taxa, including non-photosynthetic and photosynthetic groups; this clade is the sister clade to all Archaeplastida, except Chlorophyta (BPP = 1). Interestingly, LOX proteins from cyanobacteria cluster as sisters to all eukaryotes, showing a close relationship between cyanobacterial and eukaryotic GOX-like proteins. It is also noteworthy that the GOX-like proteins of chromalveolates do not cluster within red algae, as would be expected from the secondary origin of their plastids, but rather form a clade with Metazoa, although the posterior probabilities for the critical branches placing red algae are relatively low. The cyanobacterial and eukaryotic clades of GOX-like proteins are separated from the proteins of Actinobacteria (also including non-sulfur bacteria), as well as from the clade Proteobacteria (also including Verrucomicrobia; BPP = 1). The outermost clade is built by Firmicutes and Archaea. A similar clustering of GOX-like proteins was obtained using the maximum likelihood (ML) algorithm (Figure S4). However, in contrast to the Bayesian tree, some branches are not well supported in the ML tree, but all cyanobacterial and eukaryotic GOX-like proteins again form one monophyletic, statistically well-supported clade.

Similar to the previous study by Esser et al. [32], we also rooted the tree by assuming the monophyly of the eukaryotic clade (Figure S5B). This rooting resulted in a monophyletic group of all Eubacteria and Archaea, which cluster in a sister group relationship to Eukaryotes, including plants, algae, and animals. However, the clustering of the bacterial and archaeal GOX-like proteins in this rooted tree is not congruent to the clustering of taxonomic groups, which is observed when analyzing, for example, signature sequences in proteins [36]. In both trees, Archaea and Firmicutes cluster as sister groups. Alternatively, we placed the root of the tree between Firmicutes or Archaea and the other clusters of GOX-like proteins (Figure S5C,D). Both of these trees again showed a sister group topology of sequences from eukaryotes and cyanobacteria, as found before with the midpoint-rooted tree.

Overall, our phylogenetic analyses show that eukaryotic GOX-like proteins are more closely related to those of cyanobacteria than to those of any other prokaryotic group, including Proteobacteria and Archaea. As it is known that phylogenetic analysis can be biased when homologous proteins evolved into enzymes with varied metabolic functions, we set out a comprehensive biochemical evaluation of GOX-like proteins among Archaeplastida.

**Figure 2.** Phylogenetic tree of glycolate oxidase (GOX)-like proteins. The tree is based on GOX-like proteins from all groups in the tree of life and was implemented using Bayesian interference. The monophyletic group of GOX-like proteins from Eukaryotes builds the sister clade to cyanobacteria, pointing to a cyanobacterial origin of all eukaryotic GOX proteins, including heterotrophic organisms like animals. GOX-like proteins that have been biochemically verified are printed in bold (https://www.brenda-enzymes.org/index.php; EC 1.1.3.15; [37]). Proteins with a star were analyzed in this study. The numbers at the nodes show the posterior probability. The full species names and accession numbers are listed in Table S1. GOX—glycolate oxidase; LOX—lactate oxidase.

## *2.3. Archaeplastida Except Chlorophyta Possess a GOX Protein*

It is known that GOX-like proteins show varying (multi-)substrate preferences. For example, many of these enzymes preferentially oxidize l-lactate, that is, they rather represent LOX than GOX enzymes. Therefore, we analyzed the substrate preference of GOX-like proteins from all groups of Archaeplastida. To this end, we overexpressed selected cDNAs in *E. coli* to obtain recombinant proteins for biochemical characterization. In addition to the previously obtained data for GOX and LOX proteins from Cyanobacteria, Rhodophyta, Chlorophyta, and land plants [30,38], the Hisor Strep-tagged proteins from the early splitting-off glaucophyte alga *Cyanophora paradoxa* and the streptophyte (sister clade of land plants) alga *Spirogyra pratensis* were biochemically characterized using substrate concentrations of glycolate and l-lactate ranging from 0.1 to 200 mM. We found that both recombinant proteins can oxidize glycolate and, to a lesser extent, l-lactate (Figure 3 and Table 1), which is consistent with earlier reports of clear GOX activity in crude extracts from these algae [24,39,40]. Compared to At-GOX2 from *Arabidopsis*, both enzymes show a higher affinity (at least 10 times) to glycolate than to l-lactate as a substrate. Although the Vmax values are only slightly higher for glycolate than for l-lactate, the catalytic efficiency (kcat/Km) is 20 to 33 time higher with glycolate as substrate, which is similar to AtGOX2 (Table 1 and Table S2). Based on this preference for glycolate as substrate, the enzymes were named Cp-GOX and Sp-GOX, respectively. Compared with other GOX proteins, Cp-GOX shows the lowest Km and the second highest kcat/Km value for glycolate (Table 1 and Table S2). This finding could point to an early evolution of glycolate oxidation activity in the ancestor of cyanobacteria. To verify this hypothesis, we modeled, synthesized, and characterized an ancestral GOX protein.

**Figure 3.** Biochemical characterization of GOX proteins from algae. The nonlinear regression was fitted to the Michaelis–Menten kinetic using Sigma Plot 13.0. The resulting parameters for the maximal enzymatic reaction rate (Vmax) (**A**) and the substrate affinity expressed as Km (**B**) were calculated for the substrates glycolate and l-lactate. The standard deviation was calculated from at least two independent biological replicates. Cp-GOXc—*Cyanophora paradoxa*; Sp-GOX—*Spirogyra pratensis*.

## *2.4. Ancestral GOX-Like Protein Sequence with Active Site Identical to Plant GOX*

In contrast to the proteins of Archaeplastida, except Chlorophyta, cyanobacteria (Table 1) and all other analyzed prokaryotes possess LOX, a GOX-like protein that prefers l-lactate over glycolate [27,30,41]. To get an idea about the substrate preference of an ancestral protein, we reconstructed a protein that could correspond to the hypothetical common ancestor of GOX-like proteins from Archaeplastida. The primary structure of this "ancestral" protein was derived from a reduced phylogenetic tree based on Bayesian interference, including 37 sequences from five different taxonomic groups (Figure 4). Also, for the reduced dataset, the monophyly of cyanobacterial and eukaryotic GOX-like proteins is supported by a high posterior probability (BPP = 1). The ancestral protein, named N3-GOX, used for the biochemical analysis referred to the most probable amino acid sequence calculated via the ML algorithm. An amino acid sequence comparison of N3-GOX with

At-GOX2 from *Arabidopsis* revealed that the three amino acid residues in the active site (Table 2), which were previously proven to determine the specificity for the substrate glycolate [30], were identical in these enzymes, pointing to a higher GOX rather than LOX activity of the ancestral protein. Comparing the entire sequence, 68% of the amino acid residues are identical in these two proteins. In contrast, the cyanobacterial No-LOX (from *Nostoc* sp. PCC7120 [30]) shares only 56% identical positions with N3-GOX, and shows different active-site amino acid residues (Table 3).

**Table 1.** Summary of the Km and Vmax values of plant, algal, cyanobacterial, and ancestral GOX-like proteins. The table displays mean values and standard deviation of the Vmax and Km values with the substrates glycolate or l-lactate from measurements with at least three biological replicates. The enzymes are ordered as inferred from the phylogenetic tree displayed in Figure 2, that is, starting with the hypothetical ancestral form N3-GOX, followed by the cyanobacterial (No-LOX) and chlorophytic LOX (Cr-LOX) enzymes, and then the GOX proteins from Archaeplastida, namely: glaucophytic (Cp-GOXc), rhodophytic (Cm-GOX), and streptophytic (Sp-GOX) toward plant (At-GOX2) enzymes.


```
x [30]; y [38].
```
**Figure 4.** Phylogenetic tree of GOX-like proteins used for "ancestral" protein sequence reconstruction based on Bayesian interference. The numbers at nodes show the posterior probability. The full species names of all of the summarized groups can be found in Table S1. The derived amino acid for Node 3 is the ancestral sequence of eukaryotic GOX-like proteins.

**Table 2.** Amino acid positions of GOX-like proteins responsible for l-lactate or glycolate specificity. The proteins from the cyanobacterium *Nostoc* and the green algae *Chlamydomonas* exhibit amino acids shown to determine for l-lactate specificity, whereas the corresponding three amino acids of the ancestral (N3-GOX), glaucophytic (Cp-GOXc), rhodophytic (Cm-GOX), streptophytic (Sp-GOX) and plant (At-GOX2) protein, determine for glycolate specificity [30]. M—methionine; L—leucine; F—phenylalanine; T—threonine; W—tryptophan; V—valine.


**Table 3.** Sequence similarities (in %) between the ancestral oxidase (N3-GOX), the plant glycolate oxidase (At-GOX2), and the cyanobacterial l-lactate oxidase (No-LOX).

