**4. Materials and Methods**

## *4.1. Phylogenetic Analysis*

The selected amino acid sequences of all GOX-like proteins were aligned using ProbCons [34], which produces the best alignment scores [62]. An alternative alignment was constructed using MUSCLE [35]. Both alignments, especially their non-ambiguous parts, were compared via the program SuiteMSA [63]. To make sure that the incorrect aligned position was not considered for the phylogenetic analyses, we used the alignment curation program GBlocks [64]. Nearly identical sequences (after Gblocks curation) or sequences with a different evolutionary rate (calculated via Tajima's rate test included in MEGA 5.1; [65]) were excluded from further analyses. We also excluded all sequences

from single species, which did not cluster into the clade of their taxonomic group to minimize HGT influences. ProtTest 3 [66] was used to find the best fitting substitution model for the given data set. The model of Le and Gascuel [67], including a proportion of invariant sites and a gamma model of rate heterogeneity, was found to be the most appropriate model of evolution for the alignment. For the tree reconstruction, we used the Bayesian approach implemented in MrBayes [68], running 2,000,000 generations and disregarding the first 25% of samples as burn-in. The likelihoods and tree topologies of the independent runs were analyzed and compared. In addition to the Bayesian approach, we also performed a maximum likelihood analysis using RAxML version 8 [69], with a bootstrap test of the statistical support from 1000 replicates.

## *4.2. Ancestral Sequence Reconstruction*

For the reconstruction of the ancestral eukaryotic GOX protein sequence, we first reduced the number of sequences used for the phylogeny (shown in Figure 2) to 36 species, which are situated around the nodes of the eukaryotic GOX-like proteins. The phylogenetic analysis of the reduced dataset was performed as described above, and is shown in Figure 4. Because of the long evolutionary distance of the analyzed proteins, ancestral reconstruction based on amino acid inference were performed using CODEML included in PAML [70]. In addition, we used the FASTML program including the marginal and joint method of ancestral sequence reconstruction [71]. We also performed a Maximum Likelihood and Maximum Parsimony indel reconstruction, which provided the same results. We focused on the last common ancestor of all eukaryotes (Node 3). The deduced amino acid sequences of the Node 3 ancestral GOX-like protein was translated into a DNA sequence applying the codon-usage of *E. coli* (sequences are displayed in the Figure S6). The codon-optimized gene for the ancestral GOX protein was synthesized (Thermo Fisher, Darmstadt, Germany). For cloning into the expression vector pASG-IBA43+, the restriction sites *Nhe*I and *Bam*HI were added at the beginning and the end of the ancestral gene, respectively.
