*2.3. Synteny and Paralogy Analyses of the st8sia2, st8sia4, st8sia3, and st8sia9 Gene Loci*

To explain the gain or loss of ST8Sia subfamilies, we further analyzed the evolutionary relationships between these *st8sia* genes. The kind of event that created duplication was characterized by analyzing the conserved synteny between ST8Sia paralogues. It was expected that the *st8sia* genes created by a WGD would be far apart on different chromosomes in one genome, but surrounded by similar genes in each of the duplicated regions (i.e., paralogons). Significant Tetrapod paralogons containing *st8sia2* and *st8sia4* genes were found and a well conserved synteny could be established for *st8sia2* and *st8sia4* gene loci in Tetrapods (i.e., human, mouse, chicken, and xenopus) genomes (Figure 3A) as previously described [14]. However, in the fish genomes, as the *st8sia2* gene was absent in Esociformes and Siluriformes, we considered the neighboring *furin*, *fes*, *sv2b*, *fam147b*, *mctp2*, and *chd2* genes around *st8sia2* on the medaka chromosome 6 to retrieve the synteny on *Esox lucius* LG19 and on *Ictularus punctatus* chromosome 4. Similarly, *ppip5k2*, *pam*, *chd1 erap1a*, and *syk* genes conserved around the *st8sia4* locus were used to retrieve the synteny on *O. latipes* chromosome 12, *Gasterosteus oculatus* chromosome XIV, and *Xiphophorus maculatus* chromosome 8. Interestingly, paralogues of these genes could be identified on other chromosomes in the various fish genomes indicative of an ancient Teleost specific WGD (TGD) followed by intense gene rearrangements. This further suggests that the *st8sia* genes have undergone the TGD and the duplicated *st8sia* genes were rapidly lost during Teleost evolution. In the Salmoniformes, a highly conserved synteny was found around the two *st8sia2-r* gene loci corresponding to one ohnologous region in the spotted gar (*L. oculatus*), likely resulting from the fourth round of WGD (SGD) that took place more recently in the Salmoniforme genomes [51]. The two *st8sia4-r* genes were localized on two distinct chromosomes in *C. carpio* genome, supporting the hypothesis of a more recent species-specific genome duplication event in *C. carpio* [52] in spite of a weak synteny conservation (Figure 3A).

**Figure 3.** Syntenic relationships of the oligo- and poly-α2,8-sialyltransferases gene loci in vertebrates. Chromosomal locations of the *st8sia* genes and neighboring gene loci were determined in the human (*Homo sapiens*, Hsa), the mouse (*Mus musculus* (Mmu), the chicken (*Gallus gallus*, Gga), the spotted gar (*L. oculatus*, Locu), the western clawed frog (*Xenopus tropicalis*, Xtro), the zebrafish (*D. rerio*, Dre), the Japanese medaka (*O. latipes*, Ola), the channel catfish (*I. punctatus*, Ipu), the northern pike (*E*. *lucius*, Elu), the rainbow trout (*O. mykiss*, Omy), the Atlantic salmon (*Salmo salar*, Ssa), the African weakly electric fish (*Paramormyrops kingsleyae*, Pki), the three-spined stickleback (*G. aculeatus*, Gac), the southern platyfish (*X. maculatus*, Xma), and the European carp (*C. carpio*, Cca). Information from the National Center for Biotechnology Information (NCBI) and ENSEMBL release 97 was used to identify putative orthologues, which were visualized using the Genomicus 97.01 [53]. Paralogous genes in the fish genomes are indicated in green and in purple in the human genome. The *st8sia* genes are indicated in red or in grey when lost. (**A**) Syntenic relationships of the *st8sia2* and *st8sia4* gene loci in vertebrates. (**B**) Syntenic relationships of the *st8sia3* and *st8sia9* gene loci in vertebrates.

The synteny around the *st8sia3* gene locus including *wdr7*, *onecut2*, and *fech* genes is highly conserved in vertebrate lineages from fish to mammals (Figure 3B). Synteny around *st8sia9* locus is less conserved and is limited to a smaller syntenic block with *ccng2* and *ppef2* genes, which is reminiscent of ancient WGD followed by intrachromosomal rearrangement in the ancestral fish genome.

Altogether, our phylogenetic analyses enabled us to refine the evolutionary history of the fish ST8Sia and to propose a model of their evolution illustrated in Figure 4, which agrees with the fish phylogenetic tree of life [54]. It is interesting to note that, while Braasch and Postlethwait (2012) determined duplicated gene retention rates of 12–24% after the TGD 320 MYA [55], we observed no remaining *st8sia* gene copy from this event and no modification on the fish ST8Sia repertoire. However, more recent polyploidization events were recorded in several families (Salmonidae, 80 MYA), genera (Anguilla) or species (*C. carpio*, 8 MYA), which impacted the overall poly-α2,8-sialyltransferases repertoire. In Salmonidae, we described only two remaining *st8sia2* duplicates after the Ss4R among the eight ancestral *st8sia* genes (12% duplicate retention), while Lien et al. (2016) revealed a global retention rate around 55% [56]. In the carp *C. carpio*, two *st8sia4* genes were retained as duplicates among the seven *st8sia* genes (14% duplicate retention), while Li et al. (2015) calculated a global value of 92% [57]. Furthermore, these studies highlighted the fact that the retained genes after tetraploidization were specifically involved in signal transduction, protein complex formation, and immune system, which prompted us to focus on the functional divergence of these poly-α2,8-sialyltransferase duplicated genes (neofunctionalization) and on their expression divergence (subfunctionalization).

**Figure 4.** Schematic representation of the ST8Sia family evolution in the ray-finned fishes. This model for the evolution of *st8sia* genes is based on the evidence from protein sequence phylogeny, conserved synteny of genomic *st8sia* loci in vertebrate species and their paralogous relationships in fish genomes. The model takes into account the evolution of five ancestral groups of ST8Sia (*st8sia6*/*7*/*8*, *st8sia2*/*4*, *st8sia1*, *st8sia5*, and *st8sia3*/*9*) indicated in green and present in the ancestor of Chordates that predate the WGD R1 and WGD R2. Open red circles depict gene losses on the phylogenetic tree and yellow stars correspond to the WGDs R1, R2, R3 (teleost specific duplication, TGD), and R4 (salmonids specific duplication, SGD).

#### *2.4. Molecular Evolution of the Poly-*α*2,8-Sialyltransferases*

A remarkable difference between α2,8-linked polySia chains found in mammals and salmonid fish seems to be the structural diversity of polySia in fish [58–60]. Whereas in mammals, homopolymers of Neu5Ac residues are typically formed [61], in rainbow trout eggs, polymers can consist of Neu5Ac, Neu5Gc, and KDN in addition to their O-acetylated forms [62]. One explanation could be a better accessibility to different sialic acids in fish, because, in transgenic mice—showing a Neu5Gc overexpression in brain—besides Neu5Ac, Neu5Gc also seems to be utilized to build polySia [63].

Another explanation might be the occurrence of structural changes of the protein backbone during the evolution of the polysialyltransferases. We thus investigated the potential consequences of specific-lineages' *st8sia* gene loss and duplication on the functional fate of duplicates, an issue that is still poorly understood [64,65]. Substitution rate analysis of the duplicated *st8sia2* genes maintained in Salmoniformes genome after the SDG event indicated four amino acid substitutions in the ST8Sia II-r2 coding sequences compared with ST8Sia II-r1 and the rest of Teleost ST8Sia II sequences, while there were only two substitutions in the ST8Sia II-r1 sequence. Of particular note, among the four substitutions found in ST8Sia II-r2, the H → Y is recorded in sialylmotif L, and the R → Q between sialylmotifs S and III, whereas the two substitutions in the ST8Sia II-r1 sequence are located nearby the PSTD motif (Figure 5A). In addition, two convergent substitutions leading to the same amino acid were identified near the end of sialylmotif L (i.e., acquisition of a G from a Q) and beyond the sialylmotif III (i.e., acquisition of an H from an S), respectively. These drastic modifications in amino acid properties in functionally important locations in the catalytic domain of these salmonid ST8Sia II let us suggest profound changes in both ST8Sia II functions (i.e., neofunctionalization). Likewise, we examined the impact of *st8sia4* loss on the remaining *st8sia2* gene in Neoteleostei using parsimony analysis. We found two substitutions, A → S and Q → S, located in the sialylmotif L and between the sialylmotifs III and VS that of Neosteleostei ST8Sia II, respectively (Figure 5B). Interestingly, we also found a convergent T → K substitution located between the sialylmotifs III and VS that of Neosteleostei ST8Sia II that restores the K amino acid characteristic of all the ST8Sia IV sequences (Figure 5B), further suggesting changes in ST8Sia II functions in Neoteleotei. No substitution could be detected in ST8Sia IV sequences after the loss of *st8sia2* gene in Esociformes and Osmeriformes. Finally, we recorded the substitutions on the ancestral sequence of ST8Sia III after ST8Sia IX loss in Otocephala. We observed three substitutions in ST8Sia III sequence: V → T near the sialylmotif L, A → T in the sialylmotif VS, and Y → F beyond (Figure 5C).

The most striking domain of both polysialyltransferases—ST8Sia II and ST8Sia IV—is PSTD, which is essential for the polysialylation of NCAM [31,66]. This motif contains a high number of basic amino acids and is important for substrate binding and the catalytic activity. Troy and co-workers exchanged several of these amino acids to determine their distinct impact on the enzymatic activity of human ST8Sia IV [31]. Doubled substituent mutants with an exchange of the first basic residues (declared as K2 and K4 in Figure 6) by neutral amino acids retained approximately 80% of the enzyme activity and comparable values were determined, when only K6 was replaced. Stronger effects were observed in single substituted mutants where R8, H18, K28, K32, or R33 was replaced by a neutral amino acid. All these changes reduced activity by more than 50%. Their experiments demonstrated that, in addition to the neutral amino acid I31 (mutants retained only 6% of their activity), especially the basic amino acids of PSTD were key elements for polysialylation. Most of these important amino acids of the human ST8Sia IV are also highly conserved in the fish enzyme. Changes occurred sporadically at K2, K4, K6, and R8 in individual fish species (Figure 6). On the basis of the work of Troy and co-workers [31], the R8 change may have the highest impact on the general enzyme activity, as a replacement of this amino acid reduced the activity to less than 25%. However, we observed an exchange of R8 only in three fish species including *I. punctatus*. Nevertheless, as mentioned above, other substituted amino acids may also influence the interaction with the nascent sialic acid chain, depending on the composition (Neu5Ac, Neu5Gc, KDN, and O-acetylated variations) of the polySia chain.

**Figure 5.** Substitution rate analysis of the impact of *st8sia* gene duplications and losses. The sialylmotifs are indicated by red boxes and the transmembrane domain by a grey box. (**A**) Duplication of *st8sia2* genes in Salmoniformes. The substitutions observed in ST8Sia II-r1 and ST8Sia II-r2 are indicated by an arrow above and below, respectively. The position of the substitutions corresponds to the alignment in Supplemental Data 2. The black rectangles correspond to convergent mutations retrieved in both sequences. In T > K, for example, T is the ancestral state and K is the derived one. (**B**) Impact of *st8sia4* gene loss in Neoteleostei on the remaining fish ST8Sia II sequences. The code for substitution is the same as in A. The corresponding amino acid present in the paralogue ST8Sia IV sequence is given below. (**C**) Impact of *st8sia9* gene loss in Otocephala on the remaining fish ST8Sia III sequences (same abbreviations as in B).

More consistent variations were observed when ST8Sia II sequences were compared. In addition to the mentioned K2 and H4 (K instead of H in ST8Sia IV), an exchange of a basic amino acid occurs more frequently and is often highly conserved within one family. For instance, in Salmoniformes, lysine residues at position 2 and 28 are changed with apolar amino acids and the strongly basic R8 residue is exchanged with histidine, which is only partly positively charged at neutral pH. On the basis of the studies of Nakata et al. using human ST8Sia IV, we can also assume remarkable changes in the enzymatic activity of ST8Sia II [31]. For instance, ST8Sia IV mutants with a neutral amino acid at position K28 retained less than 25% of their enzymatic activity. This is in line with studies by Kitajima and co-workers demonstrating that rainbow trout ST8Sia II isoforms show only low enzymatic activity in vitro [33]. Intriguingly, in Neoteleostei, the very important lysine at position 28 was also exchanged with a neutral amino acid. Notably, in contrast to Salmoniformes, in Neoteleostei, ST8Sia II is the only polysialyltransferase because ST8Sia IV is absent. The presence of only one polysialyltransferase in Neoteleostei, which additionally includes such a striking mutation, suggests that polysialylation significantly changed in Neoteleostei in comparison with other vertebrates.

In addition to sequence alignments, we simulated the PSTD 3D structure of fish ST8Sia II and ST8Sia IV, based on the determined 3D structure of human ST8Sia IV PSTD (PDB 6AHZ) (Figure 7), which were published by Peng and colleagues [66]. Volkers et al. described that PSTD acts as a basic furrow, leading the nascent sialic acid chain to the active site of the polysialyltransferase [32]. The 3D simulation of the human ST8Sia IV PSTD shows that only significant differences between the electrostatic potential surfaces are detectable at the N-terminal region. Especially the orientation of the basic areas changed between the species. In contrast, the central and C-terminal area exhibited only minor changes. In the case of ST8Sia II, the most prominent alterations also occurred at the N-terminal domain (Figure 8). However, exchanging the N6 with aspartate, an exposed acidic segment is formed in Salmoniformes and Neoteleostei, which may influence the interaction between PSTD and the negatively charged sialic acid polymers. However, regarding the 3D simulation of PSTD, it has to be noted that a simulation is only a simulation and crystal structures of PSTD in addition to the whole enzymes are necessary for the generation of unambiguous 3D models.


**Figure 6.** Sequence-based analysis of the polysialyltransferase domain (PSTD) in fish ST8Sia II and ST8Sia IV. Multiple sequence alignment of PSTD were performed with CLUSTAL OMEGA of EMBL-EBI by MUSCLE (3.8) edited and annotated in Jawa Alignment Jalview [67]. The used protein entries from different species are listed in Supplemental Table S1. The different colors from Clustal X scheme codes indicate the following characteristics: hydrophobic (blue), positive charge (red), negative charge (magenta), polar (green), cysteine (pink), glycine (orange), proline (yellow), aromatic (cyan), and gap (white). It should be noted that one additional amino acid was added to the N-terminus and two additional amino acids to the C-terminus of PSTD.

**Figure 7.** Three-dimensional (3D) structure of PSTD motifs in fish ST8Sia IV. The 3D model of human ST8Sia IV PSTD (Protein Data Bank entry 6AHZ)—electrostatic potential surfaces—is displayed in addition to the simulated structure of PSTD from I. punctatus and *C. maraena* using YASARA. The exchanged amino acids are colored in an additional version of the 3D structure to highlight the position of the exchange: N3 → R3 (orange), K6 → P6 (magenta), R8 → H8 (green), T9 → M9 (green), I17 → V17 (violet), and P30 → N30 (grey) for *I. punctatus* and L1 → V1 (yellow), N3 → R3 (orange), K6 → R6 (magenta), V29 → I29, and P30 → N30 (grey) for *C. maraena*. It should be noted that, for the determination of the 3D structure of human ST8Sia IV PSTD, a peptide was used with one additional amino acid on the N-terminus and two additional amino acids on the C-terminus of PSTD [66].

**Figure 8.** Three-dimensional (3D) structure of PSTD motifs in fish ST8Sia II. The 3D model of human ST8Sia II PSTD in addition to PSTD from *P. fluviatilis* and *C. maraena* was simulated, based on the 3D model of human ST8Sia IV PSTD (Protein Data Bank entry 6AHZ) using YASARA. The electrostatic potential surfaces are displayed. The exchanged amino acids are colored in an additional version of the 3D structure to highlight the position of the exchange: K2 → L2 (yellow), H4 → T4 (orange), N6 → D6 (magenta), Y11 → F11 (green), and K28 → N28 (red) for *P. fluviatilis* and K2 → L2 (yellow), H4 → T4 (orange), N6 → D6 (magenta), R8 → H8 (violet), Y11 → F11 (green), K28 → N28 (red), and H30 → Q30 (light blue) for *C. maraena*. For the determination of the 3D structure of human ST8Sia IV PSTD, a peptide with one additional amino acid on the N-terminus and two additional amino acids on the C-terminus of PSTD were used [66].

Taken together, our sequence alignments and 3D simulations demonstrate that, in fish, characteristic alterations of the amino acid sequences occurred within PSTD and that several of these replaced amino acids are important for the enzymatic activity in the case of human ST8Sia IV, as demonstrated by Troy and co-workers [31]. These variations might also influence the ability of PSTD to interact with sialic acid chains consisting of other sialic acids than Neu5Ac, such as Neu5Gc and KDN, as well as their O-acetylated forms. However, to definitively proof this hypothesis of neofunctionalization of fish polysialyltransferases, their enzymatic activity has to be characterized in more detail.

#### *2.5. Expression of Poly*α*2,8-Sialyltransferase Genes in C. Maraena Tissues*

Having characterized the chromosomal localization, evolutionary history, and structure of the poly-α2,8-sialyltransferases ST8Sia II and ST8Sia IV encoded by the *st8sia2* and *st8sia4* genes, respectively, we eventually profiled their expression in ten organs and tissues of *C. maraena* as a representative of the Salmoniformes (Figure 9A,B). As *st8sia2* is duplicated in salmonid fishes, we investigated whether the expression of both genes is tissue-specific, and thus possibly function-specific. To this end, discriminating primer pairs for *st8sia2-r1* and *st8sia2-r2* as well as for *st8sia4* transcripts were designed. The RT-qPCR analysis revealed that *st8sia2-r1* transcripts were on low levels in liver, heart, spleen, head kidney, gills, hypothalamus, and hind brain (>300 copies/ng RNA), and almost absent in muscle (>10 copies/ng RNA) (Figure 9A). In stark contrast, the copy numbers of *st8sia2-r1* were at a high level in gonads (~1700 copies/ng RNA) and telencephalon (~ 2140 copies/ng RNA) (Figure 9B). The transcript levels of the gene copy *st8sia2-r2* were generally higher compared with its paralogue, ranging from a 1.5-fold difference in gonads to a 233-fold difference in spleen (Figure 9A). While the expression of *st8sia2-r2* was not detectable in hind brain and telencephalon, it exceeded the expression level of *st8sia2-r1* by 4622-fold in the hypothalamus.

**Figure 9.** Expression profiling of poly-α2,8-sialyltransferase-encoding genes in maraena whitefish. (**A**) Transcript levels of *st8sia2-r1* (black bars), *st8sia2-r2* (gray), and *st8sia4* (blank) were determined in ten tissues from maraena whitefish (*n* = 4), as indicated on the abscissa. Bars represent the averaged copy numbers normalized against three reference genes; error bars represent the standard deviation. (**B**) A heat map represents the same copy numbers per target gene as shown in (**A**) relative to the expression in gonads (set as 1.0). These relative expression values are colored according to the code given at the right. Non-detectable transcript numbers are indicated by gray fields.

The expression level of *st8sia4* was at a similarly low or even significantly lower level compared with that of *st8sia2-r1* with the highest copy numbers in spleen (~330 copies/ng RNA). No or only very few *st8sia4* transcripts were detectable in liver, muscle, and heart (Figure 9B). The results are partially different in comparison with the determined mRNA levels in rainbow trout using Northern blot analysis and semi-quantitative PCR [33]. For instance, spleen samples were negative for *st8sia2* transcripts, which might not only be the result of differences in the applied methods, but also in general differences between these two Salmoniformes.

Taken together, profiling the expression of the poly-α2,8-sialyltransferase genes revealed a tissue-specific expression pattern of *st8sia2* genes in *C. maraena* tissues indicative of their subfunctionalization. Probably one of the most striking differences between the expression profiles in maraena whitefish and humans is the presence of *st8sia2* and *st8sia4* transcripts in the reproductive tract. Whereas in humans, only a weak signal for *st8sia2* mRNA and no signal for *st8sia4* mRNA could be detected by Northern blotting [68], in *C. maraena,* the gonads belongs to the tissues with the highest expression levels of polysialyltransferases. This was already described by Kitajima and colleagues using rainbow trout ovaries [33]. Besides the gonads, remarkable differences were also observed in spleen. Contrary to humans, where no *st8sia2* mRNA was detectable [68], *st8sia2-r2* expression was extremely high in the spleen of *C. maraena*, indicating that ST8Sia II-r2 might play a role during immunologic reactions in maraena whitefish. Altogether, these results let us suggest that, in addition to the number of active polysialyltransferases, as well as their enzymatic activity, the physiological roles of these polysialyltransferases may have changed during the evolution of vertebrates.
