*Article* **Vertebrate Alpha2,8-Sialyltransferases (ST8Sia): A Teleost Perspective**

**Marzia Tindara Venuto 1, Mathieu Decloquement 2, Joan Martorell Ribera 3, Maxence Noel 2, Alexander Rebl 3, Virginie Cogez 2, Daniel Petit 4, Sebastian Peter Galuska <sup>1</sup> and Anne Harduin-Lepers 2,\***


Received: 29 November 2019; Accepted: 10 January 2020; Published: 14 January 2020

**Abstract:** We identified and analyzed α2,8-sialyltransferases sequences among 71 ray-finned fish species to provide the first comprehensive view of the Teleost ST8Sia repertoire. This repertoire expanded over the course of Vertebrate evolution and was primarily shaped by the whole genome events R1 and R2, but not by the Teleost-specific R3. We showed that duplicated *st8sia* genes like *st8sia7*, *st8sia8*, and *st8sia9* have disappeared from Tetrapods, whereas their orthologues were maintained in Teleosts. Furthermore, several fish species specific genome duplications account for the presence of multiple poly-α2,8-sialyltransferases in the Salmonidae (ST8Sia II-r1 and ST8Sia II-r2) and in *Cyprinus carpio* (ST8Sia IV-r1 and ST8Sia IV-r2). Paralogy and synteny analyses provided more relevant and solid information that enabled us to reconstruct the evolutionary history of*st8sia* genes in fish genomes. Our data also indicated that, while the mammalian ST8Sia family is comprised of six subfamilies forming di-, oligo-, or polymers of α2,8-linked sialic acids, the fish ST8Sia family, amounting to a total of 10 genes in fish, appears to be much more diverse and shows a patchy distribution among fish species. A focus on Salmonidae showed that (i) the two copies of *st8sia2* genes have overall contrasted tissue-specific expressions, with noticeable changes when compared with human co-orthologue, and that (ii) *st8sia4* is weakly expressed. Multiple sequence alignments enabled us to detect changes in the conserved polysialyltransferase domain (PSTD) of the fish sequences that could account for variable enzymatic activities. These data provide the bases for further functional studies using recombinant enzymes.

**Keywords:** molecular phylogeny; α2,8-sialyltransferases; polySia motifs; evolution; ST8Sia; functional genomics

### **1. Introduction**

Glycoproteins and glycolipids can be modified with numerous different glycans during their transit to the cell surface. Here, these glycoconjugates form a dense meshwork, the glycocalyx, influencing several essential processes, such as adhesion and migration mechanisms in addition to

cell signaling. Intriguingly, all living cells are surrounded by such a sugar-coat, which demonstrates the importance of glycans for all living organisms [1]. However, glycoconjugates are not only found on the cellular membranes, but also on released extracellular vesicles and soluble glycoconjugates; likewise, various physiological and pathological can be targeted by their released forms. Several different monosaccharides are utilized for the formation of glycans. Nevertheless, a very special position among the building blocks of glycans takes the family of sialic acids [2,3]. These α-keto acids consist of a nine-carbon backbone with a carboxylic acid group at C1 and a ketone group at C2 [4]. Remarkably, more than 50 derivatives are known in nature. Besides N-acetylneuraminic acid (Neu5Ac), N-glycolylneuraminic acid (Neu5Gc) is the most common sialic acid and the hydroxyl groups of both can be additionally substituted, for example, by acetylation. The same applies for a further common sialic acid, which is mainly used in lower vertebrates, deaminated neuraminic acid (KDN, 2-keto-3-deoxy-D-glycero-D-galacto-nononic acid) [5]. All three of these sialic acids are frequently added by α2,3- and α2,6-sialyltransferases (ST3Gal, ST6Gal and ST6GalNAc) to nascent glycans. However, in contrast to other commonly utilized monosaccharides of glycans, an attached sialic acid residue can only be used to add another sialic acid residue, which explains their outermost position on sialylated glycans. The elongation at position C8 of α2,3- or α2,6-linked sialic acid residues is catalyzed by sialyltransferases belonging to the group of α2,8-sialyltransferases (ST8Sia) and long polymers of sialic acids can be enzymatically synthesized in this way [6–8].

All those animal sialyltransferases (α2,3-, α2,6- α2,8-sialyltransferases) belong to the CAZy glycosyltransferase family GT29, which indicates their common modular organization (GT-A-like fold) and their common ancestral origin [8,9]. These protein sequences are characterized by the presence of four consensus motifs called sialylmotifs (L (Large), S (Small), III, and VS (Very Small)) involved in 3D structure maintenance, substrate binding, and catalysis [10,11]. The sialylmotifs are very useful for in silico identification of sialyltransferases-related sequences [12]. On the basis of their sugar acceptor specificity and glycosidic linkage formed, GT29 is subdivided into four families ST3Gal, ST6Gal, ST6GalNAc, and ST8Sia in vertebrates [7,13], each of which is characterized by family motifs likely involved in linkage specificity [14–16]. The biosynthesis of α2,8-sialylated molecules is an ancient pathway achieved by the ST8Sia, a group of enzymes that emerge in the first eukaryotes [8] and expanded very early in animal evolution [14]. Up to now, the ST8Sia enzymes have been studied and characterized in mammalian tissues and primarily in the adult brain. The human and mouse genomes show six ST8Sia subfamilies: ST8Sia I, ST8Sia V, and ST8Sia VI are mono-α2,8-sialyltransferases and constitute a first group of ST8Sia enzymes involved in di-sialylation of glycoconjugates, while ST8Sia III in addition to ST8Sia II and ST8Sia IV form a second group of oligo- and poly-α2,8-sialyltransferases implicated in the polysialylation of glycoproteins [15].

Interestingly, our recent studies pointed to the fact that the *st8sia* gene family appears to be much larger in teleost fish genomes [14,17]. The emergence of several novel vertebrate mono-α2,8-sialyltransferases subfamilies like ST8Sia VII and ST8Sia VIII was described in this first group of ST8Sia and their enzymatic specificities remain to be determined. These mono-α2,8-sialyltransferase genes have arisen as a consequence of whole genome duplications (WGDs, R1 and R2) at the base of vertebrates and were maintained in fish, whereas some others such as *st8sia6*, maintained in Tetrapods, have disappeared in fish [17,18]. In the second ST8Sia group, the enzymes responsible for the biosynthesis of sialic acid polymers, the poly-α2,8-sialyltransferases ST8Sia II and ST8Sia IV and the oligo- α2,8-sialyltransferase ST8Sia III, have been cloned and characterized from mammalian tissues, essentially the brain, where they act on α2,3-sialylated N-glycans of the neural cell adhesion molecule (NCAM), leading to an increased neuronal plasticity in embryos [19–26]. From a structural point of view, the poly-α2,8-sialyltransferases share a high degree of similarity in their sequence and structure [27–29] and are characterized by two additional sequence motifs, termed the polysialyltransferase domain (PSTD), of 32 amino acids located upstream of the sialylmotif S, and the polybasic region (PBR), of 35 amino acids located in the stem region of the enzymes involved in protein-specific polysialylation [30,31]. The oligo-α2,8-sialyltransferases ST8Sia III also

show additional broadly conserved motifs with respect to ST8Sia II and ST8Sia IV (motifs III-1 and III-2) [14] with potential implication in the oligosialylation activity [32]. Their fish orthologues have been identified, cloned, and characterized in zebrafish (*Danio rerio*) in addition to rainbow trout (*Oncorhynchus mykiss*) [18,33,34].

Our previous phylogenetic studies also identified novel α2,8-sialyltransferases-related sequences like the ST8Sia III-related (ST8Sia III-r) found in a few fish orders like Perciformes, Tetraodontiformes, and Beloniformes, whereas the ST8Sia IV disappeared from the Neognathi fish [14]. It has long been appreciated that gene-, segmental-, and genome duplication, as well as gene loss events, have played important role in evolution, providing new genetic materials, which may facilitate new adaptation for the organism [35,36].

In this study, we used a BLAST strategy to identify over 700 ST8Sia-related sequences from ray-finned fish genomes and performed phylogenetic analyses and sequences alignments to reevaluate their evolutionary relationships and fate, focusing on those responsible for polysialic acid (polySia) biosynthesis with implications for the evolution of nervous system, immunological system, and cell–cell interactions. Our findings point to a particular distribution of ST8Sia in fish, revealing novel *st8sia* gene members and further suggesting their functional divergence in vertebrates.

#### **2. Results and Discussion**

#### *2.1. In Silico Identification and Phylogenetic Reconstruction of ST8Sia Sequences*

To investigate *st8sia* genes' expansion and distribution in vertebrates, we performed public database screenings in the National Center for Biotechnology Information (NCBI), ENSEMBL, and Phylofish databases [37] using a BLAST strategy [38]. The obtained results led to the identification of more than 700 ST8Sia-related sequences (Supplemental Data 1) in chordate genomes, including 71 ray-finned fish genomes (68 Teleosts genomes). Putative ST8Sia sequences with significant similarity to the known human ST8Sia based on the presence of the sialylmotifs L, S, III and VS found in all GT29 sialyltransferases, and of family motifs characteristic for the ST8Sia family were selected, and multiple sequence alignments were performed to select the complete open reading frame. The orthologues of ST8Sia I and ST8Sia V involved in gangliosides biosynthesis are identified in all the investigated genomes, suggesting a high conservation of the gangliosides biosynthetic pathways in vertebrates (Supplemental Table S1). Similarly, the ST8Sia III and the recently described fish ST8Sia VIII [17] could be found in all the Actinopterygii (ray-finned fishes) genomes (Figure 1; Supplemental Table S1). Intriguingly, multiple copies of *st8sia*-related gene sequences were identified in Teleost genomes and their number varied considerably from one fish order or species to another. For example, there are 6 *st8sia*-related genes in the medaka (*Oryzias latipes*), 8 in the clownfish (*Amphiprion oscellaris*) and the common carp (*C. carpio*), and up to 10 in the rainbow trout. Indeed, multiple copies of ST8Sia VIII (>3) were found in Perciformes, Cichliiformes, and Cyprinodontiformes; that is, two copies of the ST8Sia VII in Cypriniformes, two copies of the ST8Sia II in Salmoniformes, and two copies of the ST8Sia IV were found in the Cypriniforme *C. carpio* (Supplemental Table S1). In addition, some other *st8sia* genes could not be found like ST8Sia VI in Teleosts and Chondrostei [17]; ST8Sia IV in Neoteleostei genomes [14]; or ST8Sia II in Esociformes, Siluriformes, or Gymnotiformes (except *Electrophorus electricus*) genomes (Table 1). This resulted in a particular distribution of ST8Sia observed in the Actinopterygii compared with the Sarcopterygii (lobbed-finned fishes and Tetrapods) and Chondrichtyes (sharks) (Figure 1; Supplemental Table S1), which might have facilitated the acquisition of evolutionary innovations during vertebrate evolution [35]. These observations prompted us to re-examine the genetic events, which have shaped α2,8-sialylation in Teleosts.

**Figure 1.** A schematic phylogenetic tree of vertebrate evolution. A simplified phylogenetic tree depicting the evolution of the jawed vertebrates Gnathostomes after the two rounds of whole genome duplication (WGD, R1 and R2). It is hypothesized here that WGD-R2 occurred after the Gnathostomes-Agnathes (jawless vertebrates) split. The Gnathostomes branch is divided into two categories: the cartilaginous fish Chondrychtyes (sharks and rays) and the bony fish Osteichthyes. The Osteichthyes are split into the lobe-finned fish Sarcopterygii that contain Tetrapods, and the ray-finned fish Actinopterygii that contain Neopterygii (Chondrostei, Holostei, and Teleosts).


**Table 1.** Fish orders that have lost *st8sia* genes.

To determine whether the expansion of *st8sia* genes observed in Actinopterygii could be associated to WGD or smaller scale duplication events, we took advantage of the improved genome sequencing of several critical species for basal Vertebrates as Agnathans (Lampreys and Hagfish) and for Actinopterygii as Chondrostei (Sturgeons) and Holostei (Gars and Bowfin) (Figure 1). A simplified dataset was constructed including sequences of Agnathans (*Lethenteron camtschaticum*, *Petromyzon marinus*, *Eptatretus burgii*), Chondrichthyans (*Callorhinchus milii*, *Squalus acanthias*, and *Heterodontus zebra*), basal Actinopterygians (*Acipenser sinensis*, *Amia calva*, and *Lepisosteus oculatus*) and basal Teleosteans such as the Elopomorphs *Anguilla anguilla* and *Mastacembelus armatus*, in addition to two Teleosts, the Beloniforme *O. latipes* (medaka) and the Characiforme *Astyanax mexicanus* (cave fish). The potential orthology of the selected sequences was assessed through the construction of phylogenetic trees (Figure 2). The topology of these trees indicated two major phylogenetic groups of mono-α2,8-sialyltransferases on one hand, and oligo- and poly-α2,8-sialyltransferases on the other, as previously described [7,14].

**Figure 2.** Minimum evolution phylogenetic tree of 89 chordates ST8Sia. The evolutionary history of 89 ST8Sia (see names and sequences in Supplemental Data 1) was inferred using the minimum evolution (ME) method. The optimal tree drawn to scale with the sum of branch length = 16.02931149 is shown. The evolutionary distances were computed using the JTT (Jones-Taylor-Thornton) matrix-based method and the rate variation among sites was modeled with a gamma distribution (shape parameter = 5). The ME tree was searched using the close-neighbor-interchange (CNI) algorithm at a search level of 1. The neighbor-joining algorithm [39] was used to generate the initial tree. The analysis involved 89 amino acid sequences and all positions with less than 95% site coverage were eliminated. A total of 226 positions were in the final dataset (see multiple sequence alignments in Supplemental Data 2). Evolutionary analyses were conducted in MEGA7.0 [40]. The nine Vertebrate subfamilies of ST8Sia (ST8Sia I to ST8Sia IX) are indicated by various colors.

In the mono-α2,8-sialyltransferases group, a series of Agnathan sequences are found at the base of each of ST8Sia I and ST8Sia V. The results corroborate previous findings suggesting the emergence of these two subfamilies around 596 and 563 million years ago (MYA), well before vertebrates emergence and prior WGD R1 and R2 [14]. Consistent with our previous data [17], we identified *st8sia7* genes in the jawless vertebrates *Lethenteron camtschaticum*, *Petromyzon marinum*, and *Eptatretus burgeri* genomes. Thus, these genes might have arisen from the ancestral *st8sia6*/*7*/*8* gene after the first WGD R1 event (~552 MYA), although timing of these events with respect to the divergence of agnathans is still a matter of debate [41,42]. Interestingly, Agnathans possess two copies of this later enzyme, named ST8Sia VII and ST8Sia VII-r in Figure 2, likely resulting from species specific large-scale gene duplication events. Similarly, in Teleosts, the eel *A. anguilla* (Elopomorphes, see the work of [43]) also harbors two copies of ST8Sia VII, ST8Sia I, and ST8Sia V enzymes (Figure 2). This observation is in favor of a large-scale genome duplication event different from the Teleost specific third round of WGD R3 (TGD) [44,45], which may have taken place in a common ancestor of freshwater eels sometime after the split of Elopomorpha and Osteoglossomorpha [46]. The ST8Sia VI and ST8Sia VIII subfamilies likely have arisen from the second WGD at the base of Vertebrates; the first one was maintained in Sarcopterygii and disappeared in Actinopterygii, and vice versa for ST8Sia VIII [17]. The many gene copies of *st8sia7* and *st8sia8* identified in Teleosts genomes (Supplemental Table S1) are likely the result of single gene duplication events because they were identified on the same piece of chromosome (data not shown), and were thus noted with -A, -B, or -C extension. However, it is difficult to infer the origin of these segmental duplications as they have occurred in many, but not all terminal branches of clades.

The second branch encompasses both oligo- and poly-α2,8-sialyltransferases. Regarding poly-α2,8-sialyltransferases, the Agnathan sequences were attributable only to ST8Sia IV, indicative of a divergence between ST8Sia II and ST8Sia IV dating back to WGD-R1 (Figure 2) [14] followed by *st8sia2* gene loss in Agnathans. In contrast, the Agnathan sequences of oligo-α2,8-sialyltransferases are at the base of the ST8Sia III and ST8Sia III-r subfamilies, while there are orthologues to the ST8Sia III from sharks to Tetrapod lineages, suggesting a genome duplication event linked to WGD-R2 consistent with previous dating around 474 MYA [14]. Despite the fact that the ST8Sia III-r sequences appear to be restricted to Teleosteans, including Elopomorphes, and are lost in Chondrichthyans and Tetrapods

lineages, they were not issued from the Teleost specific WGD, and thus were renamed ST8Sia IX according to the previously described nomenclature [12].

#### *2.2. Identification and Phylogenetic Analysis of the Fish St8sia Genes (st8sia2, st8sia4, st8sia3, and st8sia9)*

Interestingly, in the oligo- and poly-α2,8-sialyltransferases group, the ST8Sia II and ST8Sia IV appeared to be duplicated or lost in several Teleost lineages after divergence of Actinopterygii from Sarcopterygii [47,48], whereas the ST8Sia III was found in all the Actinopterygii. In the basal Elopomorphes and Osteoglossiformes branches, the four*st8sia* genes (*st8sia2*, *st8sia3*, *st8sia4*, and *st8sia9*) could be identified. The results indicate that these genes already existed in the common ancestor of the 68 Teleost fishes examined. All Otocephalan lineages lack the *st8sia9* gene and the Siluriformes lack both the *st8sia9* and *st8sia2* genes. Consequently, the *st8sia9* gene was lost shortly after Otocephala emergence around 176.2 MYA and the *st8sia2* gene was lost more recently (~82.6 MYA) during siluriformes evolution [49]. As previously observed, all Neoteleostei fish lack the *st8sia4* gene [14], which was lost at the basis of Neoteleostei lineage. Finally, the Esociformes lack the *st8sia2* gene only (Table 1). Furthermore, two ST8Sia II-related sequences were identified in all the investigated Salmoniformes (*Oncorhynchus*, *Coregonus*, *Salmo*, *Salvelinus*, and *Thymallus*) and two ST8Sia IV-related sequences were identified only in the Cypriniformes *C. carpio* and *Sinocyclocheilus anhuiensis* (Supplemental Table S1). We took advantage of the improved genome and transcriptome sequencing of several fish [37,50], selected several representative Salmoniformes and Cypriniformes ST8Sia sequences, and constructed phylogenetic trees (Supplemental Figure S1). The topology of these trees indicated that the later duplications of *st8sia* genes were not associated to the Teleost specific genome duplication (TGD, WGD R3), but rather to more recent lineage-specific genome duplication events described in Salmonidae (SGD) lineage [51] and in *C. carpio* species [52].
