Next Article in Journal
Associations of High-Density Lipoprotein Functionality with Coronary Plaque Characteristics in Diabetic Patients with Coronary Artery Disease: Integrated Backscatter Intravascular Ultrasound Analysis
Next Article in Special Issue
Evidence of Horse Exposure to Anaplasma phagocytophilum, Borrelia burgdorferi, and Leishmania infantum in Greece through the Detection of IgG Antibodies in Serum and in an Alternative Diagnostic Sample—The Saliva
Previous Article in Journal
Novel Anti-Melanoma Compounds Are Efficacious in A375 Cell Line Xenograft Melanoma Model in Nude Mice
Previous Article in Special Issue
Biomolecular Minerals and Volcanic Glass Bio-Mimics to Control Adult Sand Flies, the Vector of Human Leishmania Protozoan Parasites
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Revisiting Schistosoma mansoni Micro-Exon Gene (MEG) Protein Family: A Tour into Conserved Motifs and Annotation

by
Štěpánka Nedvědová
1,2,3,
Davide De Stefano
1,
Olivier Walker
1,
Maggy Hologne
1 and
Adriana Erica Miele
1,4,*
1
UMR 5280 Institute of Analytical Sciences, Université de Lyon, CNRS, Université Claude Bernard Lyon 1, 69100 Villeurbanne, France
2
Department of Chemistry, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences, 16500 Prague, Czech Republic
3
Department of Zoology and Fisheries, Center of Infectious Animal Diseases, Czech University of Life Sciences, 16500 Prague, Czech Republic
4
Department of Biochemical Sciences, Sapienza University of Rome, 00185 Rome, Italy
*
Author to whom correspondence should be addressed.
Biomolecules 2023, 13(9), 1275; https://doi.org/10.3390/biom13091275
Submission received: 20 July 2023 / Revised: 15 August 2023 / Accepted: 18 August 2023 / Published: 22 August 2023
(This article belongs to the Special Issue New Insight into Vector Borne Diseases)

Abstract

:
Genome sequencing of the human parasite Schistosoma mansoni revealed an interesting gene superfamily, called micro-exon gene (meg), that encodes secreted MEG proteins. The genes are composed of short exons (3–81 base pairs) regularly interspersed with long introns (up to 5 kbp). This article recollects 35 S. mansoni specific meg genes that are distributed over 7 autosomes and one pair of sex chromosomes and that code for at least 87 verified MEG proteins. We used various bioinformatics tools to produce an optimal alignment and propose a phylogenetic analysis. This work highlighted intriguing conserved patterns/motifs in the sequences of the highly variable MEG proteins. Based on the analyses, we were able to classify the verified MEG proteins into two subfamilies and to hypothesize their duplication and colonization of all the chromosomes. Together with motif identification, we also proposed to revisit MEGs’ common names and annotation in order to avoid duplication, to help the reproducibility of research results and to avoid possible misunderstandings.

1. Introduction

Global efforts for genome sequencing after the human genome project [1,2] have been extremely beneficial to the entire scientific community. The Wellcome Trust Sanger Center, among others, undertook the task of sequencing the genomes of pathogens, in particular parasites, which affect the vast majority of the world’s population [3,4]. Parasites are complex eukaryotes, whose genomes show more resemblance with their hosts than with non-parasitic family members. In the last twenty years, the completion of parasites’ genomes opened the way to study the molecular basis of the diseases caused by these agents and also to find and validate new drug targets. This last point is a critical one since most of the drugs against parasites are old, most of the time with severe side effects, or not any more effective due to resistance. Nevertheless, focused genetics and molecular biology studies had already been carried out before the genomic era; therefore, quite a number of gene products had been annotated and served as landmarks for the recent annotations.
Under many respects, Schistosoma mansoni, the widest-spread agent of human schistosomiasis, is a good case study [4,5]. This vector-borne extracellular parasite is a metazoan digenean trematode; it possesses seven autosomes and a pair of sexual chromosomes, with a complex genetic structure including long interspersed sequences, transposon-like sequences, alternative splicing, and gene duplications [5,6,7,8,9].
One class of genes, in particular, has attracted the attention of researchers, the so-called micro-exon genes (MEG). Their discovery happened well before the completion of S. mansoni genome sequencing [10,11], although their annotation was completed with the version available on February 2023 on WormBase ParaSite (WBPS17–WS282) [12]. In total, 35 genes are annotated with characteristic alternants of long introns (ranging from 0.1 to 5 kbp) and very short exons, whose length can vary from 6 to 81 base pairs, with a majority of 15 bp long exons [5,12]. The introns’ structure is also quite intriguing and characteristic: their length is shorter towards the 5′ and 3′ ends (ranging from 100 to 500 bp) and longer (up to 4 kbp) in the center. This peculiarity made their automatic inference complicated from a genomic point of view.
However, the last fifteen years have seen plenty of transcriptomics and proteomics studies accumulating evidence of MEG complexity and variability and allowing for their tracing back to the genome [5,11,13,14,15,16].
In this work, we present a systematic study aiming to understand the putative origins and spread over the entire genome of the MEG superfamily of genes. We highlight the presence of conserved motifs, despite the high variability, and we finally suggest a revision of the nomenclature in order to resolve some ambiguities in the literature.

2. Materials and Methods

We have interrogated the WormBase ParaSite (WBPS) database [12] over the last two years, because this is the reference database for helminth, and we present the data from the session WS282 (last accession 31 March 2023), searching, within the genome of Schistosoma mansoni, for the terms “Micro-Exon Gene”, “MEG”, “antigen 10.3”, “GRAIL”, and “ESP”. For each gene, WBPS presents its structure, the possibility to retrieve the sequence, the position on the chromosome, and a translation identified with (at least) one associated UniProt ID. On the UniProt website [17], each entry is associated with both its WBPS and GenBank® identifiers, when available.
An example of the WBPS entry for MEG 3.2 isoform 1 is given in Figure 1.
In parallel, we have performed exactly the same search on UniProt KnowledgeBase [17] and cross-verified the results. All the protein sequences downloaded from UniProt were passed into psi-BLAST from NCBI [18], restricting the search to S. mansoni in order to collect the maximum number of protein sequences annotated as MEG.
Afterward, we performed a trimming, eliminating the duplicated entries, which displayed the same sequence with two different protein accession numbers, one from NCBI and one from UniProt. Whenever this was the case, we arbitrarily kept the UniProt ID only for consistency with the cross-annotation of WBPS. While we were writing this manuscript, on 13 April 2023, a new release of WPBS was published, which included more transcriptomic data and increased the number of putative megs, without the associated protein data; therefore, we did not change our verified workflow.
The primary structure alignment of MEG proteins was performed with T-Coffee [19], K-align [20], and MUSCLE [21] on the EBI server [22] (last access on 31 March 2023). All the default parameters were chosen. The alignments were then manually inspected and the one from MUSCLE was finally preferred for subsequent analysis (supplementary Figure S1) because the inserted gaps respected the exons’ boundaries; hence, they better respected the biological constraints of alternative splicing.
Phylogenetic trees were built with both Simple phylogeny [23] and PRANK [24] on the EBI server [22] (last access on 31 March 2023). The one from PRANK was more consistent with the gene clustering and also with the type of retrotransposon sequences, which had been found at the boundaries of the megs [7,8]; therefore, it was retained and visualized on the iTOL (interactive Tree of Life) server [25].
Emboss on the EBI server [22] was used to put in evidence conserved linear motifs, which were displayed with Weblogo [26].
Primary sequence analysis was performed by the ProtParam tool [27] on the Expasy website [28] (last access on 31 March 2023); the results on calculated molecular weights, isoelectric point, aliphatic index, and GRAVY index are presented in the supplementary Table S1. In this table both the WBPS gene name and the corresponding code from GenBank® (when available) are listed (last access on 18 August 2023).

3. Results and Discussion

The latest annotated genome version of S. mansoni contains at least 35 unique micro-exon genes (meg) with a peculiar structure of 10 to 20 very short exons interspersed with long introns, whose length spans from 100 to almost 5000 base pairs. Unsurprisingly, the automatic annotation was challenging to trace these genes and to recognize them as protein-expressing ones. The short exons are, in the majority of cases, a multiple of three and range from a minimum of 6 bp (i.e., two amino acids) to a maximum of 81 bp. However, in a few cases, one or two exons contain a number of base pairs not divisible by three. In the vast majority, the exons are 15 bp long, thus coding for five amino acids [5,11,12].
These 35 verified protein-expressing genes are interspersed over the seven autosomes and the sexual chromosomes (Figure 2); most are coded on the leading strand and a few in the complementary one, such as Smp_010550 (uncharacterized/MEG-15), which interestingly codes for one tRNA on the leading strand.
On chromosomes 1, 3 and 5, they happen to be clustered together, reinforcing the hypothesis of their origin via gene duplication.
All the chromosomes contain at least one meg (chromosomes 2, 4 and the sexual one), while chromosome 3 hosts thirteen distinct megs. Moreover, close to the 5′ of megs, three types of (retro)transposable elements have been found, suggesting a spreading via gene duplication and transposition and subsequent mutation. This finding was previously corroborated by studying the ratio between non-synonymous over synonymous (dN/dS) mutations in meg [7,8,29].

3.1. MEG Filiation

In S. mansoni, 35 megs code for at least 87 verified MEG proteins with a unique UniProt ID, mainly originating from alternatively splicing the central exons. In the past, the gene sequences have been annotated and clustered into 23 families, numbered from 1 to 16 and from 26 to 32. MEG proteins have been found prior to the genome sequencing in the secretions from eggs and adult worms, in both transcriptomics and proteomics studies. At the beginning, they were named “antigen 10.3”, “Egg secreted protein no 15 (ESP15)”, and “Grail”, before their peculiar gene structure was discovered [5,10,11,13,14,30].
We have aligned the 87 verified proteins, whose sequences are given in the supplementary Table S1, and we found that, among the three most used software offered on the EBI tools webpage [22], MUSCLE [21] was the one that reflected the splicing constraints the most. In fact, it inserted the gaps between exons and not inside them (supplementary Figure S1); moreover, it aligned the N-termini and the C-termini better than T-coffee or K-align.
Starting from an unknown ancestor, possibly on chromosome 6, a putative (phylogenetic) ontogenetic tree based on sequence similarities was built by PRANK [24] and a clustering in two main groups/clades was highlighted (Figure 3).
It is plausible that a “proto-meg” was on chromosome 6, since the first leaves departing from each clade are there. Indeed, MEG-28 and MEG-29 of the red clade and MEG-8 (Smp_172180.1) and MEG-32 of the blue clade are on chromosome 6. Of course, this is just a pure hypothesis based on sequence similarity among the orthologous proteins.
The proposed filiation of the red clade, composed of seven gene families, indicates an early jump on chromosome 3 to give rise to the MEG-1 and MEG-3 protein families, which practically colonized the entire chromosome. On the other hand, meg-9 and meg-31 jumped, respectively, on chromosomes 7 and 1. Based on sequence similarity, an event of gene duplication to generate meg-3 and meg-9 prior to splitting into two different chromosomes could be hypothesized, while meg-31 could well be a filiation of meg-28.
Again on chromosome 6, we find the genes coding for MEG-13, MEG-8 (Smp_172180.1), MEG-26, MEG-32 of the second clade, the blue one in Figure 3. Therefore, we might speculate that the first “experiments” on increasing protein variation via gene duplication and alternative splicing were carried out in chromosome 6 and then pursued on other chromosomes, in particular, no. 3 and no. 1. In fact, the protein coded by meg-8 on chromosome 6 is close to the proteins of the MEG-2 family of chromosome 3, which then expanded and duplicated. The goal of increasing variability is quite common to parasites and it is usually linked to strategies for achieving host immune evasion [9,11,29,31,32].
meg-9 on chromosome 7 was “joined” by meg-15, whose sequence is similar to the large meg-2 protein family; hence, we might speculate a jump from chromosome 3 to no. 7 rather than an intra-chromosomal duplication and mutation. The fact that they are not clustered together might corroborate this hypothesis.
Chromosome 3 is particularly interesting since it contains the highest number of meg genes and also the majority of egg-secreted MEG proteins (MEG-2, MEG-3, and MEG-1) [5,14,30], as well as the genes with the highest number of spliced isoforms (14 MEG-1 isoforms from Smp_122630, Figure 3).
According to genome annotation, meg-8 (Smp_163710.1), meg-16, and meg-6 are the only representatives on chromosomes 2, 4 and the sexual one, respectively. They all belong to the blue clade, but the protein product of meg-16 shares more similarities with MEG-32, which may suggest an early jump from chromosome 6 to 4, while MEG-6 proteins are closer to the MEG-2 protein family, suggesting a transposition from chromosome 3 towards the sexual chromosome. Moreover, MEG-8 protein shares 50% similarity with MEG-30, suggesting a transposition from chromosome 1 to chromosome 2. It is worth to remind that the mRNA of MEG-6 and MEG-8 are highly present in the eggs and esophageal excreted/secreted transcripts, and that the respective proteins have been found in proteomic studies [13,14,30].
Looking at the tree in Figure 3, we could also speculate that the clade painted in blue was more successful in diversifying the sequences, an aspect that we have tried to highlight with three shades of blue. On the other hand, we could speculatively infer that the clade painted in red was more successful in implementing the alternative splicing to achieve diversity.

3.2. Conserved Motifs

Going back to the total alignment in supplementary Figure S1, it is clear that there are very few consensus motifs conserved, except the N-terminal signal peptide, which is needed for secretion. We have therefore decided to present this alignment more graphically, by using WebLogo [26] in Figure 4. Apolar residues such as Pro, Phe, Leu are regularly spaced and highly represented. Charged residues, mostly basic Lys, are more conserved in the C-terminal part, while Glu (E) is interspersed at more or less regular intervals of 15–20 aa (taking into account the gaps).
Soon after the signal peptide, all the MEGs present a stretch of hydrophobic and aromatic residues ended by a basic one: FLϕϕFX6Wp(K/H/R). Where X is any residue, p is a polar amino acid, and ϕ is a hydrophobic one.
Independently of the clade, it is apparent that MEG proteins possess sticky sequences, which we have verified by calculating the aliphatic index and the GRAVY index with ProtParam [28]. These data are included in the supplementary Table S1.
The stickiness of MEG proteins also appeared from the studies aimed at producing the isolated proteins through protein engineering. For example, recombinant MEG-3.2 and MEG-3.4 were purified from the inclusion bodies of Escherichia coli Rosetta cells and refolded by dialysis after purification, before they could be used for immunization studies [33]. MEG-14 was poorly expressed in bacteria and its amphipathic character was studied using circular dichroism with synchrotron light [34] before using the protein for binding studies with host factors [35]. MEG-4 (Sm10.3) was expressed in Escherichia coli Rosetta cells and purified in a buffer containing 0.5 M NaCl [36], a non-physiological hypertonic concentration of salt. MEG-24, MEG-27 and MEG-2.1 could not be produced in a heterologous host and, given their short size, they were chemically synthesized to perform in vitro studies [37,38]. Chemical synthesis was also employed to use peptides of several MEG proteins as baits in search for host partners; this was the case, for example, of MEG-12 [32], MEG-8 [39], and twelve other MEG proteins expressed in the tegument and esophageal glands [40].
If we split the two clades and align the sequences separately (35 proteins for the red clade and 52 for the blue clade), we can appreciate that the contribution of conserved Cys and Phe to the overall alignment comes from the red clade (Figure 5). On the other hand, the basic residues at the C-termini are contributed by the isoforms of the blue clade. Proline residues are conserved in both clades.
A hydrophobic motif at the N-terminus, soon after the signal peptide, is also present in the red clade (Figure 5), but with a slightly different sequence [FxxLFL(I/R)(V/D/E)Fxx(D/E)]. Moreover, we can appreciate that this first linear motif is followed by four other conserved motifs: CGGLppG; (D/E)F(D/I/E)KCϕϕ(R/K); CX5/7/9HX3/5/7C; and CLYppDX3L(Y/F/D)V. In total, five short linear motifs characterize the red clade from the N- to the C-terminus, the first one being in common with the blue clade. It would be interesting to experimentally check whether these peptides are conserved because they are antigenic or because they confer some structural features to the IDPs.

3.3. Nomenclature

Based on this classification and filiation, we would like to propose a more rational annotation of the gene products, trying to eliminate the gap between MEG-16 and MEG-26, and also possibly to rationalize the nomenclature of the large MEG-2 family, whose gene products have been numbered somewhat arbitrarily. It is worth mentioning that this class possesses at least nine more members deposited on UniProt, which we have excluded because they have been found only as mRNA, not yet as protein, and there is apparently no gene associated with them in WBPS.
To start a reclassification, we have taken the sequences of 13 MEG-2 proteins of the blue clade and aligned them (Figure 6), starting from the PRANK results. These proteins are coded by eight genes consecutively clustered together in the second half of chromosome 3 on the leading strand (Table 1). Only two genes not coding for MEG (Smp_326510 and Smp_309120), one after the second and one before the last meg-2, interrupt the chain.
Interestingly, the gene coding for the red clade MEG-2/ESP15 isoform C4QPS0, Smp_183040.1, is located in the same part of chromosome 3, between Smp_183010.1 and Smp_183030.1.
Based on ontology and genome positioning, we propose to keep the name MEG-2 to C4QPS0 of the red clade and to rename the products of the blue clade consecutively, according to their position on the genome, from 5′ to 3′, as indicated in Table 2 below.
Moreover, to disambiguate the MEG-4/antigen 10.3 proteins encoded by the genes Smp_085840, Smp_307220, and Smp_307240, all on chromosome 1, we propose to keep the name MEG-4.1 for the protein products of the originally deposited gene (Smp_307220), to call MEG-4.2 the product of the closer relative Smp_307240 and to call Smp_085840′s product MEG-17 because it is more similar to MEG-12 than to MEG-4.
Analogously, to disambiguate the two proteins called MEG-8, we propose to keep the name 8 for the gene Smp_172180.1 present on chromosome 6, given the filiation described above, and to rename the other as MEG-18.
We also propose a slight modification of MEG-10, where two sequences hold the same protein name (Table 2), as well as for the MEG-3/Grail family, which is composed of three genes coding for a total of 12 isoforms (Table 3).
An interesting case is the one of MEG-15, which in the latest annotation loses its name and becomes “uncharacterized”, therefore we propose to go back to its name, given its similarity with the MEG-2 family and with MEG-6 (see Figure 2). Indeed, the gene Smp_010550 has four splice variants, each one coding for one isoform, unequivocally identified on UniProt. A summary is given in Table 4.
Finally, we think that some order in the MEG-1 family would also improve the readability (Table 5), although this is, together with MEG-3, one of the best-annotated families. It is better to underline that there are more than these isoforms on UniProt that have only been inferred by transcriptomics and do not (yet) belong to any deposited coding gene; therefore, they are not included in Supplementary Table S1, nor have they been used for the alignment.

3.4. Towards a Function?

The exact role of the MEG proteins is still unknown; their high copy number and their high variability make inferring tough. In recent years, many “omics” studies have boosted the research on schistosomes, with the aim to find new drug targets, develop more early and precise diagnostics, and implement a vaccine. A handful of studies on individual MEG proteins have been carried out, revealing their nature as intrinsically disordered proteins (IDP) without or with morphing/chameleon behavior. One morphing IDP is MEG-14, which is able to fold upon binding to negatively charged membranes or to calgranulin, a human S100 family member involved in inflammation [34,35]. MEG-24 (whose sequence was not deposited in any public repository) and MEG-27 were shown to bind to liposomes and to agglutinate red blood cells, possibly through the formation of amphipathic helices [37].
We have recently characterized by NMR three splice variants of Smp_336990 of the MEG-2 family and confirmed their IDP nature, together with some interesting hairpin loops, which might undergo some morphing and act as a platform for interactions with host partners [38].
Indeed MEG-2 family members possess the highest content of Cys residues (Figure 5) and a conserved N-terminal motif Cys2X(6/8/10)Cys2, which is reminiscent of either a Zn-finger or a [2Fe2S] cluster. It remains to be proven that MEG-2 proteins might be metal sensors or chelators. A clue might be the fact that they are present in the esophagus’ secretions, where hemoglobin digestion occurs; this digestion releases high quantities of iron, which is potentially toxic. It is known that hemozoin is regurgitated [41], but maybe MEG-2 proteins could act as an iron cleaner, limiting high oxidative stress.
MEG proteins’ high variability and their high expression in the mammalian host have made the researchers think about a role in host immune evasion or modulation. This was the basis for using short synthetic peptides issued from more or less each MEG family as baits to fish IgG from infected mouse models. The most antigenic ones were then used as protective vaccines, although with low efficacy [40]. The question arises whether the full-length isoforms would be more protective or whether the conserved motifs that we have highlighted could be a better strategy.

4. Conclusions

The 87 verified protein products of Schistosoma mansoni’s 35 micro-exon genes (MEG) are elusive and interesting macromolecules. They are Pandora’s box of tools to understand the complex behavior of these fascinating and dangerous parasitic worms. We have presented a rationalization of the gene families based on the sequence similarities and proposed a renaming in order to avoid confusion and to help trimming what is known (the tip of the iceberg) from what still remains to be studied.
Although there are very few consensus motifs conserved in the alignment, the N-terminal signal peptide, required for secretion, remains constant among the MEG proteins. MEGs also share a prevalence of apolar residues like Pro, Phe, and Leu, and charged residues like Lys and Glu in the C-terminal region. The presence of sticky sequences, confirmed by the aliphatic index and GRAVY index calculations, suggests their potential importance in interactions with host factors. The stickiness of MEGs was also evident in studies using protein engineering for their isolation and purification. Phylogenetic analysis split the sequences into two clades and revealed distinct contributions from conserved Cys, Phe and basic residues in each clade. Furthermore, specific short linear motifs were identified within the red clade, one shared with the blue clade. These findings warrant further experimental investigation to determine whether these conserved motifs play a role in antigenicity or contribute to structural features in the IDP regions of MEGs. Overall, these insights into the characteristics and sequences of MEG proteins pave the way for future research on their functional roles and potential applications in immunization studies and interactions with host partners.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biom13091275/s1, Figure S1: Multiple alignment of MEG proteins performed with MUSCLE; Table S1: Summary of sequences and primary structure characteristics of MEG proteins analyzed in this study.

Author Contributions

A.E.M. and M.H. conceived the research. Š.N. and A.E.M. acquired the funding. Š.N. and D.D.S. performed analysis. Š.N., D.D.S., O.W., M.H. and A.E.M. analyzed the data. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funds from Improvement in Quality of the Internal Grant Scheme at CZU, reg. no. CZ.02.2.69/0.0/0.0/19_073/0016944, financed from the funds of Operational Programme Research, Development and Education, in the framework of ESF Call no. 02_19_073 for Improving the Quality of Internal Grant Schemes at Higher Educational Institutions in priority axis 2 OP to S.N.; This research received funds from CNRS project MITI-80PRIME to A.E.M. The APC was partially supported by an agreement between MDPI and Sapienza University of Rome.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created. All the data used in the manuscript are freely available on the WBPS and UniProt databases.

Acknowledgments

We would like to dedicate this paper to the memory of the late Ricardo DeMarco, who pioneered the field of micro-exon genes and dedicated his research to unravel schistosome’s biology.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Lander, E.S.; Linton, L.M.; Birren, B.; Nusbaum, C. Initial sequencing and analysis of the human genome. Nature 2001, 409, 860–921. [Google Scholar] [CrossRef]
  2. Venter, J.C.; Adams, M.D.; Myers, E.W.; Li, P.W.; Mural, R.J.; Sutton, G.G.; Smith, H.O.; Yandell, M.; Evans, C.A.; Holt, R.A.; et al. The Sequence of the Human Genome. Science 2001, 291, 1304–1351. [Google Scholar] [CrossRef] [PubMed]
  3. Alliance of Genome Resources Consortium; Agapite, J.; Albou, L.-P.; A Aleksander, S.; Alexander, M.; Anagnostopoulos, A.V.; Antonazzo, G.; Argasinska, J.; Arnaboldi, V.; Attrill, H.; et al. Harmonizing model organism data in the Alliance of Genome Resources. GENETICS 2022, 220, iyac022. [Google Scholar] [CrossRef]
  4. Zerlotini, A.; Oliveira, G. The contributions of the Genome Project to the study of schistosomiasis. Mem. Inst. Oswaldo Cruz 2010, 105, 367–369. [Google Scholar] [CrossRef]
  5. Berriman, M.; Haas, B.J.; LoVerde, P.T.; Wilson, R.A.; Dillon, G.P.; Cerqueira, G.C.; Mashiyama, S.T.; Al-Lazikani, B.; Andrade, L.F.; Ashton, P.D.; et al. The genome of the blood fluke Schistosoma mansoni. Nature 2009, 460, 352–358. [Google Scholar] [CrossRef]
  6. Silva, L.L.; Marcet-Houben, M.; Nahum, L.A.; Zerlotini, A.; Gabaldón, T.; Oliveira, G. The Schistosoma mansoni phylome: Using evolutionary genomics to gain insight into a parasite’s biology. BMC Genom. 2012, 13, 617. [Google Scholar] [CrossRef]
  7. Philippsen, G.S. Transposable Elements in the Genome of Human Parasite Schistosoma mansoni: A Review. Trop. Med. Infect. Dis. 2021, 6, 126. [Google Scholar] [CrossRef]
  8. Venancio, T.M.; Wilson, R.A.; Verjovski-Almeida, S.; DeMarco, R. Bursts of transposition from non-long terminal repeat retrotransposon families of the RTE clade in Schistosoma mansoni. Int. J. Parasitol. 2010, 40, 743–749. [Google Scholar] [CrossRef] [PubMed]
  9. Hull, R.; Dlamini, Z. The role played by alternative splicing in antigenic variability in human endo-parasites. Parasites Vectors 2014, 7, 53. [Google Scholar] [CrossRef]
  10. Davis, R.E.; Davis, A.H.; Carroll, S.M.; Rajkovic, A.; Rottman, F.M. Tandemly Repeated Exons Encode 81-Base Repeats in Multiple, Developmentally Regulated Schistosoma mansoni Transcripts. Mol. Cell. Biol. 1988, 8, 4745–4755. [Google Scholar] [CrossRef] [PubMed]
  11. DeMarco, R.; Mathieson, W.; Manuel, S.J.; Dillon, G.P.; Curwen, R.S.; Ashton, P.D.; Ivens, A.C.; Berriman, M.; Verjovski-Almeida, S.; Wilson, R.A. Protein variation in blood-dwelling schistosome worms generated by differential splicing of micro-exon gene transcripts. Genome Res. 2010, 20, 1112–1121. [Google Scholar] [CrossRef]
  12. Howe, K.L.; Bolt, B.J.; Shafie, M.; Kersey, P.; Berriman, M. WormBase ParaSite—A comprehensive resource for helminth genomics. Mol. Biochem. Parasitol. 2017, 215, 2–10. [Google Scholar] [CrossRef]
  13. Wilson, R.A.; Li, X.H.; MacDonald, S.; Neves, L.X.; Vitoriano-Souza, J.; Leite, L.C.C.; Farias, L.P.; James, S.; Ashton, P.D.; DeMarco, R.; et al. The Schistosome Esophagus Is a ‘Hotspot’ for Microexon and Lysosomal Hydrolase Gene Expression: Implications for Blood Processing. PLoS Neglected Trop. Dis. 2015, 9, e0004272. [Google Scholar] [CrossRef] [PubMed]
  14. Anderson, L.; Amaral, M.S.; Beckedorff, F.; Silva, L.F.; Dazzani, B.; Oliveira, K.C.; Almeida, G.T.; Gomes, M.R.; Pires, D.S.; Setubal, J.C.; et al. Schistosoma mansoni Egg, Adult Male and Female Comparative Gene Expression Analysis and Identification of Novel Genes by RNA-Seq. PLoS Neglected Trop. Dis. 2015, 9, e0004334. [Google Scholar] [CrossRef]
  15. Li, X.H.; de Castro-Borges, W.; Parker-Manuel, S.; Vance, G.M.; Demarco, R.; Neves, L.X.; Evans, G.J.; Wilson, R.A. The schistosome oesophageal gland: Initiator of blood processing. PLoS Neglected Trop. Dis. 2013, 7, e2337. [Google Scholar] [CrossRef]
  16. Lu, Z.; Sankaranarayanan, G.; Rawlinson, K.A.; Offord, V.; Brindley, P.J.; Berriman, M.; Rinaldi, G. The Transcriptome of Schistosoma mansoni Developing Eggs Reveals Key Mediators in Pathogenesis and Life Cycle Propagation. Front. Trop. Dis. 2021, 2, 713123. [Google Scholar] [CrossRef] [PubMed]
  17. The UniProt Consortium; Bateman, A.; Martin, M.-J.; Orchard, S.; Magrane, M.; Ahmad, S.; Alpi, E.; Bowler-Barnett, E.H.; Britto, R.; Bye-A-Jee, H.; et al. UniProt: The Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 2023, 51, D523–D531. [Google Scholar] [CrossRef]
  18. Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef]
  19. Notredame, C.; Higgins, D.G.; Heringa, J. T-coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000, 302, 205–217. [Google Scholar] [CrossRef]
  20. Lassmann, T.; Sonnhammer, E.L.L. Kalign, Kalignvu and Mumsa: Web servers for multiple sequence alignment. Nucleic Acids Res. 2006, 34, W596–W599. [Google Scholar] [CrossRef]
  21. Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef]
  22. Madeira, F.; Pearce, M.; Tivey, A.R.N.; Basutkar, P.; Lee, J.; Edbali, O.; Madhusoodanan, N.; Kolesnikov, A.; Lopez, R. Search and sequence analysis tools services from EMBL-EBI in 2022. Nucleic Acids Res. 2022, 50, W276–W279. [Google Scholar] [CrossRef] [PubMed]
  23. Saitou, N.; Nei, M. The neighbor-joining method: A new method for reconstructing evolutionary trees. Mol. Biol. Evol. 1987, 4, 406–425. [Google Scholar] [CrossRef] [PubMed]
  24. Löytynoja, A.; Goldman, N. Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis. Science 2008, 320, 1632–1635. [Google Scholar] [CrossRef] [PubMed]
  25. Letunic, I.; Bork, P. Interactive Tree of Life (iTOL) v5: An online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021, 49, W293–W296. [Google Scholar] [CrossRef]
  26. Crooks, G.E.; Hon, G.; Chandonia, J.-M.; Brenner, S.E. WebLogo: A Sequence Logo Generator. Genome Res. 2004, 14, 1188–1190. [Google Scholar] [CrossRef]
  27. Gasteiger, E.; Hoogland, C.; Gattiker, A.; Duvaud, S.; Wilkins, M.R.; Appel, R.D.; Bairoch, A. Protein Identification and Analysis Tools on the Expasy Server. In the Proteomics Protocols Handbook; Walker, J.M., Ed.; Humana Press: Totowa, NJ, USA, 2005; pp. 571–607. [Google Scholar]
  28. Duvaud, S.; Gabella, C.; Lisacek, F.; Stockinger, H.; Ioannidis, V.; Durinx, C. Expasy, the Swiss Bioinformatics Resource Portal, as designed by its users. Nucleic Acids Res. 2021, 49, W216–W227. [Google Scholar] [CrossRef]
  29. Philippsen, G.S.; Wilson, R.A.; DeMarco, R. Accelerated evolution of schistosome genes coding for proteins located at the host-parasite interface. Genome Biol. Evol. 2015, 7, 431–443. [Google Scholar] [CrossRef]
  30. Mathieson, W.; Wilson, R.A. A comparative proteomic study of the undeveloped and developed Schistosoma mansoni egg and its contents: The miracidium, hatch fluid and secretions. Int. J. Parasitol. 2010, 40, 617–628. [Google Scholar] [CrossRef]
  31. Fneich, S.; Théron, A.; Cosseau, C.; Rognon, A.; Aliaga, B.; Buard, J.; Duval, D.; Arancibia, N.; Boissier, J.; Roquis, D.; et al. Epigenetic origin of adaptive phenotypic variants in the human blood fluke Schistosoma mansoni. Epigenetics Chromatin 2016, 9, 27. [Google Scholar] [CrossRef]
  32. Vlaminck, J.; Lagatie, O.; Dana, D.; Mekonnen, Z.; Geldhof, P.; Levecke, B.; Stuyver, L.J. Identification of antigenic linear peptides in the soil-transmitted helminth and Schistosoma mansoni proteome. PLoS Neglected Trop. Dis. 2021, 15, e0009369. [Google Scholar] [CrossRef]
  33. Mambelli, F.S.; Figueiredo, B.; Morais, S.; Assis, N.; Fonseca, C.; Oliveira, S. Recombinant micro-exon gene 3 (MEG-3) antigens from Schistosoma mansoni failed to induce protection against infection but show potential for serological diagnosis. Acta Trop. 2020, 204, 105356. [Google Scholar] [CrossRef] [PubMed]
  34. Lopes, J.L.S.; Orcia, D.; Araujo, A.P.U.; DeMarco, R.; Wallace, B.A. Folding Factors and Partners for the Intrinsically Disordered Protein Micro-Exon Gene 14 (MEG-14). Biophys. J. 2013, 104, 2512–2520. [Google Scholar] [CrossRef] [PubMed]
  35. Orcia, D.; Zeraik, A.E.; Lopes, J.L.; Macedo, J.N.; dos Santos, C.R.; Oliveira, K.C.; Anderson, L.; Wallace, B.; Verjovski-Almeida, S.; Araujo, A.P.; et al. Interaction of an esophageal MEG protein from schistosomes with a human S100 protein involved in inflammatory response. Biochim. Biophys Acta. Gen. Subj. 2017, 1861, 3490–3497. [Google Scholar] [CrossRef] [PubMed]
  36. Martins, V.P.; Morais, S.B.; Pinheiro, C.S.; Assis, N.R.G.; Figueiredo, B.C.P.; Ricci, N.D.; Alves-Silva, J.; Caliari, M.V.; Oliveira, S.C. Sm10.3, a Member of the Micro-Exon Gene 4 (MEG-4) Family, Induces Erythrocyte Agglutination In Vitro and Partially Protects Vaccinated Mice against Schistosoma mansoni Infection. PLoS Neglected Trop. Dis. 2014, 8, e2750. [Google Scholar] [CrossRef]
  37. Felizatti, A.P.; Zeraik, A.E.; Basso, L.G.; Kumagai, P.S.; Lopes, J.L.; Wallace, B.; Araujo, A.P.; DeMarco, R. Interactions of amphipathic α-helical MEG proteins from Schistosoma mansoni with membranes. Biochim. Biophys Acta Biomembr. 2020, 1862, 183173. [Google Scholar] [CrossRef]
  38. Nedvedova, S.; Guillière, F.; Miele, A.E.; Cantrelle, F.-X.; Dvorak, J.; Walker, O.; Hologne, M. Divide, conquer and reconstruct: How to solve the 3D structure of recalcitrant Micro-Exon Gene (MEG) protein from Schistosoma mansoni. PLoS ONE 2023, 18, e0289444. [Google Scholar] [CrossRef]
  39. Romero, A.A.; Cobb, S.A.; Collins, J.N.R.; Kliewer, S.A.; Mangelsdorf, D.J.; Collins, J.J., 3rd. The Schistosoma mansoni nuclear receptor FTZ-F1 maintains esophageal gland function via transcriptional regulation of meg-8.3. PLoS Pathog. 2021, 17, e1010140. [Google Scholar] [CrossRef]
  40. Farias, L.P.; Vance, G.M.; Coulson, P.S.; Vitoriano-Souza, J.; Neto, A.P.d.S.; Wangwiwatsin, A.; Neves, L.X.; Castro-Borges, W.; McNicholas, S.; Wilson, K.S.; et al. Epitope Mapping of Exposed Tegument and Alimentary Tract Proteins Identifies Putative Antigenic Targets of the Attenuated Schistosome Vaccine. Front. Immunol. 2021, 11, 624613. [Google Scholar] [CrossRef]
  41. Soares, J.B.C.; Maya-Monteiro, C.M.; Bittencourt-Cunha, P.R.; Atella, G.C.; Lara, F.A.; D’avila, J.C.; Menezes, D.; Vannier-Santos, M.A.; Oliveira, P.L.; Egan, T.J.; et al. Extracellular lipid droplets promote hemozoin crystallization in the gut of the blood fluke Schistosoma mansoni. FEBS Lett. 2007, 581, 1742–1750. [Google Scholar] [CrossRef]
Figure 1. Screenshot of the entry for MEG-3.2 isoform 1 on WormBase ParaSite [12]. In the green highlighted box there is a zoom of the gene structure: vertical signs represent the exons and horizontal lines the introns. The length of each sign is directly proportional to the actual number of base pairs of exons and introns, respectively. This gene codes for 5 proteins deposited on UniProt [17].
Figure 1. Screenshot of the entry for MEG-3.2 isoform 1 on WormBase ParaSite [12]. In the green highlighted box there is a zoom of the gene structure: vertical signs represent the exons and horizontal lines the introns. The length of each sign is directly proportional to the actual number of base pairs of exons and introns, respectively. This gene codes for 5 proteins deposited on UniProt [17].
Biomolecules 13 01275 g001
Figure 2. Schematic representation of Schistosoma mansoni haplotype. The approximate position of each MEG-coding gene on each chromosome (colored cylinder) is indicated by a black bar and its name on WormBaseParaSite is noted on the left. The chromosome number is on top of each cylinder.
Figure 2. Schematic representation of Schistosoma mansoni haplotype. The approximate position of each MEG-coding gene on each chromosome (colored cylinder) is indicated by a black bar and its name on WormBaseParaSite is noted on the left. The chromosome number is on top of each cylinder.
Biomolecules 13 01275 g002
Figure 3. Phylogenetic tree colored in red and blue after clustering the clades by amino acid sequence similarity. Each protein is identified with its unique UniProt code, followed by its common name and WBPS gene number. In the red clade, an early event has separated MEG-29 and MEG-2 (ESP15, coded by Smp_183040.1) from the rest, so we have colored this branch in dark red. Similarly, on the blue clade, MEG-7, MEG-32 and MEG-16 departed early from the clade and are highlighted in light blue.
Figure 3. Phylogenetic tree colored in red and blue after clustering the clades by amino acid sequence similarity. Each protein is identified with its unique UniProt code, followed by its common name and WBPS gene number. In the red clade, an early event has separated MEG-29 and MEG-2 (ESP15, coded by Smp_183040.1) from the rest, so we have colored this branch in dark red. Similarly, on the blue clade, MEG-7, MEG-32 and MEG-16 departed early from the clade and are highlighted in light blue.
Biomolecules 13 01275 g003
Figure 4. WebLogo representation of the alignment of all the 87 MEG protein sequences. The sequences of the signal peptide have been omitted for clarity. Even if the longest protein is 189 residues long, the number of gaps lengthens the aligned sequences to 246 residues.
Figure 4. WebLogo representation of the alignment of all the 87 MEG protein sequences. The sequences of the signal peptide have been omitted for clarity. Even if the longest protein is 189 residues long, the number of gaps lengthens the aligned sequences to 246 residues.
Biomolecules 13 01275 g004
Figure 5. WebLogo representation of the alignment of the red clade composed of 35 MEG proteins, i.e., isoforms of MEG-1, MEG-3, MEG-9, MEG-28, MEG-29, MEG-31, and C4QPS0 of the MEG-2 family. The sequence of the signal peptide has been omitted for clarity.
Figure 5. WebLogo representation of the alignment of the red clade composed of 35 MEG proteins, i.e., isoforms of MEG-1, MEG-3, MEG-9, MEG-28, MEG-29, MEG-31, and C4QPS0 of the MEG-2 family. The sequence of the signal peptide has been omitted for clarity.
Biomolecules 13 01275 g005
Figure 6. Sequence alignment of proteins coded by the meg-2 family. Sequences from 1 to 13 belong to the blue clade and the latter sequence (#14) belongs to the red clade. Conserved residues are highlighted in yellow.
Figure 6. Sequence alignment of proteins coded by the meg-2 family. Sequences from 1 to 13 belong to the blue clade and the latter sequence (#14) belongs to the red clade. Conserved residues are highlighted in yellow.
Biomolecules 13 01275 g006
Table 1. Summary of the cluster of the blue components of meg-2 on chromosome 3.
Table 1. Summary of the cluster of the blue components of meg-2 on chromosome 3.
MEG-2 Coding Genes from 5′ to 3′ in the Second Half of Chromosome 3
WBPS gene identifierSmp_1598030.1Smp_159800.1Smp_183010.1Smp_183030.1Smp_183020.1 and Smp_183020.2Smp_345100.1Smp_336990
UniProt IDA0A3Q0KR24C4QG05
and
D7PD69
C4QPR6C4QPR9C4QPR8
and
A0A3Q0KTV3
A0A5K4FFX0 and
D7PD77
A0A5K4FDB9
D7DP78
D7DP76
D7DP75
Table 2. Proposed changes in the nomenclature of MEG protein products -2, -4, -8, -10 based on the filiation presented in Figure 3.
Table 2. Proposed changes in the nomenclature of MEG protein products -2, -4, -8, -10 based on the filiation presented in Figure 3.
WBPS Gene IDUniProt IDOld NameNew Name
Smp_183040.1C4QPS0MEG-2/ESP15MEG-2
Smp_1598030.1A0A3Q0KR24MEG-2/ESP15MEG-2.1
Smp_159800.1C4QG05MEG-2/ESP15MEG-2.2 isoform 1
D7PD69MEG-2.4 isoform 1MEG-2.2 isoform 2
Smp_183010.1C4QPR6MEG-2/ESP15MEG-2.3
Smp_183030.1C4QPR9MEG-2/ESP15MEG-2.4
Smp_183020.1C4QPR8MEG-2 isoform 1MEG-2.5 isoform 1
Smp_183020.2A0A3Q0KTV3MEG-2 isoform 2MEG-2.5 isoform 2
Smp_345100.1A0A5K4FFX0MEG-2.2 isoform 1MEG-2.6 isoform 1
D7PD77MEG-2.2 isoform 2MEG-2.6 isoform 2
Smp_336990A0A5K4FDB9UncharacterizedMEG-2.7 isoform 1
D7DP78MEG-2.1 isoform 1MEG-2.7 isoform 2
D7DP76MEG-2.1 isoform 2MEG-2.7 isoform 3
D7DP75MEG-2.1 isoform 3MEG-2.7 isoform 4
Smp_307220.1A0A5K4F627Developmentally regulated antigen 10.3MEG-4.1 isoform 1
Smp_307220.2A0A5K4F2K5Developmentally regulated antigen 10.3MEG-4.1 isoform 2
Smp_307220.3Q86D79Developmentally regulated antigen 10.3MEG-4.1 isoform 3
Smp_307240.1A0A5K4F4B1MEG-4/antigen 10.3MEG-4.2
Smp_085840C4KE8MEG-4/antigen 10.3MEG-17
Smp_172180.1G4VLP3MEG-8MEG-8
Smp_171190.1G4VCW5MEG-8MEG-18
Smp_152590.1A0A3Q0KQ39MEG-10 isoform 2MEG-10.1 isoform 1
Smp_152590.2G4LYD0MEG-10 isoform 2MEG-10.1 isoform 2
Smp_243730.1A0A0U5KJN7MEG-10.2MEG-10.2
Table 3. Proposed changes in the nomenclature of MEG-3 protein products, based on the filiation presented in Figure 3.
Table 3. Proposed changes in the nomenclature of MEG-3 protein products, based on the filiation presented in Figure 3.
WBPS Gene IDUniProt IDOld NameNew Name
Smp_138060.1D7PD62MEG-3.3 isoform 1
D7PD63MEG-3.3 isoform 2
D7PD64MEG-3.3 isoform 3
A0A3Q0KMS0MEG-3 Grail familyMEG-3.3 isoform 4
Smp_138070.1D7PD52MEG-3.2 isoform 2.1MEG-3.2 isoform 1
D7PD53MEG-3.2 isoform 2
D7PD54MEG-3.2 isoform 3
D7PD57MEG-3.2 isoform 6MEG-3.2 isoform 4
D7PD60MEG-3.2 isoform 9 MEG-3.2 isoform 5
Smp_138080.1D7PD49MEG-3.1 isoform 1
D7PD51MEG-3.1 isoform 3MEG-3.1 isoform 2
A0A3Q0KMU6 MEG-3 Grail familyMEG-3.1 isoform 3
Table 4. Proposed changes in the nomenclature of MEG-15 protein products.
Table 4. Proposed changes in the nomenclature of MEG-15 protein products.
WBPS Gene IDUniProt IDOld NameNew Name
Smp_010550.1A0A3Q0KC91Uncharacterized proteinMEG-15 isoform 1
Smp_010550.2A0A5K4E9M7Uncharacterized proteinMEG-15 isoform 2
Smp_010550.3G4VMN2Uncharacterized proteinMEG-15 isoform 3
Smp_010550.4A0A5K4E9G8Uncharacterized proteinMEG-15 isoform 4
Table 5. Proposed changes in the nomenclature of MEG-1 protein products.
Table 5. Proposed changes in the nomenclature of MEG-1 protein products.
WBPS Gene IDUniProt IDOld NameNew Name
Smp_326790.1A0A5K4F8B3MEG-1MEG-1.1 isoform 1
A0A5K4F8U8MEG-1MEG-1.1 isoform 2
Smp_122630.1A0A3Q0KKC4MEG-1 isoform 1MEG-1.2 isoform 1
D7PD83MEG-1 isoform 5MEG-1.2 isoform 2
D7PD84MEG-1 isoform 6MEG-1.2 isoform 3
D7PD86MEG-1 isoform 8MEG-1.2 isoform 4
D7PD88MEG-1 isoform 10 MEG-1.2 isoform 5
D7PD89MEG-1 isoform 11MEG-1.2 isoform 6
D7PD93MEG-1 isoform 16MEG-1.2 isoform 7
Smp_122630.2A0A5K4EKN1MEG-1 isoform 2MEG-1.2 isoform 8
D7PD79MEG-1 isoform 1MEG-1.2 isoform 9
D7PD91MEG-1 isoform 14MEG-1.2 isoform 10
D7PD94MEG-1 isoform 18MEG-1.2 isoform 11
D7PD95MEG-1 isoform 17MEG-1.2 isoform 12
D7PD99MEG-1 isoform 12 MEG-1.2 isoform 13
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nedvědová, Š.; De Stefano, D.; Walker, O.; Hologne, M.; Miele, A.E. Revisiting Schistosoma mansoni Micro-Exon Gene (MEG) Protein Family: A Tour into Conserved Motifs and Annotation. Biomolecules 2023, 13, 1275. https://doi.org/10.3390/biom13091275

AMA Style

Nedvědová Š, De Stefano D, Walker O, Hologne M, Miele AE. Revisiting Schistosoma mansoni Micro-Exon Gene (MEG) Protein Family: A Tour into Conserved Motifs and Annotation. Biomolecules. 2023; 13(9):1275. https://doi.org/10.3390/biom13091275

Chicago/Turabian Style

Nedvědová, Štěpánka, Davide De Stefano, Olivier Walker, Maggy Hologne, and Adriana Erica Miele. 2023. "Revisiting Schistosoma mansoni Micro-Exon Gene (MEG) Protein Family: A Tour into Conserved Motifs and Annotation" Biomolecules 13, no. 9: 1275. https://doi.org/10.3390/biom13091275

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop