1. Introduction
SARS-CoV-2 is a newly emerged zoonotic single-stranded RNA virus of the
Coronaviridae family and the causative agent of the COVID-19 pandemic, the scale and impact of which is unprecedented in modern times. Genetic analysis of the spike protein (S) of SARS-CoV-2 shows that it has acquired a few features not found among previously emerged human coronaviruses [
1]. This involves mutation of the receptor binding domain achieving high affinity binding to the human ACE2 receptor, as well as insertion of a functional polybasic furin cleavage site between the two subunits of the spike protein [
2]. Proteolytic processing at the S1/S2 site is suggested to help adopt a favorable open conformation for ACE2 binding and is speculated to be of value for transmissibility and pathogenicity [
3,
4,
5].
Fast-acting measures are needed to relieve the global medical and economic burden imposed by the COVID-19 pandemic. Fortunately, several nucleic acid-based and inactivated virus vaccines have been rolled out at record speeds and vaccination programs have been initiated across the globe [
6,
7,
8,
9]. Many more vaccine clinical trials are underway, including subunit formulations, which rely on recombinant viral proteins for immunization [
10]. An important element in the development of effective subunit vaccines is the characterization of the glycosylation of viral proteins. There are two major types of human glycosylation relevant for enveloped viruses—N-linked glycosylation, added co-translationally in the ER to N-X-S/T consensus sequons, and mucin type O-linked glycosylation, initiated in the Golgi by a family of polypeptide GalNAc transferases (GalNAc-Ts) modifying Ser, Thr, and possibly Tyr. Different viruses have varying preferences for the two different types of glycans, with e.g., HIV-1, HCV and influenza virus having dense N-glycosylation on their surface proteins, whereas Ebola virus and herpesviruses additionally have many O-linked glycans [
11]. Furthermore, both types of site-specific glycosylation have been shown to affect viral glycoprotein secretion or function [
12]. Glycans represent structural features not encoded in the gene sequence, yet play a crucial role in immune recognition affecting vaccine designs. Such vaccine candidates are often expressed in cell lines that do not recapitulate the glycosylation pattern on native pathogens, and hence potentially do not elicit biologically relevant immune responses. Glycosylation of viral surface proteins is paramount for immune shielding, and altering positions of N-glycosylation sites by mutation of underlying acceptor amino acids is a well-established immune evasion strategy [
11,
12]. A number of different protein formulations and expression systems have been used to deduce structure and post-translational modifications (PTMs) of SARS-CoV-2 spike protein (S), and some of those have been explored as candidates in vaccine research. So far, 22 highly occupied N-linked glycosylation sites have been identified on protein S, as well as a variable number of O-linked glycosylation sites at low stoichiometries [
13,
14,
15,
16,
17,
18].
Mucin type O-linked glycosylation is initiated by a competitive action of a family of 20 polypeptide GalNAc transferases (GalNAc-Ts) expressed in a tissue specific manner via the transfer of an N-acetylgalactosamine to Ser, Thr, and possibly Tyr residues. All 20 isoforms are found in humans, and 14 homologs have been identified in the fruit fly [
19,
20]. In humans, O-glycosylation can be elongated to form four common core structures, of which core 1 (Galβ3GalNAcα-S/T) is the most abundantly found in epithelial cell lines [
21]. Several core 1 synthase genes have also been described in the fly, and core 1 is the most predominant O-glycan structure in
Drosophila embryos [
22,
23]. It is difficult to predict such glycosylation events and challenging to map specific locations of O-glycans [
19,
24]. Collision induced dissociation (CID) based MS fragmentation methods often result in loss of the glycan modification on product fragment ions, which makes it difficult to map precise O-glycan positions on peptides with multiple Ser and Thr unambiguously. Thus, confident mapping of O-glycosites often require high performance instrumentation equipped with technology supporting electron transfer dissociation (ETD) fragmentation, which allows generation of glycan retaining peptide fragment ions. In some instances, optimized higher energy CID (HCD) methods can be sufficient, especially for simple O-glycan structures [
25]. We have previously established paired ETciD and HCD workflows to enable robust peptide sequencing and high precision glycosite mapping and applied them for generating human and viral O-glycoproteomes [
26,
27,
28,
29,
30]. Here, we aimed to compare the O-glycosylation patterns on different recombinant SARS-CoV-2 S proteins expressed in insect or human cells and investigate, whether O-glycosylation may have any implications on the choice of immunogens for vaccination studies. Using the described MS techniques, we mapped a total of 25 O-glycosites on three different subunit vaccine candidates expressed either in human or insect cells and estimated their occupancy, as well as location in the context of 3D S protein structure.
2. Materials and Methods
2.1. Protein Expression and Purification in S2 Cells
The nucleotide sequence for the prefusion-stabilized spike protein ectodomain (aa 16–1208), modified with an N-terminal BiP signal peptide, two proline substitutions (aa 986, 987), an AARA substitution at the furin cleavage site, and a twin strep tag (IBA, GmbH), was synthesized and subcloned into a pExpreS2-1 (ExpreS2ion Biotechnologies, Hørsholm, Denmark) vector by Geneart. Transiently transfected Drosophila melanogaster S2 cells (ExpreS2 Cells, ExpreS2ion Biotechnologies, Hørsholm, Denmark) were grown shaking in suspension at 25 °C for three days after which the supernatant was harvested by centrifugation, concentrated and buffer exchanged approximately 10-fold. The prefusion-stabilized spike protein ectodomain was captured on a 5 mL Strep-Tactin XT column (IBA, GmbH, Göttingen, Germany) and eluted using 50 mM biotin. The protein was further purified by size exclusion chromatography using a Superdex200 column (Cytiva, Marlborough, MA, USA) equilibrated in 1 × PBS. By analytical SEC the protein behaves predominantly like a dimer protein.
The nucleotide sequence encoding the spike protein receptor binding domain (RBD, amino acids 305 to 543), fused N terminally to a BiP secretion signal, a 10× His tag and proprietary catcher domain (Adaptvac, Hørsholm, Denmark), was synthesized and subcloned into a pExpreS2-1 (ExpreS2ion Biotechnologies, Hørsholm, Denmark) vector by Geneart. The protein was expressed transiently in insect cells (ExpreS2 Cells, ExpreS2ion Biotechnologies, Hørsholm, Denmark) and harvested after three days of culture. The cell supernatant was harvested by centrifugation and concentrated (10-fold) and buffer exchanged to loading buffer (10-fold) by tangential flow filtration (TFF). The protein was loaded onto a 1 mL HisTrap excel (Cytiva, Marlborough, MA, USA) and washed to baseline in loading buffer. The protein was eluted with Imidazole. Elution fractions were concentrated, and the protein was loaded onto a gel filtration column (Superdex 200pg, Cytiva) equilibrated in 1 × PBS (Gibco, Waltham, Massachusetts, USA). The RBD protein is monomer in solution by analytical SEC analysis (not shown).
2.2. Protein Expression and Purification in HEK 293F Cells
A synthetic gene encoding SARS-CoV-2 S (sequence derived from Genbank accession: MN908947.3) was cloned into a customized DNA vector for expression in mammalian tissue culture. Final expression constructs featured a fragment encoding the native S ectodomain, including viral signal peptide (residues 1–1214), with prefusion stabilizing and 986P, 987P variants first reported by [
18]. HEK 293F-expressed S protein featured a C-terminal leucine zipper (GCN4) motif, and His
8-tag for Ni
2+-affinity purification. Pure S-encoding plasmid DNA was transfected into HEK 293F cells (approximately 10
6 mL
−1) using PEI (polyethylimine) MAX at 5:1
w/w (final culture DNA concentration approximately 1 μg mL
−1). After 96 h condition media supernatant was harvested by low-speed centrifugation, and recombinant S trimers were purified directly from media by IMAC using a 5 mL HisTrap FF crude column (GE). Protein-containing fractions were pooled and concentrated before application to a Superose 6 10/300 GL column (GE) for final purification via size-exclusion chromatography.
2.3. In-Gel Digestion
Next, 2× 20 μg of each of the three different SARS-CoV-2 spike proteins were mixed with 4× NuPAGE™ LDS sample buffer (Thermo Fisher Scientific, Waltham, Massachusetts, USA) and up to 10 mM dithiothreitol (DTT, Sigma-Aldrich, Darmstadt, Germany). The samples were run on Novex 4–12 % gradient gel (Bis-Tris) in 1× NuPAGE™ MES buffer (Invitrogen, Waltham, Massachusetts, USA) at 150 V for 60 min on ice followed by 30 min at 200 V. The gel was then stained with InstantBlue® protein stain (Abcam, Cambridge, UK). Bands of interest were excised, cut into smaller pieces (1 mm × 1 mm), rinsed with water and 2× 30 min washed in 50 mM ammonium bicarbonate (AmBic, Sigma-Aldrich, Darmstadt, Germany) at 37 °C. The gel pieces were then shrunk in 100 % acetonitrile (ACN) and shaken for 5 min. Solvent was removed, and gel pieces rehydrated in 10 mM DTT in 50 mM AmBic, followed by 40 min incubation at 60 °C. The gel pieces were again shrunk in 100 % ACN and shaken for 30 min. Solvent was removed, and gel pieces rehydrated in 55 mM iodoacetamide (IAA, Sigma-Aldrich, Darmstadt, Germany) in 50 mM AmBic, followed by 40 min incubation at room temperature. The gel pieces were again shrunk in 100 % ACN and shaken for 15 min. The solvent was removed, gel pieces briefly rinsed with 50 mM AmBic and rehydrated in a small volume (10 µL) of 50 mM AmBic supplemented with 1 U PNGase F (Roche, Basel, Switzerland) at 37 °C for 30 min. The gel pieces were then topped up with 50 mM AmBic to cover the surface and incubated at 37 °C overnight. After N-glycan removal, the gel pieces were washed in 50 mM AmBic at 37 °C for 30 min, followed by shrinking in 100 % ACN for 5 min. The solvent was removed, briefly rinsed with 50 mM AmBic and rehydrated in a small volume (15 µL) of 50 mM AmBic supplemented with 0.5 µg of chymotrypsin (Roche, Basel, Switzerland) or Glu-C (Roche, Basel, Switzerland) at 25 °C for 30 min. The gel pieces were then topped up with 50 mM AmBic to cover the surface and incubated at 25 °C for 8 h. The Glu-C treated gel pieces were then dried by SpeedVac and rehydrated in a small volume (15 µL) of 50 mM AmBic supplemented with 0.5 µg of trypsin (Roche, Basel, Switzerland) at 37 °C for 30 min. The gel pieces were then topped up with 50 mM AmBic to cover the surface and incubated at 37 °C for 8 h. Peptides were collected followed by 2× 15 min water bath sonication in 300 µL 50 mM AmBic followed by 15 min sonication in 300 µL 50 % ACN. For individual samples, all supernatants were pooled and dried using SpeedVac to less than 500 µL. The peptides were desialylated in 50 mM sodium acetate (Merck, Darmstadt, Germany) buffer pH 5 using Clostridium perfringens neuraminidase (Sigma-Aldrich, Darmstadt, Germany) at 0.1 U/mL for 2 h at 37 °C. Desialylated peptides were stage-tip purified as previously described, dried, and reconstituted in 12 µL of 0.1 % formic acid prior to analysis.
2.4. Mass Spectrometry Analysis with Orbitrap Fusion Lumos Mass Spectrometer
LC MS/MS site-specific O-glycopeptide analysis of was performed on EASY-nLC1200 UHPLC (Thermo Fisher Scientific, Waltham, MA, USA) interfaced via nanoSpray Flex ion source to an Orbitrap Fusion Lumos Tribrid MS (Thermo Fisher Scientific, Waltham, MA, USA). The nLC was operated in a single analytical column set up using PicoFrit Emitters (New Objective, 75 mm inner diameter, Littleton, MA, USA) packed in-house with Reprosil-Pure-AQ C18 phase (Dr. Maisch, 1.9-mm particle size, 19–21 cm column length, Ammerbuch-Entringen, Germany). Each sample was injected onto the column and eluted in gradients from 3 to 32 % B in 95 min, from 32 to 100 % B in 10 min and 100 % B in 15 min at 200 nL/min (Solvent A, 100 % H2O; Solvent B, 80 % acetonitrile; both containing 0.1% (v/v) formic acid). A precursor MS1 scan (m/z 350–1700) was acquired in the Orbitrap at the nominal resolution setting of 120,000, followed by Orbitrap HCD-MS2 and ETD-MS2 at the nominal resolution setting of 60,000 of the five most abundant multiply charged precursors in the MS1 spectrum; a minimum MS1 signal threshold of 50,000 was used for triggering data-dependent fragmentation events. Stepped collision energy +/− 5 % at 27 % was used for HCD MS/MS fragmentation and charge dependent calibrated ETD reaction time was used with CID supplemental activation at 30 % collision energy for ETD MS/MS fragmentation.
For the site-specific glycopeptide identification, the corresponding HCD MS/MS and ETD MS/MS were analyzed by Proteome discoverer 2.2 software (Thermo Fisher Scientific, Waltham, MA, USA) using Sequest HT as a searching engine. Carbamidomethylation at cysteine was used as fixed modification and oxidation at methionine, asparagine deamidation, and HexNAc or Hex-HexNac at serine/threonine/tyrosine were used as variable modifications. Precursor mass tolerance was set to 10 ppm and fragment ion mass tolerance was set to 0.02 Da. Data were searched against the human-specific UniProt KB/SwissProt-reviewed database downloaded on January, 2013 and construct-dependent viral protein sequence databases. All spectra of interest were manually inspected and validated to prove the correct peptide identification and glycosite localization.
2.5. In-Solution Digestion and Mass Spectrometry Analysis with Q Exactive HF-X Mass Spectrometer
The COV19 double mutant spike trimer from HEK 293F cell was prepared for MS analysis as previously described [
31], with minor modifications. In brief, the protein (50 µg) was denatured and aliquots (10 µg) were digested under five different protease conditions including chymotrypsin, a combination of trypsin and chymotrypsin, trypsin, elastase, and subtilisin as described. All samples were then pooled and deglycosylated by Endo H followed by PNGase F in O18-water.
The combined sample was analyzed on an Q Exactive HF-X mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). Each sample was run twice as replicate. Samples were injected directly onto a 25 cm, 100 μm ID column packed with BEH 1.7 µm C18 resin (Waters, Milford, MA, USA). Samples were separated at a flow rate of 300 nL/min on a nLC 1200 (Thermo Fisher Scientific, Waltham, MA, USA). Solutions A and B were 0.1 % formic acid in 5 % and 80 % acetonitrile, respectively. A gradient of 1–25 % B over 160 min, an increase to 40 % B over 40 min, an increase to 90 % B over another 10 min and held at 90 % B for 30 min was used for a 240 min total run time. Column was re-equilibrated with solution A prior to the injection of sample. Peptides were eluted directly from the tip of the column and nanosprayed directly into the mass spectrometer by application of 2.8 kV voltage at the back of the column. The HFX was operated in a data dependent mode. Full MS1 scans were collected in the Orbitrap at 120k resolution. The ten most abundant ions per scan were selected for HCD MS/MS at 25NCE. Dynamic exclusion was enabled with exclusion duration of 10 s and singly charged ions were excluded.
The MS data were processed as described previously [
31] with minor modifications. The data were searched against the proteome database and quantified using peak area in Integrated Proteomics Pipeline-IP2. The parameters were set as: MS1 tolerance ≤50 ppm, MS2 tolerance ≤20 ppm, no enzyme specificity, carboxyamidomethylation (+57.02146 C) as a fixed modification, and oxidation (+15.9994 M), deamidation (+2.988261 N), GlcNAc (+203.079373 N), GalNAc (+203.079373 S/T), Gal-GalNAc (+365.136681 S/T) and pyroglutamate formation from N-terminal glutamine residue (−17.026549 Q) as variable modifications.
2.6. Molecular Modelling
All cryo-EM structures of the SARS CoV-2 S-protein trimer that are currently available in the Protein Data Bank lack significant parts of the protein structure due to flexibility. To visualize the 3D positions of O-glycosites in the context of a fully-glycosylated full-length S protein structure, we downloaded the model 6vsb_1_1_1 from
http://www.charmm-gui.org/?doc=archive&lib=covid19 (accessed on 23 January 2021) [
32]. This full-length model is currently the most reliable and contains modelled N-glycans in agreement with [
16]. To check for accessible O-glycosites on the protein surface, we used Conformational Analysis Tools software (
www.md-simulations.de/CAT/, accessed on 23 January 2021) interfaced with STRIDE [
33] to estimate the solvent accessible surface (SAS) of Ser/Thr residues. For 3D visualization of the experimentally confirmed O-glycosites, Galβ3GalNAcα disaccharides were attached manually and relaxed based on short MD simulations (12 ns for S-protein trimer, 200 ns for RBD monomer) using YASARA [
34].
4. Discussion
Enveloped viruses acquire host glycosylation while their proteins travel through the secretory pathway. Glycosylation of viral envelope proteins significantly alter the shape of the protein molecules, shielding the underlying amino acids and thus their recognition by the host immune system. Viruses can acquire both N- and O-linked glycans, but the former is much more widely explored [
12]. Here, we aimed to discover O-linked glycan attachment sites on SARS-CoV-2 S protein and explore the potential implications of such modifications using three different recombinant protein entities. We report 25 unique O-glycosites, most of which were identified unambiguously. This includes the previously identified T323 and T678 adjacent to S1/S2 cleavage site [
14,
15,
16,
17].
We found similar glycosylation patterns on insect and human cell derived ectodomains, which suggests that the peptides can be glycosylated by isoforms of GalNAc-Ts with broad acceptor substrate specificities. We discovered wide distribution of O-glycosites over the surface areas of the two investigated proteins and found no bias towards specific regions. A larger proportion of sites harboring core 1 structures (Galβ3GalNAcα-S/T) were identified on the insect cell-derived ectodomain, compared to mostly short immature Tn structures (GalNAcα-S/T) on the HEK 293F-derived ectodomain. Since S2 cells abundantly express paucimannosidic N-glycans, it is possible that the surface is more accessible to the Core-1 enzyme elongating the short Tn structure with a galactose residue [
35]. Furthermore, the predominantly dimeric conformation of the insect ectodomain may also provide better access for enzymes. However, since the size exclusion chromatography analysis is done on a purified protein, we do not know, whether it is an accurate representation of the quaternary structure in the cell culture supernatants. In a similar context, more O-glycosites were found on the monomeric RBD compared to ectodomain presented RBD, suggesting better accessibility to the initiating GalNAc-Ts. However, low stoichiometries were estimated, making it unlikely to have an effect on receptor binding. In the future it would be relevant though to map O-glycosites on native viruses derived from specified respiratory cell subtypes.
When looking at molecular dynamics simulations of N-linked glycan molecular movements on S, it is evident that some surfaces of the molecule are still accessible, and these would be the logical regions to find O-glycans [
17]. In this context, we were surprised to find, that more than 60 % of our identified sites were located right next to N-glycosite positions. After inspecting the sequences of such O-glycopeptides, we came to realize that they were primarily located on Asn containing peptides unmodified by N-glycans. This is in contrast to deamidated and presumably de-N-glycosylated peptides, where we only in a few exceptions found O-glycans. Importantly, for seven out of nine investigated regions in the Orbitrap dataset, more than 85 % of non-N-glycosylated peptides were in fact modified with O-glycans, suggesting O-glycosylation machinery in most instances fill in the “empty space” unoccupied by N-glycans. This was confirmed by the Q Exactive data, where PNGase F N-deglycosylation was performed in O
18 water, and all the identified peptides contained either N-glycans, or O-glycans, but not both.
All of our identified O-glycosites are of very low occupancy, based on integrated MS1 signals of relevant (glyco)peptides. It is important to consider, however, that glycopeptides exhibit poorer ionization efficiencies, and quantification purely based on signal magnitude is likely underestimating the true stoichiometry. Regardless, low occupancy of O-glycosites close to N-glycosites fits well with published data on S N-glycosylation. The 22 SARS-CoV-2 S N-glycosites are reported to be more than 95 % occupied in recent glycoproteomic analyses of SARS-CoV-2 S [
13,
14,
15,
16,
17]. Based on our data, it is unlikely, that both N- and O-glycans exist in close vicinity to each other, as we found such O-glycosites almost exclusively on peptides with unoccupied N-X-S/T sequons. It is unlikely that the low abundance of O-glycosites would have biological consequences to protein function or immunogenicity and might not be of concern for choice of immunogen formulation of a highly N-glycosylated protein. However, future intact mass-based experiments combined with immunological studies are necessary to have a definitive answer. Our analyses suggest that O-glycans rather serve to ensure maximum shielding of the minor fraction of peptides that are unoccupied by N-glycans. Based on our molecular modelling, N- and O-glycans were often “pointed” at different angles away from each other; however, there is likely insufficient accessibility to initiating enzymes when N-glycans are occupied. In this context, it would be interesting to find out whether site directed mutagenesis of specific N-linked glycans would increase the occupancy of O-linked glycans in those regions, as well as investigate O-glycan occupancy in less densely N-glycosylated viruses.
In conclusion, sensitive instrumentation and appropriate techniques make it possible to identify low abundance post-translational modifications and accurately map their positions, which allowed us to contribute to a more comprehensive map of SARS-CoV-2 S protein glycosylation, with wide distribution of O-glycosites on the surface of S protein ectodomains. We suggest that O-glycans constitute a minor component of the S protein glycan shield, yet they may serve an important function of covering up the unoccupied N-glycosylation sequons.