1. Introduction
Efficient immunological protection in COVID-19 is currently the major factor shaping the global perspective on the growth or recession of further pandemic waves. The SARS-CoV-2 virus has constantly been producing a multitude of variants; many of these variants, such as Delta or Omicron, triggered pandemic waves obviously using evasion of existing immunity against the virus to spread within the global population [
1,
2]. Importantly, antigenic drift, driven by the pressure from specific immunity exerted on the virus, significantly contributes to the selection of new variants able to evade immunological control [
3]. For these reasons, the hope for efficient control and suppression of the pandemic simply through global vaccinations and immunological memory from previous infections has been tempered.
Virus-specific antibodies are, in fact, a pool of antibodies targeting different structural proteins. Structural proteins of SARS-CoV-2 include nucleocapsid protein (protein N), spike protein (protein S), envelope protein (protein E), and membrane protein (protein M) [
4]. The most important target for the immune response is protein S, due to the virus-blocking potential of antibodies targeting this protein. This protein mediates binding to the ACE2 receptor; thus, antibodies targeting the S protein may neutralize the virus. The S protein is relatively large, 1273 amino acids (aa) long (acc. no.: YP_009724390.1), and it consists of two major domains: S1 and S2. The S1 subunit function is binding to receptors in human cells. Its major parts are the N-terminal domain (NTD) and receptor binding domain (RBD). The S2 subunit, in turn, is engaged in the viral and host cell membranes’ fusion. S2 consists of the cytoplasmic tail (CT), transmembrane domain (TM), heptad repeat 1 (HR1), heptad repeat 2 (HR2), fusion peptide (FP), central helix (CH), and connector domain (CD). Also, the cleavage site S1/S2 plays a key role in virus entry to a human host cell. Linear epitopes on the S protein have been demonstrated to elicit neutralizing antibodies in COVID-19 patients [
5]. The N protein, in turn, can be a target for serological diagnostics, and at the beginning of pandemics, it was used in the majority of serological tests. Later, it was replaced by protein S, due to the direct indication of S-targeting responses on the neutralization potential of antibodies, and also due to potential problems with specific response induction in asymptomatic infections. However, the relatively conserved protein N is still being used to confirm SARS-CoV-2-related etiology of already cleared infections, since the N protein is unique for SARS-CoV-2, highly immunogenic, and not included in anti-COVID-19 vaccines [
6]. Two other proteins, M and E, seem to have lower significance as potential targets for specific antibodies. However, one cannot exclude the role of these antibodies in virus inhibition.
After two years of the pandemic and the continuous emergence of new variants, it is challenging to thoroughly understand all the possible combinations of previous infections, various vaccination types, and potential exposure to new variants in an individual patient. These combinations further generate myriads of possibilities at the population level. Thus, instead of variant-to-variant comparisons, the identification of key protein regions that are linked to the immune evasion of existing immunity against the virus could be more helpful for predictions on pandemic potential in new SARS-CoV-2 variants.
Here, we validate a new method for the experimental identification of regions that can play a role in
cross-reacting IgG hotspots in SARS-CoV-2 structural proteins. The
cross-reacting IgG hotspots have been identified by a high-throughput assay of cross-reactivity of antibodies elicited by SARS-CoV-2, based on the modified VirScan technology [
7]. This approach employs a phage display library of epitopes, and it allows for significant extension of the most popular method for B-cell epitope identification (by microarrays) and for massive testing of a multitude of epitopes. We affirm the validity of identifying immunogenicity regions in both our current protocol and previous methodologies as it constitutes the core application of all VirScan-derived assays. By extending the scope of the library, experimental assay, and calculations to encompass multivariant analysis, we have introduced a powerful new tool for predictions grounded in empirically observed interactions. Consequently, we propose a combined approach involving the identification of immunogenicity regions and the pinpointing of cross-reacting IgG hotspots. This synergistic strategy promises to yield more valuable insights into the immunogenicity of the virus (or other pathogens). Here, 34,949 oligopeptides have been tested in a single assay, thus representing all variants of SARS-CoV-2 available in databases at the moment of the library creation (
Supplementary Materials).
2. Materials and Methods
2.1. Serum Donors
Blood serum was collected from patients hospitalized due to COVID-19: PCR-confirmed infection with SARS-CoV-2 (at least 1 month after infection), non-vaccinated, over 18 years old, hospitalized in COVID-19 ward in Healthcare Centre Bolesławiec, Poland from 27 January 2021 to 26 March 2022 (alpha variant: n = 15, delta variant: n = 8).
2.2. Bioethics Statements
The study was conducted in accordance with the principles of the Declaration of Helsinki. The research was approved by the local Bioethical Commission of the Regional Specialist Hospital in Wroclaw (approval number: KB/02/2020, policy No. COR193657). During the individual interview, all information about the study was provided and written consent was obtained from each participant. The written consent was accepted by the local Bioethical Commission of the Regional Specialist Hospital in Wroclaw (approval number: KB/02/2020).
2.3. Blood Samples
Blood samples were collected in test tubes (BD SST II Advance), left to clot for 1 h at room temperature (RT), and serum was separated from the clot by centrifugation (15 min, 2000 g, RT) and then stored at −20 °C for further use.
2.4. Identification of SARS-CoV-2 Variants Infecting Investigated Patients
In selected hospitalized patients, SARS-CoV-2 variants were identified from biological material collected as nasopharyngeal swabs. The material was processed for standard RNA isolation and applied to the targeted sequencing approach with the Ion AmpliSeq SARS-CoV-2 Insight Research Assay (Thermo Fisher Scientific, Waltham, MA, USA, according to the manufacturer’s instructions). Briefly, patients were sampled by nasopharyngeal swabbing with sterile medical swabs (Chemivet, Olsztyn, Polska), viral RNA was isolated on silica spin columns (QIAquick), and the viral load was quantified by the qPCR standard diagnostic test (primers 5′-ACAGGTACGTTA ATAGTTAATAGCGT-3′, 5′-ATATTGCAGCAGTACGCACACA-3′, 5′-GGG GAA CTT CTC CTG CTA GAA-3′, 5′-CAG ACA TTT TGC TCT CAA GCT G-3′). Ct values were used to design library preparation parameters, which included standard retrotranscription. Library preparation and sequencing chip loading were completed on Ion Chef. Sequencing was completed on Ion GeneStudio S5 and analyzed with the software provided by the manufacturer (Thermo Fisher Scientific, Waltham, USA). Sequences of sufficient quality were uploaded to the public database GISAID (
https://www.gisaid.org/ accessed on 17 December 2021).
2.5. Epitope Analysis
Identification of oligopeptides interacting with specific IgGs was performed in accordance with the modified protocol published by Xu et al. [
7] and adapted for our research using coding sequences of the investigated SARS-CoV-2 variants as the source for library design [
7]. Part of a library representing protein variants was created from proteomes of SARS-CoV-2 variants downloaded on 7 April 2021 from the Identical Protein Groups Database, National Center for Biotechnology Information. Another part of a library representing the reference proteome of the SARS-CoV-2 virus (prod. ID: UP000464024) was created from the Proteome Database at UniProt Proteome resources, and a set of oligopeptides was created by single and triple alanine substitution, similar to the original procedure [
7]. Alanine substitutions have been established in VirScan technology as the method for epitope disruption. Specifically, alanine is substituted for each amino acid in the investigated sequence of the epitope, both individually and in triplicates. In the case of original sequences containing alanines, the substitution is completed with glycine. These substitutions are employed to identify specific amino acids involved in epitope–antibody interactions. Disruption of the interaction occurs when a key amino acid is substituted.
Variants of each investigated reference protein were aligned with the Clustal Omega software v. 1.2.4 [
8,
9,
10]. Each alignment and protein library consisting of reference proteomes was virtually cut into 56 aa long fragments tailing through protein sequences, starting every 28 aa from the first amino acid. The peptide library, consisting of protein variants in the Identical Protein Group, was cleared from all incomplete sequences (containing missing fragments or undetermined amino acids). If a specific protein showed a gap in the alignment, oligopeptides covering this place were shorter than 56 aa. The start position of each oligopeptide in the reference proteins N, S, M, and E of SARS-CoV-2 is presented in
Supplementary Tables S1–S5. The resulting library contained 34,949 epitopes.
Sequences of all oligopeptides were reverse-translated into DNA sequences using codons optimized for expression in
E. coli. The oligopeptide library was synthesized using the SurePrint technology for nucleotide printing (Agilent, Santa Clara, USA). These oligonucleotides were used to create a phage display library using the T7Select 415-1 Cloning Kit (Merck Millipore, Burlington, USA). Immunoprecipitation of the library was performed in accordance with a previously published protocol [
7,
11]. Briefly, the phage library was amplified in a standard culture as described in the manufacturer’s manual and then purified by hollow fiber dialysis against a Phage Extraction Buffer (20 mM Tris-HCl, 100 mM NaCl, 6 mM MgSO
4, pH 8.0). All plastic containers (96-well plates) used for immunoprecipitation were prepared by blocking with 3% Bovine Serum Albumin (BSA) in a TBST buffer overnight on a rotator (50 rpm, 4 °C). A sample representing an average of 10
5 copies of each clone in 250 µL was mixed with 1 µL of human serum (two technical replicates were applied) and incubated overnight at 4 °C with rotation (50 rpm). A 20 μL aliquot of a 1:1 mixture of Protein A and Protein G Dynabeads (Invitrogen, Waltham, USA) was added and incubated for 4 h at 4 °C with rotation (50 rpm). The liquid in all wells was separated from Dynabeads on a magnetic stand and removed. Beads were washed 5 times with 280 µL of a wash buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 0.1% Tween-20) and the beads were resuspended in 60 µL of water to elute the immunoprecipitated bacteriophages from the beads.
The immunoprecipitated part of the library (for each patient-derived sample), as well as 20 samples representing the library before immunoprecipitation (input samples), was then used for amplification of the insert region according to the manufacturer’s instructions with a Phusion Blood Direct PCR Kit (Thermo Fisher Scientific, Waltham, USA) (primers 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGATCCGAATTCTTCTTCTTCT-3′, 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTAACTAGTTACTCGATGC GG-3′). A second round of PCR was carried out with the IDT for Illumina UD indexes—set: A, B, C (Illumina Corp., San Diego, USA) to add adapter tags. The products were purified using AMPure XP magnetic beads (Beckman Coulter, Brea, USA). Sequencing of the amplicons was completed using Illumina next-generation sequencing (NGS) technology (Genomed, Warszawa, Poland). A full list of oligopeptides in the tested library (cloned) is given in
Supplementary Materials.
2.6. Sequencing Data Analysis
Sequenced amplicons were mapped to the original nucleotide library sequences by the bowtie2 software v. 2.3.4.1, similarly as described by Xu et al. [
7,
12]. NGS sequencing reads (after removal of sequences added in PCR) were mapped to the full list of oligonucleotides as designed for the library synthesis as indexes (options: end-to-end mode, ‘-q -5 9 --no-unal --no-hd --no-sq --ignore-quals --mp 3 --rdg 150,100 --rfg 150,100 --score-min L,-0.6,-0.6′) [
12,
13]. The number of hits that were mapped to each reference sequence (only the highest score for each read) was counted (
count,
c).
The signal in each sample was calculated/normalized according to Formula (1):
s—a signal of the i-th sequence in the j-th serum sample and m-th technical replicate.
c—count, a number of readings mapped to the i-th sequence in the m-th technical replicate of the j-th serum sample.
I—set of all reference sequences (used as indexes in mapping by the bowtie2 software).
Signals in each input sample used as negative control samples (n = 20, amplified and purified phage library sequenced before immunoprecipitation) were calculated. In the input samples, the average log10 of a signal and its standard deviation (SD) for each tested sequence were used as the control signal for later calculations of p-values. As for the tested samples, the log10(signal) was determined for each detected sequence (count > 0). In order to avoid false positive results, we set the signal of undetected oligonucleotides (s) in a tested sample to the minimum measured for this sample value and set its standard deviation to the average value of SD in the input samples.
Each log10(signal) in the tested samples was compared to the control signal of the same sequence in the input samples, and the p-value was calculated (assuming normal distribution). Only signals detected with a non-zero count and p-value < 0.05 in both technical replicates of a sample were recognized as positive (significantly enriched). Relative signal (enrichment): an average signal in technical replicates of a sample divided by the average signal in ‘input samples’ of the same sequence resulting in a signal ratio (relative signal). The above procedure was tested on the input samples as controls and proved to yield less than a 5% chance for a false positive (significantly enriched) sequence in each sample.
2.7. Immunogenicity and Cross-Reacting IgG Hot-Spot Determination—Statistical Model
Determination of immunogenicity and cross-reacting IgG hot-spot was performed by adopting binomial distribution. This distribution (provided with chance and size) allows one to calculate the chance of randomly observing a specific amount of positive outcomes in a series of trials. The assumption is that every patient has the same chance of recognizing each measured oligopeptide and its variant separately.
For each group (Alpha/Delta, N/S-protein) in both experiments (immunogenicity research and hot-spot analysis) we used the same protocol separately. First, we calculated
a chance by dividing the sum of all detected (positive) oligopeptides in each patient in a group by all tested patients and all tested oligopeptides/variants.
Size is the amount of patients in a researched group (immunogenicity) or the number of tested variants of a fragment in question (cross-reacting hotspots). Then, in R language, we calculated the chance for a random occurrence of an observed number of positive results for each tested protein fragment. Fragments whose chance for a random occurrence of a measured result is below 0.05 are indicated in the figures with one star and results below 0.001 are indicated with double stars. Exact
p-values are provided (
Supplementary Tables S1–S5) along with the graphical comparison of the statistical model used and observed data (
Supplementary Figure S1).
4. Discussion
The potential significance of antibody cross-reactivity is crucial due to the challenge of existing immunity evasion by viruses and other infectious agents.
IgG hotspots are regions of a protein where the cross-reactivity of antibodies limits the viral potential for evading recognition by the human immune system. Rational strategies for managing expanding pathogenic variants, like those observed in SARS-CoV-2 pandemics, require predictions and efficient estimations of whether emerging variants have a high or limited potential for existing immunity evasion. To some extent, these predictions can be conducted through theoretical comparisons with existing epitope databases (such as those of other viruses), however, experimental verification remains necessary. Given the multitude of emerging variants, assessing pathogens with a high potential for epidemic spread, like SARS-CoV-2, can be challenging for experimental verification of multi-directional cross-reactions. The VirScan method was previously employed to compare the wild-type SARS-CoV-2 virus with other coronaviruses such as MERS, the first SARS, the “common cold” HCov group, and coronaviruses infecting bats [
16]. Here, we propose a high-throughput technology that utilizes a phage display library of epitopes derived from a pool of identified viral variants, enabling massive testing of a multitude of epitopes. In this study, 34,949 oligopeptides representing all variants of SARS-CoV-2 available in databases at the moment of the library creation have been tested in a single assay (
Supplementary Materials).
To assess the proposed approach, we focused on two selected proteins in SARS-CoV-2, due to these proteins’ biological and medical importance: nucleocapsid protein N and spike protein S. In the case of the N protein, cross-reactions enhance its suitability as the diagnostic target in serological testing. Consequently, cross-reactivity hotspots represent the regions where new mutations can probably be tolerated, without losing diagnostic applicability. It is, however, less probable (if at all) that antibodies targeting the N protein might have any effect on virus neutralization. As for the S protein, its major significance lies in anti-COVID-19 vaccines and the protective potential of antibodies targeting this protein. Cross-reacting hotspots in this context indicate regions where new mutations of the virus are relatively ‘safe’ for the human population. That is, the probability of existing immunity evasion by a new variant with a mutation in a cross-reactivity hot-spot is lower. Conversely, regions outside the cross-reacting hotspots suggest a higher probability of immunity evasion by a new variant of that type. IgG hotspots identified in this study were located in domains NTD, RBD, HR1, and HR2/TM, which have been recognized as important targets for virus-neutralizing antibodies.
To identify cross-reactions within protein regions that have overall significant importance for immune responses to the virus, we also assessed the immunogenicity (intensity of region-specific antibody production). Immunogenic regions within proteins N and S of Alpha and Delta variants identified in our work (
Figure 3) were similarly located to those identified by VirScan for the wild-type virus previously [
16]. This advocates for the robustness of VirScan technology and demonstrates at least some conservancy of immunogenic regions in viral variants.
Of note, this study has focused on epitopes in the majority linear. Specifically, structural epitopes with all amino acids located within the tested 56-amino acid regions could be investigated, while more distant locations rendered detection inefficient. Thus, the potential effect of mutations in many structural epitopes could not be identified, and it may still contribute to specific effects of the immune response and protection. However, even linear epitopes have been demonstrated as important targets in anti-SARS-CoV-2 protection [
5,
17]. They are also of key importance for applications where a partial protein is used to elicit or detect the immune response. Therefore, we propose the cross-reacting regions of the N and S proteins in SARS-CoV-2 as the major targets in both diagnostics and vaccine design. Of note, the goal of this study was to assess the applicability of the approach (immunoprecipitation of a phage display library presenting viral variants epitopes) in the experimental detection of IgG cross-reactions and the viral evasion of existing immunity against the virus. However, this assessment was completed with exemplary human sera specific to Alpha and Delta variants of SARS-CoV-2. Future studies should include antibodies induced by other viral variants.
Identified IgG hotspots can be helpful in theoretical predictions of the potential spread of newly identified variants, particularly when experimental data regarding a specific variant are not available yet. Variants with new mutations outside the cross-reacting hotspots have the highest potential for immune evasion and, eventually, for spreading in the human population regardless of the vaccination rate or immunity acquired from infections caused by previous variants.