1. Introduction
Antibodies are proteins that play an important role in the mammalian immune system, and the target molecules of antibodies, such as proteins or chemical ligands, are named antigens. Monoclonal antibodies (mAbs) are currently the largest class of bio-therapeutics in the clinic due to their high binding affinity and target specificity [
1,
2,
3]. Antibody drug candidates often need to be engineered to improve affinity, specificity, stability, solubility and other properties. Improving affinity in particular is important for increasing drug efficacy and decreasing the amount of antibody per dose [
4]. It is known that there is a distinction between the types of interactions used in antibody–antigen (Ab-Ag) binding and those observed in general protein–protein interactions [
5]. Amino acid mutations can be introduced to existing antibodies to increase the binding affinity and specificity of the antibody [
6], but there is no clear rule for identifying mutations that increase affinity. Affinity is experimentally measured with enzyme-linked immunosorbent assay (ELISA), surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC). Constructing and expressing a large number of antibody mutants and measuring their affinity requires substantial time and cost. It makes sense to use the computational method that predicts the effect of antibody mutations on affinity before experimental evaluation.
A number of methods have also been developed to predict the changes in binding affinity in regard to mutation in recent years. These computational tools are largely divided into two categories, molecular energy-based approaches such as FoldX [
7], EvoEF2 [
8,
9], Rosetta [
10] and machine learning-based approaches such as mCSM toolkit [
11,
12,
13,
14,
15], TopGBT [
16], Hom-ML [
17], GeoPPI [
18], Geometric [
19], BindFormer [
20], GearBind [
21]. All of the aforementioned methods require the 3D structures and the protein–protein complex in the bound state to predict changes in binding affinity upon mutation. However, an accurate complex structure, the prerequisite for ΔΔG prediction, is not easily available for most antibody–antigen pairs [
21]. Furthermore, current multimer structure prediction methods, such as AlphaFold3 [
22,
23] and docking [
24,
25], are still insufficiently reliable as starting points for structure-based affinity maturation. While the performance of AlphaFold3 in predicting the structures of antibody–antigen complexes has improved compared to previous versions, it still lags behind the predictions for other complexes, exhibiting a 60% failure rate for antibody and nanobody docking when sampling a single seed [
26].
To address the importance of predicting the change in affinity without an Ab-Ag complex structure, we have developed a deep learning-based framework, called MutAb, for predicting the effect of mutations on antibody affinity with learnable context-aware structural representations of antigens and antibodies. Given that the antigen–antibody complex structure is not a required input, MutAb exhibits more obvious advantages in antibody engineering applications compared to other competing methods.
2. Results
2.1. Overview
We propose a framework based on deep learning to predict the effect of mutations on antibody affinity without an antigen–antibody structure in the bound state. The learned representation module serves as an encoder in our framework to leverage biological insights (
Figure 1).
To address the lack of a dedicated antibody mutation dataset, we curated a benchmark dataset containing 15 antibody cases and 424 single-point mutations entries, and evaluated our framework in the benchmark comparisons against predictors commonly used for this field. This comparison is not entirely fair to our framework because some of the predictors take the structure of an Ab-Ag complex in the bound state as input. Nonetheless, the evaluation demonstrates the outstanding or comparable performance of our framework. Furthermore, in predicting the effects of mutations on antibodies against SARS-CoV-2 (P36-5D2 and R3P1-E4), influenza (NC41 and NC10) and human cytomegalovirus (1G2), we demonstrated the advantages of our approach over energy-based methods and docking protocol, especially in scenarios where precision antigen–antibody complex structures are unavailable.
2.2. Benchmark Composition
The SKEMPI 2.0 dataset and several subsets (e.g., S645 [
16], S1131 [
27], S4169 [
11], M1707 [
28]) are widely used in ΔΔG prediction tasks. However, there is currently no subset that is strictly made up entirely of mutations on the antibody. Due to the asymmetry of the contact surfaces of antigens and antibodies, a pure antibody mutation dataset is necessary for the establishment of computer-aided antibody engineering methods. In order to develop our framework and benchmark against other methods, the benchmark of 15 unique antibodies was established and the maximum pairwise sequence similarity between antibodies was 79% (
Figure 2A). We obtained the single-point mutation data on the antibody by filtering the SKEMPI 2.0 database and removed the mutations that could not be modeled by FoldX5 and EvoEF2. Specifically, we group mutations by unique antibodies, exclude groups with fewer than ten mutation data points and calculate metrics for each antibody group separately. As a result, our benchmark comprises 424 mutants across 15 structures of the Ab-Ag complex (
Table 1). Most single-point mutations are located at the binding site, and the most common mutation is from tyrosine to alanine (i.e., 12.7%) (
Figure 2B).
2.3. Representation Space for Antibody Mutations Generated by the Pre-Trained Encoder
A pre-trained scheme enforces encoder in MutAb to capture general rules in the amino acids type and position of the antibody residues that form paratope. Since the mutations lead to different composition and conformations in paratope, the pre-trained encoder is helpful in the prediction of the Ab-Ag binding affinity changes upon mutations. In the benchmark, we use the pre-trained encoder to generate representations for each mutation. Then, we employed the principal component analysis (PCA), the t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) to compare the distribution of these representations in a low-dimensional space. These algorithms are widely used in machine learning to reduce the feature dimension and preserve the two most important components for the input representations.
We show the low-dimensional representation space of mutations in the dataset (
Figure 3). Additionally, we notice that between mutations increased affinity and mutations decreased affinity, and there are significant differences in both two preserved components of the mutation representations (
p < 0.05). In other words, the representations produced by the pre-trained encoder have the potential to reflect the effect of mutations on antibody affinity.
2.4. Prediction of the Mutational Effects on Binding Affinity
We classified mutations into positive and negative effects based on the sign of ΔΔG. Based on the above mutation representation, we employed AutoGluon-Tabular (automated machine learning framework) to train and select the best machine learning model to predict classification labels. Given the imbalance and limited size of the dataset, we conducted a stratified three-fold cross-validation. Additionally, to more accurately reflect the real-world applications of predictive models, we implemented a leave-one-antibody-out cross-validation system. We calculated eight metrics, namely AUROC, AUPRC, ACC, BACC, F1 Score, precision, recall and MCC, which were commonly used in practical applications.
The results for each baseline method are summarized in
Figure 4 and
Figure 5, while the corresponding
p-values are presented in
Tables S3 and S4. In both cross-validation methods, overall, the average performance of our framework exceeds that of the EvoEF2 tool and the ESM1v model but is slightly lower than that of the FoldX5 tool. The FoldX5 and EvoEF2 tools require an Ab-Ag complex structure in the bound state as input and use the energy function to assess the impact of residue mutations. Nonetheless, as we show in the next section, MutAb has a stronger robustness and wider applicability than the tools based on precise complex structures.
We further evaluated the performance of MutAb to assess the quality of classifying mutations into highly increasing affinity mutations (ΔΔG > |Threshold|) and highly decreasing affinity mutations (ΔΔG < −|Threshold|). The performance evaluation metrics were calculated based on stratified three-fold cross-validation (randomly repeated five times), and different ΔΔG thresholds were tested.
Figure 6 indicates the impressive performance of MutAb in predicting highly increasing affinity and highly decreasing affinity mutations, especially under the condition where the threshold is greater than 0.5 kcal/mol. A ΔΔG of the order ±0.5 kcal/mol is within the experimental error [
29]. However, a considerable proportion of the ΔΔG entries (36.26%) in the dataset falls within the range of −0.5 to +0.5 kcal/mol, which makes classification challenging and limits the accuracy of predictions when no ΔΔG threshold is set.
2.5. Prediction of the Mutational Effects on Neutralization Against SARS-CoV-2 Pseudotyped Virus: Validation with a Blind Dataset
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was used as one of the examples to test the realistic utility of our framework in antibody protein engineering. The entry of SARS-CoV-2 into host cells can be effectively blocked by antibodies, thus providing a promising therapeutic solution for the associated disease. Without an antibody against SARS-CoV-2 during the training phase of our framework, we tested whether MutAb can capture the effects of mutations in the antibodies on the neutralization of a SARS-CoV-2 pseudotyped virus. One of the definition of neutralization is “the reduction in viral infectivity by the binding of antibodies to the surface of viral particles (virions), thereby blocking a step in the viral replication cycle that precedes virally encoded transcription or synthesis” [
30,
31]. Compared with the Ab-Ag affinity, the neutralization activity of the antibody against a target virus is more reflective of the effect of the antibody in vivo in terms of protection or therapy. An enhanced Ab-Ag binding affinity is often correlated with increased neutralizing activity, as strong binding can prevent the antigen from interacting with its cellular receptors. The pseudovirus neutralization activity data of wild-type mAb (P36-5D2 and R3P1-E4) and single-point mutants was obtained from studies by Sisi Shan et al. [
19] and Lili Li et al. [
32]. The neutralization activity of wild-type mAbs and their single-point mutants against several strains of SARS-CoV-2 viruses was determined by a pseudovirus neutralizing assay. The neutralization curves of the antibody mutants were provided in the study by Lili Li et al. and the IC
50 (half-maximal inhibitory concentration) values were provided in the study by Sisi Shan et al. For the specific strain of SARS-CoV-2 viruses, we calculate the log fold changes of IC
50 for each mAb mutant relative to the wild-type mAb and use its sign as a classification label. Matched with different viral strains, all mutations of the two antibodies comprise a single dataset containing 135 entries.
We focus on the method’s ability to classify mutations into two categories: mutations that enhance neutralizing activity and mutations that weaken neutralizing activity. Here, in addition to FoldX5, we introduce mCSM-AB2 as a comparison method. mCSM-AB2 is a graph-based machine learning approach that requires the input of the 3D structure of Ab-Ag complex to predict the effects of mutations on mAb binding affinity. The evaluation results are summarized in
Figure 7. Interestingly, our framework achieved a better performance than that of the FoldX 5.
For most strains of SARS-CoV-2 viruses, there is no experimental Ab-Ag complex structure. Therefore, it is necessary to generate the residue mutation model structure from the initial complex structure as the input of FoldX and mCSM-AB2. Such an input condition makes the prediction task more difficult than it is for solved complexes structures for tools like FoldX. Not surprisingly, mCSM-AB2 achieves optimal performance. The mCSM-AB2 uses the complex structure in the bound state as input, while our framework only inputs the split structure in the unbound state. Additionally, the graph-based machine learning method is more resistant to reductions in the quality of the structure than the energy-based method.
2.6. Comparison with Docking Protocol
Antibody NC41 [
33] and NC10 [
34] binds to the subtype N9 neuraminidase (NA) of influenza virus and inhibits its enzyme activity. The 1G2 [
35] antibody is a neutralizing human monoclonal antibody which binds to human cytomegalovirus (HCMV) glycoprotein B (gB) ectodomain. As a case study, we aim to assess whether our framework demonstrates superior predictive performance compared to the conventional molecular docking protocol for the three antibodies.
The structure of the antibody and antigen were separated from the experimentally resolved complex structure (PDB ID: 1NCA, 1NMB and 5C6T), and 10 docking conformations were generated using ClusPro’s Antibody Mode [
36,
37]. NC41 and 1G2 obtained the Ab-Ag docking conformation, which was close to the original natural structure through ClusPro (
Figure 8A,B), while the docking conformation of NC10 was quite different from the natural structure (
Figure 8C). The docking conformations ranked highest in ClusPro’s output (the center of the largest pose clusters) for NC41 and 1G2 were used as the input of FoldX and mCSM-AB2 to predict the effect of mutations on an antibody’s affinity.
When the experimentally solved structure was used as input, the predicted results of FoldX5 and mCSM-AB2 were consistent with the actual changes in the affinity of the antibody mutants. However, when using the docking conformation as input, the prediction performance of FoldX5 and mCSM-AB2 can be compromised, even resulting in outcomes that contradict those obtained by using an experimentally solved structure (
Figure 9). When there is no experimentally solved antigen–antibody complex structure, our framework offers a distinct advantage. Even using a docking conformation that is similar to the natural structure as input for tools like Fold and mCSM-AB2 may lead to inaccurate predictions. Furthermore, the docking structure itself may not be reliable (such as NC10).
3. Discussion
Predicting mutations in the antibody that are beneficial to its function is a key challenge in helping to guide the maturation of conventional affinity. Although some successful computationally guided Ab development examples have been published in recent years, the computational tools have not yet had a broad transformative impact on antibody engineering due to the limited scope of application of the available methods or the high requirements for input data, such as the need for accurate antigen–antibody complex structures. In this study, we present a deep learning framework, MutAb, that utilizes a pre-trained model as an encoder for the prediction of beneficial residue mutations on antibody. In cross-validation using the benchmark dataset, our results show that the framework is able to distinguish the mutations in antibodies that increased the Ab-Ag affinity and decreased the Ab-Ag affinity across different antibodies. This also highlights the power of the pre-trained model as an encoder in terms of identifying the contribution of residues on the paratope by efficiently representing the intramolecular and intermolecular structural environment of target residues.
We applied MutAb to predict the impact of mutation on the neutralization activity of mAb (P36-5D2 and R3P1-E4) against different strains of SARS-CoV-2 viruses. The receptor-binding domain (RBD) of the spike protein of different SARS-CoV-2 strains differs by only one to five amino acids. Using the complex structure of the wild-type spike RBD and antibody as a homology reference, we simulated the complex structures of the RBDs of the different strains in combination with the antibody. When FoldX and mCSM-AB2 use the simulated antigen structure as input, the prediction accuracy of our method surpasses that of FoldX, coming in second only to mCSM-AB2. The mCSM-AB2 utilizes graph-based structural signatures and machine learning techniques, enabling it to maintain a robust performance even when utilizing homology models as input.
In more common scenarios, where no existing antigen–antibody complex structure can serve as a homology reference, it is necessary to use docking methods to generate an Ab-Ag complex conformation before applying tools such as FoldX and mCSM-AB2. Under these conditions, MutAb demonstrates a distinct advantage. For the NC41, NC10, and 1G2 antibodies, we compared our approach against the docking process. When FoldX and mCSM-AB2 utilize the docking pose as input, our framework has a better prediction performance. The discrepancy between the docking conformation and the actual natural structure significantly affects the prediction outcomes of both FoldX and mCSM-AB2. For the external validation set of P36-5D2 and R3P1-E4 (mAbs against SARS-CoV-2), in the case studies of NC41, NC10 (mAbs against influenza) and 1G2 (mAb against human cytomegalovirus), the antibodies and antigens involved are both unseen in the training and test datasets. The results demonstrate the generalization ability and usefulness of this method.
We observed that, during cross-validation, the model performs much better on the training dataset compared to the test dataset (
Table S2). Given the limited sample size, there is a possibility that overfitting occurred during the training process. However, the distinct mutation entities present in each fold of the cross-validation, along with the unseen data in the external validation set, help mitigate the risk of overestimating the model’s performance due to overfitting. The performance of our model also indicates that it has captured valuable information, thereby enhancing our understanding of the underlying patterns in antibody mutation data. Though our framework is demonstrated to have advantages over other competing methods, it also has some limitations. The binding affinity of the antibody to the target antigen is determined by the whole interaction surface, and the representations of antibody mutations is limited to a certain residue and its neighbor residues. Therefore, our framework is better suited for classification tasks, such as predicting whether a mutation in the antibody is advantageous or disadvantageous for Ab-Ag binding, rather than directly predicting the ΔΔG value. Another issue is that our method requires the generation of sequence profiles (especially PSSM), which takes an average of 8.14 ± 3.36 min per structure across all the datasets used in this work. Therefore, our method is more suitable for directed mutation optimization at several known key residue sites (e.g., as determined by alanine scanning mutation experiments) for a monoclonal antibody rather than for the prediction of high-throughput deep mutation scans.
4. Materials and Methods
4.1. Definition of the Task of Predicting Antibody–Antigen Binding Affinity Changes upon Mutations on Antibodies
Given the interaction of the antibody–antigen, the residue on the antibody to be mutated to the new amino acid type, the goal is to estimate the affinity or binding free energy change between the original antibody–antigen and mutant.
where
In the equation, R, T and K are the gas constant, temperature in Kelvin, and dissociation (or inhibition) constant (i,e., or ), respectively. The dissociation constant reflects the affinity of the antibody to the target antigen, with smaller values and a stronger affinity. The positive value of indicates a stronger binding affinity between the antibody mutant and antigen, which corresponds to a smaller value for the antibody mutant. Additionally, the negative value of indicates a weaker binding affinity. The binary classification task is to predict whether the value of is negative or non-negative.
4.2. Data and Preprocessing
To develop our predictive pipeline, we collected the ΔΔG data of single-point mutations from the mCSM-AB2 dataset (a subset of SKEMPI 2.0) and downloaded the experimentally determined 3D structures complexes from the SKEMPI2.0 databases. Compared with the mCSM-AB2 dataset, we discarded nanobodies, anti-idiotype antibodies, antibodies with fewer than 10 mutations, mutations not present on the antibodies and mutations that could not be simulated by FoldX5 or EvoEF2. This resulted in a benchmark dataset that consists of 15 unique antibodies and 424 mutation entries. A total of 135 entries with a ΔΔG value of 0 or greater were classified as affinity-increasing mutations, while 289 entries with a ΔΔG value less than 0 were classified as affinity-reducing mutations.
As for the single-point mutation dataset for neutralizing antibodies targeting SARS-CoV-2 spike protein, we collected 2 mAbs (P36-5D2 and R3P1-E4), each complexed with SARS-CoV-2 spike RBD separately. The structures of these mAbs are available (PDB ID: 7FAF and 7VMU). The effects of mutations on these two antibodies on neutralization against SARS-CoV-2 pseudotyped virus were measured by Sisi Shan et al. [
19] and Lili Li et al. [
32]. We included qualitative and quantitative assay data, leading to a total of 135 data points for the classification task. Using EvoEF2, the structures of the spike RBD from different strains were generated by substituting amino acids in the WT structure and subsequently optimized.
We used BIOVIA Discovery Studio 4.5 (Dassault Systèmes BIOVIA, San Diego, CA, USA) to extract the 3D structures of antigens and antibodies from the complexes, respectively. The antibody Fv sequence was annotated by the “Annotate Antibody Sequence” tool in the BIOVIA Discovery Studio. Only the Fv region was retained for each antibody structure. The antibody mutant structures were generated using EvoEF2, which also optimized the rotamers of the mutated and surrounding residues.
4.3. Constructing the Graph Structure of Antibody and Antigen
To construct a graph representing the structure of a given antibody or antigen, we treated the individual residues as nodes and the Cβ-Cβ distances between them as the edges. For glycine (Gly) residues, which lack a β-carbon, we utilized the Cα atom instead. Edges with a distance of less than 10 Å are retained. We initialized the embeddings for each node by incorporating both structural and sequence features at the residue level from the unbound structure of the antibody or antigen. The structural features are composed of the solvent accessible surface area (ASA) and a half-sphere amino acid composition (HSAAC). The absolute and relative solvent accessible surface areas of the residue (d = 2) were calculated by STRIDE [
38]. As defined in PAIRpred [
39], HSAAC (d = 20) is a local amino acid profile that indicates the frequency of each amino acid type within 8 Å of the target residue. Sequence-based features comprise the amino acid type represented by one-hot encoding (d = 20) and evolutionary information derived from the position-specific scoring matrix (PSSM). The PSSM (d = 20) is generated by employing the PSI-BLAST [
40] to query the NCBI’s Non-Redundant sequence database (
https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz, accessed on 5 October 2023). The PSSM encodes each residue as a vector with 20 elements, representing the probabilities of the 20 amino acids occurring at that position. The effectiveness of PSSM in predicting protein–protein interaction (PPI) sites has been demonstrated [
41]. We have modified the myPDB.py Python script from PAIRPred to compute the total of the 62 dimensional features mentioned above.
4.4. Implementation Details of the Representation Learning Module
The representation learning module comprises a graph convolution layer and a attention layer and shares the basic idea of the PECAN framework [
42]. The graph convolution implemented by Fout et al. [
43] enables order-independent aggregation of properties across a neighborhood of antibody/antigen residues that collectively contributes to the formation of an Ab-Ag binding interface. To summarize, considering that
is the initial representation of the target node
i in the antibody’s residue graph and
(
) is the representation set of neighbor nodes that define the receptive field of the convolution, the aggregate weight matrix of the target node (i.e., center node) is
and the aggregate weight matrix of the neighbor node is
. After the convolution operation, the representation of the target node is updated to
An attention layer, designed by Luong T et al. [
44] and Pittala S et al. [
42] encodes the contextual representations of the antigen’s residue graph onto the antibody’s residue graph, providing potential information about the interaction surface residues on the antigen for the target residues on the antibody. Considering that
is the initial representation of node m in the antigen’s residue graph, similarly, after the convolution operation, the representation of the node is updated to
. The attention score between antibody node
i and antigen node
m is expressed as
. By accumulating the product of all antigen node representations and their corresponding normalized attention scores, the antigen context representations for the antibody’s residue graph node
i is obtained. The combination of
and context forms the representations of the target residue on the antibody, thus realizing the functionality of the encoder in our framework.
In order to determine the learnable parameters
,
,
in the representation generation module of our framework, we retrained the paratope prediction model in the PECAN framework, incorporating a fully connected layer for classification, as described in the original paper. Additionally, the training and test data utilized are from 460 Ab-Ag complexes (
https://zenodo.org/records/3885236, accessed on 27 March 2024). Such a pre-trained scheme enables our representation generation module to capture general rules regarding the amino acid types and the positions of the antibody residues that form the paratope.
4.5. Model Training and Performance Evaluation Metrics
The pre-trained module encodes the wild-type and mutant residues on the antibody, and the difference between the two was fed into the AutoGluon-Tabular [
45], an open-source AutoML framework used to train the model and predict changes in antibody affinity. A stratified three-fold cross-validation approach was employed on the benchmark dataset using the StratifiedKFold function from the sklearn.model_selection Python module. With this approach, the dataset was divided into three folds, with each fold containing distinct mutation entities not present in the others while preserving the class ratio in each fold to match that of the entire dataset. Two of the folds were utilized for training purposes, while the third fold was reserved for testing. Stratified k-fold cross-validation is an extension of regular k-fold cross-validation that is particularly useful for handling imbalanced and small-sized datasets.
Additionally, a leave-one-antibody-out cross-validation system was also employed. Mutation entries from 14 out of 15 antibodies were used for training, while the mutation entries from the remaining antibody served as the test set. Since each antibody in the benchmark dataset is unique, this approach rigorously evaluates the model’s generalization.
The classification performance of the model was evaluated by calculating accuracy (ACC), balanced accuracy (BACC), the area under the precision–recall curve (AUCPR), the area under the receiver operating characteristic curve (AUCROC), the F1 score, precision, recall, and the Matthews correlation coefficient (MCC) using the sklearn.metrics Python library.
4.6. FoldX5, EvoEF2 and ESM1v
FoldX [
7] provides a quantitative analysis of the impact of mutations on the stability, folding, and dynamics of proteins and protein complexes (
http://foldxsuite.crg.es, accessed on 20 October 2023). We undertook a comparison with the new version of FoldX, FoldX5. EvoEF2 [
8] is an accurate energy function for protein sequence design, protein energy computing, and building mutant models (
https://github.com/tommyhuangthu/EvoEF2, accessed on 20 October 2023). The ComputeStability command in EvoEF2 computes the stability (total energy) of the protein complex. The ComputeBinding command in EvoEF2 calculates the binding interaction energy of a protein–protein complex, dividing the designation of the chains into two components: antigen and antibody. The two are referred to here as EvoEF2_stability and EvoEF2_binding, respectively. ESM1v [
46] is a protein language model designed for predicting the effects of variants (
https://github.com/facebookresearch/esm/, accessed on 27 June 2023). Since the ESM1v models only accept a single sequence as input, we concatenated the sequences of the antibody and antigen proteins for the analysis.
4.7. Antibody–Antigen Docking
ClusPro [
36] employs shape complementarity, electrostatics, and desolvation energy terms to generate the multiple docking poses of the protein complex. ClusPro outputs the docking poses at the centers of the 10 most populated clusters and ranks the models by cluster size. The ClusPro web server (
https://cluspro.org, accessed on 11 August 2024) was used in Antibody Mode with default settings, providing automated masking for the non-CDR regions of antibodies. The quality of protein interfaces in the docking poses was assessed against the experimentally solved structures using DockQ [
47] (
https://github.com/bjornwallner/DockQ, accessed on 11 August 2024).
5. Conclusions
Ab-ag complex crystal structures will only be available for a tiny number of the antibodies of interest [
48,
49]. Due to the flexibility of complementarity-determining regions and the absence of co-evolution signals, Ab-Ag complex modeling has been a long-standing challenge [
25]. It is essential for structure-based ΔΔG predictors to work on antibody and antigen structures in the unbound state. In this study, we developed a deep learning-based framework for antibody affinity maturation. This framework utilizes a pre-trainable Ab-Ag context-aware structural representation encoder and does not require the structure of the Ag-Ab complex as input. We have demonstrated that this framework is practical in antibody engineering and has the potential power to improve the efficiency of antibody development.
Author Contributions
Conceptualization, Z.C., X.C., S.H. and X.B.; methodology, Z.C. and X.C.; software, Z.C.; validation, Z.C. and S.H.; formal analysis, Z.C.; investigation, X.C.; resources, S.H. and X.B.; data curation, Z.C. and X.C.; writing—original draft preparation, Z.C., X.C. and S.H.; writing—review and editing, X.C., S.H. and X.B.; visualization, Z.C.; supervision, X.C., S.H. and X.B.; project administration, X.C., S.H. and X.B.; funding acquisition, X.C., S.H. and X.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research was sponsored by Beijing Nova Program, grant number 20240484733 and National Key R&D Program of China, grant number 2023YFC2604400.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Acknowledgments
The authors would like to thank Yanpeng Zhao and Huiyan Xu for their valuable suggestions on the code and experimental design.
Conflicts of Interest
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
References
- Sela-Culang, I.; Kunik, V.; Ofran, Y. The structural basis of antibody-antigen recognition. Front. Immunol. 2013, 4, 302. [Google Scholar] [CrossRef] [PubMed]
- Peng, H.P.; Lee, K.H.; Jian, J.W.; Yang, A.S. Origins of specificity and affinity in antibody-protein interactions. Proc. Natl. Acad. Sci. USA 2014, 111, E2656–E2665. [Google Scholar] [CrossRef] [PubMed]
- Robin, G.; Sato, Y.; Desplancq, D.; Rochel, N.; Weiss, E.; Martineau, P. Restricted diversity of antigen binding residues of antibodies revealed by computational alanine scanning of 227 antibody-antigen complexes. J. Mol. Biol. 2014, 426, 3729–3743. [Google Scholar] [CrossRef] [PubMed]
- Kurumida, Y.; Saito, Y.; Kameda, T. Predicting antibody affinity changes upon mutations by combining multiple predictors. Sci. Rep. 2020, 10, 19533. [Google Scholar] [CrossRef]
- Wang, M.; Zhu, D.; Zhu, J.; Nussinov, R.; Ma, B. Local and global anatomy of antibody-protein antigen recognition. J. Mol. Recognit. 2018, 31, e2693. [Google Scholar] [CrossRef] [PubMed]
- Mason, D.M.; Friedensohn, S.; Weber, C.R.; Jordi, C.; Wagner, B.; Meng, S.M.; Ehling, R.A.; Bonati, L.; Dahinden, J.; Gainza, P.; et al. Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat. Biomed. Eng. 2021, 5, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Delgado, J.; Radusky, L.G.; Cianferoni, D.; Serrano, L. FoldX 5.0: Working with RNA, small molecules and a new graphical interface. Bioinformatics 2019, 35, 4168–4169. [Google Scholar] [CrossRef]
- Huang, X.; Pearce, R.; Zhang, Y. EvoEF2: Accurate and fast energy function for computational protein design. Bioinformatics 2020, 36, 1135–1142. [Google Scholar] [CrossRef] [PubMed]
- Pearce, R.; Huang, X.; Setiawan, D.; Zhang, Y. EvoDesign: Designing Protein-Protein Binding Interactions Using Evolutionary Interface Profiles in Conjunction with an Optimized Physical Energy Function. J. Mol. Biol. 2019, 431, 2467–2476. [Google Scholar] [CrossRef] [PubMed]
- Leaver-Fay, A.; Tyka, M.; Lewis, S.M.; Lange, O.F.; Thompson, J.; Jacak, R.; Kaufman, K.W.; Renfrew, P.D.; Smith, C.A.; Sheffler, W.; et al. ROSETTA3: An object-oriented software suite for the simulation and design of macromolecules. Methods Enzym. 2011, 487, 545–574. [Google Scholar]
- Rodrigues, C.H.; Myung, Y.; Pires, D.E.; Ascher, D.B. mCSM-PPI2: Predicting the effects of mutations on protein-protein interactions. Nucleic Acids Res. 2019, 47, W338-w44. [Google Scholar] [CrossRef]
- Pires, D.E.; Ascher, D.B.; Blundell, T.L. mCSM: Predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 2014, 30, 335–342. [Google Scholar] [CrossRef] [PubMed]
- Pires, D.E.; Ascher, D.B. mCSM-AB: A web server for predicting antibody-antigen affinity changes upon mutation with graph-based signatures. Nucleic Acids Res. 2016, 44, W469–W473. [Google Scholar] [CrossRef] [PubMed]
- Myung, Y.; Rodrigues, C.H.; Ascher, D.B.; Pires, D.E. mCSM-AB2: Guiding rational antibody design using graph-based signatures. Bioinformatics 2020, 36, 1453–1459. [Google Scholar] [CrossRef]
- Myung, Y.; Pires, D.E.V.; Ascher, D.B. mmCSM-AB: Guiding rational antibody engineering through multiple point mutations. Nucleic Acids Res. 2020, 48, W125-w31. [Google Scholar] [CrossRef]
- Wang, M.; Cang, Z.; Wei, G.W. A topology-based network tree for the prediction of protein-protein binding affinity changes following mutation. Nat. Mach. Intell. 2020, 2, 116–123. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Feng, H.; Wu, J.; Xia, K. Hom-Complex-Based Machine Learning (HCML) for the Prediction of Protein-Protein Binding Affinity Changes upon Mutation. J. Chem. Inf. Model 2022, 62, 3961–3969. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Luo, Y.; Li, P.; Song, S.; Peng, J. Deep geometric representations for modeling effects of mutations on protein-protein binding affinity. PLoS Comput. Biol. 2021, 17, e1009284. [Google Scholar] [CrossRef] [PubMed]
- Shan, S.; Luo, S.; Yang, Z.; Hong, J.; Su, Y.; Ding, F.; Fu, L.; Li, C.; Chen, P.; Ma, J.; et al. Deep learning guided optimization of human antibody against SARS-CoV-2 variants with broad neutralization. Proc. Natl. Acad. Sci. USA 2022, 119, e2122954119. [Google Scholar] [CrossRef]
- Wang, G.; Liu, X.; Wang, K.; Gao, Y.; Li, G.; Baptista-Hon, D.T.; Yang, X.H.; Xue, K.; Tai, W.H.; Jiang, Z.; et al. Deep-learning-enabled protein-protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution. Nat. Med. 2023, 29, 2007–2018. [Google Scholar] [CrossRef] [PubMed]
- Cai, H.; Zhang, Z.; Wang, M.; Zhong, B.; Li, Q.; Zhong, Y.; Wu, Y.; Ying, T.; Tang, J. Pretrainable geometric graph neural network for antibody affinity maturation. Nat. Commun. 2024, 15, 7785. [Google Scholar] [CrossRef] [PubMed]
- Yin, R.; Feng, B.Y.; Varshney, A.; Pierce, B.G. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Sci. 2022, 31, e4379. [Google Scholar] [CrossRef] [PubMed]
- Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef] [PubMed]
- Guarra, F.; Colombo, G. Computational Methods in Immunology and Vaccinology: Design and Development of Antibodies and Immunogens. J. Chem. Theory Comput. 2023, 19, 5315–5333. [Google Scholar] [CrossRef] [PubMed]
- Feng, S.; Chen, Z.; Zhang, C.; Xie, Y.; Ovchinnikov, S.; Gao, Y.Q.; Liu, S. Integrated structure prediction of protein–protein docking with experimental restraints using ColabDock. Nat. Mach. Intell. 2024, 6, 924–935. [Google Scholar] [CrossRef]
- Hitawala, F.N.; Gray, J.J. What has AlphaFold3 learned about antibody and nanobody docking, and what remains unsolved? bioRxiv 2024. [Google Scholar] [CrossRef]
- Xiong, P.; Zhang, C.; Zheng, W.; Zhang, Y. BindProfX: Assessing Mutation-Induced Binding Affinity Change by Protein Interface Profiles with Pseudo-Counts. J. Mol. Biol. 2017, 429, 426–434. [Google Scholar] [CrossRef] [PubMed]
- Zhang, N.; Chen, Y.; Lu, H.; Zhao, F.; Alvarez, R.V.; Goncearenco, A.; Panchenko, A.R.; Li, M. MutaBind2: Predicting the Impacts of Single and Multiple Mutations on Protein-Protein Interactions. iScience 2020, 23, 100939. [Google Scholar] [CrossRef] [PubMed]
- Pahari, S.; Li, G.; Murthy, A.K.; Liang, S.; Fragoza, R.; Yu, H.; Alexov, E. SAAMBE-3D: Predicting Effect of Mutations on Protein-Protein Interactions. Int. J. Mol. Sci. 2020, 21, 2563. [Google Scholar] [CrossRef] [PubMed]
- Klasse, P.J. Neutralization of Virus Infectivity by Antibodies: Old Problems in New Perspectives. Adv. Biol. 2014, 1, 157895. [Google Scholar] [CrossRef] [PubMed]
- Burton, D.R. Antiviral neutralizing antibodies: From in vitro to in vivo activity. Nat. Rev. Immunol. 2023, 23, 720–734. [Google Scholar] [CrossRef] [PubMed]
- Li, L.; Gao, M.; Jiao, P.; Zu, S.; Deng, Y.Q.; Wan, D.; Cao, Y.; Duan, J.; Aliyari, S.R.; Li, J.; et al. Antibody engineering improves neutralization activity against K417 spike mutant SARS-CoV-2 variants. Cell Biosci. 2022, 12, 63. [Google Scholar] [CrossRef] [PubMed]
- Pruett, P.S.; Air, G.M. Critical interactions in binding antibody NC41 to influenza N9 neuraminidase: Amino acid contacts on the antibody heavy chain. Biochemistry 1998, 37, 10660–10670. [Google Scholar] [CrossRef] [PubMed]
- Dougan, D.A.; Malby, R.L.; Gruen, L.C.; Kortt, A.A.; Hudson, P.J. Effects of substitutions in the binding surface of an antibody on antigen affinity. Protein Eng. 1998, 11, 65–74. [Google Scholar] [CrossRef] [PubMed]
- Chandramouli, S.; Ciferri, C.; Nikitin, P.A.; Caló, S.; Gerrein, R.; Balabanis, K.; Monroe, J.; Hebner, C.; Lilja, A.E.; Settembre, E.C.; et al. Structure of HCMV glycoprotein B in the postfusion conformation bound to a neutralizing human antibody. Nat. Commun. 2015, 6, 8176. [Google Scholar] [CrossRef] [PubMed]
- Kozakov, D.; Hall, D.R.; Xia, B.; Porter, K.A.; Padhorny, D.; Yueh, C.; Beglov, D.; Vajda, S. The ClusPro web server for protein–protein docking. Nat. Protoc. 2017, 12, 255–278. [Google Scholar] [CrossRef] [PubMed]
- Brenke, R.; Hall, D.R.; Chuang, G.Y.; Comeau, S.R.; Bohnuud, T.; Beglov, D.; Schueler-Furman, O.; Vajda, S.; Kozakov, D. Application of asymmetric statistical potentials to antibody-protein docking. Bioinformatics 2012, 28, 2608–2614. [Google Scholar] [CrossRef]
- Heinig, M.; Frishman, D. STRIDE: A web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res. 2004, 32, W500–W502. [Google Scholar] [CrossRef] [PubMed]
- Minhas, F.; Geiss, B.J.; Ben-Hur, A. PAIRpred: Partner-specific prediction of interacting residues from sequence and structure. Proteins 2014, 82, 1142–1155. [Google Scholar] [CrossRef] [PubMed]
- Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef] [PubMed]
- Zeng, M.; Zhang, F.; Wu, F.X.; Li, Y.; Wang, J.; Li, M. Protein–protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 2020, 36, 1114–1120. [Google Scholar] [CrossRef] [PubMed]
- Pittala, S.; Bailey-Kellogg, C. Learning context-aware structural representations to predict antigen and antibody binding interfaces. Bioinformatics 2020, 36, 3996–4003. [Google Scholar] [CrossRef]
- Fout, A.; Byrd, J.; Shariat, B.; Ben-Hur, A. Protein interface prediction using graph convolutional networks. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 6533–6542. [Google Scholar]
- Luong, T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015; p. 1412-21. [Google Scholar]
- Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
- Meier, J.; Rao, R.; Verkuil, R.; Liu, J.; Sercu, T.; Rives, A. Language models enable zero-shot prediction of the effects of mutations on protein function. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Online, 6–14 December 2021; pp. 29287–29303. [Google Scholar]
- Mirabello, C.; Wallner, B. DockQ v2: Improved automatic quality measure for protein multimers, nucleic acids, and small molecules. Bioinformatics 2024, 40, btae586. [Google Scholar] [CrossRef]
- Olsen, T.H.; Boyles, F.; Deane, C.M. Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences. Protein Sci. 2022, 31, 141–146. [Google Scholar] [CrossRef] [PubMed]
- Schneider, C.; Raybould, M.I.J.; Deane, C.M. SAbDab in the age of biotherapeutics: Updates including SAbDab-nano, the nanobody structure tracker. Nucleic Acids Res. 2022, 50, D1368–D1372. [Google Scholar] [CrossRef] [PubMed]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).