1. Introduction
Successful enzyme design often hinges on a good understanding of the relationship between protein structures and their biological functions. A key step in rational design is the introduction of special amino-acid replacements at particular sites of the studied proteins, which is expected to enhance the protein thermo-stability and catalytic activity, etc. In practice, simultaneous mutations at two or more sites in the target proteins, rather than a single-site mutation, are required. Thus, one critical question concerning mutation design arises: is there any correlation between mutations at different sites in the studied proteins? If so, can we predict them? Obviously, if mutations at different sites are independent from one another, then the overall effect of the multiple-mutation can be estimated by simply summing up the effect of every single mutation and is called to be additive [
1]. On the contrary, in cases where a strong interplay between mutations at different sites exists, the overall mutation effects are unpredictable from those of single mutations and exhibit non-additive effects.
Mutation additivity effects had been studied in a variety of backgrounds in early days by many structural biologists. For example, Sandberg and Terwilliger [
2] examined the additive effects of mutations in gene V proteins, and found that different types of mutations showed strong additivity. In addition, they found that mutations at sites that have intense van der Waals interactions tend to be weaker additives. Boyer and colleagues [
3] suggested that the non-additivity of mutations at distant sites indicates an information communication between amino acids at these sites, and they called “thermodynamic coupling” for the enhanced thermo-stability due to this non-additive phenomenon. They used atomic resolution nuclear magnetic resonance (NMR) to examine the hydrogen exchange in the enzyme at its natural state and had attempted to determine the dynamic perturbation between the two mutation sites. They concluded that thermodynamic coupling between distal sites was caused by physical interactions between amino acids at these sites in the natural structure of the studied protein.
T4 phage lysozyme, as a model protein for studying the relationship between protein structure and their functions, was also used for the study of mutation non-additive effects in early days. Matthews and colleagues [
4] observed that mutations that introduce negative charges at ends of α helices in T4 phage lysozyme and produced electrostatic interactions at these sites, such as S38D and N144D, were additive. They designed a series of combinatorial mutations at distant sites that do not form direct contact. Interestingly, most of these multiple mutations are found to be very strong additives, and they can form either direct physical contacts or not. An extreme example that used the mutation additive effect is a combination of 7 mutations, S38D/A82P/N144D/I3L/V131A/A41V/N116D, which was found to have the largest melting temperature increase of 8.3 °C [
5].
One the other hand, some mutations that involve direct physical interactions did exhibit strong non-additivity. For example, the double-site mutant A98V/T152S showed strong non-additivity compared with the corresponding single mutations, and melting temperature change caused by the double-site mutation was 7.6 °C less than the summation of those imposed by the two corresponding single mutations [
6,
7]. In the native structure, the two residues A98 and T152 orient to one another and form direct contact. Matthews and colleagues [
8] suggested that the dynamic perturbation a mutation introduces will start at the mutation site and spread to its neighboring sites. According to their observation, for a given site, if its neighborhood can digest more perturbations caused by mutation at this site, then such a mutation might impose smaller changes to protein thermal stability. In the case of A98V/T152S, the two sites have strong interactions and thus their neighborhood’s responses to the mutations at these sites heavily depend on the detailed interactions: presumably mutations that enhance the two-site contacts might weaken the power of the neighboring structures to relieve the mutation perturbations, thus causing larger thermal stability changes. Interestingly, other mutations that do not involve direct interactions were also found having strong non-additivity [
9].
Undoubtedly, it is import to understand the mechanism underpinning the mutation non-additive effects, and predetermining mutation additivity at selected sites can effectively reduce experimental workload in rationale design. Recently, the quick accumulation of mutation data had stimulated the developing of methods for the prediction of mutation effects. For example, Tian et al. [
10] developed a machine-learning method to predict mutation effects on protein thermo-stability based on a 3366 mutant protein database. Pires et al. [
11] predicted missense mutations using some graph-based signatures. Very recently, Dehghanpoor et al. [
12] compared the performance of several machine learning methods. However, up-to-now accurate prediction of mutation correlation effects, such as non-additivity, is still a very challenging task.
In this paper, a mathematical model based on a protein structural amino-acid network was presented that successfully isolated double-site mutations with strong non-additive effects from additive ones for T4 bacteriophage lysozyme. We had examined different factors of a protein topological network that show a strong correlation with the mutation additivity. Double-site mutations of the T4 phage lysozyme were selected if the two component single-site mutations were also measured, and the non-additive effect of mutations at the two sites was determined based on the measured thermodynamics data [
9]. The dependence of the non-additive effect on the distance between the two sites was examined. We then presented a protein topological network model based on amino acid interactions [
13], and examined the network topological quantities and their relationships with the mutation non-additive effects. Finally, we presented a mathematical model based on protein network clique analysis to predict mutation additivity/non-additivity. The model was also successfully applied to a new protein of Eglin c whose structure has a different topology from that of the T4 phage lysozyme.
4. Conclusions
Protein mutation effects have become a popular topic in cell biology due to recently developed deep scanning techniques, which creates large-scale mutagenesis data that associates intrinsic protein structures and functions with the consequences of relevant genetic variation [
29]. A critical question that arises from this scenario is how natural selection works with the innumerable yet almost random mutations in the so-called evolution process? In this paper, we examined possible intrinsic correlations between random protein mutations based on protein structural network calculations. We analyzed the additivity effects of 13 double-site mutations of the T4 bacteriophage lysozyme, and found that mutations at distal sites are usually strongly additive while those occurring at neighboring sites can be either additive or non-additive. To systematically estimate the non-additive effects of double-site mutations, we investigated the amino-acid network structures for each mutant and determined the topological quantities of these networks. We generated equilibrium configuration ensembles of the studied proteins using conventional simulations and built the amino-acid network for each structure. We then analyzed the topological characteristics of the protein networks, such as the distribution of
k-cliques, and found significant correlation between 3-faction associations and the double-site mutant additivity: non-additive mutations tended to happen between sites belonging to the same 3-clique structure. It was found that the clique model could significantly separate non-additive double-site mutations from those additive ones for the examined proteins. Our calculations also suggested that such correlation probabilities can be changed to some extent by applying a third mutation.
Although the faction group model used here is very simple it does work very well for lysozyme structures. However, we also noticed that the model cannot explain mutation non-additive effects for some different proteins, such as myoglobin [
30]. Another weak point with the model is that it tends to create very few 3-cliques for many proteins, especially for those protein whose network topology are relatively sparse, which usually resulted in false negative predictions. It becomes even more complicated when considering the perturbation due to a third mutation. Thus, we expect to refine the models in the near future by combining the simple network analysis as shown in this work with a detailed physico-chemistry characterization and provide a fruitful understanding of protein mutation effects. Considering that counter-examples always exist in biological phenomena, we regard our model as a simple yet rudimentary picture to understand the mutation-correlation puzzle, on which many more details may be added for a deeper understanding.