Next Article in Journal
Environmental Enrichment Reverses Histone Methylation Changes in the Aged Hippocampus and Restores Age-Related Memory Deficits
Next Article in Special Issue
A Structure-Based Classification and Analysis of Protein Domain Family Binding Sites and Their Interactions
Previous Article in Journal
A Multi-Functional Tubulovesicular Network as the Ancestral Eukaryotic Endomembrane System
Previous Article in Special Issue
Recent Advances in the Analysis of Macromolecular Interactions Using the Matrix-Free Method of Sedimentation in the Analytical Ultracentrifuge
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

NPPD: A Protein-Protein Docking Scoring Function Based on Dyadic Differences in Networks of Hydrophobic and Hydrophilic Amino Acid Residues

Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei 115, Taiwan
*
Author to whom correspondence should be addressed.
Biology 2015, 4(2), 282-297; https://doi.org/10.3390/biology4020282
Submission received: 27 November 2014 / Accepted: 16 March 2015 / Published: 24 March 2015
(This article belongs to the Special Issue Protein-Protein Interactions)

Abstract

:
Protein-protein docking (PPD) predictions usually rely on the use of a scoring function to rank docking models generated by exhaustive sampling. To rank good models higher than bad ones, a large number of scoring functions have been developed and evaluated, but the methods used for the computation of PPD predictions remain largely unsatisfactory. Here, we report a network-based PPD scoring function, the NPPD, in which the network consists of two types of network nodes, one for hydrophobic and the other for hydrophilic amino acid residues, and the nodes are connected when the residues they represent are within a certain contact distance. We showed that network parameters that compute dyadic interactions and those that compute heterophilic interactions of the amino acid networks thus constructed allowed NPPD to perform well in a benchmark evaluation of 115 PPD scoring functions, most of which, unlike NPPD, are based on some sort of protein-protein interaction energy. We also showed that NPPD was highly complementary to these energy-based scoring functions, suggesting that the combined use of conventional scoring functions and NPPD might significantly improve the accuracy of current PPD predictions.

1. Introduction

Living cells are a crowded environment in which most proteins interact with other proteins to exert cellular functions. To understand how protein-protein interactions mediate cellular processes, scientists often need to describe the structures of protein complexes at the atomic level. However, due to the difficulty in determining the atomic structures of protein complexes using experimental methods, protein-protein docking (PPD), a computational approach, is often used to complement results from experimental studies [1].
Most methods for PPD predictions involve a two-step strategy, sampling and scoring. For sampling, numerous docking models, also referred to as docking poses or decoys, are often generated from a global search of all possible relative orientations of, and separations between, two proteins that are brought together to form a complex, then these docking poses are ranked by a scoring function. To evaluate the performance of a given scoring function for a set of protein complexes, the TopN success rate is usually employed, in which a “success” hit for a complex is defined as when at least one of its top N docking poses, as ranked by the scoring function, satisfies a specified criterion for being a good (i.e., near-native) model. It follows that, for a given scoring function, a higher success rate (i.e., a higher number of correctly predicted complexes) can be obtained by choosing to compute the success rate at a larger N, since, for a given complex, there will be more poses and, thus, a higher probability of at least one being considered good. The objective when developing a good PPD scoring function is, therefore, to rank good poses as high and bad poses as low. However, despite significant progress in recent years, this is still an active area of research [2,3], as success rates are still low when small values of N are used (e.g., using a stringent criterion, Top1 and Top10 success rates are, respectively, generally below 10% and 20%), unless dockings are guided by experimentally-derived data or information [4,5].
Most PPD scoring functions use a set of mathematical equations to compute the energy resulting from the formation of the protein complex. To do so, many use molecular mechanics functions [6,7,8,9,10,11,12,13,14,15,16], while others use statistical mechanics methods to derive potentials from various sources, including experimentally-determined protein structures [8,10,17,18,19], docking decoys [6,20,21,22,23], homology models [24,25,26], or binding energy funnels [27,28]. Many non-energy-based PPD scoring functions have also been developed, including those that utilize bioinformatics-predicted information [29,30], shape complementarity [31,32], machine learning [33,34,35], coevolution [36], and amino acid networks (AANs) [37,38].
As described in the Experimental Section below, NPPD, the network-based PPD scoring function developed in this work, is based on AANs, which have also been referred to as residue contact networks [39], protein contact networks [40], protein structure networks [41], or residue interaction networks [42], although these networks may not be completely identical in terms of their construction (for reviews, see [39,40,41,43,44]). Owing to the appeal of network analysis in the era of post-genomics research, there has been an increase in the number of studies utilizing AANs to predict a protein’s functional sites [45,46,47], protein-protein [48,49,50,51] and protein-nucleic acid interaction [52,53], and to probe protein dynamics [42,54,55], folding [56,57,58] and structure [59,60,61,62,63]. Of these studies using AANs, two reports by Pons et al. [37] and Chang et al. [38] on PPD are directly relevant to the present work.
In AANs, the protein structure is modeled by a three-dimensional geometric network, with the amino acid residues (usually the Cα or Cβ atoms) being represented as network nodes and their contacts as network edges to capture the interactions between amino acids within the same protein structure and/or between two interacting proteins. Pons et al. [37] showed that network parameters, such as closeness and betweenness, can be used to suggest protein-protein interaction regions, and that an energy term that models this information can be added to an energy-based scoring function to improve PPD predictions. Chang et al. [38] used two networks for a single protein structure, one formed by hydrophobic residues and the other by hydrophilic residues, and analyzed the two networks from the same complex (docking pose) separately; their results again demonstrated that network properties can be used to assist conventional scoring functions to distinguish between good and bad PPD decoys.
Unlike Chang et al., in developing NPPD, we constructed only a single network for a single protein structure, allowing both the hydrophobic (H) and hydrophilic (i.e., polar, P) residue nodes to coexist in the same network. We were then able to investigate not only the effects of dyadicity calculated from the hydrophobic-hydrophobic (HH) and polar-polar (PP) interactions, but also the effects of heterophilicity calculated from the hydrophobic-polar (HP) interactions on the scoring of PPD poses. Benchmark evaluations showed that, using network parameters alone in all three methods, NPPD performed better than the network-assisted PPD predictions reported by Pons et al. [37] and Chang et al. [38], and that NPPD also performed well compared to most energy-based scoring functions. In addition, further analysis revealed significant complementarity between NPPD and the other scoring functions evaluated, demonstrating the merit of using a combination of NPPD and other types of scoring functions to further improve PPD predictions.

2. Experimental Section

Figure 1 outlines the procedures used to develop NPPD. Briefly, the interface residues of a given complex (i.e., docking pose) of protein A and protein B were determined, yielding the H and P nodes for the construction of the AANs for A and B. Eight parameters for each of the two networks were computed and served as attributes for training and testing a Bayesian network model using a PPD benchmark dataset. Note that, during the training of the Bayesian model, the complex context of all the poses was removed and each AAN was treated independently, although, during the machine learning, those that came from a good pose were used as positive incidences and those from a bad pose as negative incidences. Using the Bayesian model thus derived, NPPD can then score any given pose by multiplying together the Bayesian probabilities of the two AANs. This has the advantage of quickly eliminating most of the bad poses since it takes just one bad AAN (i.e., a low Bayesian probability) to produce a bad product (pose) of two AANs. Note that, as illustrated in Figure 1, our AAN was constructed on one side of the interface and did not extend to include contacts from the other side, because including inter-protein contacts did not improve the results [64], possibly owing to the fact that the connections of an inter-protein network can change significantly even by minor changes in the configuration of the docking pose. Still, it may be warranted for future studies to find a way to use inter-protein contacts productively in the Bayesian model.
Figure 1. Procedures used to develop NPPD. (a) An example of an amino acid network and the network parameters used in this study for a docking pose; (b) Flowchart of the training and testing of a Bayesian network model of NPPD.
Figure 1. Procedures used to develop NPPD. (a) An example of an amino acid network and the network parameters used in this study for a docking pose; (b) Flowchart of the training and testing of a Bayesian network model of NPPD.
Biology 04 00282 g001

2.1. Docking Datasets, Poses, and Quality Measures

The 176 protein complexes used in this study were retrieved from a PPD benchmark dataset of known atomic structures of complex component proteins in both the bound (complex) and unbound (free) form [65]. For each of the 176 complexes, two sets of docking poses from the unbound form were used to evaluate the performance of NPPD and compare it with those of several other PPD scoring functions. One set contained the top 54,000 poses for each of 176 complexes generated by ZDOCK [66] and was downloaded from its website (http://zlab.umassmed.edu/zdock/decoys.shtml). The other set, kindly provided by the authors of a large-scale evaluation of 115 scoring functions [67], consisted of ~500 poses generated using SwarmDock [68] for each of a subset containing 118 complexes. The two sets came with their own quality measures for near-native poses, i.e., the so-called good poses; that used for the ZDOCK-generated set was an interface RMSD (IRMSD) < 2.5 Å, where IRMSD is the root mean square displacement of the interface residue’s Cα atoms from the experimentally determined structure of the bound complex and an interface residue is defined as one having at least one heavy (non-hydrogen) atom within 5 Å of any heavy atom in the second protein of the complex, while those used for the SwarmDock-generated set were three quality measures from the CAPRI criteria [2] for acceptable, medium, and high quality.

2.2. Amino Acid Networks and Network Parameters

As described above, two AANs were constructed from the interface residues of two interacting proteins locked in a docking pose. In this work, the 20 amino acids were divided into two classes according to Eisenberg et al. [69], the H class consisting of Gly, Ile, Leu, Val, Phe, Met, Trp, Cys, Tyr, and Ala, and the P class consisting of Lys, Thr, Ser, Gln, Asn, Glu, Asp, Arg, His, and Pro. Our AANs, thus, contained two types of nodes, H and P, and a network edge was established to connect any two nodes (residues) if any heavy atom in one of the residues was within 5.0 Å of any heavy atom in the other (Figure 1a).
For each AAN, we computed two dyadicity parameters, Dp-p and Dh-h, and one heterophilicity parameter, Hp-h, which, following the work of Park and Barabasi [70], are defined as:
D p p m p p E ( m p p ) , D h h m h h E ( m h h ) ,   and   H p h m p h E ( m p h )
where mpp, mhh, and mph are, respectively, the number of P-P, H-H, and P-H edges in the AAN, and the three denominators are the respectively expected values of mpp, mhh, and mph, which can be computed as:
E ( m p p ) = n p ( n p 1 ) 2 p ,   E ( m h h ) = n h ( n h 1 ) 2 p   and   E ( m p h ) = n p n h p
where np is the number of P nodes, nh the number of H nodes, and p = 2M/N(N-1) (M and N are the total number of edges and nodes, respectively) is connectance, which represents the average probability that two nodes in a dyadic network are connected [71].

2.3. Bayesian Network

To infer whether two AANs would generate a near-correct docking pose, we employed the machine learning algorithm implemented in the Weka platform [72] to derive a Bayesian network model [73], which we then used to compute the probability for every AAN of being at the interface of a protein complex. We then computed the probability product of two AANs to give an estimate of the likelihood of the resulting docking pose being a good one (Figure 1b). The aforementioned 176 benchmark complexes and their 54,000 poses per complex generated by ZDOCK were used in a leave-one-out training and testing of the Bayesian model, i.e., each of the 176 complexes was, in turn, left out during training of the model on AANs randomly selected from poses of the remaining 175 complexes and was then used as a test case. As shown in Figure 1b, we randomly selected 27,000 AANs from good poses, irrespective of whether they came from the same complex or not, as positive incidences and an equal number of AANs from bad poses as negative incidences, and used the values of the 8 parameters of Dp-p, Dh-h, Hp-h, mpp, mhh, mph, np and nh of the AANs as attributes for training. The training set-derived Bayesian model was then used to score poses of the left-out complex as a test of the model.

3. Results and Discussion

3.1. Performance of NPPD and IRAD

The TopN success rates obtained using poses created and ranked by ZDOCK [66] and IRAD [74], a state-of-the-art PPD scoring function, have often been used as yardsticks to evaluate PPD scoring functions [3,4,5]. Both ZDOCK and IRAD use a multitude of scoring terms, such as shape complementarity, interface atomic contact energy, and electrostatics, and IRAD also uses both atom-based and residue-based potentials [66,74]. As can be seen in Figure 2, using the 54,000 poses created by ZDOCK for each of the 176 benchmark complexes, the Bayesian probabilities of NPPD produced worse Top1 and Top10 success rates than either ZDOCK or IRAD, but, as N increased, the success rates increased faster for NPPD than for ZDOCK or IRAD, with NPPD outperforming the other two when N > 100.
Figure 2. TopN success rates for NPPD, ZDOCK, and IRAD on the benchmark dataset of the unbound docking poses of 176 protein complexes. IRMSD < 2.5 Å was used to determine good (near-correct) poses. The success rates of ZDOCK and IRAD were obtained from the ZDOCK website (http://zlab.umassmed.edu/zdock/perf_decoys.shtml).
Figure 2. TopN success rates for NPPD, ZDOCK, and IRAD on the benchmark dataset of the unbound docking poses of 176 protein complexes. IRMSD < 2.5 Å was used to determine good (near-correct) poses. The success rates of ZDOCK and IRAD were obtained from the ZDOCK website (http://zlab.umassmed.edu/zdock/perf_decoys.shtml).
Biology 04 00282 g002
Despite the low success rates of NPPD at a low N, it is interesting that, as shown in Table 1, many of the complexes that NPPD succeeded at predicting were different from those predicted by IRAD and vice versa. The complementarity between the two methods, measured as the ratio of the method-unique successes divided by all successes and expressed as a percentage, was especially significant at low N, being as high as 86% for the Top1 success rate (only 3 out of 22 complexes were successfully predicted by both methods).
Table 1. Number of benchmark complexes successfully predicted by NPPD and/or IRAD at different TopN success rates.
Table 1. Number of benchmark complexes successfully predicted by NPPD and/or IRAD at different TopN success rates.
SetTop1Top10Top100Top1000Top2000
NPPD (A)92865102110
IRAD (B)16436492102
Intersection (A∩B)315448095
Union (A∪B) = a225685114117
Unique to NPPD or IRAD (A⊖B) = b1941413422
Complementarity = b/a86%73%48%30%19%
⊖ (Symmetric difference): the set of elements in either of the sets and not in their intersection.

3.2. Comparison with Other Network-Based Methods

As mentioned in the Introduction, two other groups have used AANs to help score docking poses [37,38]. Table 2 compares our results with their reported success rates and shows that, using the same benchmark dataset and the same criterion for success hits, when the scoring was based on network parameters alone, NPPD produced a better Top1 and Top10 success rate: e.g., the values for the Top10 success rate was 18.5% using NPPD versus 10.6% in Pons et al. [37] for the 176 complexes of the benchmark and 25.6% using NPPD versus 23.2% in Chang et al. [38] for a subset of 43 complexes. However, it should be noted that different sampling algorithms (FTDOCK [16], RossettaDock [75], and ZDOCK [66]) were used to generate the same number of poses for evaluation, which may have contributed to the differences in success rates obtained. Several aspects of the use of AANs were also different: (i) as mentioned earlier, our AAN was different from that of Pons et al. [37], which represents all amino acids by just one type of network node, and from that of Chang et al. [38], which, although, like ours, has both H and P nodes, creates two separate AANs for the two different types of nodes; (ii) as also mentioned earlier, unlike these two other networks, our AAN did not include inter-protein contacts; (iii) whereas we used dyadicity and heterophilicity parameters for scoring, the other two studies used more conventional network parameters, such as degree and cluster coefficient [38] and closeness and betweenness [37]; (iv) NPPD was used to score docking poses by itself, whereas the network-based scoring functions of the other two studies are additional terms that can be added to an existing scoring function to give a better result [37,38] (Table 2), and, if these results also apply to our method, incorporating NPPD into existing scoring functions should achieve significantly higher success rates.
Table 2. Conditions and Top1/Top10 success rates for NPPD and two other network-based scoring functions.
Table 2. Conditions and Top1/Top10 success rates for NPPD and two other network-based scoring functions.
Conditions of docking poses176 Complexes43 Complexes
Pons et al. [37]NPPDChang et al. [38]NPPD
Generation of docking posesFTDock [16]ZDOCKRossettaDock 1.0 [75]ZDOCK
Number of poses generated10,0001000
Criterion for a success hitL-RMSD < 10 ÅL-RMSD < 5 Å
Top 1 success rate *5.0% (7.0%)8.0%2.3% (25.6%)11.6%
Top10 success rate *10.6% (29.8%)18.5%23.2% (53.4%)25.6%
* The values in parenthesis are success rates produced by combining the network parameters and the energy terms of the sampling method.

3.3. Performance of NPPD in a Comprehensive Evaluation of a Number of PPD Scoring Functions

Since many factors can affect the performance of PPD scoring functions, one example being the evaluation of docking poses produced by different sampling methods as mentioned above, it was important to evaluate NPPD further. Recently, a large-scale evaluation of 115 PPD scoring functions was reported [67], in which the authors ranked these scoring functions by comparing their Top1, Top10, and Top100 success rates on a set of docking poses produced by SwarmDock [68]. As shown in Figure 3a, using the same set of docking poses, the leave-one-out Bayesian model of NPPD produced TopN success rates comparable to those produced by the best performers of the 115 scoring functions evaluated (ranked 7th by Top10 success rate). Note that, with the exception of the 1st-ranked ZRANK2 method [12], an earlier version of IRAD, which perhaps stands out a little bit from the others, these 20 top performers were more or less equally good, as the absolute ranking depended on which success rate (Top1, 10, or 100) and which quality measure (acceptable, medium, or high) were used as the basis for ranking. Note also that, of these top performers, NPPD was the only one using network parameters (the scoring functions of Pons et al. [37] and Chang et al. [38] were not included in the 115 PPD scoring functions previously evaluated [67]).
Using the complementarity between two PPD scoring functions as defined in Table 1, i.e., the ratio of the number of complexes successfully predicted by either, but not both, of the two functions divided by the total number of successfully predicted complexes, the results, presented in Figure 3b, showed that the complementarity of NPPD with each of 16 other best performers was generally higher than the averaged complementarity exhibited by the other methods, especially in the case of the Top1 and Top10 success rates. Interestingly, although SPIDER [76], another AAN-based PPD scoring function, ranked only 38th of the 115 scoring functions evaluated [67], it is especially good at predicting complexes not detected by conventional scoring functions [67]. Unlike NPPD and the methods used by Pons et al. [37] and Chang et al. [38], SPIDER uses motifs of network structures, rather than network parameters, for scoring.

3.4. Some Limitations and Prospects

Without the ability to handle large conformational change induced by complex formation, PPD methods would perform badly for such complexes [2]. Indeed, both NPPD and IRAD failed to produce a Top100 success hit for those in the benchmark set with the largest unbound/bound IRMSDs, indicative of a significant change in conformation between the unbound and bound form of the complex (Figure 4). However, conformational change is not the only culprit for failures in PPD predictions. Figure 4 shows that if sampling could not produce a sufficient number, say 300, of positive (good) poses as defined by IRMSD < 2.5 Å (see Figure 1b) to score upon, the likelihood for either NPPD or IRAD to succeed was drastically decreased, even for complexes considered as “rigid” [65]. Further analysis indicated that some of these “rigid” complexes had a particularly small interface and hence might be difficult to sample and predict [77]. Since the best current scoring functions all performed similarly (Figure 3), we speculate that the same two factors, conformational change and insufficient sampling of good poses, also limit the success of other PPD methods. Note that while the sampling of good poses among different complexes was unbalanced, the distribution of the attributes used by NPPD was not (Figure 4), suggesting that sampling bias would not significantly affect training of the Bayesian model. While it is not entirely clear to us what gave rise to the apparently poor correlation between the number of good poses sampled and unbound/bound IRMSD as observed in Figure 4, it is notable that NPPD was better than IRAD for a few of those with the smallest unbound/bound IRMSDs and poor sampling, whereas IRAD did much better than NPPD for those ranked next in unbound/bound IRMSD (roughly between complex 1PPE and 2QFW in Figure 4), thereby contributing partly to the high complementarity between the two methods (Table 1). Taken all these results together, we can conclude that while it is still likely to significantly improve PPD performance by combining all the different scoring functions, the main barriers to overcome remain those arising from sampling and conformational change.
Figure 3. Benchmark results for NPPD and complementarity of NPPD and several best performing PPD scoring functions. (a) The 20 best performing PPD scoring functions ordered, from left to right, by increasing Top10 success rate. All data except those for NPPD were taken from [67]. Note that the Top1, Top10, and Top100 success rates for each method, shown, respectively, as the left, center, and right bar in each group, were computed using a set of unbound docking poses (~500 for each of 118 complexes) generated by SwarmDock [68], which was different from the set generated by ZDOCK used in Figure 2 and Table 1. The leave-one-out Bayesian model of NPPD was therefore derived using these SwarmDock poses, but otherwise using the same procedures described in Figure 1. The portions of success rates for high, medium, and acceptable quality poses are shown, respectively, in red, orange, and yellow, the criteria for the three quality measures being those used by CAPRI [2]; (b) Complementarity between NPPD and each of another 16 best performing PPD scoring functions. The blue, purple, and green bars indicate the complementarity, as defined in Table 1, computed based on, respectively, the Top1, Top10, or Top100 success rates. The horizontal blue, purple, and green lines are the averaged complementarity for, respectively, theTop1, Top10, or Top100 success rates for all pairs of the 16 scoring functions (three of the scoring functions (SIPPER, PYDOCK_TOT, and PROPNSTS) of the 19 compared in (a) were not included because the data were not made available to us). References for these 19 PPD scoring functions can be found in Reference [67] and references therein.
Figure 3. Benchmark results for NPPD and complementarity of NPPD and several best performing PPD scoring functions. (a) The 20 best performing PPD scoring functions ordered, from left to right, by increasing Top10 success rate. All data except those for NPPD were taken from [67]. Note that the Top1, Top10, and Top100 success rates for each method, shown, respectively, as the left, center, and right bar in each group, were computed using a set of unbound docking poses (~500 for each of 118 complexes) generated by SwarmDock [68], which was different from the set generated by ZDOCK used in Figure 2 and Table 1. The leave-one-out Bayesian model of NPPD was therefore derived using these SwarmDock poses, but otherwise using the same procedures described in Figure 1. The portions of success rates for high, medium, and acceptable quality poses are shown, respectively, in red, orange, and yellow, the criteria for the three quality measures being those used by CAPRI [2]; (b) Complementarity between NPPD and each of another 16 best performing PPD scoring functions. The blue, purple, and green bars indicate the complementarity, as defined in Table 1, computed based on, respectively, the Top1, Top10, or Top100 success rates. The horizontal blue, purple, and green lines are the averaged complementarity for, respectively, theTop1, Top10, or Top100 success rates for all pairs of the 16 scoring functions (three of the scoring functions (SIPPER, PYDOCK_TOT, and PROPNSTS) of the 19 compared in (a) were not included because the data were not made available to us). References for these 19 PPD scoring functions can be found in Reference [67] and references therein.
Biology 04 00282 g003aBiology 04 00282 g003b
In this work, instead of using two-fold validation as did Chang et al. [38], we opted for the leave-one-out validation of machine learning so that every complex of the benchmark set can be a test and the performance of NPPD can be fully compared with other scoring functions. Technical differences aside, machine learning techniques are known to be unreliable for extrapolation, and only methods based on first-principles physics can truly predict and would not fail miserably when encountering complexes with an unusual interface [78]. However, as such an ideal method is not yet in sight, there is room and merit to further develop empirical methods, such as NPPD, since a new method, particularly a nonconventional one, can often reveal shortfalls of existing methods.
Figure 4. Number of positive poses and Dp-p plotted against unbound/bound IRMSD. The 176 benchmark complexes of ZDOCK are ordered in increasing unbound/bound IRMSD, the best RMSD of interface residues superimposed between the unbound form and the bound form of the complex, with the PDB ID of every 5th complex indicated on the X-axis. Dashed line denotes a number of 300 positive poses. In the top half of the figure are the averages and standard deviations of the parameter Dp-p computed from the positive poses of each complex; all other attributes used by NPPD, and for negative poses, showed a similar random distribution [64].
Figure 4. Number of positive poses and Dp-p plotted against unbound/bound IRMSD. The 176 benchmark complexes of ZDOCK are ordered in increasing unbound/bound IRMSD, the best RMSD of interface residues superimposed between the unbound form and the bound form of the complex, with the PDB ID of every 5th complex indicated on the X-axis. Dashed line denotes a number of 300 positive poses. In the top half of the figure are the averages and standard deviations of the parameter Dp-p computed from the positive poses of each complex; all other attributes used by NPPD, and for negative poses, showed a similar random distribution [64].
Biology 04 00282 g004

4. Conclusions

In this work, we showed that a Bayesian model based on the dyadic parameters of AANs of docking poses performed well compared to the best scoring functions currently used for PPD predictions. Furthermore, the results showed that our method can complement other methods by finding good poses for a significant number of complexes missed by these methods. Taken together with the findings in a recent large-scale evaluation of 115 PPD scoring functions [67], these results suggest that non-conventional scoring functions, such as that developed in the present study, are worthy of further investigation in the effort to improve the prediction of protein complex structures.

Acknowledgments

We thank Fernández-Recio for providing the SwarmDock models. This work was supported by the Ministry of Science and Technology, Taiwan (grant nos. NSC97-2311-B-001-011-MY3 and NSC-97-2627-P-001-004). We thank Tom Barkas for English editing.

Author Contributions

Edward S.C. Shih and Ming-Jing Hwang conceived and designed the experiments, analyzed the data, and wrote the paper, while Edward S.C. Shih performed the experiments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Mosca, R.; Pons, T.; Ceol, A.; Valencia, A.; Aloy, P. Towards a detailed atlas of protein-protein interactions. Curr. Opin. Struct. Biol. 2013, 23, 929–940. [Google Scholar] [CrossRef] [PubMed]
  2. Lensink, M.F.; Wodak, S.J. Docking, scoring, and affinity prediction in CAPRI. Proteins 2013, 81, 2082–2095. [Google Scholar] [CrossRef] [PubMed]
  3. Moal, I.H.; Moretti, R.; Baker, D.; Fernandez-Recio, J. Scoring functions for protein-protein interactions. Curr. Opin. Struct. Biol. 2013, 23, 862–867. [Google Scholar] [CrossRef] [PubMed]
  4. Shih, E.S.C.; Hwang, M.J. A critical assessment of information-guided protein-protein docking predictions. Mol. Cell Proteomics 2013, 12, 679–686. [Google Scholar] [CrossRef] [PubMed]
  5. Shih, E.S.C.; Hwang, M.J. On the use of distance constraints in protein-protein docking computations. Proteins Struct. Funct. Bioinform. 2012, 80, 194–205. [Google Scholar] [CrossRef]
  6. Viswanath, S.; Ravikant, D.V.; Elber, R. Improving ranking of models for protein complexes with side chain modeling and atomic potentials. Proteins 2013, 81, 592–606. [Google Scholar] [CrossRef] [PubMed]
  7. Pallara, C.; Jimenez-Garcia, B.; Perez-Cano, L.; Romero-Durana, M.; Solernou, A.; Grosdidier, S.; Pons, C.; Moal, I.H.; Fernandez-Recio, J. Expanding the frontiers of protein-protein modeling: From docking and scoring to binding affinity predictions and other challenges. Proteins 2013, 81, 2192–2200. [Google Scholar] [CrossRef] [PubMed]
  8. Pons, C.; Talavera, D.; de la Cruz, X.; Orozco, M.; Fernandez-Recio, J. Scoring by intermolecular pairwise propensities of exposed residues (SIPPER): A new efficient potential for protein-protein docking. J. Chem. Inf. Model 2011, 51, 370–337. [Google Scholar] [CrossRef] [PubMed]
  9. Mitra, P.; Pal, D. Using correlated parameters for improved ranking of protein-protein docking decoys. J. Comput. Chem. 2011, 32, 787–796. [Google Scholar] [CrossRef] [PubMed]
  10. Tobi, D. Designing coarse grained-and atom based-potentials for protein-protein docking. BMC Struct. Biol. 2010, 10, 40. [Google Scholar] [CrossRef] [PubMed]
  11. Demir-Kavuk, O.; Krull, F.; Chae, M.H.; Knapp, E.W. Predicting protein complex geometries with linear scoring functions. Genome Inform. 2010, 24, 21–30. [Google Scholar] [PubMed]
  12. Pierce, B.; Weng, Z. A combination of rescoring and refinement significantly improves protein docking performance. Proteins 2008, 72, 270–279. [Google Scholar] [CrossRef] [PubMed]
  13. Andrusier, N.; Nussinov, R.; Wolfson, H.J. FireDock: Fast interaction refinement in molecular docking. Proteins 2007, 69, 139–159. [Google Scholar] [CrossRef] [PubMed]
  14. Cheng, T.M.; Blundell, T.L.; Fernandez-Recio, J. pyDock: Electrostatics and desolvation for effective scoring of rigid-body protein-protein docking. Proteins 2007, 68, 503–515. [Google Scholar] [CrossRef] [PubMed]
  15. Murphy, J.; Gatchell, D.W.; Prasad, J.C.; Vajda, S. Combination of scoring functions improves discrimination in protein-protein docking. Proteins 2003, 53, 840–854. [Google Scholar] [CrossRef] [PubMed]
  16. Gabb, H.A.; Jackson, R.M.; Sternberg, M.J. Modelling protein docking using shape complementarity, electrostatics and biochemical information. J. Mol. Biol. 1997, 272, 106–120. [Google Scholar] [CrossRef] [PubMed]
  17. Liu, S.; Vakser, I.A. DECK: Distance and environment-dependent, coarse-grained, knowledge-based potentials for protein-protein docking. BMC Bioinform. 2011, 12, 280. [Google Scholar] [CrossRef]
  18. Lu, H.; Lu, L.; Skolnick, J. Development of unified statistical potentials describing protein-protein interactions. Biophys. J. 2003, 84, 1895–1901. [Google Scholar] [CrossRef] [PubMed]
  19. Miyazawa, S.; Jernigan, R.L. Self-consistent estimation of inter-residue protein contact energies based on an equilibrium mixture approximation of residues. Proteins 1999, 34, 49–68. [Google Scholar] [CrossRef] [PubMed]
  20. Omori, S.; Kitao, A. CyClus: A fast, comprehensive cylindrical interface approximation clustering/reranking method for rigid-body protein-protein docking decoys. Proteins 2013, 81, 1005–1016. [Google Scholar] [CrossRef] [PubMed]
  21. Chuang, G.Y.; Kozakov, D.; Brenke, R.; Comeau, S.R.; Vajda, S. DARS (Decoys as the Reference State) potentials for protein-protein docking. Biophys. J. 2008, 95, 4217–4227. [Google Scholar] [CrossRef] [PubMed]
  22. Muller, W.; Sticht, H. A protein-specifically adapted scoring function for the reranking of docking solutions. Proteins 2007, 67, 98–111. [Google Scholar] [CrossRef] [PubMed]
  23. Esmaielbeiki, R.; Nebel, J.C. Scoring docking conformations using predicted protein interfaces. BMC Bioinform. 2014, 15, 171. [Google Scholar] [CrossRef]
  24. Anishchenko, I.; Kundrotas, P.J.; Tuzikov, A.V.; Vakser, I.A. Protein models: The grand challenge of protein docking. Proteins 2014, 82, 278–287. [Google Scholar] [CrossRef] [PubMed]
  25. Kundrotas, P.J.; Vakser, I.A. Global and local structural similarity in protein-protein complexes: Implications for template-based docking. Proteins 2013, 81, 2137–2142. [Google Scholar] [CrossRef] [PubMed]
  26. Torchala, M.; Moal, I.H.; Chaleil, R.A.; Agius, R.; Bates, P.A. A Markov-chain model description of binding funnels to enhance the ranking of docked solutions. Proteins 2013, 81, 2143–2149. [Google Scholar] [CrossRef] [PubMed]
  27. London, N.; Schueler-Furman, O. Funnel hunting in a rough terrain: Learning and discriminating native energy funnels. Structure 2008, 16, 269–279. [Google Scholar] [CrossRef] [PubMed]
  28. Kozakov, D.; Schueler-Furman, O.; Vajda, S. Discrimination of near-native structures in protein-protein docking by testing the stability of local minima. Proteins 2008, 72, 993–1004. [Google Scholar] [CrossRef] [PubMed]
  29. Schneidman-Duhovny, D.; Rossi, A.; Avila-Sakar, A.; Kim, S.J.; Velazquez-Muriel, J.; Strop, P.; Liang, H.; Krukenberg, K.A.; Liao, M.; Kim, H.M.; et al. A method for integrative structure determination of protein-protein complexes. Bioinformatics 2012, 28, 3282–3289. [Google Scholar] [CrossRef] [PubMed]
  30. De Vries, S.J.; Bonvin, A.M. CPORT: A consensus interface predictor and its performance in prediction-driven docking with HADDOCK. PLOS ONE 2011, 6, e17695. [Google Scholar] [CrossRef] [PubMed]
  31. Gu, S.; Koehl, P.; Hass, J.; Amenta, N. Surface-histogram: A new shape descriptor for protein-protein docking. Proteins 2012, 80, 221–238. [Google Scholar] [CrossRef] [PubMed]
  32. Shentu, Z.; al Hasan, M.; Bystroff, C.; Zaki, M.J. Context shapes: Efficient complementary shape matching for protein-protein docking. Proteins 2008, 70, 1056–1073. [Google Scholar] [CrossRef] [PubMed]
  33. Fink, F.; Hochrein, J.; Wolowski, V.; Merkl, R.; Gronwald, W. PROCOS: Computational analysis of protein-protein complexes. J. Comput. Chem. 2011, 32, 2575–2586. [Google Scholar] [CrossRef] [PubMed]
  34. Bourquard, T.; Bernauer, J.; Aze, J.; Poupon, A. A collaborative filtering approach for protein-protein docking scoring functions. PLOS ONE 2011, 6, e18541. [Google Scholar] [CrossRef] [PubMed]
  35. Chae, M.H.; Krull, F.; Lorenzen, S.; Knapp, E.W. Predicting protein complex geometries with a neural network. Proteins 2010, 78, 1026–1039. [Google Scholar] [CrossRef] [PubMed]
  36. Andreani, J.; Faure, G.; Guerois, R. InterEvScore: A novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution. Bioinformatics 2013, 29, 1742–1749. [Google Scholar] [CrossRef] [PubMed]
  37. Pons, C.; Glaser, F.; Fernandez-Recio, J. Prediction of protein-binding areas by small-world residue networks and application to docking. BMC Bioinform. 2011, 12, 378. [Google Scholar] [CrossRef]
  38. Chang, S.; Jiao, X.; Li, C.H.; Gong, X.Q.; Chen, W.Z.; Wang, C.X. Amino acid network and its scoring application in protein-protein docking. Biophys. Chem. 2008, 134, 111–118. [Google Scholar] [CrossRef] [PubMed]
  39. Zhang, X.; Perica, T.; Teichmann, S.A. Evolution of protein structures and interactions from the perspective of residue contact networks. Curr. Opin. Struct. Biol. 2013, 23, 954–963. [Google Scholar] [CrossRef] [PubMed]
  40. Di Paola, L.; de Ruvo, M.; Paci, P.; Santoni, D.; Giuliani, A. Protein contact networks: An emerging paradigm in chemistry. Chem. Rev. 2013, 113, 1598–1613. [Google Scholar] [CrossRef] [PubMed]
  41. Greene, L.H. Protein structure networks. Brief Funct. Genomics 2012, 11, 469–478. [Google Scholar] [CrossRef] [PubMed]
  42. Giollo, M.; Martin, A.J.; Walsh, I.; Ferrari, C.; Tosatto, S.C. NeEMO: A method using residue interaction networks to improve prediction of protein stability upon mutation. BMC Genomics 2014, 15, S7. [Google Scholar] [CrossRef] [PubMed]
  43. Krishnan, A.; Zbilut, J.P.; Tomita, M.; Giuliani, A. Proteins as networks: Usefulness of graph theory in protein science. Curr. Protein Pept. Sci. 2008, 9, 28–38. [Google Scholar] [CrossRef] [PubMed]
  44. Yan, W.; Zhou, J.; Sun, M.; Chen, J.; Hu, G.; Shen, B. The construction of an amino acid network for understanding protein structure and function. Amino Acids 2014, 46, 1419–1439. [Google Scholar] [CrossRef] [PubMed]
  45. Peng, W.; Wang, J.; Chen, L.; Zhong, J.; Zhang, Z.; Pan, Y. Predicting Protein Functions by using unbalanced bi-random walk algorithm on protein-protein interaction network and functional interrelationship network. Curr. Protein Pept. Sci. 2014, 15, 529–539. [Google Scholar] [CrossRef] [PubMed]
  46. Axe, J.M.; Yezdimer, E.M.; O’Rourke, K.F.; Kerstetter, N.E.; You, W.; Chang, C.E.; Boehr, D.D. Amino acid networks in a (beta/alpha)(8) barrel enzyme change during catalytic turnover. J. Am. Chem. Soc. 2014, 136, 6818–6821. [Google Scholar] [CrossRef] [PubMed]
  47. Lee, B.C.; Park, K.; Kim, D. Analysis of the residue-residue coevolution network and the functionally important residues in proteins. Proteins 2008, 72, 863–872. [Google Scholar] [CrossRef] [PubMed]
  48. Luo, Q.; Hamer, R.; Reinert, G.; Deane, C.M. Local network patterns in protein-protein interfaces. PLOS ONE 2013, 8, e57031. [Google Scholar] [CrossRef] [PubMed]
  49. Johnson, M.E.; Hummer, G. Interface-resolved network of protein-protein interactions. PLOS Comput. Biol. 2013, 9, e1003065. [Google Scholar] [CrossRef] [PubMed]
  50. Goebels, F.; Frishman, D. Prediction of protein interaction types based on sequence and network features. BMC Syst. Biol. 2013, 7, S5. [Google Scholar] [CrossRef] [PubMed]
  51. Del Sol, A.; O’Meara, P. Small-world network approach to identify key residues in protein-protein interaction. Proteins 2005, 58, 672–682. [Google Scholar] [CrossRef] [PubMed]
  52. Maetschke, S.R.; Yuan, Z. Exploiting structural and topological information to improve prediction of RNA-protein binding sites. BMC Bioinform. 2009, 10, 341. [Google Scholar] [CrossRef]
  53. Sathyapriya, R.; Vijayabaskar, M.S.; Vishveshwara, S. Insights into protein-DNA interactions through structure network analysis. PLOS Comput. Biol. 2008, 4, e1000170. [Google Scholar] [CrossRef] [PubMed]
  54. Montiel Molina, H.M.; Millan-Pacheco, C.; Pastor, N.; del Rio, G. Computer-based screening of functional conformers of proteins. PLOS Comput. Biol. 2008, 4, e1000009. [Google Scholar] [CrossRef] [PubMed]
  55. Bode, C.; Kovacs, I.A.; Szalay, M.S.; Palotai, R.; Korcsmaros, T.; Csermely, P. Network analysis of protein dynamics. FEBS Lett. 2007, 581, 2776–2782. [Google Scholar] [CrossRef] [PubMed]
  56. Li, J.; Wang, J.; Wang, W. Identifying folding nucleus based on residue contact networks of proteins. Proteins 2008, 71, 1899–1907. [Google Scholar] [CrossRef] [PubMed]
  57. Bagler, G.; Sinha, S. Assortative mixing in protein contact networks and protein folding kinetics. Bioinformatics 2007, 23, 1760–1707. [Google Scholar] [CrossRef] [PubMed]
  58. Vendruscolo, M.; Dokholyan, N.V.; Paci, E.; Karplus, M. Small-world view of the amino acids that play a key role in protein folding. Phys. Rev. E 2002, 65, 061910. [Google Scholar] [CrossRef]
  59. Bhattacharyya, M.; Bhat, C.R.; Vishveshwara, S. An automated approach to network features of protein structure ensembles. Protein Sci. 2013, 22, 1399–1416. [Google Scholar] [PubMed]
  60. Khor, S. Towards an integrated understanding of the structural characteristics of protein residue networks. Theory Biosci. 2012, 131, 61–75. [Google Scholar] [CrossRef] [PubMed]
  61. Estrada, E. Universality in protein residue networks. Biophys. J. 2010, 98, 890–900. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  62. Brinda, K.V.; Vishveshwara, S. A network representation of protein structures: Implications for protein stability. Biophys. J. 2005, 89, 4159–4170. [Google Scholar] [CrossRef] [PubMed]
  63. Bagler, G.; Sinha, S. Network properties of protein structures. Phys. A 2005, 346, 27–33. [Google Scholar] [CrossRef]
  64. Shih, E.S.C.; Hwang, M.-J.; Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan. Unpublished data. 2015.
  65. Hwang, H.; Vreven, T.; Janin, J.; Weng, Z. Protein-protein docking benchmark version 4.0. Proteins 2010, 78, 3111–3114. [Google Scholar] [CrossRef] [PubMed]
  66. Pierce, B.G.; Hourai, Y.; Weng, Z.P. Accelerating protein docking in ZDOCK using an advanced 3D convolution library. PLOS ONE 2011, 6, e24657. [Google Scholar] [CrossRef] [PubMed]
  67. Moal, I.H.; Torchala, M.; Bates, P.A.; Fernandez-Recio, J. The scoring of poses in protein-protein docking: Current capabilities and future directions. BMC Bioinform. 2013, 14, 286. [Google Scholar] [CrossRef]
  68. Torchala, M.; Bates, P.A. Predicting the structure of protein-protein complexes using the SwarmDock Web Server. Methods Mol. Biol. 2014, 1137, 181–197. [Google Scholar] [PubMed]
  69. Eisenberg, D.; Weiss, R.M.; Terwilliger, T.C.; Wilcox, W. Hydrophobic Moments and Protein-Structure. Faraday Symp. Chem. S 1982, 17, 109–120. [Google Scholar] [CrossRef]
  70. Park, J.; Barabasi, A.L. Distribution of node characteristics in complex networks. Proc. Natl. Acad. Sci. USA 2007, 104, 17916–17920. [Google Scholar] [CrossRef] [PubMed]
  71. Fienberg, S.E.; Meyer, M.M.; Wasserman, S.S. Statistical-Analysis of Multiple Sociometric Relations. J. Am. Stat. Assoc. 1985, 80, 51–67. [Google Scholar] [CrossRef]
  72. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: An update. SIGKDD Explor. Newsl. 2009, 11, 10–18. [Google Scholar] [CrossRef]
  73. Needham, C.J.; Bradford, J.R.; Bulpitt, A.J.; Westhead, D.R. Inference in Bayesian networks. Nat. Biotechnol. 2006, 24, 51–53. [Google Scholar] [CrossRef] [PubMed]
  74. Vreven, T.; Hwang, H.; Weng, Z. Integrating atom-based and residue-based scoring functions for protein-protein docking. Protein Sci. 2011, 20, 1576–1586. [Google Scholar] [CrossRef] [PubMed]
  75. Gray, J.J.; Moughon, S.; Wang, C.; Schueler-Furman, O.; Kuhlman, B.; Rohl, C.A.; Baker, D. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J. Mol. Biol. 2003, 331, 281–299. [Google Scholar] [CrossRef] [PubMed]
  76. Khashan, R.; Zheng, W.; Tropsha, A. Scoring protein interaction decoys using exposed residues (SPIDER): A novel multibody interaction scoring function based on frequent geometric patterns of interfacial residues. Proteins 2012, 80, 2207–2217. [Google Scholar] [CrossRef] [PubMed]
  77. Ritchie, D.W.; Kozakov, D.; Vajda, S. Accelerating and focusing protein-protein docking correlations using multi-dimensional rotational FFT generating functions. Bioinformatics 2008, 24, 1865–1873. [Google Scholar] [CrossRef] [PubMed]
  78. Moreira, I.S.; Martins, J.M.; Coimbra, J.T.; Ramos, M.J.; Fernandes, P.A. A new scoring function for protein-protein docking that identifies native structures with unprecedented accuracy. Phys. Chem. Chem. Phys. 2015, 17, 2378–2387. [Google Scholar] [CrossRef] [PubMed]

Share and Cite

MDPI and ACS Style

Shih, E.S.C.; Hwang, M.-J. NPPD: A Protein-Protein Docking Scoring Function Based on Dyadic Differences in Networks of Hydrophobic and Hydrophilic Amino Acid Residues. Biology 2015, 4, 282-297. https://doi.org/10.3390/biology4020282

AMA Style

Shih ESC, Hwang M-J. NPPD: A Protein-Protein Docking Scoring Function Based on Dyadic Differences in Networks of Hydrophobic and Hydrophilic Amino Acid Residues. Biology. 2015; 4(2):282-297. https://doi.org/10.3390/biology4020282

Chicago/Turabian Style

Shih, Edward S. C., and Ming-Jing Hwang. 2015. "NPPD: A Protein-Protein Docking Scoring Function Based on Dyadic Differences in Networks of Hydrophobic and Hydrophilic Amino Acid Residues" Biology 4, no. 2: 282-297. https://doi.org/10.3390/biology4020282

Article Metrics

Back to TopTop