Modeling Side Chains in the Three-Dimensional Structure of Proteins for Post-Translational Modifications
Abstract
:1. Introduction
2. Results
- Monte Carlo Markov Chain (MCMC) sampling (rotamer) using rotamer libraries. Dunbrack rotamer libraries were used for canonical amino acid residues, and proprietary libraries were assembled for five common post-translational modifications.
- Monte Carlo Markov Chain (MCMC) sampling (off-rotamer): This algorithm allows side-chain torsion angles to go beyond the values of the rotamer library. The rotamer library is used only to control the degree of changes in angles.
- Generative algorithm (GA-rotamer) is an evolutionary search algorithm with initialization of the initial population from the rotamer library.
- Generative algorithm (GA-random) is an algorithm with initialization of the initial population from a uniform distribution. The rotamer library is not used in this algorithm.
- All side chains were removed from the PDB structure.
- All side chains were restored, and side chains were repackaged within a radius of 10 Å from the mutation point using the algorithms described before.
- Abnormally closely located atoms;
- Going beyond the allowable values of the Ramachandra map;
- Abnormal angles or out of angles of the rotamers.
- The best results in our study, in terms of both accuracy and processing speed, were demonstrated by the Rosetta software package. This was expected, since Rosetta is one of the leading molecular modeling packages and is widely used by researchers around the world. According to the published documentation [17], Rosetta also uses the MCMC algorithm inside its software implementation, and the difference in performance apparently depends only on the selected scoring function.
- The FoldX software package also generally shows good results, but its speed is much slower than that of all the algorithms considered. In addition, FoldX only supports two PTMs (SEP and TPO), and we could not fully evaluate its results.
- The MCMC algorithm with sampling from the rotamer library shows good results, close to those of Rosetta, and even better for some PTMs.
- The results of the MCMC off-rotamer algorithm are slightly worse but still acceptable. If we thoroughly analyze the results provided by this algorithm, we can observe that in some cases its performance is better than that of other algorithms, but no regular pattern could be identified.
- The results of the work of genetic algorithms, despite the fact that their performance in general turned out to be worse than that of all the others, surprised us. The interesting point here is that GA initialized with random numbers from a uniform distribution works better than GA initialized from the rotamer library. This makes it possible not to use rotamer libraries at all for identifying the optimal position of side chains and obtain results with quite acceptable accuracy, which is especially important for rare non-canonical amino acid residues. If we analyze in detail the results of the work of GA algorithms, we can observe a picture similar to that for the MCMC off-rotamer: some structures are determined better compared to other algorithms, while some are worse. In general, the results of GA work are unstable, but as it seems to us these algorithms show great promise for solving this problem.
3. Discussion
4. Materials and Methods
4.1. Rotamer Library
4.2. Side-Chain Modeling and Repacking
4.3. Markov Chain Monte Carlo (MCMC) Sampling from the Rotamer Library
4.4. Markov Chain Monte Carlo Sampling outside the Rotamer Library
- The user defines the number of sampling steps and the radius (by default, R = 10.0 A).
- At each sampling step, a site is randomly selected from a user-defined radius. For a given site, dihedral angles of the side chain of the site and the average deviation of this angle are randomly selected from the rotamer library.
4.5. Modeling Using a Genetic Algorithm
- Variability—the characteristics of individual individuals that are part of the population may change;
- Heredity—some traits are consistently transmitted from an individual to their descendants;
- Natural selection—better-adapted individuals are more successful in struggling for survival and leave more offspring in the next generation.
- Genetic algorithms are rarely used to solve this problem. According to our hypothesis, they can show good results, especially for amino acid residues with a small statistical potential of rotamer libraries due to the greater variability of solutions formed during mutations and crossing.
- Genetic algorithms have a number of advantages over traditional search and optimization algorithms:
- Ability to perform global optimization;
- Applicability to problems with complex mathematical representation;
- Resistance to noise;
- Support for parallelization and distributed processing.
4.5.1. Creating the Initial Population
4.5.2. Selection
- k Randomly selected individuals from the population participate in each round of selection.
- The individual whose fitness is higher wins and is selected to form the next generation.
- The process is repeated until the number of “parents” becomes equal to the population size.
4.5.3. Fitness Function
4.5.4. Crossing and Mutation
Crossover Operators
(b) offspring2 = 1/2 [(1 − β) parent1 + (1 + β) parent2],
- The average of descendants is equal to the average of parents.
- When β = 1, the descendants are exact copies of the parents.
- When β < 1, the offspring are located closer to each other than the parents.
- When β > 1, the offspring are further apart than the parents.
Mutation Operators
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Krassowski, M.; Paczkowska, M.; Cullion, K.; Huang, T.; Dzneladze, I.; Ouellette, B.F.F.; Yamada, J.T.; Fradet-Turcotte, A.; Reimand, J. ActiveDriverDB: Human Disease Mutations and Genome Variation in Post-Translational Modification Sites of Proteins. Nucleic Acids Res. 2018, 46, D901–D910. [Google Scholar] [CrossRef]
- Ramazi, S.; Zahiri, J. Post-Translational Modifications in Proteins: Resources, Tools and Prediction Methods. Database J. Biol. Databases Curation 2021, 2021, baab012. [Google Scholar] [CrossRef] [PubMed]
- Duan, G.; Walther, D. The Roles of Post-Translational Modifications in the Context of Protein Interaction Networks. PLoS Comput. Biol. 2015, 11, e1004049. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.-H.; Wen, R.; Yang, N.; Zhang, T.-N.; Liu, C.-F. Roles of Protein Post-Translational Modifications in Glucose and Lipid Metabolism: Mechanisms and Perspectives. Mol. Med. 2023, 29, 93. [Google Scholar] [CrossRef] [PubMed]
- Kokkinidis, M.; Glykos, N.M.; Fadouloglou, V.E. Catalytic Activity Regulation through Post-Translational Modification: The Expanding Universe of Protein Diversity. Adv. Protein Chem. Struct. Biol. 2020, 122, 97–125. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.M.; Hammarén, H.M.; Savitski, M.M.; Baek, S.H. Control of protein stability by post-translational modifications. Nat. Commun. 2023, 14, 201. [Google Scholar] [CrossRef] [PubMed]
- Korovesis, D.; Rubio-Tomás, T.; Tavernarakis, N. Oxidative Stress in Age-Related Neurodegenerative Diseases: An Overview of Recent Tools and Findings. Antioxidants 2023, 12, 131. [Google Scholar] [CrossRef]
- Dilek, O. Current Probes for Imaging Carbonylation in Cellular Systems and Their Relevance to Progression of Diseases. Technol. Cancer Res. Treat. 2022, 21, 1–16. [Google Scholar] [CrossRef]
- Tsikas, D. Post-Translational Modifications (PTM): Analytical Approaches, Signaling, Physiology and Pathophysiology—Part I. Amino Acids 2021, 53, 485–487. [Google Scholar] [CrossRef]
- Colbes, J.; Corona, R.I.; Lezcano, C.; Rodríguez, D.; Brizuela, C.A. Protein side-chain packing problem: Is there still room for improvement? Brief. Bioinform. 2017, 18, 1033–1043. [Google Scholar] [CrossRef]
- Dunbrack, R.L.; Cohen, F.E. Bayesian Statistical Analysis of Protein Side-Chain Rotamer Preferences. Protein Sci. Publ. Protein Soc. 1997, 6, 1661–1681. [Google Scholar] [CrossRef] [PubMed]
- Xu, G.; Wang, Q.; Ma, J. OPUS-Rota3: Improving Protein Side-Chain Modeling by Deep Neural Networks and Ensemble Methods. J. Chem. Inf. Model. 2020, 60, 6691–6697. [Google Scholar] [CrossRef] [PubMed]
- Papers with Code. Prediction of Amino Acid Side Chain Conformation Using a Deep Neural Network. Available online: https://paperswithcode.com/paper/prediction-of-amino-acid-side-chain (accessed on 20 July 2023).
- Liu, K.; Ni, Z.; Zhou, Z.; Tan, S.; Zou, X.; Xing, H.; Sun, X.; Han, Q.; Wu, J.; Fan, J. Molecular Modeling with Machine-Learned Universal Potential Functions. arXiv 2021. [Google Scholar] [CrossRef]
- Nagata, K.; Randall, A.; Baldi, P. SIDEpro: A Novel Machine Learning Approach for the Fast and Accurate Prediction of Side-Chain Conformations. Proteins 2012, 80, 142–153. [Google Scholar] [CrossRef]
- Du, Y.; Meier, J.; Ma, J.; Fergus, R.; Rives, A. Energy-Based Models for Atomic-Resolution Protein Conformations. arXiv 2020. [Google Scholar] [CrossRef]
- RepackingRefiner. Available online: https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Movers/movers_pages/RepackingRefinerMover (accessed on 20 July 2023).
- Pracht, P.; Bohle, F.; Grimme, S. Automated Exploration of the Low-Energy Chemical Space with Fast Quantum Chemical Methods. Phys. Chem. Chem. Phys. 2020, 22, 7169–7192. [Google Scholar] [CrossRef] [PubMed]
- Shapovalov, M.V.; Dunbrack, R.L. A Smoothed Backbone-Dependent Rotamer Library for Proteins Derived from Adaptive Kernel Density Estimates and Regressions. Structure 2011, 19, 844–858. [Google Scholar] [CrossRef]
- Holm, L.; Sander, C. Database Algorithm for Generating Protein Backbone and Side-Chain Co-Ordinates from a C Alpha Trace Application to Model Building and Detection of Co-Ordinate Errors. J. Mol. Biol. 1991, 218, 183–194. [Google Scholar] [CrossRef]
- Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equations of State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087–1091. [Google Scholar] [CrossRef]
- Best, R.B.; Zhu, X.; Shim, J.; Lopes, P.E.M.; Mittal, J.; Feig, M.; Mackerell, A.D. Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone φ, ψ and Side-Chain χ(1) and χ(2) Dihedral Angles. J. Chem. Theory Comput. 2012, 8, 3257–3273. [Google Scholar] [CrossRef]
- Krivov, G.G.; Shapovalov, M.V.; Dunbrack, R.L. Improved Prediction of Protein Side-Chain Conformations with SCWRL4. Proteins 2009, 77, 778–795. [Google Scholar] [CrossRef] [PubMed]
- Von Mises, R. Mathematical Theory of Probability and Statistics; Academic Press: New York, NY, USA, 1964; p. 694. [Google Scholar]
- Mohammad, H. On the Disruption-level of Polynomial Mutation for Evolutionary Multi-objective Optimisation Algorithms. Comput. Inform. 2010, 29, 783–800. [Google Scholar]
Metric | Description | Reference | ||
---|---|---|---|---|
Good | Caution | Poor | ||
Clashscore | Clashscore is the number of serious steric overlaps (>0.4 Å) per 1000 atoms. P—percentile. | P ≥ 66 | 66 > P ≥ 33 | P < 33 |
Poor rotamers | Residues with side chains marginal in deviation from rotamers. Out—outlier %. | Out ≤ 0.3% | 0.3% < Out ≤ 1.5% | Out > 1.5% |
Favored rotamers | The percentage of amino acid residues that are in the preferred regions of the rotamers. Fav—favored % of the total. | Fav ≥ 98% | Fav ≥ 95% | Fav <95% |
Ramachandran outliers | Ramachandran outliers—complete marginals on the Ramachandran map, the remains lie outside the allowed area of the map. Out—outlier % of the total. | Out ≤ 0.05% | 0.05% < Out ≤ 0.5% Or Out 0.5% and Outlier count = 1 | Out > 1.5% Or Outlier count ≥ 2 |
Ramachandran favored | The percentage of remnants that are in the preferred areas of the Ramachandran map. Fav—favored % of the total. | Fav ≥ 98% | Fav ≥ 95% | Fav < 95% |
Ramachandran Z-score | Ramachandran Z-score validation checks the total Ramachandran distribution against the expected distribution [2]. | abs(Z-score) ≤ 2% | 2% < abs(Z-score) ≤ 3% | abs(Z-score) > 3% |
Cβ deviations > 0.25 Å | Number of Cβ atoms with an unacceptable deviation from the expected position. | Outlier count = 0 | 0 < Outliers < 5% | Outliers ≥ 5% |
Bad bonds | Number of covalent bonds that deviate significantly from the expected value. Out—outlier bond % of the total. | Out < 0.01% | 0.01% ≤ Out < 0.2% | Out ≥ 0.2% |
Bad angles | Number of bond angles that deviate significantly from the expected value. Out—outlier angle % of the total. | Out < 0.1% | 0.1% ≤ Out < 0.5% | Out ≥ 0.5% |
MolProbity score | Integral assessment of the quality of the structure according to the MolProbity service. The MolProbity score combines the clashscore, rotamers, and Ramachandran evaluations into a single score, normalized to be on the same scale as X-ray resolution. P—percentile. | P ≥ 66 | 66 > P ≥ 33 | P < 33 |
Precursor | PTM | ||||
---|---|---|---|---|---|
AA | Structure | PTM Code | Name | Structure | Total PDB Entry |
SER | SEP | Phosphoserine | 2437 | ||
THR | TPO | Phosphothreonine | 1864 | ||
TYR | PTR | O-phosphotyrosine | 1423 | ||
LYS | MLY | N-dimethyl-lysine | 5296 | ||
CYS | CSO | S-hydroxycysteine | 1552 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Petrovskiy, D.V.; Nikolsky, K.S.; Rudnev, V.R.; Kulikova, L.I.; Butkova, T.V.; Malsagova, K.A.; Kopylov, A.T.; Kaysheva, A.L. Modeling Side Chains in the Three-Dimensional Structure of Proteins for Post-Translational Modifications. Int. J. Mol. Sci. 2023, 24, 13431. https://doi.org/10.3390/ijms241713431
Petrovskiy DV, Nikolsky KS, Rudnev VR, Kulikova LI, Butkova TV, Malsagova KA, Kopylov AT, Kaysheva AL. Modeling Side Chains in the Three-Dimensional Structure of Proteins for Post-Translational Modifications. International Journal of Molecular Sciences. 2023; 24(17):13431. https://doi.org/10.3390/ijms241713431
Chicago/Turabian StylePetrovskiy, Denis V., Kirill S. Nikolsky, Vladimir R. Rudnev, Liudmila I. Kulikova, Tatiana V. Butkova, Kristina A. Malsagova, Arthur T. Kopylov, and Anna L. Kaysheva. 2023. "Modeling Side Chains in the Three-Dimensional Structure of Proteins for Post-Translational Modifications" International Journal of Molecular Sciences 24, no. 17: 13431. https://doi.org/10.3390/ijms241713431
APA StylePetrovskiy, D. V., Nikolsky, K. S., Rudnev, V. R., Kulikova, L. I., Butkova, T. V., Malsagova, K. A., Kopylov, A. T., & Kaysheva, A. L. (2023). Modeling Side Chains in the Three-Dimensional Structure of Proteins for Post-Translational Modifications. International Journal of Molecular Sciences, 24(17), 13431. https://doi.org/10.3390/ijms241713431