Further Development of SAMPDI-3D: A Machine Learning Method for Predicting Binding Free Energy Changes Caused by Mutations in Either Protein or DNA
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Cleaning (ProNAB)
2.2. Training Dataset for Protein Mutations
2.3. Training Dataset for DNA Mutations
2.4. Key Features in the SAMPDI-3D v2 Machine Learning Model
2.4.1. Protein Mutation
- (I)
- Position-specific scoring matrix (PSSM)
- (a)
- Evolutionary sequence composition features: For each of the 20 amino acids, a vector of normalized odds ratios across all positions in the sequence was computed using f(x) = 1/(1 + e−x). The mean of these normalized odds ratios for each amino acid served as a feature, capturing its sequence composition preference.
- (b)
- Evolutionary odds of the mutation: Calculated as the difference in odds ratios between the mutant and wild-type residues at the mutation site.
- (II)
- Mutation type-related features
- (III)
- Amino acid category features
- (IV)
- Accessibility of the mutation site
- (V)
- Accessibility changes due to mutation
- (VI)
- Backbone torsion angles of the mutation site
- (VII)
- Protein secondary structure composition
- (VIII)
- Protein–DNA contact features
- (a)
- Nucleotide amino acid contacts: The total number of interactions between the protein and the DNA.
- (b)
- Base amino acid hydrogen bonds: The total number of hydrogen-bond nucleotide bases in the DNA and protein residues.
- (c)
- Phosphate amino acid hydrogen bonds: The total number of hydrogen bonds between the nucleotide phosphate and protein residues.
- (d)
- Base amino acid stacks: The total number of stacks identified in the protein–DNA complex structure between the nucleotide bases and protein residues.
- (IX)
- Changes in protein–DNA contacts due to a mutation
- (a)
- Delta nucleotide amino acid contacts: The total change in the number of interactions between the protein and the DNA due to a mutation.
- (b)
- Delta base amino acid hydrogen bonds: The total change in the number of hydrogen-bond nucleotide bases in the DNA and protein residues due to a mutation.
- (c)
- Delta phosphate amino acid hydrogen bonds: The total change in the number of hydrogen bonds between the nucleotide phosphate and protein residues due to a mutation.
- (d)
- Delta base amino acid stacks: The total change in number of stacks identified in the protein–DNA complex structure between the nucleotide bases and protein residues due to a mutation.
2.4.2. DNA Mutation
- (I)
- Protein structure features
- (II)
- DNA structural feature of the mutation site
- (III)
- DNA mutation categorical features
- (a)
- Base-pair type: The base pairs are grouped into two classes, with AT and TA into pairs bonded with two hydrogen bonds and GC and CG bonded with three hydrogen bonds. Two labels are given a value of zero for AT or TA and one for a GC or CG pair. This feature encodes a wild-type base pair.
- (b)
- Wild or mutant base-pair mismatch: This feature encodes the matched/mismatched base-pair status of wild and mutant base pairs. If both the wild-type base pair and mutant base pair match (i.e., sets AT, TA, CG, or GC), the label “zero” is used; otherwise, the label “one” is used.
- (c)
- Mutant base-pair category: This categorical feature uses 256 (i.e., 16 × 16) different labels to encode the wild-type to mutant base pairs. There are 16 possible base pairs of four nucleotides. These 16 base pairs are assigned 16 distinct indices from 0 to 15. The wild-type base-pair/mutant base-pair label index is calculated as BPI(wild-type base-pair) × 16 + BPI(mutant base-pair), where BPI(XY) represents the base-pair index of XY.
- (IV)
- Protein–DNA interaction features
- (V)
- Protein mutation site forward strand base interaction features
2.5. Machine Learning Model Training
2.6. Selecting the Optimal Machine Learning Approach Using PyCaret
2.7. Hyperparameter Tuning and Advanced Model Training
3. Results
3.1. Dataset (Protein Mutations) Comparison of SAMPDI-3D and the Newly Curated Dataset from ProNAB
3.2. Performance of SAMPDI-3D and Other Available Methods Tested on the S177 and D42 (Newly Curated from ProNAB) Datasets
3.3. Performance of SAMPDI-3D v2 Tested on Protein and DNA Mutation Databases
3.4. Web Server Implementation
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
EMSA | Electrophoretic mobility shift assay |
HT-SELEX | High-throughput systematic evolution of ligands by exponential enrichment |
ITC | Isothermal titration calorimetry |
Kd | Dissociation constant |
MSE | Mean squared error |
NLLS | Non-linear least squares |
NRLB | No Read Left Behind |
PCC | Pearson correlation coefficient |
PBM | Protein-binding microarray |
PDI | Protein–DNA interaction |
PSSM | Position-specific scoring matrix |
RMSE | Root mean square error |
SPR | Surface plasmon resonance |
TF | Transcription factor |
XGBoost | Extreme gradient boosting |
References
- Bendel, A.M.; Faure, A.J.; Klein, D.; Shimada, K.; Lyautey, R.; Schiffelholz, N.; Kempf, G.; Cavadini, S.; Lehner, B.; Diss, G. The Genetic Architecture of Protein Interaction Affinity and Specificity. Nat. Commun. 2024, 15, 8868. [Google Scholar] [CrossRef] [PubMed]
- Vigneault, F.; Guérin, S.L. Regulation of Gene Expression: Probing DNA-Protein Interactions in Vivo and in Vitro. Expert Rev. Proteom. 2005, 2, 705–718. [Google Scholar] [CrossRef]
- Göös, H.; Kinnunen, M.; Salokas, K.; Tan, Z.; Liu, X.; Yadav, L.; Zhang, Q.; Wei, G.-H.; Varjosalo, M. Human Transcription Factor Protein Interaction Networks. Nat. Commun. 2022, 13, 766. [Google Scholar] [CrossRef] [PubMed]
- Sancar, A.; Lindsey-Boltz, L.A.; Unsal-Kaçmaz, K.; Linn, S. Molecular Mechanisms of Mammalian DNA Repair and the DNA Damage Checkpoints. Annu. Rev. Biochem. 2004, 73, 39–85. [Google Scholar] [CrossRef] [PubMed]
- Aggarwal, B.D.; Calvi, B.R. Chromatin Regulates Origin Activity in Drosophila Follicle Cells. Nature 2004, 430, 372–376. [Google Scholar] [CrossRef]
- Wang, D.; Qian, X.; Sanchez-Solana, B.; Tripathi, B.K.; Durkin, M.E.; Lowy, D.R. Cancer-Associated Point Mutations in the DLC1 Tumor Suppressor and Other Rho-GAPs Occur Frequently and Are Associated with Decreased Function. Cancer Res. 2020, 80, 3568–3579. [Google Scholar] [CrossRef]
- Pifer, P.M.; Yates, E.A.; Legleiter, J. Point Mutations in Aβ Result in the Formation of Distinct Polymorphic Aggregates in the Presence of Lipid Bilayers. PLoS ONE 2011, 6, e16248. [Google Scholar] [CrossRef] [PubMed]
- Kramers, C.; Danilov, S.M.; Deinum, J.; Balyasnikova, I.V.; Scharenborg, N.; Looman, M.; Boomsma, F.; de Keijzer, M.H.; van Duijn, C.; Martin, S.; et al. Point Mutation in the Stalk of Angiotensin-Converting Enzyme Causes a Dramatic Increase in Serum Angiotensin-Converting Enzyme but No Cardiovascular Disease. Circulation 2001, 104, 1236–1240. [Google Scholar] [CrossRef]
- Zeviani, M.; DiDonato, S. Neurological Disorders Due to Mutations of the Mitochondrial Genome. Neuromuscul. Disord. 1991, 1, 165–172. [Google Scholar] [CrossRef] [PubMed]
- Calianese, D.C.; Noji, T.; Sullivan, J.A.; Schoch, K.; Shashi, V.; McNiven, V.; Ramos, L.L.P.; Jordanova, A.; Kárteszi, J.; Ishikita, H.; et al. Substrate Specificity Controlled by the Exit Site of Human P4-ATPases, Revealed by de Novo Point Mutations in Neurological Disorders. Proc. Natl. Acad. Sci. USA 2024, 121, e2415755121. [Google Scholar] [CrossRef] [PubMed]
- Bi, M.; Su, W.; Li, J.; Mo, X. Insights into the Inhibition of Protospacer Integration via Direct Interaction between Cas2 and AcrVA5. Nat. Commun. 2024, 15, 3256. [Google Scholar] [CrossRef]
- Hellman, L.M.; Fried, M.G. Electrophoretic Mobility Shift Assay (EMSA) for Detecting Protein–Nucleic Acid Interactions. Nat. Protoc. 2007, 2, 1849–1861. [Google Scholar] [CrossRef]
- Garner, M.M.; Revzin, A. A Gel Electrophoresis Method for Quantifying the Binding of Proteins to Specific DNA Regions: Application to Components of the Escherichia Coli Lactose Operon Regulatory System. Nucleic Acids Res. 1981, 9, 3047–3060. [Google Scholar] [CrossRef] [PubMed]
- Fried, M.; Crothers, D.M. Equilibria and Kinetics of Lac Repressor-Operator Interactions by Polyacrylamide Gel Electrophoresis. Nucleic Acids Res. 1981, 9, 6505–6525. [Google Scholar] [CrossRef]
- Freire, E.; Mayorga, O.L.; Straume, M. Isothermal Titration Calorimetry. Anal. Chem. 1990, 62, 950A–959A. [Google Scholar] [CrossRef]
- Velázquez-Campoy, A.; Ohtaka, H.; Nezami, A.; Muzammil, S.; Freire, E. Isothermal Titration Calorimetry. Curr. Protoc. Cell Biol. 2004, 23, 17.8.1–17.8.24. [Google Scholar] [CrossRef] [PubMed]
- Bastos, M.; Abian, O.; Johnson, C.M.; Ferreira-da-Silva, F.; Vega, S.; Jimenez-Alesanco, A.; Ortega-Alarcon, D.; Velazquez-Campoy, A. Isothermal Titration Calorimetry. Nat. Rev. Methods Primer 2023, 3, 17. [Google Scholar] [CrossRef]
- Capelli, D.; Scognamiglio, V.; Montanari, R. Surface Plasmon Resonance Technology: Recent Advances, Applications and Experimental Cases. TrAC Trends Anal. Chem. 2023, 163, 117079. [Google Scholar] [CrossRef]
- Nguyen, H.H.; Park, J.; Kang, S.; Kim, M. Surface Plasmon Resonance: A Versatile Technique for Biosensor Applications. Sensors 2015, 15, 10481–10510. [Google Scholar] [CrossRef] [PubMed]
- Lameirinhas, R.A.M.; Torres, J.P.N.; Baptista, A.; Martins, M.J.M. A New Method to Analyse the Role of Surface Plasmon Polaritons on Dielectric-Metal Interfaces. IEEE Photonics J. 2022, 14, 1–9. [Google Scholar] [CrossRef]
- Berger, M.F.; Bulyk, M.L. Universal Protein-Binding Microarrays for the Comprehensive Characterization of the DNA-Binding Specificities of Transcription Factors. Nat. Protoc. 2009, 4, 393–411. [Google Scholar] [CrossRef] [PubMed]
- Seo, M.; Lei, L.; Egli, M. Label-Free Electrophoretic Mobility Shift Assay (EMSA) for Measuring Dissociation Constants of Protein-RNA Complexes. Curr. Protoc. Nucleic Acid Chem. 2019, 76, e70. [Google Scholar] [CrossRef]
- Berger, M.F.; Bulyk, M.L. Protein Binding Microarrays (PBMs) for Rapid, High-Throughput Characterization of the Sequence Specificities of DNA Binding Proteins. In Gene Mapping, Discovery, and Expression: Methods and Protocols; Humana Press: Totowa, NJ, USA, 2006; Volume 338, pp. 245–260. [Google Scholar] [CrossRef]
- Pantier, R.; Chhatbar, K.; Alston, G.; Lee, H.Y.; Bird, A. High-Throughput Sequencing SELEX for the Determination of DNA-Binding Protein Specificities in Vitro. STAR Protoc. 2022, 3, 101490. [Google Scholar] [CrossRef] [PubMed]
- Biedner, B.; Yassur, Y. Effect of Resection of Lateral Rectus Muscle in Undercorrected Esotropia. Ophthalmologica 1987, 195, 45–48. [Google Scholar] [CrossRef]
- Rastogi, C.; Rube, H.T.; Kribelbauer, J.F.; Crocker, J.; Loker, R.E.; Martini, G.D.; Laptenko, O.; Freed-Pastor, W.A.; Prives, C.; Stern, D.L.; et al. Accurate and Sensitive Quantification of Protein-DNA Binding Affinity. Proc. Natl. Acad. Sci. USA 2018, 115, E3692–E3701. [Google Scholar] [CrossRef] [PubMed]
- Dantas Machado, A.C.; Cooper, B.H.; Lei, X.; Di Felice, R.; Chen, L.; Rohs, R. Landscape of DNA Binding Signatures of Myocyte Enhancer Factor-2B Reveals a Unique Interplay of Base and Shape Readout. Nucleic Acids Res. 2020, 48, 8529–8544. [Google Scholar] [CrossRef]
- Zhao, Y.; Ruan, S.; Pandey, M.; Stormo, G.D. Improved Models for Transcription Factor Binding Site Identification Using Nonindependent Interactions. Genetics 2012, 191, 781–790. [Google Scholar] [CrossRef]
- Li, G.; Panday, S.K.; Peng, Y.; Alexov, E. SAMPDI-3D: Predicting the Effects of Protein and DNA Mutations on Protein–DNA Interactions. Bioinformatics 2021, 37, 3760–3765. [Google Scholar] [CrossRef] [PubMed]
- Peng, Y.; Sun, L.; Jia, Z.; Li, L.; Alexov, E. Predicting Protein-DNA Binding Free Energy Change upon Missense Mutations Using Modified MM/PBSA Approach: SAMPDI Webserver. Bioinformatics 2018, 34, 779–786. [Google Scholar] [CrossRef] [PubMed]
- Pires, D.E.V.; Ascher, D.B. mCSM–NA: Predicting the Effects of Mutations on Protein–Nucleic Acids Interactions. Nucleic Acids Res. 2017, 45, W241–W246. [Google Scholar] [CrossRef]
- Nguyen, T.B.; Myung, Y.; de Sá, A.G.C.; Pires, D.E.V.; Ascher, D.B. mmCSM-NA: Accurately Predicting Effects of Single and Multiple Mutations on Protein-Nucleic Acid Binding Affinity. NAR Genom. Bioinform. 2021, 3, lqab109. [Google Scholar] [CrossRef]
- Zhang, N.; Chen, Y.; Zhao, F.; Yang, Q.; Simonetti, F.L.; Li, M. PremPDI Estimates and Interprets the Effects of Missense Mutations on Protein-DNA Interactions. PLOS Comput. Biol. 2018, 14, e1006615. [Google Scholar] [CrossRef] [PubMed]
- Xiao, S.-R.; Zhang, Y.-K.; Liu, K.-Y.; Huang, Y.-X.; Liu, R. PNBACE: An Ensemble Algorithm to Predict the Effects of Mutations on Protein-Nucleic Acid Binding Affinity. BMC Biol. 2024, 22, 203. [Google Scholar] [CrossRef]
- Dou, B.; Zhu, Z.; Merkurjev, E.; Ke, L.; Chen, L.; Jiang, J.; Zhu, Y.; Liu, J.; Zhang, B.; Wei, G.-W. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem. Rev. 2023, 123, 8736–8780. [Google Scholar] [CrossRef] [PubMed]
- Harini, K.; Srivastava, A.; Kulandaisamy, A.; Gromiha, M.M. ProNAB: Database for Binding Affinities of Protein–Nucleic Acid Complexes and Their Mutants. Nucleic Acids Res. 2022, 50, D1528–D1534. [Google Scholar] [CrossRef] [PubMed]
- Sievers, F.; Wilm, A.; Dineen, D.; Gibson, T.J.; Karplus, K.; Li, W.; Lopez, R.; McWilliam, H.; Remmert, M.; Söding, J.; et al. Fast, Scalable Generation of High-Quality Protein Multiple Sequence Alignments Using Clustal Omega. Mol. Syst. Biol. 2011, 7, 539. [Google Scholar] [CrossRef]
- Pettersen, E.F.; Goddard, T.D.; Huang, C.C.; Couch, G.S.; Greenblatt, D.M.; Meng, E.C.; Ferrin, T.E. UCSF Chimera—A Visualization System for Exploratory Research and Analysis. J. Comput. Chem. 2004, 25, 1605–1612. [Google Scholar] [CrossRef] [PubMed]
- Prabakaran, P.; An, J.; Gromiha, M.M.; Selvaraj, S.; Uedaira, H.; Kono, H.; Sarai, A. Thermodynamic Database for Protein-Nucleic Acid Interactions (ProNIT). Bioinformatics 2001, 17, 1027–1034. [Google Scholar] [CrossRef] [PubMed]
- Liu, L.; Xiong, Y.; Gao, H.; Wei, D.-Q.; Mitchell, J.C.; Zhu, X. dbAMEPNI: A Database of Alanine Mutagenic Effects for Protein–Nucleic Acid Interactions. Database 2018, 2018, bay034. [Google Scholar] [CrossRef] [PubMed]
- Suzek, B.E.; Huang, H.; McGarvey, P.; Mazumder, R.; Wu, C.H. UniRef: Comprehensive and Non-Redundant UniProt Reference Clusters. Bioinformatics 2007, 23, 1282–1288. [Google Scholar] [CrossRef]
- Schäffer, A.A.; Aravind, L.; Madden, T.L.; Shavirin, S.; Spouge, J.L.; Wolf, Y.I.; Koonin, E.V.; Altschul, S.F. Improving the Accuracy of PSI-BLAST Protein Database Searches with Composition-Based Statistics and Other Refinements. Nucleic Acids Res. 2001, 29, 2994–3005. [Google Scholar] [CrossRef]
- Li, G.; Pahari, S.; Murthy, A.K.; Liang, S.; Fragoza, R.; Yu, H.; Alexov, E. SAAMBE-SEQ: A Sequence-Based Method for Predicting Mutation Effect on Protein-Protein Binding Affinity. Bioinformatics 2021, 37, 992–999. [Google Scholar] [CrossRef] [PubMed]
- Moon, C.P.; Fleming, K.G. Side-Chain Hydrophobicity Scale Derived from Transmembrane Protein Folding into Lipid Bilayers. Proc. Natl. Acad. Sci. USA 2011, 108, 10174–10177. [Google Scholar] [CrossRef]
- Shapovalov, M.V.; Dunbrack, R.L., Jr. A Smoothed Backbone-Dependent Rotamer Library for Proteins Derived from Adaptive Kernel Density Estimates and Regressions. Structure 2011, 19, 844–858. [Google Scholar] [CrossRef] [PubMed]
- Pommié, C.; Levadoux, S.; Sabatier, R.; Lefranc, G.; Lefranc, M.-P. IMGT Standardized Criteria for Statistical Analysis of Immunoglobulin V-REGION Amino Acid Properties. J. Mol. Recognit. JMR 2004, 17, 17–32. [Google Scholar] [CrossRef]
- Touw, W.G.; Baakman, C.; Black, J.; te Beek, T.A.H.; Krieger, E.; Joosten, R.P.; Vriend, G. A Series of PDB-Related Databanks for Everyday Needs. Nucleic Acids Res. 2015, 43, D364–D368. [Google Scholar] [CrossRef] [PubMed]
- Kabsch, W.; Sander, C. Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 1983, 22, 2577–2637. [Google Scholar] [CrossRef] [PubMed]
- Lu, X.-J.; Bussemaker, H.J.; Olson, W.K. DSSR: An Integrated Software Tool for Dissecting the Spatial Structure of RNA. Nucleic Acids Res. 2015, 43, e142. [Google Scholar] [CrossRef] [PubMed]
- Krivov, G.G.; Shapovalov, M.V.; Dunbrack, R.L., Jr. Improved Prediction of Protein Side-Chain Conformations with SCWRL4. Proteins Struct. Funct. Bioinform. 2009, 77, 778–795. [Google Scholar] [CrossRef] [PubMed]
- Ali, M. PyCaret: An Open Source, Low-Code Machine Learning Library in Python 2020. Available online: https://pycaret.org/ (accessed on 16 November 2024).
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Pandey, P.; Ghimire, S.; Wu, B.; Alexov, E. On the Linkage of Thermodynamics and Pathogenicity. Curr. Opin. Struct. Biol. 2023, 80, 102572. [Google Scholar] [CrossRef] [PubMed]
- Pandey, P.; Panday, S.K.; Rimal, P.; Ancona, N.; Alexov, E. Predicting the Effect of Single Mutations on Protein Stability and Binding with Respect to Types of Mutations. Int. J. Mol. Sci. 2023, 24, 12073. [Google Scholar] [CrossRef] [PubMed]
- Rodrigues, J.P.G.L.M.; Teixeira, J.M.C.; Trellet, M.; Bonvin, A.M.J.J. Pdb-Tools: A Swiss Army Knife for Molecular Structures. F1000Research 2018, 7, 1961. [Google Scholar] [CrossRef] [PubMed]
Model | PCC | RMSE (kcal/mol) |
---|---|---|
CatBoost regressor | 0.65 | 1.32 |
Extra trees regressor | 0.65 | 1.28 |
Gradient boosting regressor | 0.64 | 1.16 |
Light gradient boosting machine | 0.64 | 1.24 |
Random forest regressor | 0.63 | 1.28 |
Extreme gradient boosting | 0.63 | 1.28 |
AdaBoost regressor | 0.61 | 1.30 |
Linear regression | 0.46 | 1.46 |
Ridge regression | 0.43 | 1.49 |
Huber regressor | 0.32 | 0.56 |
Model | PCC | RMSE (kcal/mol) |
---|---|---|
CatBoost regressor | 0.71 | 0.75 |
Random forest regressor | 0.69 | 0.77 |
Extra trees regressor | 0.68 | 0.79 |
Gradient boosting regressor | 0.67 | 0.79 |
Extreme gradient boosting | 0.67 | 0.79 |
Light gradient boosting machine | 0.65 | 0.81 |
K neighbors regressor | 0.62 | 0.84 |
AdaBoost regressor | 0.59 | 0.86 |
Decision tree regressor | 0.45 | 0.94 |
Decision tree regressor | 0.45 | 0.98 |
Model | PCC (Best Iteration) | RMSE (kcal/mol) (Best Iteration) |
---|---|---|
Extreme gradient boosting | 0.67 | 0.89 |
CatBoost regressor | 0.66 | 0.91 |
Gradient boosting regressor | 0.66 | 0.89 |
Extra trees regressor | 0.65 | 0.91 |
Light gradient boosting machine | 0.65 | 0.9 |
AdaBoost regressor | 0.64 | 0.92 |
Random forest regressor | 0.63 | 0.91 |
Models | PCC (Best Iteration) | RMSE (kcal/mol) (Best Iteration) |
---|---|---|
Extreme gradient boosting | 0.78 | 0.69 |
CatBoost regressor | 0.77 | 0.69 |
Light gradient boosting machine | 0.77 | 0.7 |
Random forest regressor | 0.76 | 0.7 |
Extra trees regressor | 0.76 | 0.7 |
Gradient boosting regressor | 0.76 | 0.7 |
K neighbors regressor | 0.74 | 0.73 |
Mutated Residue | S419 (Count) | S419 (%) | S177 (Count) | S177 (%) |
---|---|---|---|---|
Alanine (A) | 296 | 70.64 | 108 | 61.02 |
Cysteine (C) | 6 | 1.43 | 0 | 0 |
Aspartic acid (D) | 5 | 1.19 | 4 | 2.26 |
Glutamic acid (E) | 6 | 1.43 | 13 | 7.34 |
Phenylalanine(F) | 9 | 2.15 | 4 | 2.26 |
Glycine (G) | 9 | 2.15 | 7 | 3.95 |
Histidine (H) | 6 | 1.43 | 1 | 0.56 |
Isoleucine (I) | 2 | 0.48 | 0 | 0 |
Lysine (K) | 13 | 3.1 | 5 | 2.82 |
Leucine (L) | 12 | 2.86 | 4 | 2.26 |
Methionine (M) | 6 | 1.43 | 7 | 3.95 |
Asparagine (N) | 5 | 1.19 | 3 | 1.69 |
Proline (P) | 2 | 0.48 | 2 | 1.13 |
Glutamine (Q) | 7 | 1.67 | 5 | 2.82 |
Arginine (R) | 9 | 2.15 | 3 | 1.69 |
Serine (S) | 9 | 2.15 | 5 | 2.82 |
Threonine (T) | 6 | 1.43 | 3 | 1.69 |
Valine (V) | 7 | 1.67 | 3 | 1.69 |
Tryptophan (W) | 1 | 0.24 | 0 | 0 |
Tyrosine (Y) | 3 | 0.72 | 0 | 0 |
Mutation | Method | PCC | RMSE |
---|---|---|---|
Protein | SAMPDI-3D | 0.17 | 1.34 |
mCSM-NA | 0.34 | 1.31 | |
PremPDI | 0.36 | 1.35 | |
DNA | SAMPDI-3D | 0.71 | 0.91 |
Mutation | PCC (Best Iteration) | Average PCC (50 Iterations) | Number of Features |
---|---|---|---|
Protein | 0.68 | 0.65 ± 0.05 | 49 |
DNA | 0.80 | 0.77 ± 0.06 | 35 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rimal, P.; Paul, S.K.; Panday, S.K.; Alexov, E. Further Development of SAMPDI-3D: A Machine Learning Method for Predicting Binding Free Energy Changes Caused by Mutations in Either Protein or DNA. Genes 2025, 16, 101. https://doi.org/10.3390/genes16010101
Rimal P, Paul SK, Panday SK, Alexov E. Further Development of SAMPDI-3D: A Machine Learning Method for Predicting Binding Free Energy Changes Caused by Mutations in Either Protein or DNA. Genes. 2025; 16(1):101. https://doi.org/10.3390/genes16010101
Chicago/Turabian StyleRimal, Prawin, Shamrat Kumar Paul, Shailesh Kumar Panday, and Emil Alexov. 2025. "Further Development of SAMPDI-3D: A Machine Learning Method for Predicting Binding Free Energy Changes Caused by Mutations in Either Protein or DNA" Genes 16, no. 1: 101. https://doi.org/10.3390/genes16010101
APA StyleRimal, P., Paul, S. K., Panday, S. K., & Alexov, E. (2025). Further Development of SAMPDI-3D: A Machine Learning Method for Predicting Binding Free Energy Changes Caused by Mutations in Either Protein or DNA. Genes, 16(1), 101. https://doi.org/10.3390/genes16010101