Review Reports - A Comprehensive Mapping of the Druggable Cavities within the SARS-CoV-2 Therapeutically Relevant Proteins by Combining Pocket and Docking Searches as Implemented in Pockets 2.0

Round 1

Reviewer 1 Report

Summary

The manuscript by Gervasoni et al. presents a new tool for evaluating protein structures or theoretical models to predict the ligand-binding pockets in those proteins. This is important for a novel protein or to further advance the druggability of an existing well-established protein target. In the light of the ongoing COVID-19 pandemic, this work of analyzing sars-cov-2 protein targets is well-timed. It would help researchers working in the field of structure-based drug design for small molecule therapeutics against sars-cov-2. While this method has been applied to evaluate sars-cov-2 related proteins, further evaluation of such a technique would help establish this approach's robustness. In my opinion, this is an exhaustively and thoroughly developed method, and the scientific community working in the field of drug design and discovery would benefit from this work. There are, however, some minor changes that could help elevate the message of this study.

Minor Points

While the authors discuss various parameters and methods for identifying ligand-binding pockets in the protein structures, as presented in Table 2, it would also be helpful to include an additional parameter. It could be an enrichment factor or a ROC plot, showing the early identification of druggable pockets among the predictions.

Also, in some cases, the allosteric pocket is connected with the orthosteric binding site. It would be essential to have a bit more discussion on this aspect of identifying the potential extension of the binding pocket to design and optimize ligands and their substitutions.

Author Response

Dear Reviewer,

We are grateful to you for the valuable comments, which have opened our eyes to some shortcomings in the previous version of the paper.

Here is a list of the amendments made in the revised version. All changes in the revised paper (and the corresponding replies in this letter) are highlighted in yellow for easy identification.

As also suggested by the second reviewer, the performances of the proposed method were also described by applying the confusion matrix and calculating the corresponding statistical parameters (see Table 2 and discussion in the text).

A paragraph on the architecture of the allosteric sites of nsp 12 was added in the Results.

Reviewer 2 Report

It seems that authors showed (1) promising binding pockets for targeting SARS-CoV-2, and (2)
characterizing chosen pockets for drug repositioning. Authors appealed characterized pockets were
mapped based on their developed docking program (Pockets 2.0). Basically, I respect every developed
programs or source codes. However, I think that every academic article without experimental evidence
must show strict statistical validation to prove effectiveness of these articles. I think that current
manuscript is incomplete study. So, if authors publish this manuscript, they need additional complement
(1. experimental evidence or statistical validation, 2. Hits through virtual screening).
#1. [Major Issue – Validation] I think that the history of chiral organocatalysts and chiral ligands cannot
be separate. During these development, one chiral scaffold always has played both roles:
organocatalysts and ligands. In this study, authors simply introduced thiourea or urea functional group
into their chiral ligands. After the introduction, L1-L6 are chiral ligands, never organocatalysts anymore.
In general, if a researcher read the word, ‘combination’, he can consider the combination of two
individual molecules.
If such urea type functionality was firstly used for chiral phoshines, I strongly recommend that authors
change their point from the combination to rational design of chiral ligand. If the design can be explained
by any scientific method, it is more valuable.
#2. [Major Issue –‘Systematic search’ without method or workflow]
The terminology, ‘systematic review’, ‘systematic literature search’ are defined method to study bulk
materials. This method should be quantitative and show a logical workflow. When I read result section
2.1., I could not find either any quantitative scoring or workflow as system of their systematic literature
search.
Rather than ‘systematic’, simple collection was described.
#3. [Major Issue – Pockets 2.0 performances]
In general, in silico performance can be validated through comparing predicted results with experimental
results (eg. RMSD value between X-ray and predicted, free energy between experimental and
predicted). However, current study simply showed their simulated results (Volumen, ChemPLP in Table
2). Such results cannot be a criterion of performance.
If authors conducted such comparison, they should show their comparison using general method
(confusion matrix, accuracy, precision, etc). For example, ‘Correctly identified pockets’ should be
expressed with clear mathematical equation. what is ‘correct’ in their simulations?
#4. [Minor Issue –Methods: 3.1. protein structures]
If the prepared PDB are publically available, the manuscript is more valuable.
#5. [Minor Issue –Methods: 3.2. Preliminary simulations]
Add concrete simulation condition in the main text and supplementary information.
Check ‘[Error! Bookmark not defined.]’ in the section as well as Table1.
#6 [Minor Issue –Methods: 3.3. Pockets 2.0 approach]
The writing is not clarified. When a practical user of their program can understand the paragraph.
Authors need to rewrite the section to make general readers understood. Personally, current writing
looks like the manual of a software.
#7 [Minor Issue –Conclusion]
The web page is not available.
https://www.exscalate4cov.network/
#8 [Major-Conclusion]
- two primary objectives: characterizing the druggable binding sites within the 413 therapeutically
relevant SARS-CoV-2 proteins as well as presenting and validating a novel strategy 414 to search
and prioritize the protein pockets by using the identified SARS-CoV-2 sites as test set.
 I can’t find any validation index.
- The second paragraph need to be rewritten to brief this study more concisely.

Author Response

Dear Reviewer,

We are grateful to you for the valuable comments, which have opened our eyes to some shortcomings in the previous version of the paper.

Here is a list of the amendments made in the revised version. All changes in the revised paper (and the corresponding replies in this letter) are highlighted in yellow for easy identification.

#1. [Major Issue – Validation]

I think that the history of chiral organocatalysts and chiral ligands cannot be separate. During these development, one chiral scaffold always has played both roles:

organocatalysts and ligands. In this study, authors simply introduced thiourea or urea functional group into their chiral ligands. After the introduction, L1-L6 are chiral ligands, never organocatalysts anymore.

In general, if a researcher read the word, ‘combination’, he can consider the combination of two individual molecules. If such urea type functionality was firstly used for chiral phoshines, I strongly recommend that authors change their point from the combination to rational design of chiral ligand. If the design can be explained by any scientific method, it is more valuable.

We are sorry, but we believe that this comment is not for our manuscript which indeed is not focused on organocatalysis and chiral phosphines…

#2. [Major Issue –‘Systematic search’ without method or workflow]

The terminology, ‘systematic review’, ‘systematic literature search’ are defined method to study bulk materials. This method should be quantitative and show a logical workflow. When I read result section 2.1., I could not find either any quantitative scoring or workflow as system of their systematic literature search. Rather than ‘systematic’, simple collection was described.

The Authors agree with the concern raised by the reviewer. The term systematic was indeed used to emphasize that almost all known therapeutically relevant SARS-CoV-2 proteins were considered but is misleading because it seems to concern the methodology by which the literature search was performed. Hence, this term was replaced by other more general words such as “comprehensive”, “extensive” or “careful”, which emphasize the number of investigated proteins without concerning the approaches for the bibliographic search.

#3. [Major Issue – Pockets 2.0 performances]

In general, in silico performance can be validated through comparing predicted results with experimental results (eg. RMSD value between X-ray and predicted, free energy between experimental and predicted). However, current study simply showed their simulated results (Volumen, ChemPLP in Table 2). Such results cannot be a criterion of performance. If authors conducted such comparison, they should show their comparison using general method (confusion matrix, accuracy, precision, etc). For example, ‘Correctly identified pockets’ should be expressed with clear mathematical equation. what is ‘correct’ in their simulations?

As suggested by the reviewer, the rationale by which Pockets2.0 was validated and what are the correct pockets were better detailed and statistical parameters based on the confusion matrix were added in Table 2 and discussed in the text.

#4. [Minor Issue –Methods: 3.1. protein structures]

The PDB files for the homology models were deposited in ModelArchive as described at the end of the original manuscript. To make this more clear, the text with links to ModelArchive was moved to section 3.1.

#5. [Minor Issue –Methods: 3.2. Preliminary simulations]

Add concrete simulation condition in the main text and supplementary information.

Done.

Check ‘[Error! Bookmark not defined.]’ in the section as well as Table1.

Done.

#6 [Minor Issue –Methods: 3.3. Pockets 2.0 approach]

The writing is not clarified. When a practical user of their program can understand the paragraph. Authors need to rewrite the section to make general readers understood. Personally, current writing looks like the manual of a software.

As suggested, the description of Pockets 2.0 was largely simplified and shortened.

#7 [Minor Issue –Conclusion]

The web page is not available. https://www.exscalate4cov.network/

The problem was fixed.

#8 [Major-Conclusion]

- two primary objectives: characterizing the druggable binding sites within the 413 therapeutically relevant SARS-CoV-2 proteins as well as presenting and validating a novel strategy 414 to search and prioritize the protein pockets by using the identified SARS-CoV-2 sites as test set. I can’t find any validation index.

See point #3; a sentence commenting the statistical parameter MCC was added in the Conclusion.

- The second paragraph need to be rewritten to brief this study more concisely.

As suggested, the paragraph was shortened by focusing on the MCC values as said above.

Round 2

Reviewer 2 Report

Authors can find my attached review comments. Sorry for the insufficient revision. I expect further improvement.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

We are grateful once again for the valuable comments. Here is the list of the improvements according to your requests. All changes in the paper and supporting information are highlighted in yellow for easy identification.

Point #1

I could find MCC, precision, accuracy. However, I cannot find confusion matrix to calculate them.

In addition, threshold between two classes can change such statistical values so that the threshold with confusion matrix should be shown.

A binary classification is used between correctly and incorrectly detected pockets. For example for the case involving the consensus scores, there are 30 correctly detected best pockets = true positives; 10 incorrectly detected best pockets which are simultaneously both the false positives and the false negatives; 1314 non-best pockets which were correctly recognized as non best pockets = true negatives. 1314 = total number of detected pockets in all analyzed proteins (1364, as obtained by summing the number of pockets in Table 2) – number of best pockets (40, one best pocket per analysis) – number of incorrectly detected pockets (10, the false negatives). A paragraph detailing how the confusion matrix is computed and the data used to calculate the MCC precision and accuracy is added at page 11.

Can you describe the meaning of Table2? The lines 236-238 described that PDB(%R7Y) has 16 fockets and Fpocket predicted the right focket with 1st rank and PLANTS also did. The consensus = mean between Fpocket and PLANTS? Clarify the table (especially, the definition of column vairables)!!!

Table 2 and the titles of the columns has been better detailed.

Point #3

How to judge “the correct” in the table 2? To judge the correction, which index did authors used? Did authors used RMSD? If X-ray and their predicted pocket can share 80% volume, the prediction is correct? If the overlapped volume is 50%, how they can judge it?

The reviewer emphasizes a very important point and proposes a surely interesting approach based on RMSD or on volume overlapping that however can be applied only if the analyses involve always resolved proteins in complex with ligands that can be used as the probes for the searches. Unfortunately, this is not situation in our study, which involve both theoretical models and resolved proteins with no bound ligands. In these cases, the arrangement of the orthosteric and allosteric binding sites was argued by structural comparison with homologous proteins but we do not have reference ligands by which RMSD or volume overlapping can be evaluated. Hence, we decide to utilize a more general approach which is based on the recognition of key residues lining the binding pockets and the detected pockets are analyzed by considering their distance to these key residues. Clearly the correct pocket is the closest one to the key residues. If the pocket is correctly recognized the closest pocket is ranked as n. 1. Table 2 reports these rankings and assesses how many times the correct pocket is ranked as the first pocket or not (by FPocket, by PLANTS and by consensus score). A paragraph detailing this criterion to evaluate the Pocket 2.0 performances was added at page 8.

Point #5

Sorry. I can’t find any simulation condition in current supplementary information. Instead of simulation parameter and condition, I could find their result. In addition, Table S1 also could not tell me how they could get the rank.

The computational details were added in the supporting information and Table S1 was rendered almost identical to Table 2 to support its understanding.

Point #6

Sorry. I didn’t ask authors either simplifying or shortening. I asked more understandable writing for reproducible simulation by other researchers among future readers.

We are sorry but we replied to his first comment: “The writing is not clarified. When a practical user of their program can understand the paragraph. Authors need to rewrite the section to make general readers understood. Personally, current writing looks like the manual of a software.” The major criticism was a too technical writing of this part which appears to be a manual. Indeed, we simplified the section avoiding excessive technicalities so as to render the section understandable for the general readers. That being said, the computational details now reported in a specific 3.4 section and those added in the SI allow the users to reproduce the performed analyses.