Next Article in Journal
Theoretical and Applied Aspects of Hydrodechlorination Processes—Catalysts and Technologies
Next Article in Special Issue
Reversed Proteolysis—Proteases as Peptide Ligases
Previous Article in Journal
Sustainable Enzymatic Synthesis of a Solketal Ester—Process Optimization and Evaluation of Its Antimicrobial Activity
 
 
Article
Peer-Review Record

The Evolution, Gene Expression Profile, and Secretion of Digestive Peptidases in Lepidoptera Species

Catalysts 2020, 10(2), 217; https://doi.org/10.3390/catal10020217
by Lucas R. Lima 1,†, Renata O. Dias 2,3,†, Felipe Jun Fuzita 2, Clélia Ferreira 2, Walter R. Terra 2 and Marcio C. Silva-Filho 1,4,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Catalysts 2020, 10(2), 217; https://doi.org/10.3390/catal10020217
Submission received: 6 January 2020 / Revised: 24 January 2020 / Accepted: 6 February 2020 / Published: 11 February 2020
(This article belongs to the Special Issue Biocatalysis: Mechanisms of Proteolytic Enzymes)

Round 1

Reviewer 1 Report

The paper tackle a relevant topic of biochemical adaptation in lepidoptera. Authors collect novel data on localization of Serine Peptidase (SP) in midgut bringing insight on physiology of their use and helping to ameliorate SP gene annotation from published genome with relevant proteomic information. On the actual reconstruction of the evolutionary event the authors fails to perform a formal analysis based on evolutionary model for change in time of gene duplication helping to define event of gene expansion, nor they perform ancestral character reconstruction to help to pinpoint the appearance of a single novelty in the protein structure. Due to the very sparse taxon sampling probably a full fledged evolutionary analysis on the whole Lepidoptera was not feasible, but the authors should have concentrate on the few novelty discussed in the paper and formalized the analysis for the few for which taxonomic sampling were sufficient.
It is unclear the utility of the species tree reconstruction, given that the age of the species tree node is not used to infer the date of the two gene trees duplication event

As overall the author should make an effort to refocus the manuscript or only on the biochemical/proteomic part or try to reorganize the phylogenetic one introducing some evolutionary model or for duplication or for ancestral character reconstruction.

In case authors decided to reorganize  the phylogenetic analysis the change needed are really major and the editor should  consider if to ask for a re-submission


Major

The authors focus on 23 genomes but in NCBI it seems that is possible to find 57 genomes of lepidoptera. What criteria were used to select the 23 genomes? Given the problem of taxon representation, it would be important to define why some data were discarded.

Putative SP selection is unclear, a figure that depict the workflow would help. In particular the last step of selection include both NJ reconstruction and blastp score and is not clear how authors combines this two tests. The figure of the workflow should help also to identify the different category of sequences (what sequences were included in figure 2-3? also the one of table S1 and S2?)
It is also surprising that a single aminoacid (pos 189) is sufficient to discriminate between trypsin and chymotripsin. Further this choice is done without any reference. It would be interesting to check if removing this step in the pipeline and using directly the next one ( NJ and blastp) would have produced the same subdivision between putative trypsin and chymotripsin.

line 376 and similarly line 109 "Monophyletic branches predicted for both enzyme trees were individually analyzed." If the "Monophyletic branches" are the one called T1-T18 and C1-C23 then authors should not use "for both" but "for each" given that sequences are not paired across the two tree and each species have several sequences included in each analysis. Further it is unclear the meaning that the authors attach to the word monophyletic. It  seems that they use to indicates a branch with good support meaning not polytomic, while should means "single origin" and should be used to describe a group defined a priori of the phylogenetic tree that find a confirmation in the phylogeny, thanks to a good support on the branch that group them. Saying that a branch is monophyletic is a tautology. Only a group of sequences/species/individuals can be monophyletic.

Reconstruction of ancient state using only the topology as the one in line 230-231 should be given as hypothetical given that similar caveat that the one of gene duplication hold. If author are really interested to those point they should perform an explicit reconstruction of ancient character (using MrBayes or even some function in RAXML).


the phylogenetic inference
Given the non central role of the species tree (the nodes age is not used for the discussion and family definition is taken from general taxonomy), authors should evaluate to use directly Timetree output to define a ultrametric species tree with most species used and the few lacking inserted on taxonomic base drawing a dashed line.
If on the contrary authors think that the species phylogeny is a relevant contribution they should discuss also the state of the art in lepidoptera systematics. The authors for example do not comment why the family of Crambidae is not monophyletic in their species phylogeny.

I have some perplexities in the choice of the  protein for the phylogenetic reconstruction. From figure S5 we see that not all sequences are single copy, raising some problems in orthology definition. I would advise the authors to remove duplicated gene from the analysis and evaluate if to remove also  some gene not well represented ( several orthologous are missing or only a fragment). I notice that Chilo suppressalis have a large section of missing data that could explain the problem of monophly of Crambidae
The authors, in order to estimate the ultrametric tree, use a single timetree estimation without taking in account the large confidence interval of 40MYa reported in Timetree database for that nodes. R8s has several settings and it could accommodate confidence interval information and the use of several calibration points, maybe all taken from Timetree. In fact no explanation is given from the authors for the choice of this specific single calibration point.


Minor

line 198, what criteria of quality were used? some word should be given that connect to later material and method section

line 207 predicted by who? what kind of software or procedure? The sequences with red label (non canonical catalytic triad) of figure 2-3 are the same cited here or there are others?


Figure 1 and 2 have serious problem of readability
label of sequence and support on branch, cannot be read with actual resolution.
Authors should simplify information ( i.e. use color coding, sequence group designation)
With actual resolution, it is not possible to distinguish outgroup from ingroup, nor identify at what species or family belong each sequence.
Reference to sequence name in text that refer to those figures (line230) cannot be used, because the name cannot be read, nor searched (image is strictly raster). Reference of this kind should be marked with symbol as a star.
It should be useful, to better follow discussion in line 300-310, that sequences found in midgut by proteomic analysis should be marked on the tree.
Authors should move the actual figure at higher resolution in the supplemental or better ( for repeatability and for data re-use) in supplemental should be available a file in newick format with the actual tree and a table with sequence name, type of enzyme, species and family


Table 1 and 2 have unclear formatting
from legend gray should be used for 100% conserved position while only X (major class less 50%) are in gray.
Aminoacid conserved across all group are highlighted in green but this is not described in the legend
Table 1 has several values not in line, no explanation is given in legend for it. Is it an error or it bear meaning?
Table 1 header "branch" is not correctly spaced

Author Response

            The authors would like to thank the reviewers for the valuable comments and suggestions.

 

Reviewer #1:

 

“The paper tackle a relevant topic of biochemical adaptation in lepidoptera. Authors collect novel data on localization of Serine Peptidase (SP) in midgut bringing insight on physiology of their use and helping to ameliorate SP gene annotation from published genome with relevant proteomic information. On the actual reconstruction of the evolutionary event the authors fails to perform a formal analysis based on evolutionary model for change in time of gene duplication helping to define event of gene expansion, nor they perform ancestral character reconstruction to help to pinpoint the appearance of a single novelty in the protein structure. Due to the very sparse taxon sampling probably a full fledged evolutionary analysis on the whole Lepidoptera was not feasible, but the authors should have concentrate on the few novelty discussed in the paper and formalized the analysis for the few for which taxonomic sampling were sufficient.

It is unclear the utility of the species tree reconstruction, given that the age of the species tree node is not used to infer the date of the two gene trees duplication event”

As overall the author should make an effort to refocus the manuscript or only on the biochemical/proteomic part or try to reorganize the phylogenetic one introducing some evolutionary model or for duplication or for ancestral character reconstruction.

In case authors decided to reorganize  the phylogenetic analysis the change needed are really major and the editor should  consider if to ask for a re-submission”

 

Answer: The authors agree with the reviewer that more robust analyses are needed in order to tackle the evolutionary history of these gene families. However, we have two major problems in conduct more specific phylogenetic analyses. First, as the reviewer pointed out we have sparse taxon sampling (see answer #2 for the reasons). Second, serine peptidases are a very diverse gene family, making them a challenge to multiple sequence alignment tools and, consequently, to the phylogenetic tree prediction. We consider that our proposed gene trees are good indicatives to support that several gene duplication events may have occurred along the Lepidoptera evolutionary history but the obtained statistical support for the tree branches were not good enough to use these trees in the analyses proposed by the reviewer. However, in order to make clear the limitations of our phylogenetic analysis, we reorganize the paper, removing more specific evolutionary discussions, moving the phylogenetic trees to Supplementary Materials, and focusing more on the biochemical/proteomic analysis, as suggested by the reviewer. Furthermore, the species tree reconstruction was removed from the analysis.

 

 

 

 

Major

 

“The authors focus on 23 genomes but in NCBI it seems that is possible to find 57 genomes of lepidoptera. What criteria were used to select the 23 genomes? Given the problem of taxon representation, it would be important to define why some data were discarded”.

 

Answer: The authors agree with the reviewer that the addition of more Lepidoptera genomes from different taxon would greatly improve the results of our work. However, our work was performed using only 32 Lepidoptera genomes because these were the only ones with annotated protein datasets available on NCBI. Several Lepidoptera genomes were submitted to NCBI Genome database without the gene set annotation, which impaired our analyses.

 

“Putative SP selection is unclear, a figure that depict the workflow would help. In particular the last step of selection include both NJ reconstruction and blastp score and is not clear how authors combines this two tests. The figure of the workflow should help also to identify the different category of sequences (what sequences were included in figure 2-3? also the one of table S1 and S2?)”

 

Answer: In agreement with the reviewer the authors added an overview of the workflow used in sequences identification on Figure S6. We also added a list of all sequences described on the phylogenetic trees on Tables S2 and S4.

In the tables previously named S1 and S2, we show:

Table S1 (now Table S7): additional Spodoptera frugiperda sequences identified as composed by a predicted trypsin domain but not classified as trypsin or chymotrypsins. Serine peptidases are a large group of enzymes with different functions and several sequences with a predicted trypsin domain have different functions from those that we are looking for. For example, at the end of Table S1 it is possible to see sequences with CUB and CLIP domains, which are involved in immune responses besides the presence of a trypsin domain. Table S2 (now Table S8): here, we show the number of sequences with a predicted trypsin domain but not included in our classification as putative trypsin or chymotrypsins. Now, we clarify that some of them were removed on our first filter step and others on the second filter step. The aim of this table it is to show that the discrepancy in the number of sequences selected as putative trypsins or chymotrypsins among species was not due to the filters. For example, Chilo suppresalis has only seven predicted chymotrypsin sequences and in this Table is possible to see that this species has an overall low number of SP genes even before the filters.

 

“It is also surprising that a single aminoacid (pos 189) is sufficient to discriminate between trypsin and chymotripsin. Further this choice is done without any reference. It would be interesting to check if removing this step in the pipeline and using directly the next one ( NJ and blastp) would have produced the same subdivision between putative trypsin and chymotrypsin”.

Answer: Changing the residue in position 189 does not transform a trypsin in a chymotrypsin and vice-versa. However, position 189 lies at the bottom of the S1 binding pocket, which is the main determinant of substrate specificity. Thus, analysis of the residue at position 189 is the easiest and most effective way to discriminate trypsins and chymotrypsins  among serine endopeptidases sequences (identified with the use or interpro) (Steitz et al., 1969, J. Mol. Biol. 46, 337-348; Barrett A,J. Rawlings N.D., Woessner J.F., 2004. Handbook of Proteolytic Enzymes, vol. 2, Elsevier, London). We tried the suggestion of the reviewer, but it failed to discriminate trypsins from chymotrypsins, probably due to the difficulty of aligning so many sequences.

 

 

“line 376 and similarly line 109 "Monophyletic branches predicted for both enzyme trees were individually analyzed." If the "Monophyletic branches" are the one called T1-T18 and C1-C23 then authors should not use "for both" but "for each" given that sequences are not paired across the two tree and each species have several sequences included in each analysis. Further it is unclear the meaning that the authors attach to the word monophyletic. It  seems that they use to indicates a branch with good support meaning not polytomic, while should means "single origin" and should be used to describe a group defined a priori of the phylogenetic tree that find a confirmation in the phylogeny, thanks to a good support on the branch that group them. Saying that a branch is monophyletic is a tautology. Only a group of sequences/species/individuals can be monophyletic.”

 

Answer: At this point, the authors fully agree with the reviewer and we have already corrected this error. We would like to thank the reviewer for this observation.

 

“Reconstruction of ancient state using only the topology as the one in line 230-231 should be given as hypothetical given that similar caveat that the one of gene duplication hold. If author are really interested to those point they should perform an explicit reconstruction of ancient character (using MrBayes or even some function in RAXML)”.

 

Answer: As previously discussed, we do not think that a reconstruction of ancient character will be possible due the complexity of these gene families and the available taxonomic data. In order to keep our observations as hypothetical, we now put the phylogenetic analyses on Supplementary Material and are considering them along the manuscript more carefully.

 

The phylogenetic inference

“Given the non central role of the species tree (the nodes age is not used for the discussion and family definition is taken from general taxonomy), authors should evaluate to use directly Timetree output to define a ultrametric species tree with most species used and the few lacking inserted on taxonomic base drawing a dashed line. If on the contrary authors think that the species phylogeny is a relevant contribution they should discuss also the state of the art in lepidoptera systematics. The authors for example do not comment why the family of Crambidae is not monophyletic in their species phylogeny.

I have some perplexities in the choice of the  protein for the phylogenetic reconstruction. From figure S5 we see that not all sequences are single copy, raising some problems in orthology definition. I would advise the authors to remove duplicated gene from the analysis and evaluate if to remove also  some gene not well represented ( several orthologous are missing or only a fragment). I notice that Chilo suppressalis have a large section of missing data that could explain the problem of monophly of Crambidae
The authors, in order to estimate the ultrametric tree, use a single timetree estimation without taking in account the large confidence interval of 40MYa reported in Timetree database for that nodes. R8s has several settings and it could accommodate confidence interval information and the use of several calibration points, maybe all taken from Timetree. In fact no explanation is given from the authors for the choice of this specific single calibration point.”

 

Answer: The species tree was removed from the work.

 

Minor

“line 198, what criteria of quality were used? some word should be given that connect to later material and method section”

 

Answer: The criteria are now mentioned as: “The other sequences did not meet the established criteria to chymotrypsins identification used in the present work (see Material and Methods section).”

 

“line 207 predicted by who? what kind of software or procedure? The sequences with red label (non canonical catalytic triad) of figure 2-3 are the same cited here or there are others?”

 

Answer: We changed this sentence to: “Other protein sequences from the S. frugiperda transcriptome predicted to include a trypsin domain (according InterproScan analysis, IPR001254) but not classified according to our criteria as trypsins or chymotrypsins are presented in Table S7”.

 

“Figure 1 and 2 have serious problem of readability
label of sequence and support on branch, cannot be read with actual resolution. 
Authors should simplify information ( i.e. use color coding, sequence group designation). With actual resolution, it is not possible to distinguish outgroup from ingroup, nor identify at what species or family belong each sequence. 
Reference to sequence name in text that refer to those figures (line230) cannot be used, because the name cannot be read, nor searched (image is strictly raster). Reference of this kind should be marked with symbol as a star.
It should be useful, to better follow discussion in line 300-310, that sequences found in midgut by proteomic analysis should be marked on the tree.
Authors should move the actual figure at higher resolution in the supplemental or better ( for repeatability and for data re-use) in supplemental should be available a file in newick format with the actual tree and a table with sequence name, type of enzyme, species and family”.

Answer: We have moved the phylogenetic trees to Supplementary Material and put the files in a higher resolution in order to allow the reader to read the label when zooming on it.

 

“Table 1 and 2 have unclear formatting from legend gray should be used for 100% conserved position while only X (major class less 50%) are in gray. Aminoacid conserved across all group are highlighted in green but this is not described in the legend Table 1 has several values not in line, no explanation is given in legend for it. Is it an error or it bear meaning?”

 

Answer: We believe that there was a problem in formatting the tables at the time of finalizing the file for the Journal format. We are sorry about this mistake. These tables are now converted in Figures S3 and S4 following the suggestions of the reviewer.

 

“Table 1 header "branch" is not correctly spaced”

 

Answer: Thank you for the observation. Now we present the corrected form.

Reviewer 2 Report

The manuscript reports on a study carried out to investigate the evolution of genes coding for chimotrypsin and trypsin in Lepidoptera. The authors demonstrated that several gene expansion events occurred during the evolution of this taxon. An analysis of the conservation/variation of the residues at the binding site of each subgroup of trypsin and chymotripsin has been carried out.

Experimental data on gene expression and secretion pathways have been included in the work and correlated to the phylogenetic analysis.

Authors may want to take into consideration the possibility to map onto the predicted 3D structures of trypsin and chymotripsin the sites specific of each subgroup (in a way similar to what already done in one of their previous work cited as ref. 6). In addition to the description of the conserved and variable sites, the presence of indels specific of a subgroup should be indicated and eventually mapped onto the structure. Once this has been accomplished, functional correlations may be attempted. For example, it could be verified whether variable positions involve site known to interact with PI which may possibly suggest the structural bases of variability of response to host Pis in different trypsin or chymotrypsin subgroups.

 

Minor points:

species name should be in italics

I suspect that “ultrameric” is “ultrametric”

Reference format is not homogeneous.

 

Author Response

Reviewer #2:

 

“The manuscript reports on a study carried out to investigate the evolution of genes coding for chimotrypsin and trypsin in Lepidoptera. The authors demonstrated that several gene expansion events occurred during the evolution of this taxon. An analysis of the conservation/variation of the residues at the binding site of each subgroup of trypsin and chymotripsin has been carried out. Experimental data on gene expression and secretion pathways have been included in the work and correlated to the phylogenetic analysis”.

 

“Authors may want to take into consideration the possibility to map onto the predicted 3D structures of trypsin and chymotrypsin the sites specific of each subgroup (in a way similar to what already done in one of their previous work cited as ref. 6)”.

 

Answer: We added the Figures suggested by reviewer on Supplementary Figures S3 and S4.

 

“In addition to the description of the conserved and variable sites, the presence of indels specific of a subgroup should be indicated and eventually mapped onto the structure. Once this has been accomplished, functional correlations may be attempted. For example, it could be verified whether variable positions involve site known to interact with PI which may possibly suggest the structural bases of variability of response to host Pis in different trypsin or chymotrypsin subgroups”.

 

Answer: Unfortunately, the presence of indels was difficult to analyze due the large number of sequences and the challenge of align so many of them, particularly on insertion regions. However, we failed to find an association between the diversity of amino acids on the enzyme’s substrate-binding site and host PIs. Our similarity groups were so diverse that no trend was observed along the phylogenetic tree. Moreover, due to suggestions of reviewer #1 we decreased the relevance of our evolutionary analysis and give more focus to RNA-seq and proteomics analysis.

 

Minor points:

 

“Species name should be in italics”

 

Answer: Thank you for the observation, we corrected this mistake.

 

“I suspect that “ultrameric” is “ultrametric””

 

Answer: Following the suggestions of reviewer #1, we remove the species tree from the analyses.

 

“Reference format is not homogeneous”.

 

Answer: Thank you for the observation, we checked all references and correct their formats.

 

Round 2

Reviewer 2 Report

The authors responded as much as possible to all comments. I have no further issues.

Back to TopTop