Delineation of Soil Management Zone Maps at the Regional Scale Using Machine Learning
Round 1
Reviewer 1 Report
General Comments
This manuscript describes a mapping study of soil management zones to support agricultural and environmental goals in Bajestan, Iran. It provides an interesting framework for developing soil management zones using geospatial modeling of soil properties (Random Forests) which are then used to delineate clusters of similar soil pixels (fuzzy classification). I believe that with substantial revision this paper could be a good contribution to science, but it needs a lot of work. One major improvement would be to present more detailed methods as a framework for other studies to follow, then use the specific study region as a proof of concept to guide the discussion.
The introduction reads like a very comprehensive literature review, but it does not do a great job at framing the overall narrative of the paper. It does not identify knowledge gaps that this work will address and, because of this, it makes the presented research seem less interesting and less novel. Similarly, the section of the discussion focusing on targeted fertilizer applications, which is the motivation for this entire research, seems a bit vague and could go much deeper into potential land management strategies for specific MZs. Since this is the closing section of the discussion, it lessens the impact of the work. I recommend reformulating both the introduction and discussion to strengthen the contribution that this paper will provide.
There needs to be substantially more information provided on the methods for the random forests model and model extrapolation. It is a critical part of the work and has limited detail in the methods. Provide a final list of variables that were selected, give us the mtry and ntree values that were chosen. What training and testing splits were used? What do the authors mean when they refer to “validation”? Similarly, more detail on the fuzzy classification approach would be very helpful to the audience, rather than providing a citation and making them dig up another paper. After all, “machine learning” is in the title of this paper, but there is not much detail about it within.
There are numerous parts of the paper that have minor grammatical issues or strangely worded sentences. I have noted some in the specific comments, but I recommend that the authors review the paper thoroughly for these issues.
Specific Comments
Ln 22: I do not believe that “SQ” has been defined. Please define it.
Ln 50-54: Double check the grammar and punctuation in this passage. With all of the citations in there, it gets a little confusing.
Ln 64: Say “were used by…” instead of “was”.
Ln 71: “Maps” should not be capitalized here.
Ln 79: Add an “a” for “appropriate”.
Ln 81-92: I found the phrasing of these closing sentences to be awkward and a bit confusing. I recommend that the authors rewrite this section, as these are some of the most important sentences in the entire research paper. It would also be good to phrase the study objectives in the past tense.
Ln 96: Do not capitalize “Selected”.
Ln 121-123: This sentence is grammatically incorrect. Maybe try “Therefore, there is a need for…”.
Ln 125: Rewrite to “...applied to the SCORPAN model [45] to map soil properties by linking…”.
Table 1: This table should have some additional information regarding these spatial indices. Just a note on what each index generally means without forcing the reader to look at every reference that is placed next to the covariates.
Ln 146: Please provide more details regarding the variable selection process.
Ln 161: I believe that the “My” is supposed to be a symbol here.
Ln 164: What are the authors referring to as “standardized data”? This is the first we have heard of this.
Ln 164-167: This sentence is confusing. Please revise.
Ln 179: What are the other “related variables” from the PCA other than the PCs? Please elaborate.
Ln 192-193: Please revise for clarity.
Ln 205: This sentence needs to be revised. I cannot tell what the authors are saying.
Figure 2: I am not sure why some of the color bars on this graph have red as high values and some have red as low values. Please use the same color scheme for all graphs for consistency. Also, I would consider submitting this figure as supplemental material and instead presenting maps of the 5 PCs used in the classification method. That will help compress the figure and show more interpretable trends.
Ln 295: It seems like the evaluation of this RF model should be discussed before the mapped soil properties.
Ln 298-299: This is why we need more information on the model training. Was an independent set of data used for error evaluation?
Ln 350-353: Please elaborate on this analysis, possibly add it into the methods section. It is unclear if the authors are saying that they used ANOVA to delineate MZs or if ANOVA was just used to investigate statistical differences between different soil properties among the MZs.
Ln 353-354: This sentence about fertilizer application seems out of context here. Also, I am not sure that “bytes” is the appropriate word to use here.
Ln 400: The authors need to define “SQI”.
Table 7: It would be helpful to provide the percentage of each SQI relative to total area of each MZ on this table.
Ln 444-446: This sentence is confusing. What do the authors mean by “evaluated” here?
Ln 487: I recommend removing “the status of his” from this sentence. It reads better that way.
Author Response
Reviewer #1:
General comments
This manuscript describes a mapping study of soil management zones to support agricultural and environmental goals in Bajestan, Iran. It provides an interesting framework for developing soil management zones using geospatial modeling of soil properties (Random Forests) which are then used to delineate clusters of similar soil pixels (fuzzy classification). I believe that with substantial revision this paper could be a good contribution to science, but it needs a lot of work. One major improvement would be to present more detailed methods as a framework for other studies to follow, then use the specific study region as a proof of concept to guide the discussion.
The introduction reads like a very comprehensive literature review, but it does not do a great job at framing the overall narrative of the paper. It does not identify knowledge gaps that this work will address and, because of this, it makes the presented research seem less interesting and less novel. à Thanks for your very good suggestions.The authors thank the reviewer for his/her valuable and constructive comments that were accounted for in this revised version of the manuscript, along with the specific comments given hereafter that were addressed. We totally revised the last paragraph of Introduction and discussed abot my aimes in lines 85-93.
Similarly, the section of the discussion focusing on targeted fertilizer applications, which is the motivation for this entire research, seems a bit vague and could go much deeper into potential land management strategies for specific MZs. Since this is the closing section of the discussion, it lessens the impact of the work. I recommend reformulating both the introduction and discussion to strengthen the contribution that this paper will provide. à
The introduction and discussion fully revised in new version of manuscript.
There needs to be substantially more information provided on the methods for the random forests model and model extrapolation. It is a critical part of the work and has limited detail in the methods. Provide a final list of variables that were selected, give us the mtry and ntree values that were chosen. What training and testing splits were used? What do the authors mean when they refer to “validation”? Similarly, more detail on the fuzzy classification approach would be very helpful to the audience, rather than providing a citation and making them dig up another paper. After all, “machine learning” is in the title of this paper, but there is not much detail about it within. à We used leave one out cross validation (k-fold cross-validation) which has been defined in seconed paragraph of Section 2.5. Furthermore, four commonly indices including the coefficient of determination (R2), Lin’s concordance correlation coefficient (CCC), mean absolute error (MAE), and root mean square error (RMSE) were used to validate the performance of the RF model.
Random Forest replace instead of machine learning in the title of paper.
There are numerous parts of the paper that have minor grammatical issues or strangely worded sentences. I have noted some in the specific comments, but I recommend that the authors review the paper thoroughly for these issues. à The English language was thoroughly checked and revised and some parts have been completely rewritten.
Detailed comments to the Author
- Ln 22: I do not believe that “SQ” has been defined. Please define it. à Done.
- Ln 50-54: Double check the grammar and punctuation in this passage. With all of the citations in there, it gets a little confusing. à It was revised and showed in lines 53-55 in new version of manuscript.
- Ln 64: Say “were used by…” instead of “was”. à Done.
- Ln 71: “Maps” should not be capitalized here. à Done.
- Ln 79: Add an “a” for “appropriate”. à Done.
- Ln 81-92: I found the phrasing of these closing sentences to be awkward and a bit confusing. I recommend that the authors rewrite this section, as these are some of the most important sentences in the entire research paper. It would also be good to phrase the study objectives in the past tense. à The sentences totally rewrite and added in lines 85-93 to the new version of revised manuscript.
- Ln 96: Do not capitalize “Selected”. à Done.
- Ln 121-123: This sentence is grammatically incorrect. Maybe try “Therefore, there is a need for…”.à The sentences totally revised in lines 123-126.
- Ln 125: Rewrite to “...applied to the SCORPAN model [45] to map soil properties by linking…”.…”.à Done.
- Table 1: This table should have some additional information regarding these spatial indices. Just a note on what each index generally means without forcing the reader to look at every reference that is placed next to the covariates.àDefinitions of indices added to the new version of Table. The new version of table presented in supplementary data.
- Ln 146: Please provide more details regarding the variable selection process. à It added to the lines of 144-147.
- Ln 161: I believe that the “My” is supposed to be a symbol here. à Revised.
- Ln 164: What are the authors referring to as “standardized data”? This is the first we have heard of this. à Removed.
- Ln 164-167: This sentence is confusing. Please revise. à It Revised which explained in lines 159-161.
- Ln 179: What are the other “related variables” from the PCA other than the PCs? Please elaborate. à It revised. This means that using important variables is selected based on principal component analysis.
- . Ln 192-193: Please revise for clarity. à Revised.
- Ln 205: This sentence needs to be revised. I cannot tell what the authors are saying. à It Revisedand explained in line 193 in new version of manuscript.
- Figure 2: I am not sure why some of the color bars on this graph have red as high values and some have red as low values. Please use the same color scheme for all graphs for consistency. Also, I would consider submitting this figure as supplemental material and instead presenting maps of the 5 PCs used in the classification method. That will help compress the figure and show more interpretable trends. àThe legend of Figure 2 revised. Due to the importance of Figure 4 and the detailed discussion of soil maps in Section 3.3, the figure was not transferred to the auxiliary data section. Unfortunately, the second part of the comment is not understandable about presenting maps of the 5 PCs .
- Ln 295: It seems like the evaluation of this RF model should be discussed before the mapped soil properties. à Thank you for your suggestion. The sententences discussed before the mapping soil properties. The new sentences explained in section 3.2. in lines 223-242.
- Ln 298-299: This is why we need more information on the model training. Was an independent set of data used for error evaluation? The method of k fold cross validation was used and added new sentence in line 150.
- Ln 350-353: Please elaborate on this analysis, possibly add it into the methods section. It is unclear if the authors are saying that they used ANOVA to delineate MZs or if ANOVA was just used to investigate statistical differences between different soil properties among the MZs. à The sentences were explained about ANOVA in lines 186-189.
- Ln 353-354: This sentence about fertilizer application seems out of context here. Also, I am not sure that “bytes” is the appropriate word to use here. à It removed.
- Ln 400: The authors need to define “SQI”. à It define in line 384 in new version of manuscript.
- Table 7: It would be helpful to provide the percentage of each SQI relative to total area of each MZ on this table. It revised. The better explanation is available in the revised version of Table 5.
- Ln 444-446: This sentence is confusing. What do the authors mean by “evaluated” here? à Removed “evaluated”.
Ln 487: I recommend removing “the status of his” from this sentence. It reads better that way. àRemoved.
Author Response File: Author Response.pdf
Reviewer 2 Report
The work in the scientific sense is well written, and in the continuation of the text I have a couple of questions and suggestions for correction.
Line 22 What is SQ map. Maybe explain?
Line 79 “ppropriate?” Did you mean appropriate?
Line 117 Title "2.4. Spatial variability of soil properties" on the next page of the work paper where the text is.
Table 1. "LS" is an acronym for Slope Length and Steepness factor (in the USLE, RUSLE model), maybe another acronym is better?
Write which software you used for the Random Forest model and for Principal component analysis (PCA)
Equation 4 Should there be "MAE" in the formula here?
Line 237 Title "3.2. Digital soil property maps” on the next page to be.
Table 4. All names should be below the table on one side of lines 316-320
Author Response
Reviewer #2:
General comments
Comments and Suggestions for Authors
The work in the scientific sense is well written, and in the continuation of the text I have a couple of questions and suggestions for correction.à The authors thank the reviewer for his/her valuable and constructive comments that edited text according to all comments you have proposed as you can see below.
Detailed comments to the Author
- Line 22 What is SQ map. Maybe explain? à
- Line 79 “ppropriate?” Did you mean appropriate? à
- Line 117 Title "2.4. Spatial variability of soil properties" on the next page of the work paper where the text is. à
- Table 1. "LS" is an acronym for Slope Length and Steepness factor (in the USLE, RUSLE model), maybe another acronym is better? à SPI used instead of "LS" .
- Write which software you used for the Random Forest model and for Principal component analysis (PCA). à It is reported in the materials and methods section 2.5 lines 148-149 for Random Forest model (The caret package of R3.3.1 was used to prediction soil properties map with the RF function). The PCA method was examined using IBM SPSS 22 statistics software (section 2.6 line 159).
- Equation 4 Should there be "MAE" in the formula here? à
- Line 237 Title "3.2. Digital soil property maps” on the next page to be. à
- Table 4. All names should be below the table on one side of lines 316-320 . à
General comments on manuscript
- Please check and revise all the equations. If there are any pictures. If so, please revise. àRevised the all equations.
- Please add the accessed date for ALL the links in the WHOLE paper, not just main text. à
- Author Contributions: Page: 4
Who is M.Z.? . àIt was a typo and it has been corrected. - Please add the citation of ref 93 in Main Text or delete ref 93. à
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
General Comments:
In general, this manuscript is improved from its original form. I think this work represents a good regional study for addressing agricultural challenges. However, I do not believe that the authors adequately addressed all of the initial concerns, and the manuscript still needs some revisions.
The closing of the introduction and statement of objectives is much improved from the original draft. Something is still lacking from the closing of the discussion. Perhaps a table with specific recommendations to address the SQ challenges of each MZ would help communicate the overall findings in a way that would be easy for researchers and agricultural managers to interpret?
Something is wrong with the figures in this revised draft. It may have been a download error, but I could not see any of the figures properly, so I don't really have any comments on them. It would be useful to show the spatial distributions of the PCs that were used in the MZ delineation, however.
There are still some grammatical errors and misspellings in this manuscript. The authors need to thoroughly review the work and fix these issues.
Specific Comments:
Ln 44: There needs to be an "and" before "electrical conductivity".
Ln 63: Pluralize "situation".
Figures: It may be a rendering issue on the manuscript document, but figure 1 has a large grey rectangle covering the bottom half and figure 2 onward are just blank white boxes on my copy.
Table S1: This is much better. Thank you adding more descriptions. However, it would be useful to put an inset in this table to define what B4 and B8A and all of these variables are in the equations.
Ln 111: Provide more detail on the groundwater mapping than just "geostatistical analysis".
Ln 117: Is "mode" supposed to be "model" here?
Ln 128: Was the cross-validation used for tuning model parameters? Expand this. Also, did the variable importance step help remove highly correlated/redundant predictors (e.g., Na+ and SAR)? Random forests are better at dealing with this, but a lot of correlated predictors can still cause issues.
Ln 172-173: Do the authors mean to say "high variability" instead of "availability" here? High CV -> high variability.
Ln 190: Revise spelling of "commn".
Ln 192-193: The positive bias is an interesting finding. The authors could expand with a potential explanation here.
Ln 196: There needs to be a comma before "and" to break up this compound sentence.
Ln 303-304: I still feel that this is a confusing sentence. Would it be easier to just say "ANOVA was used to evaluate differences in soil nutrients among MZs"?
Ln 332: Define TWI
Ln 355-357: Rephrase this sentence. It is awkwardly worded.
Author Response
Reviewer #1:
General comments
In general, this manuscript is improved from its original form. I think this work represents a good regional study for addressing agricultural challenges. However, I do not believe that the authors adequately addressed all of the initial concerns, and the manuscript still needs some revisions.
The closing of the introduction and statement of objectives is much improved from the original draft. Something is still lacking from the closing of the discussion. Perhaps a table with specific recommendations to address the SQ challenges of each MZ would help communicate the overall findings in a way that would be easy for researchers and agricultural managers to interpret? àTable 7 added in new version of manuscript.
Something is wrong with the figures in this revised draft. It may have been a download error, but I could not see any of the figures properly, so I don't really have any comments on them. It would be useful to show the spatial distributions of the PCs that were used in the MZ delineation, however. à Thanks for your very good suggestions. The authors thank the reviewer for his/her valuable and constructive comments that were accounted for in this revised version of the manuscript, along with the specific comments given hereafter that were addressed. We added the PCs maps in Figure 5 and described them in section 3.4 in new version of manuscript.
There are still some grammatical errors and misspellings in this manuscript. The authors need to thoroughly review the work and fix these issues. à The English language was thoroughly checked and revised and some parts have been completely rewritten.
Specific Comments:
Ln 44: There needs to be an "and" before "electrical conductivity". àDone.
Ln 63: Pluralize "situation". àDone.
Figures: It may be a rendering issue on the manuscript document, but figure 1 has a large grey rectangle covering the bottom half and figure 2 onward are just blank white boxes on my copy. à Probably, this problem is in the rendering of the file in the respected referee's version, but the figures were included with the new EMf format in the text.
Table S1: This is much better. Thank you adding more descriptions. However, it would be useful to put an inset in this table to define what B4 and B8A and all of these variables are in the equations. à The all of bands of Sentinel 2 explained in below of Table S1. Sentinel 2 Bands: B2= Blue (492.4 nm), B3= Green (559.8 nm), B4= Red (664.6 nm), B5= Vegetation Red Edge (704.1 nm), B6 = Vegetation Red Edge (740.5 nm), B7= Vegetation Red Edge (782.8 nm), B8=NIR (832.8 nm), B8A= Narrow NIR (864.7 nm), B10=SWIR1 (1373.5 nm), B11= SWIR2 (1613.7 nm).
Ln 111: Provide more detail on the groundwater mapping than just "geostatistical analysis". à The sentences totally rewrite and added in lines 131-137 to the new version of revised manuscript. In order to prepare map of water quality parameters, a total of 190 groundwater features, including qanat and wells, were sampled and analyzed according to standard procedures [52]. Then, spatial analyst of ArcMap GIS was used to generate different useful maps for better understanding the variability of water quality in the study area. Groundwater quality parameters (HCO3−, Cl−, SO42−, Na+, Ca2+, Mg2+, EC, pH, SAR, and TDS) were mapped using geostatistical analysis using the inverse distance weighting (IDW) method [28]
Ln 117: Is "mode" supposed to be "model" here? àRevised.
Ln 128: Was the cross-validation used for tuning model parameters? Expand this. Also, did the variable importance step help remove highly correlated/redundant predictors (e.g., Na+ and SAR)? Random forests are better at dealing with this, but a lot of correlated predictors can still cause issues. à Thank you for your comments. We reviewed the cross-validation method. Regarding the correlation, it should be noted that land features and remote sensing using stochastic forest models were used to predict soil features. RF models are not sensitive to the correlation between predictors. In the next step, we use PCA on the soil properties to convert the hem to PC. Therefore, we practically removed the most relevant features.
Ln 172-173: Do the authors mean to say "high variability" instead of "availability" here? High CV -> high variability. à Yes, it revised.
Ln 190: Revise spelling of "commn".à Done.
Ln 192-193: The positive bias is an interesting finding. The authors could expand with a potential explanation here. à Done. The new sentences explained in lines 234-236. There was a very small positive bias for predicted values for all parameters (Table 2) which RMSE and MAE exhibited a positive value with regard to the overestimated prediction of the DSM of soil property. This bias seemed closely related to the average distance of soil sampling, differences of type of cultivated lands and laboratory methods.
Ln 196: There needs to be a comma before "and" to break up this compound sentence. à Done.
Ln 303-304: I still feel that this is a confusing sentence. Would it be easier to just say "ANOVA was used to evaluate differences in soil nutrients among MZs"? à The new sentences replaced.
Ln 332: Define TWI. à Topographic wetness index.
Ln 355-357: Rephrase this sentence. It is awkwardly worded. à Revised. Therefore, nitrogen fertilizer consumption should be given serious attention. In order to achieve the highest absorption efficiency of chemical fertilizers for phosphorus, it should be used at the right time and according to the needs of the plant.
Author Response File: Author Response.pdf