Next Article in Journal
Adoption Pattern of Direct-Seeded Rice Systems in Three South Asian Countries during COVID-19 and Thereafter
Previous Article in Journal
Variations and Commonalities of Farming Systems Based on Ecological Principles
 
 
Article
Peer-Review Record

Assessing Soil and Land Suitability of an Olive–Maize Agroforestry System Using Machine Learning Algorithms

Crops 2024, 4(3), 308-323; https://doi.org/10.3390/crops4030022
by Asif Hayat 1, Javed Iqbal 1, Amanda J. Ashworth 2,* and Phillip R. Owens 3
Reviewer 1: Anonymous
Reviewer 2:
Crops 2024, 4(3), 308-323; https://doi.org/10.3390/crops4030022
Submission received: 14 May 2024 / Revised: 17 June 2024 / Accepted: 24 June 2024 / Published: 9 July 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Common remarks

Usually this kind of work follow to the generally accepted scheme:

1) Data collection

1.1) Collection of direct data at point locations obtained as a result of field work (field data, "ground truth")

1.2) Collection of indirect data covering the entire work area (DEM features, remote sensing data)

2) Selecting the dependent variable. Either the results of measurements at point locations are used directly as a dependent variable (to build a regression), or the result of the classification of field data (to build a classification)

3) Model building

3.1) Preparing a dataset for modeling. Point data from clause 1.1 are overlayed with indirect data from clause 1.2, resulting in a point dataset with a dependent variable from clause 2, and independent variables from clause 1.2.

3.2) Selection of independent variables. From the data set of clause 3.1, the variables most related to the dependent variable from clause 2 are selected.

3.3) The point dataset created in clause 3.2 is divided into 3 parts to build a model using ML (machine learning methods): training dataset, testing dataset, validation  dataset in a proportiond of approximately 8:1:1. Training dataset and testing  dataset are used to build a model.

3.4) The final ML model is built on the training dataset data, and the testing dataset data is used to control the construction process and select the optimal parameters.

3.5) The quality of the final model is assessed using a validation dataset. These estimates of the quality of the final model are used as a prediction performace metric. Usualy clause 3.3-3.5 are repeated several times to collect statistics and assess the stability of the model.

4) Prediction. The model constructed in clause 3.4 is applied using the complete data set from clause 1.2 to run a prediction for the entire territory.

--------------------------

Questions about the article, since it is hard to understand from the text what the authors did and how much their methodology meets the standard requirements for building models.

Q1) Clause 2 of Common Remarks is not described, i.e. it is not explained how and on the basis of what data  the classes of the dependent variable were calculated (specific variable names and which dataset they belong to).

Q2) One might assume that the authors' term "auxiliary variables" refers to the data in clause 1.2  of Common Remarks (section "2.6 Preparation of the Auxiliary Data"), however, in the text in section "3.2 Selected Auxiliary Features" we see the results of field analyzes from clause 1.1  of Common Remarks (pH, CaCO3, etc., similar to figures 3,4). Both datasets in clauses 1.1, 1.2  of Common Remarks must be precisely defined. 

Q3) The article does not indicate which independent variables were used to build the models in clause 3.4 of Common Remarks. Judging by the text, this is point data from clause 1.1 of Common Remarks, which is not available for the entire territory, together with data from clause 1.2 of Common Remarks (authors need to provide specific variable names and which dataset they belong to).

Q4) The article does not describe what data was used to build the model in clauses 3.2-3.5 of Common remarks (specific variable names and which dataset they belong to, how the dataset was divided into training dataset, testing dataset, validation dataset), and for what part of the data the model performance estimates are given in Table 2

Q5) Since, from the text, the model uses point data from clause 1.2 of Common Remarks, which is not available for the entire territory, how were the results in Table 3 obtained for the entire territory?

Author Response

Manuscript ID: crops-3035007

Reviewer 1:

Questions about the article, since it is hard to understand from the text what the authors did and how much their methodology meets the standard requirements for building models.

Response: Thank you very much for the time and consideration during the review process. Specific responses are below.

Q1) Clause 2 of Common Remarks is not described, i.e. it is not explained how and on the basis of what data  the classes of the dependent variable were calculated (specific variable names and which dataset they belong to).

Response: Indeed, methodologies in the article follow the aforementioned data collection, handling, and process schema outlined. Furthermore, the dependent variables used (point data) are now specified in the amended version in L213-215; 311-339 and 144-151. Thank you for this suggestion.

Q2) One might assume that the authors' term "auxiliary variables" refers to the data in clause 1.2  of Common Remarks (section "2.6 Preparation of the Auxiliary Data"), however, in the text in section "3.2 Selected Auxiliary Features" we see the results of field analyzes from clause 1.1  of Common Remarks (pH, CaCO3, etc., similar to figures 3,4). Both datasets in clauses 1.1, 1.2  of Common Remarks must be precisely defined.

Response: Authors can see how this caused confusion and has been amended in the revised version. Mention of point data are now removed from article section 3.2.

Q3) The article does not indicate which independent variables were used to build the models in clause 3.4 of Common Remarks. Judging by the text, this is point data from clause 1.1 of Common Remarks, which is not available for the entire territory, together with data from clause 1.2 of Common Remarks (authors need to provide specific variable names and which dataset they belong to).

Response: Dependent and independent variables are now defined, as are statements included confirming training data and testing data were used to select optimal parameters. The exact ML procedures have been followed as done in previously published work by authors (doi.org/10.3390/soilsystems5030041 and doi.org/10.32389/JEEG22-001) and were done correctly. Thank you for these comments. If additional ML methods need clarification, please specify and they can be added in the amended version.

Q4) The article does not describe what data was used to build the model in clauses 3.2-3.5 of Common remarks (specific variable names and which dataset they belong to, how the dataset was divided into training dataset, testing dataset, validation dataset), and for what part of the data the model performance estimates are given in Table 2

Response: thank you for these comments, substantial effort has been undertaken to clarify data sources/types and division among training data. Authors added specific variable names and the source of data in the manuscript (L143-150) and in a Table (Table 1). Specifically, data were divided into 70% training, 20% testing and 10% validation, while Table 3 represents the confusion matrices for the data of the entire study area, which compares the actual target values with those of the predicted ones. Verbiage was again added to this point in L313-338. The model performance is given for the entire data both with the actual values and the predicted on of the classified imagery. Hopefully, this fully addresses this Reviewers queries on model development.

Actual/ Predicted

Positive

Negative

Positive

TP

FN

Negative

FP

TN

Q5) Since, from the text, the model uses point data from clause 1.2 of Common Remarks, which is not available for the entire territory, how were the results in Table 3 obtained for the entire territory?

Response: thanks for your comments. Yes the data are point data but authors interpolated the point data through modeling for the entire study area to create layer of each data type then all layers were overlaid on each other to create multiband images, after which they were classified into suitability classes and then the land area was measured by field calculator tool in the GIS software from which the area in and their respective percentages were obtained as shown in the Table 4. This approach has been widely published by authors and co-authors, as well as other investigators.

Reviewer 2 Report

Comments and Suggestions for Authors

The current investigation entitled “Land Suitability of an Olive-Maize Agroforestry Systems Using Machine Learning Algorithms” authored by Hayat et al. tested and evaluated several machine learning algorithms namely random forest and support vector machine and traditional techniques of weighted overlay for their ability to identify suitable land classes for an integrated olive and maize agroforestry system for Khyber Pakhtunkhwa province of Pakistan. The current investigation offers valuable information that may be useful to policymakers and land managers for increasing valuable olive-maize systems in the Khyber Pakhtunkhwa province and other adjacent areas.

Comments/suggestion

In the abstract section the abbreviations provided should be spelled such as S2 and S3 .

The conclusive statemntnprovided at the end of the abstract section need to be revised.

I did to find the significance of providing the information in line 60-72. Since, these statements are too generalised and should be removed.

The title fonthe manuscript should be revised to soil site suitability rather than just land suitability

The research gap of the current investigation is not clearly indicated. Moreover the specific objectives are also not mentioned.

In the material and method section, the author should also provide a table indicating the different sources of the datasets used. Moreover. The detail about the source of climatic factors is also missing.

The title of subsection should be revised.

Sub section 3.1 need to be shifted to material and method section

Discussion section still need to be revised.

The conclusion section need to be revised and methodological detail need to be removed from the conclusion section.

Overall the manuscript have novelty but need to minor revision

Regards 

Comments on the Quality of English Language

Moderate editing of English language required

Author Response

Reviewer 2:

The current investigation entitled “Land Suitability of an Olive-Maize Agroforestry Systems Using Machine Learning Algorithms” authored by Hayat et al. tested and evaluated several machine learning algorithms namely random forest and support vector machine and traditional techniques of weighted overlay for their ability to identify suitable land classes for an integrated olive and maize agroforestry system for Khyber Pakhtunkhwa province of Pakistan. The current investigation offers valuable information that may be useful to policymakers and land managers for increasing valuable olive-maize systems in the Khyber Pakhtunkhwa province and other adjacent areas.

Response: Thank you for the thorough review of our article; suggestions made greatly improved the manuscript. Authors also appreciate the comment about this investigation offering valuable information.

Comments/suggestion

In the abstract section the abbreviations provided should be spelled such as S2 and S3.

Response: Great suggestion, this has been carried out in the amended version.

The conclusive statement provided at the end of the abstract section need to be revised.

Response: Agreed, the conclusions statement is now revised.

I did to find the significance of providing the information in line 60-72. Since, these statements are too generalised and should be removed.

Response: Authors concede that this information was not germane to the foci of the paper and has been largely erased (e.g., L62-70 are now erased).

The title of the manuscript should be revised to soil site suitability rather than just land suitability

Response: Thank you for this suggestion. “Soil” is now prominently placed in the title per this remark.

The research gap of the current investigation is not clearly indicated. Moreover the specific objectives are also not mentioned.

Response: Thank you for this comment. The goal is now clearly listed in the abstract and the end of the introduction. Research gaps are mentioned “however, few studies have used machine learning (ML) algorithms to evaluate multiple variables (i.e. soil physicochemical properties, climatic, and topographic data) for the selection of suitable rainfed sites in mountainous terrain systems” AND “However, little work has investigated the suitability of maize-olive intercrop agroforestry systems in Pakistan for closing yield gaps and improving food security” AND “However, no studies to date have tested and evaluated ML models [i.e., RF, SVM, and traditional techniques of weighted overlay (WOL)] for their ability to identify suitable land classes for an integrated olive and maize agroforestry system for 1,757 km² in Khyber Pakhtunkhwa province, Pakistan”.

In the material and method section, the author should also provide a table indicating the different sources of the datasets used. Moreover. The detail about the source of climatic factors is also missing.

Response: Thank you, a Table is now added to the methodology. Details about the sources of climatic factors are also added per this point in Table 1. 

The title of subsection should be revised.

Response: Thank you for this comment. Sub-heads were edited and numbering was corrected, thank you for noticing this error.

Sub section 3.1 need to be shifted to material and method section

Response: Because 3.1 only presents results of the soil analyses, authors felt that it belonged in Results. However, if the Editor and this Reviewer feel strongly about its inclusion in the methods, please let us know and it can be moved.

Discussion section still need to be revised.

Response: Thank you for the attention to this. The sections have been split up in the amended version and revised.

The conclusion section need to be revised and methodological detail need to be removed from the conclusion section.

Response: Thank you. Authors agree and have removed methodological details from the conclusions.

Overall the manuscript have novelty but need to minor revision

Response: Thank you. Authors hope that the amended version is deemed more acceptable.

General comments 

Line 14. Is it dual cropping or the agroforestry systems. Kindly justify. The abstract section need to be revised. (i) authors should limited to the word limit in the abstract section. (ii) the writing style need to be revised especially the methodological section (iii) the conclusive statement at the end of the abstract section should indicate the implications of the current investigation or constructive conclusive statement.

Response: Thank you for reiterating previous comments. It is both agroforestry and dual cropping (annual systems integrated with an agroforestry system). All suggestions have been incorporated.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Unfortunately, it was not possible to obtain specific and direct answers to the questions from the authors neither in  revised version nor in cover letter. What we managed to understand from the corrected text and answers:

1) There are field data at points (soil characteristics, etc.)

2) Additional data is available for the entire territory (relief, climate, remote sensing)

3) The authors calculated interpolated soil characteristics for the entire territory.

4) There is a simple method of determination of land suitability classes for an olive-maize agroforestry system, through the weighted overlay technique,  using he point data from clause 1) supplemented with data from clause 2)

5) The point data from clause 1) is supplemented with data from clause 2), and a ML model is trained on them with the land suitability classes as dependent variable, which should reproduce the simple technique from clause 4)

6) The simple technique from clause 4) is applied to the entire territory, using the data from clause 3) and clause 2)

7) The trained ML models from clause 5) are applied to the entire territory, using the data from clause 3) and clause 2), and the results are compared with the results from clause 6)

Thus, the main goal of the work is to reproduce the simple methodology from clause 4) using machine learning methods on the same data as in clause 5). This is devoid of scientific and practical meaning. Publishing such work in a scientific journal will also look strange.

What can be done:

1) Remove ML models from the work, leave the classification results from clause 6) and their discussion.

2) Assess by cross-validation the change in classification results using the method from clause 4) on point data from clause 1) supplemented by data from clause 2), when replacing real soil characteristics with interpolation results from clause 3)

3) Train ML models on the results of classification of point data, using only the data from clause 2), and evaluate the performance of such models using cross-validation. Obtain a classification for the entire territory using these models, and compare with the results from clause 6).

Author Response

Response: Thank you for your thorough and in-depth remarks, hopefully feedback will clarify and placate Reviewer 2’s concerns. When evaluating the results of the classification done by the weighted overlay and machine learning techniques, it was clear that the more straightforward weighted overlay technique was compromised and showed a very small area that is not suitable (N2), and a very large area of highly suitability (S1) and moderately suitable (S2) classes. Therefore, authors are reluctant to remove ML methods. Further, the machine learning methods classified suitable land more accurately which was illustrated in Figures 5 and in Table 4. Authors chose two machine learning models: RF and SVM. Both RF and SVM function effectively when data are not available on a large scale. The exact ML methods used in this study were followed because of their successful applicability in previous research e.g., Taghizadeh-Mehrjardi et al. (2020) and Agrawal et al. (2024). Similarly, these studies used AHP and ML (RF and SVM) for land suitability classification of wheat-mustard intercropping using geospatial parameters like soil, climatic, and topography (methodology exactly followed in the present study). Similarly, authors found that ML methods provided more accurate results than traditional methods, thus these articles likewise proposed that ML-based approaches surpassed the limitations of conventional methods and gave precise decisions. Habibie et al. (2020) also suggested the ML method for classification of suitable land classes for sole maize prediction of drought prone areas in Indonesia based on methods which our paper follows. Due to the results of these papers and the importance of the ML algorithms like RF and SVM, we incorporated these methodologies in our research to improve and correctly classify suitable land classes.  Therefore, authors maintain that ML modeling was indeed appropriate and followed the approach of Taghizadeh-Mehrjardi et al. (2020) and Agrawal et al. (2024).

Response: Thank you for this suggestion. A cross-validation assessment of the change in classification results was added in the results section (L334-387; 368-383; 416-419) and discussion section (L442-451 and 460-464) in hopes that it appeases this reviewer's concerns.

Response: Authors appreciate this suggestion. Authors indeed followed the same method as suggested by this Reviewer. Please refer to the flow chart diagram (L-142), as we used point data (clause 1) and the climatic, topographic, and RS data (clause 2) and interpolated all the data and then overlayed all the layers to then train the ML algorithms and classify it, same is the case for (clause 6) as the entire territory was then compared for their classification results (accuracy). Authors hope that by adding the cross-validation assessment, this will placate concerns for ML models, which were appropriately handled and scientifically robust and follow CRISP-DM (Cross industry standard process for data mining) guidelines and widely accepted procedures in the literature (Löw, F., et al (2012), Sheykhmousa, M. et al (2020), Pal, M. (2005).

Comments from Editors

Is the use of leading dots for equation numbers the correct format? Looks odd.

Response: In the amended version, equations have been corrected. Authors also request that special attention be paid to Reviewer 2’s comments on removing ML models and ask that the TE weigh in on this decision. Authors felt that removing ML models, which are scientifically robust, appropriately handled, and followed CRISP-DM (Cross industry standard process for data mining) guidelines, as well as widely accepted procedures [Taghizadeh-Mehrjardi et al. (2020); Liu et al (2018); Rienow et al. (2021); Thanh Noi et al. (2017)] would greatly weaken the article. Authors hope that adding the cross-validation assessment will placate concerns over ML models, however, we hope that as Editor you will evaluate this request based on what is widely accepted in the literature.

Back to TopTop