Next Article in Journal
Calibration of Small-Grain Seed Parameters Based on a BP Neural Network: A Case Study with Red Clover Seeds
Previous Article in Journal
Effects of Phosphate Application Rate on Grain Yield and Nutrition Use of Summer Maize under the Coastal Saline-Alkali Land
 
 
Article
Peer-Review Record

A Hybrid Approach for Soil Total Nitrogen Anomaly Detection Integrating Machine Learning and Spatial Statistics

Agronomy 2023, 13(11), 2669; https://doi.org/10.3390/agronomy13112669
by Wengang Zheng 1,2,†, Renping Lan 1,2,†, Lili Zhangzhong 3, Linnan Yang 4, Lutao Gao 4,* and Jingxin Yu 3,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Agronomy 2023, 13(11), 2669; https://doi.org/10.3390/agronomy13112669
Submission received: 24 September 2023 / Revised: 13 October 2023 / Accepted: 21 October 2023 / Published: 24 October 2023
(This article belongs to the Section Water Use and Irrigation)

Round 1

Reviewer 1 Report

In this paper, the authors proposed  proposed a method for identifying TN outliers by combining the isolated forest algorithm and the local spatial auto-correlation analysis method, and identified the outliers of TN data in Yunnan Province in 2009, and reached the following conclusions: (1) The overall situation of TN content in Yunnan Province is good, and at the same time, TN is positively correlated with SOM, PH, AP and AK relationship, and had a strong correlation with soil organic matter (R=0.74). 607 (2) The proposed method has high detection validity and accuracy in detecting TN anomalies, and in the detection effect of 2009 TN data in Yunnan Province, it showed that the TR could reach 99.97% and the FNR was 0.01%. (3) Comparing the proposed method with the independent isolated forest (FNR=4.76%), box plot (FNR=3.90%), OneClassSVM (FNR=4.77%), and the combination method of box plot, OneClassSVM combined with spatial analysis, respectively, the results show that the proposed method (FNR=0.01%) is better than the single method and other combination methods. (4) The spatial distribution of TN content in Yunnan Province showed a trend of high in the northwest and low in the southwest, with the high value area mainly distributed in northwest Yunnan and the low value area mainly concentrated in southwest Yunnan. These results can be used for soil nutrient analysis and evaluation to provide guidance for agricultural production management. 

There are some observations which the authors should incorporate in their manuscript

1. In Table 1, split poin should be split point. Kindly check

2. In section 2.5, second line of first paragraph, true (TP) should be true positive(TP).

3. T1- T3 not explained in Section 2.4 as claimed in the manuscript.

4. How the value of K is selected in KNN?

5. In figure 9, add (a), (b), (c) and (d) in the captions of subfigures.

6. Table 10 is referred on page 16, but there is no Table 10 in the manuscript.

7. The authors are suggested to refer the following research articles for more quality evaluation parameters for their proposed techniques- Quantitative Estimation of Soil Properties using Hybrid Features and RNN Variants, Chemosphere, Vol. 287, pp. 1-10, 2022; Estimation of Soil Properties from the EU Spectral Library using Long Short Term Memory Networks, Geoderma Regional, Vol.18, pp. 1-12(e00233), 2019

Author Response

(Comment 1) In Table 1, split poin should be split point. Kindly check.

Reply: Thank you for your careful reading of this manuscript, we have corrected "split poin" to "split point" in Table 1.

 

(Comment 2) In section 2.5, second line of first paragraph, true (TP) should be true positive(TP).

Reply: Thank you very much for your careful reading of this manuscript. We have corrected "true (TP)" to "true positive (TP)" in section 2.5.

 

(Comment 3) T1 - T3 not explained in Section 2.4 as claimed in the manuscript.

Reply: Your comments on the review manuscript are greatly appreciated. This is described at the end of Section 2.4 "According to HL/LH abnormality, HH/LL aggregation and non-significance, the spatial outlier judgement can be divided into the following three types: (1) If the TN detection result is HL/LH abnormality and the auxiliary indicator detection result is no abnormality or the abnormality result is inconsistent, it indicates that the data is an outlier. (2) If the TN shows clustering of HH or LL and the auxiliary indicator is opposite, the data is an outlier. (3) If the TN shows insignificant results and the auxiliary indicator shows HL abnormality or LH abnormality, the data is also an outlier." With discrimination rules (1)-(3) corresponding to T1 - T3, respectively. To improve readability, we have changed the corresponding serial numbers from (1) - (3) to T1 - T3.

 

(Comment 4) How the value of K is selected in KNN?

Reply: Thank you for your comments on the review of this manuscript. In this study, the K value in KNN was selected based on changes in spatial outliers. We selected the K value below the corresponding K value where the results of the spatial autocorrelation analysis of the soil total nitrogen data were in a relatively stable state. This is described in detail in section 2.4 of this study.

 

(Comment 5) In figure 9, add (a), (b), (c) and (d) in the captions of subfigures.

Reply: Thank you for your very valuable review comments. We have added subheadings to Figure 9 based on your suggestion.

 

(Comment 6) Table 10 is referred on page 16, but there is no Table 10 in the manuscript.

Reply: Thank you very much for your careful reading of this manuscript. The reference to Table 10 on page 16 should be to Table 9 and we have changed the presentation here.

 

(Comment 7) The authors are suggested to refer the following research articles for more quality evaluation parameters for their proposed techniques-Quantitative Estimation of Soil Properties using Hybrid Features and RNN Variants, Chemosphere, Vol. 287, pp. 1-10, 2022; Estimation of Soil Properties from the EU Spectral Library using Long Short Term Memory Networks, Geoderma Regional, Vol.18, pp. 1-12(e00233), 2019

Reply: Thank you for providing us with specific research articles to refer to as additional quality assessment parameters. We appreciate your efforts in suggesting these relevant publications and understand the importance of referring to relevant research articles as quality assessment parameters. In our study, we decided to borrow the confusion matrix to evaluate the model quality because the binary classification problem is more concerned with the model's classification accuracy and misclassification of different categories. However, we have considered your suggestion and added the reasons for the choice of evaluation parameters to further explain what is described in the article.

Author Response File: Author Response.docx

Reviewer 2 Report

Please find some comments.

Section 3.1 and overall text, nutrients are ranked as high, medium, and low content/concentration, not rich or poor. Please review and correct this.

must add at the foot of all figures and tables the meaning of the acronyms used.

Write again conclusions, almost 80% is just a summary of the results. Provide at least 3 statements of the relevance of your research.  Include further considerations. Include critical limitations of your work.

It is not clear the real use of this study, please provide realistic examples of application

why this was done using data from 2009? what happened with the data from 2010 to 2023?

how data almost 15 years old can be used nowadays to create agricultural production management?

Author Response

(Comment 1) Section 3.1 and overall text, nutrients are ranked as high, medium, and low content/concentration, not rich or poor. Please review and correct this.

Reply: Thank you very much for your review comments. We have changed the corresponding nutrient descriptions.

 

(Comment 2) must add at the foot of all figures and tables the meaning of the acronyms used.

Reply: Thank you for your careful review of this manuscript. Following your suggestions, we have added the meanings of the abbreviations used at the bottom of the pictures and tables where the abbreviations appear.

 

(Comment 3) Write again conclusions, almost 80% is just a summary of the results. Provide at least 3 statements of the relevance of your research. Include further considerations. Include critical limitations of your work.

Reply: Thank you very much for your careful consideration of the structure of this paper, we have presented the strengths and weaknesses of the methodology and future work in the Discussion section and based on your suggestions we have adjusted the content, modified the conclusion and placed the strengths and weaknesses of the methodology and future work in the Conclusion section.

"For the problem of detecting anomalies in TN data, we proposed a method for identifying TN outliers by combining the isolated forest algorithm and the local spatial autocorrelation analysis method, and identified the outliers of TN data in Yunnan Province in 2009, and reached the following conclusions: (1) The overall situation of TN content in Yunnan Province is good, and at the same time, TN is positively correlated with SOM, PH, AP and AK relationship, and had a strong correlation with soil organic matter (R=0.74). (2) The proposed method has high detection validity and accuracy in detecting TN anomalies, and in the detection effect of 2009 TN data in Yunnan Province, it showed that the TR could reach 99.97% and the FNR was 0.01%. (3) Comparing the proposed method with the independent isolated forest (FNR=4.76%), box plot (FNR=3.90%), OneClassSVM (FNR=4.77%), and the combination method of box plot, OneClassSVM combined with spatial analysis, respectively, the results show that the proposed method (FNR=0.01%) is better than the single method and other combination methods. (4) The spatial distribution of TN content in Yunnan Province showed a trend of high in the northwest and low in the southwest, with the high value area mainly distributed in northwest Yunnan and the low value area mainly concentrated in southwest Yunnan. The fusion method of isolated forest and local spatial autocorrelation analysis can be used to effectively identify outliers in the analytical processing of soil total nitrogen data, thereby improving data quality. After outlier removal, the variability of the data is reduced and the spatial autocorrelation is enhanced.

The method has the following advantages and innovations: (1) The method fully considers the spatial characteristics of TN data and improves the accuracy and comprehensiveness of outlier identification. (2) The IForest algorithm, as a global outlier detection tool, is capable of handling complex data with high speed and accuracy [39]. (3) Local spatial autocorrelation as a local outlier detection tool is capable of detecting spatial anomalies [40]. (4) The method integrates the discriminative results of global and local outliers for the final classification of the data, which avoids the possible misjudgment or omission of a single method and improves the reliability and stability of outlier identification. This study provides a new idea and method for the identification of TN outliers and supports soil nutrient analysis and evaluation, as well as the mining of soil fertility models.

Although the proposed methods show better results in identifying TN outliers, both the IForest algorithm and the local spatial autocorrelation analysis include some parameter settings, and the anomaly detection results are affected by the parameter settings, which may lead to unstable or inconsistent detection results.We can consider using some outlier detection methods based on integrated learning or deep learning to improve the stability and consistency of the detection results in future work. The stability and consistency of the detection results, as well as the use of some spatial outlier detection methods based on multiple indicators or multiple scales to improve the accuracy and sensitivity of the detection results."

 

(Comment 4) It is not clear the real use of this study, please provide realistic examples of application.

Reply: Thank you very much for your deeper consideration of this study. In this study, a method for detecting soil total nitrogen anomalies is proposed, and the results show that this method can effectively identify the anomalous data of soil total nitrogen. In the summary and analysis of soil test data, the use of this method can help to accurately identify and exclude abnormal values in soil total nitrogen data, improve the quality of soil total nitrogen data, so as to better understand the nature and characteristics of the soil and improve soil management.

 

(Comment 5) why this was done using data from 2009? what happened with the data from 2010 to 2023?

Reply: Thank you for your valuable review comments. As long-term data collection requires more resources and time, and the 2009 data are the largest and most complete data we have obtained, this dataset is representative enough to validate the effectiveness of the proposed method for detecting outliers in soil total nitrogen. In subsequent studies, data samples can be expanded to include data from more years to further validate the reliability and applicability of the method.

 

(Comment 6) how data almost 15 years old can be used nowadays to create agricultural production management?

Reply: Thank you for your review comments. The first step is to collect and compile data to improve the quality of the data; secondly, to analyse the data and look for correlations and trends between the data; thirdly, to build a forecasting model based on the data of the past 15 years, which can be used to predict trends and changes in agricultural production in the future; fourth, to formulate an appropriate strategy for managing agricultural production based on the results of data analysis and forecasting; and finally, to establish a monitoring system to collect data on agricultural production at regular intervals and to make adjustments and optimisations according to the actual situation.

Author Response File: Author Response.docx

Reviewer 3 Report

The reviewed article is devoted to methods for predicting and classifying soils based on the analysis of nitrogen emissions. The article is written in good, competent scientific language.   The analysis of the state of research in the field has been carried out quite thoroughly. Classification models were built based on the methods of decision trees and SVM algorithms.

The subject of the article corresponds to the subject of the journal.        Machine learning methods were originally used to predict the content of substances in the soil. The method of self-organizing Kohonen maps was also used to solve the clustering problem. Metrics for assessing their quality are correctly applied to the machine learning methods used. The use of machine learning methods will make it possible to solve current problems in the field of agronomy in conditions of uncertainty of initial data.

I recommend this article for publication with minor adjustments:

1) it is necessary to expand the conclusion, indicating the strengths and weaknesses of the proposed approach, as well as describe directions for further research on this topic

2) Figures 4 and 8 has low resolution and not readable. The pictures quality needs to be improved

3) the references can include specific publications on machine learning methods and the effects of their use in various fields, for example,

Orlova, E.V. Methodology and Models for Individuals’ Creditworthiness Management Using Digital Footprint Data and Machine Learning Methods. Mathematics 20219, 1820. https://doi.org/10.3390/math9151820

Orlova, E.V. Innovation in Company Labor Productivity Management: Data Science Methods Application. Appl. Syst. Innov. 20214, 68. https://doi.org/10.3390/asi4030068

Li, W.; Finsa, M.M.; Laskey, K.B.; Houser, P.; Douglas-Bate, R. Groundwater Level Prediction with Machine Learning to Support Sustainable Irrigation in Water Scarcity Regions. Water 202315, 3473. https://doi.org/10.3390/w15193473

 

correct syntax and punctuation errors

Author Response

(Comment 1) it is necessary to expand the conclusion, indicating the strengths and weaknesses of the proposed approach, as well as describe directions for further research on this topic.

Reply: Thank you for your valuable review comments. In the last two paragraphs of the discussion, we detail the advantages and disadvantages of the proposed method and elaborate on the research directions that can be considered in future work to improve the detection results. Based on your suggestions, we have restructured the paper by placing the strengths and weaknesses of the method and future work in the conclusion section.

 

(Comment 2) Figures 4 and 8 has low resolution and not readable. The pictures quality needs to be improved

Reply: Thank you very much for your careful review of the images in this manuscript. We have re-exported the images to improve the pixel points.

 

(Comment 3) the references can include specific publications on machine learning methods and the effects of their use in various fields, for example,

Orlova, E.V. Methodology and Models for Individuals’ Creditworthiness Management Using Digital Footprint Data and Machine Learning Methods. Mathematics 2021, 9, 1820. https://doi.org/10.3390/math9151820

Orlova, E.V. Innovation in Company Labor Productivity Management: Data Science Methods Application. Appl. Syst. Innov. 2021, 4, 68. https://doi.org/10.3390/asi4030068

Li, W.; Finsa, M.M.; Laskey, K.B.; Houser, P.; Douglas-Bate, R. Groundwater Level Prediction with Machine Learning to Support Sustainable Irrigation in Water Scarcity Regions. Water 2023, 15, 3473. https://doi.org/10.3390/w15193473

Reply: Thank you for your suggestion. Based on your suggestion, we have carefully read the relevant literature and included citations in the article. “For example, Orlova et al. used hierarchical clustering, k-mean and gradient boosting algorithms to construct clustering and classification models for credit scoring and risk prediction of borrowers [20]; Orlova et al. analysed the application of k-mean and support vector machine algorithms in the management of production labour rate in the enterprise [21]; and Li et al. used AI algorithms to predict groundwater depth [22].”

Author Response File: Author Response.docx

Back to TopTop