Groundwater Quality Assessment Based on the Random Forest Water Quality Index—Taking Karamay City as an Example
Round 1
Reviewer 1 Report
This paper introduces an improved groundwater quality index (RFWQI) based on random forest, taking Karamay City as an example. The advanced stochastic forest model was used to rank the importance of 11 indexes, and the weight values of these indexes were calculated by ROC analysis. Through the steps of data processing, model training, super-parameter tuning and model verification, a reliable groundwater quality prediction model is established. This study provides an effective method for groundwater quality assessment and provides a reference basis for water resources management and protection. However, the article still has the following problems, which need further improvement.
1. The article has no discussion section and needs to be supplemented. In the discussion section, it is necessary to discuss the advantages and disadvantages of the improved Random Forest Water Quality Index (RFWQI) proposed in this article compared to existing water quality index evaluation methods, and what improvements are made.
2. According to the specific situation of Karamay City, the water quality evaluation results of Karamay City using the improved Random Forest Water Quality Index (RFWQI) should be rationally analyzed, so as to prove the reliability of RFWQI method in practical application.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Please see the attachment.
Comments for author File: Comments.pdf
Moderate editing of English language required
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
1. Title is not making sense. Improved WQI Index, It’s the methods advantages not the WQI. Change the title accordingly
2. What is the reason for selecting Random forest, why not other methods. Any comparison study done before selecting RF, there are several other ML models that are used for WQI prediction. Justify with proper statement.
3. Introduction need to written with need of the study and background of the study. More statistical data about the past decade pollution levels and freshwater availability need to be included to have a clear idea about the water pollution.
4. In location map, it is recommended including the county (China).
5. Why Health Risk Assessment was not conducted. From the available data it can be included.
6. Data was collected during summer, but we can’t conclude by studying only in summer. Samples need to collected in summer, winter, premonsoon and post monsoon, since in each season, there will be a lot of variation in pollutants concentrations.
Some spelling mistakes need to be rectified.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 4 Report
The authors have studied the ground water and introduced the a new water quality index for measuring it. The manuscript should be revised before being accepted. Some points for improvement :
1. For the ML based quality index, it is very important to make your datasets and ML optimized parameters publically accessible. I think this will enable the others to use it , validate it and also imporovise it in future.
2. Motivation for the choice of the location is not clear? why was Karamay city used for the defining the WQI. Further, the literature review should be more detailed. How is the sampling abundant in the present case compared to previous studies?
3. FIgure 2 is not at useful to make any conclusions.
4. Figure 3 : How do the authors explain the non-overlapping region for the two approaches?
5. An accuracy of 1.0 is makes me wonder, if its not a case of overfitting in Table 6.
6. Have the authors tried other ML based algorithms to obtain the WQI like gradient boosting? Or more elementary approaches like SVM? Again the motivation to use Random forest model is arbitary.
7. Morevoer, the authors should discuss the datasets in parituclar how diverse the features are? How many outliers did it have? And how much transferrable is the dataset for some new sample from the different region or landscape.
8. Finally the discussion on the WQM over NSF improved results should be more elaborated.
Minor editing will be good. References to previous studies and established approaches should be incorpated. Many are missing when discussed in the manuscript.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The article has been greatly improved. It is recommended to add comparisons with existing research in the discussion section to highlight the value of this research.
Author Response
Please see the attachment.
Author Response File: Author Response.docx