Next Article in Journal
Contribution of Non-Rainfall Water Input to Surface Soil Moisture in a Tropical Dry Forest
Next Article in Special Issue
Development of Rating Curves: Machine Learning vs. Statistical Methods
Previous Article in Journal
Incorporating aSPI and eRDI in Drought Indices Calculator (DrinC) Software for Agricultural Drought Characterisation and Monitoring
Previous Article in Special Issue
The Development of Explicit Equations for Estimating Settling Velocity Based on Artificial Neural Networks Procedure
 
 
Article
Peer-Review Record

KNN vs. Bluecat—Machine Learning vs. Classical Statistics

Hydrology 2022, 9(6), 101; https://doi.org/10.3390/hydrology9060101
by Evangelos Rozos 1,*, Demetris Koutsoyiannis 2 and Alberto Montanari 3
Reviewer 1:
Reviewer 2:
Hydrology 2022, 9(6), 101; https://doi.org/10.3390/hydrology9060101
Submission received: 16 May 2022 / Revised: 28 May 2022 / Accepted: 31 May 2022 / Published: 6 June 2022

Round 1

Reviewer 1 Report

 

1.     The manuscript presents machine learning (KNN) versus classical statistics (Bluecat), which is interesting.

2.     However, the manuscript, in its present form, contains several weaknesses. Appropriate revisions to the following points should be undertaken in order to justify recommendation for publication.

3.     More justifications are needed to demonstrate that it is within the scope of the journal, which is on climate change.

4.     Full names should be shown for all abbreviations in their first occurrence in texts. For example, XOR in p.1, 1D in p.4, 2D in p.4, etc.

5.     For readers to quickly catch your contribution, it would be better to highlight major difficulties and challenges, and your original achievements to overcome them, in a clearer way in abstract and introduction.

6.     It is shown in the reference list that the authors have several publications in this field. This raises some concerns regarding the potential overlap with their previous works. The authors should explicitly state the novel contribution of this work, the similarities, and the differences of this work with their previous publications.

7.     p.1 – KNN and Bluecat are adopted in this study. What are the other feasible alternatives? What are the advantages of adopting these data-driven methods over others in this case? How will this affect the results? More details should be furnished.

8.     p.2 - Arno River at Subbiano and Sieve River at Fornacina are adopted as the case studies. What are other feasible alternatives? What are the advantages of adopting these case studies over others in this case? How will this affect the results? The authors should provide more details on this.

9.     p.2 - historical records of 1992 to 2014 are taken. Why are more recent data not included in the study? Is there any difficulty in obtaining more recent data? Are there any changes to situation in recent years? What are its effects on the result?

10.  p.3 - “…The data of Arno River and Sieve River case studies were split (without shuffling) into train and test tests.…” That means no cross-validation is made. How can the problem of overfitting be resolved then? More justification should be furnished on this.

11.  p.4 - three functions are adopted as f in Equation (1). What are the other feasible alternatives? What are the advantages of adopting these functions over others in this case? How will this affect the results? More details should be furnished.

12.  p.4 - four options are adopted regarding the vector x. What are the other feasible alternatives? What are the advantages of adopting these options over others in this case? How will this affect the results? More details should be furnished.

13.  p.4 - z-score normalisation is adopted in KNN. What are other feasible alternatives? What are the advantages of adopting this approach over others in this case? How will this affect the results? The authors should provide more details on this.

14.  p.4 - the mlpack tool was run with the option of using dual tree is adopted for obtaining the nearest neighbours. What are other feasible alternatives? What are the advantages of adopting this approach over others in this case? How will this affect the results? The authors should provide more details on this.

15.  p.4 - Bayesian inference is adopted to estimate the probability at the right hand of Equation (2). What are other feasible alternatives? What are the advantages of adopting this approach over others in this case? How will this affect the results? The authors should provide more details on this.

16.  p.5 – Eq.(4) suggested by Koutsoyiannis and Montanari is adopted as an approximative estimation of the conditional distribution. What are other feasible alternatives? What are the advantages of adopting this approach over others in this case? How will this affect the results? The authors should provide more details on this.

17.  p.6 - “…Significant differences between the plots of the two models appear in the region of very high flows. These differences are related to the.…” More justification should be furnished on this issue.

18.  p.9 - “…very large Euclidean distance because the difference Qt–1 – Qt fluctuates strongly when passing from the rising to the falling limb (Figure 8b). This is probably the reason the.…” More justification should be furnished on this issue.

19.  p.9 - “…It should be noted that this may be happening just because the.…” More justification should be furnished on this issue.

20.  p.10 - “…Fq|Q(q|Q) at high and low Q by KNN because of an unbalanced number of values below and above Q. This bias is the reason that.…” More justification should be furnished on this issue.

21.  Some key model parameters are not mentioned. The rationale on the choice of the set of parameters should be explained with more details. Have the authors experimented with other sets of values? What are the sensitivities of these parameters on the results?

22.  Some assumptions are stated in various sections. Justifications should be provided on these assumptions. Evaluation on how they will affect the results should be made.

23.  The discussion section in the present form is relatively weak and should be strengthened with more details and justifications.

24.  Moreover, the manuscript could be substantially improved by relying and citing more on recent literature about contemporary real-life case studies of soft computing techniques and/or uncertainty in hydrology such as the following. Discussions about result comparison and/or incorporation of those concepts in your works are encouraged:

l   Sharafati, A., et al., “A strategy to assess the uncertainty of a climate change impact on extreme hydrological events in the semi-arid Dehbar catchment in Iran,” Theoretical and Applied Climatology 139 (1-2): 389-402 2020.

 

l   Ehteram, M., et al., “Reservoir operation based on evolutionary algorithms and multi-criteria decision-making under climate change and uncertainty,” Journal of Hydroinformatics 20 (2): 332-355 2018.

l   Zhao, C.P., et al., “Drought Monitoring of Southwestern China Using Insufficient GRACE Data for the Long-term Mean Reference Frame under Global Change,” Journal of Climate 31 (17): 6897-6911 2018.

25.  Some inconsistencies and minor errors that needed attention are:

l   Replace “…into train and testing tests…” with “…into training and testing tests…” in the third column of line 91 in p.3

l   Replace “…in the train set to…” with “…in the training set to…” in line113 of p.3

26.  In the conclusion section, the limitations of this study, suggested improvements of this work and future directions should be highlighted.

Author Response

Please find our reply in the attached pdf file.

Author Response File: Author Response.pdf

Reviewer 2 Report

In this work, the uncertainty in the hydrological modelling was examined, having in mind that the uncertainty has multiple sources including the measurement errors of the stresses (the model inputs), the measurement errors of the hydrological process of interest (the observations against which the model is calibrated).

Thematically the work is interesting for the researchers and professionals and the proposed manuscript is relevant to the scope of the journal.

I found it appropriate for publication in the Hydrology journal, but only after some modifications and clarification from the Authors.

The title is a clear representation of the manuscript's content. The abstract reflects realistically the substance of the work. It covers the research context and background, motivation, hypothesis, method, main results, and conclusion, underlining the implications of the main findings.

The overall organization and structure of the manuscript are appropriate. The paper is well written and the topic is appropriate for the journal.

The aim of the paper is well described and the discussion was well approached, its results and discussion are correlated to the cited literature data.

The literature review is comprehensive and properly done.

The novelty of the work must be more clearly demonstrated.

The significance of the Work: Given the large number of analyzed data, this is an interesting study with a possible significant impact in this area.

Statistical interpretation of the analytical data must be more properly presented. The verification of the model should be performed.

Other Specific Comments: The work is properly presented in terms of the language. The work presented here is very interesting and well done, it is presented in a compact manner.

Some specific comments include the following:

1. please specify the software used in machine learning investigation

2. please include more details of the software settings in order to perform the analysis (it should be more informative)

3. In conclusion section, please add more quantitative conclusions.

Author Response

Please find our reply in the attached pdf file.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

 

The revised paper has addressed all my previous comments, and I suggest to ACCEPT the paper as it is now.

Back to TopTop