Machine Learning Reveals a Significant Shift in Water Regime Types Due to Projected Climate Change
Round 1
Reviewer 1 Report
The article by Ayzel presents an interesting and somewhat original approach to predicting water regime on a regional basis in western Russia by employing a Random Forest model. This machine-learning appoach is not entirely unpredented in water resource forecasting, but does seem to be a novel application to Russia and the data used in this article specifically. Both the approach and the results are interesting, and I recommend publication. I found the article to be well written and describes the problem being addressed quite well. English grammar and spelling were acceptable for publication. The use of only open-source software to constuct the models is good news for those who seek to replicate this method in other areas for water resource prediction.
I have only two minor quibbles. First is that the monthly runoff data are adequate to provide to the Random Forest (lines 138-140). It is not necessarily true that a Random Forest with self-organize the data to extract all of the most meaningful patterns in the data. This implict extraction of means, deviations, and other measures may not be happening in fact. On the other hand, it may be the monthly runoff means are all the data that is necessary to have broadly convergent models under different GCM scenarios. I appreciate this simple approach as it seems adequate based on Table4.
Second issue is that there is no formal accuracy assessment provided. I may have missed a statement about the accuracy of the output from the training dataset in replicating the historical water regimes. Figure 6 shows the sensitivity of the model and the input to changes in the bounding conditions (changing GCM). This does not suggest that the predictions are correct or accurate, but rather that the model can replicate its own results. A more formal accuracy assessment based on a true set of in-situ testing data ("ground truth") would provide readers with much greater confidence in the assertions of the conclusions of this article.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
This study explored RF for future water regime prediction through predicting time-series runoff from the GCM models. A few things are not clear to me and hope the author can clarify:
- Random Forest is simply a classification algorithm. It can of course classify based on whatever inputs you provide. The accuracy of the future water regime classification totally relies on the accuracy of projected future runoff information. The title of this manuscript is "Machine learning reveals a significant shift in water regime
types due to projected climate change", which gave me an impression that the paper is to strengthen the importance of ML (RF) in the prediction. Maybe consider modifying the title. - The author used historical data from 1979 to 1991 for model calibration, and then suddenly jumped to predict a future time period for 2087 - 2099. There is a huge gap between 1992 and 2087. Why wasn't a validation work done using the most recent or current data, e.g., 2000-2020? Are such data not available? The results can not convince me the prediction is reliable.
- From the perspective of visualization, first, the produced water regime maps are low-quality. Second, why are those maps made up of "dots", instead of smooth pixels (grids), in that the maps were derived from continuous grid data?
- Refence [49] is an unnecessary self-citation. It could be better to cite someone else's similar work to demonstrate Random Forest.
[49] Ayzel, G. Random Forest-based model for water regime type prediction in the northwest of the European part of Russia, 2021.
last access: 25 June 2020, doi:10.5281/zenodo.4966175.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
The author presents an interesting paper that links modern flow regime types to future changes in hydrology in northwest Russia. The author uses machine learning to train a model to relate flow regime types and hydrograph data, then uses this model and outputs from CMIP models of future climate states to map back from modelled hydrograph data to predicted changes in flow regimes. The methods are novel and the paper presents interesting and relevant results.
I think it's a solid paper and worthy of publication following some revisions. I have listed the paper as requiring major revisions since I believe there is a key part of the thesis that is missing. I have also included some suggestions for how the readability of the paper could be improved.
The machine learning model is trained using the water regime types as the target variable with the hydrographs as the training data, however the paper doesn’t establish that the hydrographs are a valid training dataset for the water regime types, and Figure 4 further calls this assumption into question. I struggled to see the ‘distinctive differences’ (lines 182/3) and ‘noticeable differences’ (line 188) between the water regime types in figure 4 referred to in the text. Looking at Figure 4, I see essentially the same annual pattern in relative runoff across most of the water regime types. I think adding some shading or visual cues to the plot may help highlight the differences the author is trying to establish. Despite this, I struggle to see differences between types 3 and 15, which is a key point of the paper. The figure shows a high degree of variability within each water regime type, and this variability is strongly overlapping and a function of the number of grid cells in each regime. The relationship between the water regime types and the hydrographs needs to be explored and established before it can be used as a predictor for CMIP model outputs. I see this as a key uncertainty in the thesis of the paper that needs to be addressed in the revised paper.
Suggestions for improvement:
I found it difficult to follow the author’s arguments with the water regime types referred to as numbers. This puts the onus back on the reader to translate the water regime type number and the description of that type when reading the paper, which disrupts the flow of the paper (particularly through section 3.1). I would suggest using a short description or letter combination as the name for each regime type instead, to give the reader some quick hints as to which regime the author is referring to.
I also noticed that the descriptors the author provide in Table 1 are not the same ones the author use in section 3.1, again adding to the confusion. In table 1 the author makes note of the seasonal water characteristics, but in section 3.1 the author comment on what the flow regimes mean in the real world, e.g., lines 187/188 the author speak about thawing floods as the cause of the water regime properties. I found this more practical description much easier to understand, and would suggest adding these descriptors to table 1.
Section 2.3 lists four GCMs that are used in the paper, but no discussion or citations are provided justifying their use. Why were these four models chosen over the other CMIP models? Have these models been shown to have realistic runoff projections?
Figure 4 is a key figure for the results section, but I found it hard to interpret. Could shading be added to the plots to highlight the seasons the author refers to (e.g., ‘winter’ is a relative term depending where in the world you are). Again, I found the use of water regime type numbers difficult to follow in this figure, and I think having more descriptive titles will help the reader. Can the number of hydrographs plotted in each plot be added to the figure – the author notes later in the text that the number of grid cells for each type is not uniform.
While the paper contains all the relevant information to present the author’s thesis, I found the paper itself a little out of order, which made the arguments hard to follow in places. The description of the runoff data used in section 2.3 describes both the historical period data as well as the data from the RCPs. Can these two datasets be listed as two distinct dot points on new lines to make it clear to the reader that this is where the paper introduces both datasets. The author seems to have started to do this (line 94 has ‘(1)’ but no ‘(2)’), and I think it will help with the readability of the paper.
The paper becomes hard to follow at section 2.4 and 2.5 because the end of section 2.4 discusses methods for resampling and preparing data for machine learning, but this discussion is provided before the overall workflow is introduced. I was able to read the paper here out of order to follow the argument, but I think it needs to be rearranged. My suggestions are (1) moving section 2.5 and the associated figure above section 2.4, or (2) splitting section 2.4 into the justification and description of the model (lines 106-133) and then moving the more detailed discussion in lines 134-155 into or below section 2.5, so that it’s introduced alongside the overall workflow. I found section 2.5 and figure 3 really clear and easy to follow and these in hindsight made section 2.4 make more sense.
I find section 3.3 of the paper onwards really interesting and well presented. The figures are easily interpreted and I find the thesis really compelling. I think the prediction uncertainty is a great addition to the discussion.
Small things to note:
- Line 77: there is a word missing – it should read ‘The proposed study region is in the northwest...’
- Figure 2 caption is not descriptive enough to make it clear what figure 2 is showing. I can see that it’s a decision tree, but what are the decisions based on? I.e. for the first decision, the figure lists ‘<=4.19’ and ‘>4.19’, but doesn't provide any context for these numbers.
- Figure 3 has a typo in the yellow text box in the middle - ‘hydrographs’ is misspelled.
- Section 3.2 has the same heading as section 2.4, which becomes confusing when trying to follow the story of the paper. Section 3.2 seems to be about the classification model results and accuracy - I suggest the heading be changed to reflect this.
- Lines 238-240: provide a citation for the claim that the model is comparable to a trained hydrologist?
- Lines 245-260: While I love seeing pseudocode in papers since I think it adds useful detail, this pseudocode snippet is mostly about importing and defining variables. It could be cut from the paper since I don’t think it provides any information that’s not in the text.
- Table 3: The table caption doesn’t explain what the numbers in the table refer to. I assume it’s the percentage of the total study area under each water regime type?
- The author refers to seasons throughout the paper. Can these be changed to months of the year instead to help with international readability? E.g. winter could be referred to as DJF etc.
- Conclusion: the conclusion refers back to the two research questions in the introduction. It would be helpful to restate them here to save the reader flicking back up to the introduction.
- References 8 and 9 have the author listed strangely
Author Response
Please see the attachment.
Author Response File: Author Response.pdf