Next Article in Journal
Reconstruction of a Long-Term, Reach-Scale Sediment Budget Using Lateral Channel Movement Data as a Proxy: A Case Study on the Lowland Section of the Tisza River, Hungary
Previous Article in Journal
Use of Soil Moisture as an Indicator of Climate Change in the SUPer System
 
 
Article
Peer-Review Record

Enhancing Monthly Streamflow Prediction Using Meteorological Factors and Machine Learning Models in the Upper Colorado River Basin

by Saichand Thota 1, Ayman Nassar 2,3, Soukaina Filali Boubrahimi 1,*, Shah Muhammad Hamdi 1 and Pouya Hosseinzadeh 1
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 26 February 2024 / Revised: 25 April 2024 / Accepted: 27 April 2024 / Published: 1 May 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Please see comments in the pdf file attached.

Comments for author File: Comments.pdf

Author Response

Manuscript # (2911618): Enhancing Streamflow Prediction Using Meteorological Factors and Machine Learning Models in the Upper Colorado River Basin

Responses to the issues raised by the reviewer 1

The authors would like to express sincere gratitude to the anonymous reviewer for their helpful and constructive comments. We also appreciate the reviewer’s insightful observations. In this document, the following sections present the responses of the authors to the issues raised by the reviewer.

 

– use of too many abbreviations, I would avoid using without explaining them
Done!

 

space missing

Done!

 

there is no uniformization of the word in the paper. please revise stream flow vs streamflow

Done! We have updated the manuscript to consistently use 'streamflow' throughout.

 

extra space
Done!

 

wrong format, superscript

Done!

 

supercript

Done!

 

3300 m 900 m

Done!

 

superscript

Done!

 

superscript

Done!

 

I believe only the right figure is needed, the left one does not bring anything new.

Correct! We have combined both figures.

 

Figure 4 can completly replace figure 3, it has all the information needed rendering figure 3 with no new information.

Done! We have removed Figure 3 and kept Figure 4

 

Also, Latitude and longitude are missing the degree symbol and N and E

Done!

 

it would be interesting to know the threslhold of the %of missing values that would be disregarded from the analysis

Done! We added explanation regarding the missing values. After monthly data aggregation, no missing values were found for temperature, precipitation, and snow water equivalent. However, snow depth still lacks 24%, and snow density, is 47.5%. Hence, excluding these two attributes from the analysis is justified.

 

it is hard to state that the temperature is increasing, moving average of 12 or 24 months for each parameter in the same chart would be more interesting denoting thus the increase of temperature and eventually the decrease of accumulated precipitation

Done! Please refer to Figures 4 and 5 in current version of the manuscript.

 

Both Figure 5 and 6 are missing units of each parameter. Also index = months?

Done! Index has been changed to “Year”.

 

I believe the authors should develop more the captions of all the figures, they are too simple.

Done! Captions of Figures 1, 3, 4, 5, 6, and 7 have been extended.

 

I cannot understand these figures without units,what values are on the left? maybe create a second axis on the right, where flow is represented and the left would be for each parameter respectively.

Done! Please refer to Figure 6 in the current version of the manuscript.

 

also define "x"

Done!

 

remove?!

Removed!

 

insert space

Done!

 

there is a parenthesis missing in the end of equation 3

Done!

 

what is the difference? not described

$\bar{C(k)}$ is memory cell candidate. A memory cell candidate refers to the information that is proposed to be stored in the memory cell during the processing of input data. Please refer to the paragraph highlighted in yellow in the text.

 

what is "i"?

We have already mention this in the manuscript where we state equation (4) is defined as input gate.

 

use same terminology (italic) or symbol

Done!

 

dont understand this.

The optimizer is a key element in training neural networks. It adjusts the model’s parameters to minimize the loss function, improving overall performance. We have added appropriate explanations to the manuscript to clarify this.

 

What is "g"?

Thank you for your keen observation on this. “g” was a typo, We changed it to “h”.

 

where is "d" and "D" in  the equation? equation is incomplete,

We have added explanation about this. The values "d" and "D" are 0 for our SARIMA model. Setting, d = D = 0 means that no differencing is applied to the time series data, indicating that the data is already stationary and does not require differencing to make it stationary.

 

 

where and what is "s"?

We meant “m“. We have fixed this.

 

Months

Done!

 

Months

Done!

 

where are they? maybe create a supplementary material table with these combinations.

Thank you for your insightful comment. While the combinations are visually represented in the heatmaps of Figures 19 and 21, we acknowledge the importance of providing a detailed breakdown of the 15 combinations for clarity in this section. To address this, we have included a thorough explanation of these combinations within the manuscript. We opted against creating an additional table or figure to avoid overwhelming the reader with an excess of visual aids, considering the considerable number of figures already present in the manuscript. Please refer to the highlighted text in yellow.

 

for better readability I would advise to discuss both range and density of each model individually, instead of range for all models and density for all models.

Done! We have added explanation to the manuscript.

 

it is less than 2 but does it reach 6?

It is approximately 6 as shown in the Figure. We have added the word “approximately” to make the statement precise.

 

accordign to figure 12 it is clearly above 8

Same thing here! We have added the word “above” to be more precise.

 

all models show this. I believe the authors refer to the difference.

We have rephrased the sentence.

 

Since figure 15 is discussed fist maybe change figs order between 14 and 15.

Done!

 

how does the reader knows this? where are the combinations?

According to the paragraph you are mentioning, there is no specific combination used. We are just stating that using multivariate time series data (snow water equivalent, temperature, and precipitation) leads to a reduction in RMSE and enhanced R-squared.

 

it would improve great the analysis of the boxplots if the authors would consider drawing the notchs in the boxplots, thus giving the information just by looking at the plots if there is a statistical difference between models regarding the median values of each model.

Done!

 

units are nmissing.

We believe that there should be no metrics for the measurements like error or accuracy, etc. You may refer to Figure 4, 6, etc. where we have added metrics such as mm, Fahrenheit, etc. in this version of the manuscript for the sake of clarity.

 

again, for greater readability, I would advise to describe one chart at a time and then conclude wich model the authors realize to produce better results and so on.

Done!

 

what do you mean superior? not counting outliers the lower value is higher but it also has the lowest MAE values of them all.

We added more explanation about this in the manuscript. Please refer to the highlighted paragraph in yellow.

 

superscript

Done!

 

again true but overall?

Done!

 

superscript

Done!

 

superscript

Done!

 

are you sure? Figure 14 vs 16 says otherwise. this is the density variation. unless you hcange the text to address each criteria.

since the density variability DECREASED, the results will improve and thus improved results (eventhough slight imrpovement for SARIMA).

We have changed this. Thank you for bringing this up to our attention.

 

units are missing

We believe that there should be no metrics for the measurements like error or accuracy, etc. You may refer to Figure 4, 6, etc. where we have added metrics such as mm, Fahrenheit, etc. in this version of the manuscript for the sake of clarity.

 

a table with these values for all models (except SARIMA) would help readers reach the same conclusion.

Further are these values between the model and the USBR observed values?

We have added a table to address your concerns regarding this.

Regarding the second issue: Yes, the performance of the models will be assessed by comparing their predicted values with the actual streamflow values obtained from the USBR, using evaluation metrics. We have added proper explanation in Data Analysis section.

 

the years seem to but a little cut in the figures.

Done!

 

figure 19 is not mentioned in the main text.

Done!

 

what would be the impact if more than 6years were used in the test sample? would the same input/output sequence be the better choice? myabe compare and discuss other research papers?

Assigning 80% dataset to training is commonly regarded as a best practice in many articles. This allocation ensures an ample amount of data for both modeling and testing. Utilizing a smaller portion of the data for training can lead to less effective models. We have added this to the manuscript.

 

superscript

Done!

 

why not consider only 1 figure with all heatmaps? The paper already has to many figures

Done! We have merged those Figures.

 

uniformization is very important in research papers, either consider using R-squared or R2

Done!

 

x axis should show months not year (same 2018 or 2019)

Done!

 

i believe this all section should be in the methodology or before results, only now you show what the combinations are eventhough not 15 as mentioned in the text previously. or am i mistaken?

Thank you for bringing this up. With further investigation, we figure out that the title of this subsection may mislead the readers. We have changed this to “Ablation Analysis” since this is the results of analyzing the impact of different features on the predictions. Therefore, we ask you to let us keep this subsection here as it is with further change in title. This should be a useful result after the main experimental results.

 

not described

Done! We have already addressed this.

 

way to many figures, consider put some in supplementary data material including this one.

We have removed some Figures and combined some others, resulting in 24 Figures in thie current version of the manuscript.

 

this is not part of the discussion of your paper, shoudl be in hte introduction.

Done!

 

which model is this? first time mentioned show complete name.

Done! Added!

 

could it be because it is a bigger basin? or because snowfall plays an important role in determining streamflow? this type of discussion would greatly improve the paper.

Yes! We have added explanations in the Discussion section.

 

this is a good example of  discussion

Thank you!

 

this seems out of place. consider reordering your paper structure.

We have rephrased the first sentences and paraphrased some parts of the paragraph to make sure we have a clear text there.

 

conlcusions

Done!

 

conclusions should be one maybe 2 small paragraphs summzaring the results not discussion of results.

Done!

 

USBR

Done!

 

USBR

Done!

 

this is not needed here, ony in the discussion

Done!

 

what subsection? this is very confusing

Done!

 

 

 

 

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The axis ticks from figures 18,19, 21 and 25 should include the trimester to whom are related.

 

Despite the considerable effort made to explain the three chosen methods, there is a need for a stronger justification for selecting those methods over the rest of the available methods (for example, Random Forest, CatBoost, XGBoost, and LSTM networks). They have added in the conclusions that they will consider more options in future work, but I believe they should include a better justification for the chosen methods.

Author Response

Manuscript # (2911618): Enhancing Streamflow Prediction Using Meteorological Factors and Machine Learning Models in the Upper Colorado River Basin

Responses to the issues raised by the reviewer 2

The authors would like to express sincere gratitude to the anonymous reviewer for their helpful and constructive comments. We also appreciate the reviewer’s insightful observations. In this document, the following sections present the responses of the authors to the issues raised by the reviewer.

 

 The axis ticks from figures 18,19, 21 and 25 should include the trimester to whom are related.
Done! We have fixed those issues. Please refer to the current version of the manuscript.

 

Despite the considerable effort made to explain the three chosen methods, there is a need for a stronger justification for selecting those methods over the rest of the available methods (for example, Random Forest, CatBoost, XGBoost, and LSTM networks). They have added in the conclusions that they will consider more options in future work, but I believe they should include a better justification for the chosen methods.

We have rephrased that paragraph. We indeed opted to use Random Forest Regression, LSTM, GRU, and SARIMA in our work. However, we plan to extend our analysis using graph-structured models like Graph Neural Networks (GNNs) in the future. These graph-based models are different than the ones we used in this work and needs much explanations and thorough analysis in depth.

 

 

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

The paper deals with testing different models and different modalities of their use to predict streamflows in the upper Colorado river basin.

I report below some general comments, while in the attached PDF the authors can find more detailed comments.

- The introduction would benefit from a more rigorous structure, aiming to avoid repetitions which are currently prevalent in the paper. It is also not fully clear what distinguishes this paper from others where similar analyses have been conducted. In addition, literature references for the models used should be provided.

- The current number of figures could be reduced, as some of them may not be necessary for conveying essential information. The quality of the figures could also be enhanced with more attention to detail.

- The method descriptions could be clearer and more accurately explained to ensure they are understandable even to non-experts in machine learning.

- I believe the presentation of the results could be streamlined to enhance readability, making it easier for the reader to follow. Furthermore, some conclusions drawn from the results may benefit from stronger alignment with the data obtained.

In conclusion, I believe that the paper, in its current state, may require some more work before it would be suitable for publication.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

The paper could greatly benefit from a thorough language revision. Some sections appear to be unclearly structured and, in my opinion, certain terms used by the authors, such as 'diligently,' may not be the most suitable for scientific publications. Additional detailed comments and suggestions are provided in the attached PDF.

Author Response

Manuscript # (2911618): Enhancing Streamflow Prediction Using Meteorological Factors and Machine Learning Models in the Upper Colorado River Basin

Responses to the issues raised by the reviewer 3

The authors would like to express sincere gratitude to the anonymous reviewer for their helpful and constructive comments. We also appreciate the reviewer’s insightful observations. In this document, the following sections present the responses of the authors to the issues raised by the reviewer.

 

The paper deals with testing different models and different modalities of their use to predict streamflows in the upper Colorado river basin. I report below some general comments, while in the attached PDF the authors can find more detailed comments. The introduction would benefit from a more rigorous structure, aiming to avoid repetitions which are currently prevalent in the paper. It is also not fully clear what distinguishes this paper from others where similar analyses have been conducted. In addition. literature references for the models used should be provided. The current number of figures could be reduced, as some of them may not be necessary for conveying essential information. The quality of the figures could also be enhanced with more attention to detail. The method descriptions could be clearer and more accurately explained to ensure they are understandable even to non-experts in machine learning. I believe the presentation of the results could be streamlined to enhance readability, making it easier for the reader to follow. Furthermore, some conclusions drawn from the results may benefit from stronger alignment with the data obtained. In conclusion. I believe that the paper, in its current state, may require some more work before it would be suitable for publication.
Thank you for your feedback. We respond to your comments in the following section.

 Please do not use acronyms in the abstract
Done!

 

Please specify what Lees Ferry is. Is it a location?

Done! We made it more clear in the Abstract.

 

involves

Changed to “they involve”

 

involves

Changed to “they involve”

 

and

Removed!

 

involves

Changed to “they involve”

 

streamflow levels. 4

“and” added.

 

Please provide a reference for this model

Done!

 

This sentence is not very clear. Please clarify

Sentence has been rephrased.

 

This sentence could be rephrased for better clarity

Rephrased!

 

Please provide references for these models

Added

 

The way this reference is introduced is a bit general

Changed!

 

The way this reference is introduced is a bit general

Changed!

 

Please rephrase this sentence

Done!

 

Please provide a reference

Added!

 

Please provide a reference

Added!

 

This general description of ML models should be moved earlier in the paper

Done!

 

Please rephrase this sentence

Done!

 

The authors are referring to more than one study, but only one rerefence is provided at the end of the sentence. Please be more precise

Two more references added!

 

Expansive

Synonym added.

 

I feel that in its current form, this figure doesn't add very much to the content already present in the text

We would like to kindly request that you reconsider the importance of Figure 1 in the context of our study. While we acknowledge that the information presented in the figure is discussed in the text, we believe that the visual representation provided by the figure offers a quick and intuitive overview for readers. In our view, this visualization serves as a valuable tool for enhancing the accessibility and understanding of the features (climate variables) utilized in our study.
We are open to remove this if you still believe we should do so.

 

This part would look better in the introduction section

Moved!

 

the utilization of

Removed!

 

Please use a synonym

Done!

 

Please use a synonym

Done!

 

Please consider using a different color instead of yellow for better visibility in this figure. Furthermore, some reference background could be included.

Done! We have used green color instead of yellow. We have also added some background and geographic locations.

 

Please use a synonym

Done!

 

Please use a synonym

Done!

 

to provide

Changed to “provides”

 

In my opinion this description is a bit overdone, Figure 4 depicts a very commonly used representation

Rephrased to: “Figure 3 provides a clear visual representation of the distribution of SNOTEL sensors within the UCRB and their relation to the USGS monitoring gauges”.

 

Please replace the x-axis with months

We added “Year” instead of “index” for Figures 4 and 5 (current version).

 

This statement appears a bit harsh and lacks supporting scientific evidence

Rephrased!

 

Why did the authors choose to use Pearson correlation?

We believe Pearson correlation is an appropriate choice for our analysis given the nature of our data and research question. Both streamflow and the meteorological data (temperature, precipitation, and snow water equivalent) in our study are continuous variables. Pearson correlation is specifically designed to measure the linear association between two continuous variables.

 

In previous sections the acronym RF was used to refer to the random forest model. Please be consistent

Done!

 

Could you please provide the reasons for that?

SARIMA model by its nature requires the entire dataset to estimate their parameters. Therefore, in SARIMA modeling, the entire dataset was utilized for training the model instead of using sequences. We have added explanations regarding this to the manuscript.

 

Is this a typo?

Yes, it was a typo! We have removed it.

 

Please replace with a dot

Done!

 

In my opinion, this section belongs more in the results rather than the methods

We removed that sentence.

 

What are sigmoid layers?

We have defined the Sigmoid layers. Please refer to the highlighted text in yellow.

 

What is the Rectified Linear Unit? The methods should be presented clearly enough for non-experts in machine learning to understand

We have explained this in the new version of the manuscript.

 

Can you clarify what is the batch size?

We have explained the batch size. Please see the highlighted text.

 

It is not very clear what the authors mean

Explanation regarding the Adam optimizer has been added.

 

And what about the equation 13?

We have explained this in the current version of the manuscript.

 

Please provide some motivations for that

Added!

 

Please correct

Done!

 

I believe this figure is not necessary since it repeats information already covered in the text

Deleted!

 

This aspect has already been extensively covered in the introduction, so it may not be necessary to repeat it

Deleted!

 

The authors are right, but the improvement in the R2 value is only by a few percentage points

The median R-squared plot comparing univariate and multivariate models demonstrated that all models performed better in terms of R-squared values when meteorological factors were included in the input sequence, Figure 13. Despite the fact that this difference is not significant, it still reflects a good improvement.

 

Please add the indication of months in the x axis

“Date” (month and year) added.

 

Please add the indication of months in the x axis

“Date” (month and year) added.

 

please use a synonym

Synonym added.

 

Please add the indication of months in the x axis

“Date” (month and year) added.

 

Please add the indication of months in the x axis

“Date” (month and year) added.

 

 

 

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I agree with all the revisions.

Author Response

The authors would like to express sincere gratitude to the reviewer for their helpful and constructive comments.

Reviewer 3 Report

Comments and Suggestions for Authors

I thank the authors for this new version of the paper. I first point out that during resubmission an additional author was included that was not in the previous version. I am not sure whether this is allowed, but I would defer to the editor the final decision regarding this aspect.

In any case, I thank the authors for incorporating the suggestions provided. The work has been improved. In the attached PDF I have indicated additional changes that I believe are necessary before the paper is published. 

 

Comments for author File: Comments.pdf

Comments on the Quality of English Language

In my opinion, the language requires further minor adjustments. Please refer to the attached PDF for some suggestions.

Author Response

The authors would like to express sincere gratitude to the reviewer for their helpful and constructive comments. We also appreciate the Editor’s insightful observations. In this document.

We have already addressed the issues raised by the reviewers in our second-round submission. Please find the responses and the tracked manuscript.

Author Response File: Author Response.pdf

Back to TopTop