Flash Flood Forecasting Based on Long Short-Term Memory Networks
Round 1
Reviewer 1 Report
Please see the enclosed file
Comments for author File: Comments.pdf
Author Response
We truly appreciate the insightful comments and suggestions from you, which have helped us to improve this paper significantly. Our detailed responseds are presented point by point in the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
In general the paper is well written and informative, and could be interesting for researchers working in the field of flood forecasting and hydrological modelling. However, I would suggest some moderate changes before publishing. Especially the topic of uncertainty is not covered at all. Predictive uncertainty is essential in flood forecasting and has to be taken into consideration. On the one hand the uncertainties from precipitation forecasts will increase with lead-times, especially in small mountainous catchments and also the LSTM modelling approach itself, using such a small sample data set, will show a big variability. Both aspects have not been mentioned in the paper.
Some more detailed comments:
In the abstract the qualified rates are mentioned. I don't know if this term is well known outside China. Although it is explained later, but it is difficult to understand what it's meaning in the abstract.
In this paper the importance of having a precise precipitation forecast has been not mentioned at all. It is written on page 6 line 181 that 10 hours short term precipitation forecast information is considered. Which kind of forecast (NWP model) has been used? Looking at the results I suppose that the observed precipitation measurements are taken as a perfect forecast? Otherwise I would suppose that the quality of the forecast would decrease more rapidly.
Disadvantages of neural network approaches are not mentioned:
Huge number of parameters and the hidden layers are difficult to interpret! It is not possible to extrapolate: Especially if extreme events are happening in the forecast period, which have not been observed during the training period. This will degrade the usage of Neural Networks and LSTM for forecasting purposes.
A benchmark model is missing. It is difficult to estimate the quality, if there is nothing to compare. I would image a simple AR(1) model could give similar results for the first lead time?
Parameters are not explained: units, batches epochs, which are essential for the LSTM model, but are not well known to the hydrological community.
There are different options available instead of try and error like dropout rate for circumventing overfitting (Step 5 on page 4), which are not mentioned.
Regarding NSE:
Since the qualified rate is defined by the number of qualified flood events (NSE >0.7) divided by the total number of events, you have to calculate one NSE value for each event, which is a bit strange for me. Maybe I misunderstand it, but in that case you have to define subjectively the beginning and the end of an event, which will have a great impact on the single NSE. Could you explain that please?
When you calculate the QR for all criteria, how do you that? It does not look like you take the averages, Figures 5-8, so what else did you do?
The sample size is quite small for the application of deep learning methods. The only thing you show is that the calibration with 19 events alone will decrease the quality (although the description of this sample size problem is rather unclear for me, Fig 8 and page 10. In that case I could imagine that the stochastic initialization of the weights will result in different results each time you calibrate them. Could you give some indication of the uncertainties?
Definitions: On page 7 you write that you use one LSTM layer with 5 units. On page 13 you mention that the number of memory cells is 5. So memory cells = units? Information of this number of memory cells and an hydrological explanation could be helpful! For me the visual interpretation of the memory cells (Figure 11) is difficult and I can hardly see what you write on page 13 (the weight matrixes are exactly alike for all gates?)
Why are validation and test period separated ? What is the purpose of the validation period?
On page 13 you write: 5*(9+5). So I suppose that the number 5 before the brackets refers to the memory cells and 9 is input + 5 is hidden state)? But the hidden state is not mentioned.
Generalization: What do you mean? From Training to Test period
What do you mean with the error distribution is close (page 9)
Author Response
We truly appreciate the insightful comments and suggestions from you, which have helped us to improve this paper significantly. Our detailed responses are presented point by point in the attachment.
Author Response File: Author Response.docx