4.2.1. PMI-Based Input Variable Selection
In order to test the approach that combines the hydrodynamic model and ANNs, a performance comparison of the hydrodynamic model, ANNs, and hybrid model (hydrodynamic model with ANN-based error correction model) was carried out. An ANN model, labeled here as ANN1, was developed to forecast water levels at specific locations using historical observations as inputs. The considered input candidates and output of ANN1 are shown in
Figure 7. Furthermore, an ANN model, denoted as ANN2, was constructed for water level forecasting with input candidates including not only the inputs used in ANN1 but also prescribed boundary conditions at
,
, …,
that were required for the hydrodynamic forecast model (
Figure 8). Finally, an ANN for error correction of the hydrodynamic model, labeled as ANN3, was developed with candidate inputs including the predicted boundary conditions of the hydrodynamic model, historical observations, calculated water levels by the hydrodynamic model, and recorded errors of the hydrodynamic model (
Figure 9). In
Figure 7,
Figure 8 and
Figure 9,
and
represent the observed and prescribed discharges at the Paldang Dam, respectively;
and
are the observed and prescribed tributary inflows, respectively;
and
are the observed and prescribed water levels at the Junryu gauge station;
is the observed water level at gauging stations; HM_
is the computed water level by the hydrodynamic model;
is the time derivative of the observed water level; HM_
is the time derivative of the computed water level by the hydrodynamic model;
is the forecasted water level at the
k-th station; and
is the errors of the hydrodynamic model at the
k-th station.
For the considered ANNs, the number of potential inputs can be quite large, as shown in
Figure 7,
Figure 8 and
Figure 9; however, given their correlated nature, many may be redundant. The travel time of flow from upstream boundary (Paldang Dam) to downstream boundary (Junryu gauging station) approximately corresponds to 1~2 h for most historical flood events. However, it is difficult to precisely measure the travel time from the upstream boundary to the downstream boundary owing to the tidal effect. Therefore, lagged variables from current time
t to the previous 5 h (the maximum lag time
d) were assumed as sufficient in the study.
The derivative of water levels can be used to describe the flow situation. For example, a zero or low derivative indicates normal or base flow while a high positive derivative indicates the rising limb of a flood event, and a high negative derivative indicates the recession limb of a flood event. Therefore, the rate of flow changed and its lagged variables were also considered as possible input variables in ANN models.
From the input candidates, input variables for ANN models were selected by applying the PMI technique. Data of 12 flood events (
Table 1) were used for input variable selection using PMI. Based on the AIC criterion, input sets that satisfy
were selected. The selected input variables for ANNs at different lead times and at different sites are listed in
Table 2,
Table 3 and
Table 4. In the tables, subscribes P, B, H, and W represent the Paldang Bridge, Banpo Bridge, Hangang Bridge, and Wangsook Stream, respectively.
Selected input variables for ANN1 show that water level forecasting at the Paldang Bridge is mainly determined by observed water levels at the Paldang Bridge, and discharges at the Paldang Dam. The inflow from the Wangsook Stream also affects the water level forecasting as the lead time increases (
Table 2).
Table 2 shows that 1-h lead time forecasting at the Banpo Bridge and Hangang Bridge is related to the observed water levels at four gauge stations, i.e., Paldang Bridge, Banpo Bridge, Hangang Bridge, and Junryu gauge station.
Table 3 shows the selected input variables for ANN2 which considers not only historical observations but also the predicted boundary conditions used in the hydrodynamic model. The selected variables show that water level forecasting at the Paldang Bridge is determined by the latest observations at the Paldang Bridge and predicted discharges at the Paldang Dam. For the 1-h lead time forecasting at the Banpo Bridge and Hangang Bridge, observations at four gauge stations are selected as input variables. As the lead time increases, water levels at the Banpo Bridge and Hangang Bridge are forecasted by including additional predicted data as inputs such as water levels at the Junryu station and discharges from the Paldang Dam and Wangsook Stream.
Table 4 shows that all the selected input sets for the error correction models (i.e., ANN3) at different locations and different lead times include
, suggesting that
significantly contributes to the forecasting of simulation errors. Several studies indicated that
includes most of the information on
(e.g., [
7]). Therefore, a good error prediction can be obtained by considering the
as the only input variable of the model. However, Torres Rua [
61] performed internal tests and indicated that the performance of the single variable is less robust than the variable combination for an error correction model.
Table 4 shows that 1-h lead time forecasting of errors for the Paldang Bridge is mainly related to
, while 1-h lead time forecasting of errors for the Banpo Bridge or Hangang Bridge is determined by the latest errors, time derivatives of forecasted water levels at their own locations, and the latest observed water levels at the Junryu gauge station.
Table 4 also shows that relatively high numbers of input variables were selected for the 3-h lead time forecasting of errors at the Banpo Bridge and Hangang Bridge, which shows that the error forecasts generated by an ANN model are somewhat inexplicable, and this is a shortcoming of the ANN modeling approach. May et al. [
55] stated that the lack of interpretability is not surprising because the PMI variable selection method for the ANN development is somewhat holistic and it does not consider the contribution of individual input variables to the model.
4.2.2. Application of ANN Models
The data from six flood events that were used for the calibration of the hydrodynamic model were used for the ANN training, and six events that were used for the validation of the hydrodynamic model were used for the ANN validation (
Table 1). Therefore, approximately 55% and 45% of the data were used for training and validation, respectively.
ANNs for water level forecasting at different sites were constructed based on the selected input variables and the number of hidden neurons, as determined by trial and error. The appropriate number of hidden neurons was determined by testing the training data set with hidden nodes ranging from 1 to 10. Therefore, an ANN was optimized by selecting the optimal number of hidden neurons given the selected inputs. The ANN that exhibited optimal performance in the validation was selected as the optimal network. The network architectures for different ANNs are listed in
Table 5. Given an ANN architecture, the weights and biases of the network were determined in the model training.
Figure 10,
Figure 11 and
Figure 12 show the observed and predicted stage hydrographs at three sites at different lead times for flood event 2. As shown in the figures, the combination of the hydrodynamic model and ANNs provides a highly accurate prediction of flood water levels and adequately captures the behavior of water flow. Even with respect to the 2-h or 3-h lead time forecasting, the hybrid approach results in an estimation that agrees well with the observations.
Two statistics, i.e., the Nash–Sutcliffe efficiency coefficient (NSE) and RMSE, were used to evaluate the accuracy of different forecasting models. The values of two statistics for training periods are summarized in
Table 6, and the best results for each location according to each performance metric are highlighted in bold. NSE and RMSE indicate that the hydrodynamic model alone has relatively low accuracy compared to other forecasting models. Forecasting the accuracy of ANN1 decreases with the longer lead time of forecasting, which is naturally expected as ANN1 only uses the historical observations to forecast water levels. Forecasts by the hybrid approach are most accurate at the Paldang Bridge, while ANN2 also shows good performance at the Banpo Bridge and Hangang Bridge.
The validation results are shown in
Table 7 and the best results are also highlighted in bold. It is immediately obvious from the table that the hybrid approach is the most accurate based on the RMSE and NSE when applied to all three locations, and the accuracy reduces with the increasing lead time of forecasting. ANN2 showed better accuracy in the model training at the Banpo Bridge and Hangang Bridge, while it shows worse performance than the hybrid approach in the model validation. The prediction accuracy of ANN1 decreases as lead time increases, and it performs worse than the hydrodynamic model for 3-h lead time forecasting at the Hangang Bridge. The reduction in the simulation errors of the hydrodynamic model is evident after error correction by ANNs, which implies that coupling a hydrodynamic model and an ANN (as opposed to only using a hydrodynamic model or an ANN) results in highly accurate predictions of water levels. For 1-h lead time forecasting, the hybrid approach achieved 70%, 80%, and 76% reduction in the RMSE at the Paldang Bridge, Banpo Bridge, and Hangang Bridge, respectively, when compared with the hydrodynamic model. In addition, the reduction in the RMSE by applying the hybrid approach are 18%, 13%, and 33% at 3 locations when compared with the ANN2, which indicates the better performance of a hybrid model compared to an ANN model. For the 2-h lead time forecasting, the hybrid model achieved 65%, 69%, and 65% reduction in the RMSE at the Paldang, Banpo and Hangang Bridges, respectively, when compared with the hydrodynamic model, and the reduction is 26%, 16%, and 33% when compared to the ANN2. ANN1 shows less accuracy than the hybrid model and ANN2. The NSE for the hybrid model is over 0.99 when the model is applied for the 1-h or 2-h forecasting at all locations, and it is over 0.98 when it is applied to the 3-h forecasting. The results of this research indicate that the hybrid model performs better than a data-driven model (i.e., ANN1 or ANN2), and a data-driven model has better performances when compared to the physically based model (i.e., hydrodynamic model), which is consistent with the research by Cho and Kim [
18].