**3. Study Area and Datasets**

This research area was the Erren River basin (Figure 2) located in southern Taiwan. The river length is 63.2 km with the upstream up to 460 m above sea level, and the catchment area is 350.4 km2, which is further divided into 10,744 grids (40 m <sup>×</sup> 40 m). In this study, six actual storm events and twelve designed rainfall patterns with various return periods, as well as their correspondent 2D flood simulation results (simulated by the SOBEK software of Deltares), were collected from the Water Resources Agency. A total of 631 hourly datasets were used to configure the ML models. There were also 631 regional flood inundation maps generated by the SOBEK based on the rainfall patterns mentioned above, which were used as the virtual inundation maps for ML models' training and testing. These datasets were further divided into 391 datasets (10 events) for training, 144 datasets (4 events) for validation, and 96 datasets (4 events) for testing (Table 1).

**Figure 2.** Study area of Erren river catchment.




**Table 1.** *Cont*.

1,2 24H800 mm and 24H500 y are the design storm events of 800 mm and 500-year for 24 h rainfall, respectively. <sup>3</sup> 20190719 (i.e., 2019, 07, 19) means the real event happened on the 19 July 2019.

Figure 3 shows a graph of rainfall and average regional flooding depth (ARID) history of 10 training events, which included three quantitative rainfalls (200, 450, and 800 mm), two different return period rainfalls (10 and 500 years), and five actual rainfall events. The rainfall pattern of quantitative rainfalls and two return period rainfalls adopted the centrally concentrated rain pattern, where the maximum rainfall is placed in the center and then sorted according to the amount of rainfall to the right and left, respectively. The average regional inundation depth (ARID) in a specific time was calculated by summarizing all the inundation depths in each grid and then dividing by the number of grids. The time series of ARID shown in Figure 3 presents a complete hydrograph for the correspondent rainfall event (pattern), with the ARID value along with the time from rising limb to the peak and then to the recession limb. We noticed that the actual rainfall events show more irregular rainfall patterns and produce a variant ARID hydrograph. According to the design of training cases, the model can learn the relationship between the different magnitudes of rainfall (large or small) and rainfall types (design rainfall or actual rainfall) with their correspondent average flooding depths during the training process.

**Figure 3.** *Cont*.

**Figure 3.** Rainfall histogram and average regional inundation depth hydrograph for training events. (**a**) 24H800 mm; (**b**) 24H450 mm; (**c**) 24H200 mm; (**d**) 24H10 y; (**e**) 24H500 y; (**f**) 20190719; (**g**) 20170601; (**h**) 20160925; (**i**) 20160912; (**j**) 20190813.

#### **4. Model Construction**

The research process was divided into three stages as shown in Figure 4. The first stage was to collect data, which includes two-dimensional flood simulation data, five stations of rainfall data, and 25 stations of flood sensor datasets. The total datasets were divided into three independent datasets for model training, validation, and testing. The second stage was to construct the forecast models. Three models were built to make multi-step-ahead forecasts of the average regional inundation depth (ARID). The models' parameters are shown in Table 2. The input factors of Model 1 were 5 rainfall stations' data (*R*<sup>1</sup> <sup>∼</sup> *<sup>R</sup>*5) and one model's self-feedback value *ARID*<sup>ˆ</sup> (*<sup>t</sup>* + *<sup>n</sup>* <sup>−</sup> <sup>1</sup>), a total of 6 input factors, and the number of weights was 41. Model 2 used 5 rainfall stations' data (*R*<sup>1</sup> ∼ *R*5), 25 flood sensors' data (*S*<sup>01</sup> <sup>∼</sup> *<sup>S</sup>*25), and one model's self-feedback value *ARID*<sup>ˆ</sup> (*<sup>t</sup>* + *<sup>n</sup>* <sup>−</sup> <sup>1</sup>), a total of 31 input factors, the number of weights was 166. Model 3 used 5 rainfall stations' data (*R*<sup>1</sup> ∼ *R*5), 7 stations of flood representative sensor data (i.e., *S*02, *S*04, *S*05, *S*09, *S*13, *S*15, *S*22) selected through correlation analysis, and one model's feedback value *ARID*<sup>ˆ</sup> (*<sup>t</sup>* + *<sup>n</sup>* <sup>−</sup> <sup>1</sup>), a total of 13 input factors, the number of weights was 76. The third stage was to assess the results of the three models using evaluation indicators.

**Figure 4.** Research framework.



Input variable selection is an essential step in the development of machine learning models. In recent years, various input selection methods have been satisfactorily used to improve prediction accuracy and produce parsimonious models in numerous applications [42–45]. For instance, in hydrological issues, Taormina and Chau [46] used binary-coded particle swarm optimization and extreme learning machines for rainfall–runoff modeling; Chang et al. used the Gamma test to identify the most suitable input variables for multi-step-ahead water level forecasting [26] and estimating stream total phosphate concentration [47]. While new methods continue to emerge, each has its own advantages and limitations, and there is no best method for all modeling purposes. In this study, we aimed learn whether the new implemented 25 sensors in the study area could be beneficial to model the regional flood inundation forecast. The input variable selection was mainly based on correlation analysis between the sensors. As shown above, the main difference between Model 2 and Model 3 was that Model 2 used all sensors (25) as the model input, while Model 3 used 7 selected sensors as input. A 25 × 25 correlation matrix was constructed. The selection process is shown in Figure 5 and explained as follows.


**Figure 5.** The procedure of selecting representative sensors.

In this way, one of the most representative sensors will be selected for each round until all sensors have been picked or removed. We noticed that this study conducted a total of 7 rounds and picked out S22, S13, S02, S09, S04, S15, S05, a total of 7 representative sensors.

The hydrographs show 25 flood sensors and their correspondent ARID during the 500 year event used in Model 2 (Figure 6a) and 7 representative sensors used in Model 3 (Figure 6b), representatively. We noticed that many of these 25 sensors showed the same trend (such as *S*11, *S*12, *S*13, *S*14, *S*15, *S*16, *S*17, *S*19, *S*20, *S*21, *S*23, *S*24, *S*25, etc.), which will result in additional parameters in the model and cause noise. In Model 3, which used only 7 representative sensors, the number of parameters and noise were both greatly reduced. Thus, the model could more accurately describe the relationship between rainfall, flooding sensor, and model output.

The evaluation indicators used in this study were RMSE (root-mean-square error), R2, and Nash–Sutcliffe coefficient (NSE). The formulae are shown in Equations (15)–(17).

$$RMSE = \sqrt{\frac{\sum\_{i=1}^{N} (d\_i - y\_i)^2}{N}} \tag{15}$$

$$R^2 = \left(\frac{\sum\_{i=1}^{N} \left(d\_i - \overline{d}\right) \times \left(y\_i - \overline{y}\right)}{\sqrt{\sum\_{i=1}^{N} \left(d\_i - \overline{d}\right)^2} \times \sum\_{i=1}^{N} \left(y\_i - \overline{y}\right)^2}\right)^2\tag{16}$$

$$NSE = 1 - \frac{\sum\_{i=1}^{N} (d\_i - y\_i)^2}{\sum\_{i=1}^{N} (d\_i - \overline{d})^2} \tag{17}$$

where *di* and *yi* are the observed value and the forecasted value of the *i*th data, respectively; *d* and y are the means of the observed and forecasted values.

**Figure 6.** The sensor data vs. T + 1 forecasted average regional inundation depths by Models. (**a**) 25 sensors' data vs. T+1 forecasted ARIDs of Model 2; (**b**) 7 sensors' data vs. T+1 forecasted ARIDs of Model 3.

#### **5. Results**

As known, there is a temporal relationship between observations of the rainfall and IoT sensors' inundation level of the study area and hence the value of observation at a particular time depends to some extent on past values. This study sought to model the temporal relationship and forecast the average regional inundation depth by using the machine learning model. Our reason for employing IoT sensor data was that the sensors could provide monitored (real) inundation values for model's training as well as testing and thus improve the multi-step forecast accuracy. Moreover, the real-time monitoring datasets could be used to on-line adjust the model's parameters and visually assess the model's performance, which could largely promote the model's applicability and reliability. The results of the three models are shown in Table 3. There were two very large flood events (water depth > 0.5 m) in the training case. Model 1 results indicated those high flood events were underestimated, but its performances, in general, was good, where the RMSE could reach 0.049, 0.050, and 0.061 m in training, validation and testing cases, respectively. The small values of RMSE and high R<sup>2</sup> and NSE values indicate that Model 1, which was solely based on rainfall information as input, could provide suitable 1–3 h ahead forecasts, while it would largely underestimate the peak flood inundation depth.

Model 2 used the data of 5 rainfall stations and 25 flood sensors. In the training phase, its errors in T + 1–T + 3 were significantly reduced as compared with those of Model 1. For instance, the RMSE values of Model 1 and Model 2 at T + 3 were 0.083 and 0.049, respectively. We notice that Model 2 had much better performances in multi-step-ahead forecasts (T + 1–T + 3) than those of Model 1 in the training phase, while that was not the case in both the validation and testing phases. In order to investigate the inconsistency problem, we explored the relationship between sensors' data with the models' forecast values in the event (24H100y) of the testing case. Figure 7a shows that the time series of 25 sensors were inconsistent, especially during the 12th–18th hours, where some of them were ascending, while the rest of them were descending, and the time series of the 3 h ahead forecasted ARID made by Model 2 was in sharp descent. We noticed that the inconsistency between sensors' hydrographs might have been due to the propagation time between the upstream and downstream locations of the sensors resulting in the different lag time between the increasing and decreasing limb of their associated hydrograph. The inconsistency between the sensors' datasets as well as the forecasted ARID values, however, could result in a large error (i.e., RMSE = 0.113 shown in Table 3). The results indicated that adding the IoT sensors information could, in general, reduce the forecast error of the ARID in the training phase due to using more information (25 sensors' data) as input. Nevertheless, using a large number of inconsistent sensors values along the time series as input might introduce too many parameters in the model and result in an overfitting (overtraining) problem.


**Table 3.** Performance of one- to three-hour-ahead forecasts of the Model 1, Model 2, and Model 3.

RMSE: root-mean-square error, NSE: Nash–Sutcliffe coefficient.

To reduce the input variables (sensors), correlation analysis was used to select a limited number of sensors as the representative sensors. Model 3 used the data of 5 rainfall stations and 7 representative sensors as inputs. Figure 7b shows the time series of 3 h forecasted ARID with the inundation depths of 7 representative sensors. As shown, the time series of 7 selected sensors as well as the forecasted ARID were relatively consistent, as compared with those of Figure 7a. The results show that Model 3, in general, was superior to the Model 1 and Model 2 in all the validation and testing phases, and its R2 values in T + 1–T + 3 were consistent and higher than 0.9 in all the phases (Table 3). Thus, the Model 3 forecasts maintained fairly high accuracy (very small RMSE and high R<sup>2</sup> and NSE).

**Figure 7.** The sensor data vs. T + 3 forecasted average regional inundation depths by Models. (**a**) 25 sensors' data vs. T+3 forecasted ARIDs of Model 2; (**b**) 7 sensors' data vs. T+3 forecasted ARIDs of Model 3.

Figure 8 presents the forecast results of Model 1 and Model 3 at T + 1 in three phases. It shows that Model 1 underestimated the peak flow in the first and fifth events of the training session (i.e., 24H800mm and 24H500y), while they overestimated the peak flow in the validation and testing phases. In contrast, the results of Model 3 indicated that the problems of overestimated or underestimated peak values of ARIDs were significantly mitigated, its estimating errors (RMSE), in general, were also much smaller than those of Model 1 in all three phases.

(**c**) testing phase

**Figure 8.** Comparison of simulation, Model 1, and Model 3 in three phases. (**a**) training phase; (**b**)validation phase; (**c**)testing phase.

Figure 9 shows the RMSE charts in the training, validation, and testing phases of the three models. It is clearly shown that Model 3 had the most reliable and accurate performance compared to Models 1 and 2. Model 2 could be very well trained and was superior to Model 1 in all cases except in the cases of T + 2 and T + 3 in the testing phase, which was mainly caused by an inconsistency issue in the event of 24H100y. Model 3 had better performance than Model 2 in the validation and testing phases, where the average improvement rates were 22.65% and 42.71%, respectively. Moreover, the model's weights

were reduced from 166 (Model 2) to 76 (Model 3), when the number of parameters was significantly reduced by 54.21%. The analyzed results provide an extra evidence and demonstrate that using the selected sensors as model input not only produces a parsimonious model but also improves prediction accuracy in multi-step-ahead flood inundation forecasts. These results represent the great value and benefit of IoT sensors, which fuse as inputs into the machine learning models.

**Figure 9.** The radar chart of models' performance (RMSE) in three phases.

#### **6. Conclusions**

This study sought to model the temporal relationship and forecast the average regional inundation depth by using the IoT-based machine learning model. The datasets obtained from rainfall stations and IoT flood sensor data were used to model the average regional flood forecasts and explore the effectiveness and usefulness of the IoT sensors data in the model's reliability and accuracy. The results show that adding IoT sensor data as a model input can reduce the model error, especially for those of long horizontal (T + 2 and T + 3) forecasts in high-flood-depth conditions, where their underestimations are significantly mitigated. For instance, by adding 25 IoT sensors of inundation depth as extra inputs of Model 2, an average error improvement rate up to 18.49% could be reached as compared with Model 1 (only used rainfall datasets). While we also noticed that the inconsistent relationship between the 25 sensors datasets could over train the model and result in a chaotic issue in late application. For instance, Model 2 does provide the worst performances in the testing phase of T + 3 case. On the other hand, Model 3 used 7 IoT representative sensors, selected by the principle of correlation, as the extra inputs to model the multi-step flood forecasts. The number of parameters of Model 3 were greatly reduced (over 50%) as compared with Model 2. Furthermore, Model 3 was superior to Model 2, with an average improvement rate up to 18%, and it also provided much better forecast performances (small RMSE and high R2 and NSE values) than Model 1 in all the T + 1–T + 3 cases. Therefore, these results give very promising evidence and demonstrate that using the IoT representative sensors as the model input to configure the machine learning models not only can produce a parsimonious model but also significantly improve the models' reliability and accuracy in multi-step-ahead regional flood inundation forecasts. The most fascinating achievement of this study is that the constructed model now can be on-line adjusted and its real-time forecast can be visually assessed and numerically evaluated based on the current implemented IoT inundation levels in the studied areas. Thus, the IoT-based machine learning model performance could be continuously assessed and adjusted, and its reliability and accuracy could be consistent.

**Author Contributions:** Conceptualization, L.-C.C.; Methodology, L.-C.C. and S.-N.Y.; Software, Validation, Data Curation and Visualization, S.-N.Y.; Formal Analysis and Investigation, L.-C.C. and S.-N.Y.; Resources, Writing-Original Draft Preparation, Writing-Review and Editing, Supervision, Project Administration and Funding Acquisition, L.-C.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Water Resource Agency, Ministry of Economic Affairs, Taiwan, R.O.C. (grant number: MOEAWRA1060468).

**Acknowledgments:** The authors gratefully acknowledge the Water Resources Agency (WRA), Taiwan for the financial support on this research and for providing the investigative data. The authors would like to thank the Editors and anonymous Reviewers for their valuable and constructive comments related to this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


on Signal Processing and its Applications IEEE, Kuala Lumpur, Malaysia, 7–9 March 2014; pp. 204–207. [CrossRef]


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article*
