Water Level Forecasting Combining Machine Learning and Ensemble Kalman Filtering in the Danshui River System, Taiwan

Fu, Jin-Cheng; Su, Mu-Ping; Liu, Wen-Cheng; Huang, Wei-Che; Liu, Hong-Ming

doi:10.3390/w16233530

Open AccessArticle

Water Level Forecasting Combining Machine Learning and Ensemble Kalman Filtering in the Danshui River System, Taiwan

by

Jin-Cheng Fu

¹,

Mu-Ping Su

²,

Wen-Cheng Liu

^2,*

,

Wei-Che Huang

²

and

Hong-Ming Liu

²

¹

National Science and Technology Center for Disaster Reduction, New Taipei City 23143, Taiwan

²

Department of Civil and Disaster Prevention Engineering, National United University, Miaoli 360302, Taiwan

^*

Author to whom correspondence should be addressed.

Water 2024, 16(23), 3530; https://doi.org/10.3390/w16233530

Submission received: 31 October 2024 / Revised: 28 November 2024 / Accepted: 5 December 2024 / Published: 8 December 2024

(This article belongs to the Special Issue Application of Machine Learning Models for Flood Forecasting)

Download

Browse Figures

Versions Notes

Abstract

:

Taiwan faces intense rainfall during typhoon seasons, leading to rapid increases in water level in rivers. Accurate flood forecasting in rivers is essential for protecting lives and property. The objective of this study is to develop a river flood forecasting model combining multiple additive regression trees (MART) and ensemble Kalman filtering (EnKF). MART, a machine learning technique, predicts water levels for internal boundary conditions, correcting a one-dimensional (1D) unsteady flow model. EnKF further refines these predictions, enabling precise real-time forecasts of water levels in the Danshui River system for up to three hours lead time. The model was calibrated and validated using observed data from four historical typhoons to evaluate its accuracy. For the present time at three water level stations in the Danshui River system, the root mean square error (RMSE) ranged from 0.088 to 0.343 m, while the coefficient of determination (R²) ranged from 0.954 to 0.999. The validated model (module 1) was divided into two additional modules: module 2, which combined the ensemble unsteady flow model with inner boundary correction and MART, and module 3, which featured an ensemble 1D unsteady flow model without inner boundary correction. These modules were employed to forecast water levels at three stations from the present time to 3 h lead time during Typhoon Muifa in 2022. The study revealed that the Tu-Ti-Kung-Pi station was less affected by inner boundaries due to significant tidal influences. Consequently, excluding the upstream and downstream boundaries, Tu-Ti-Kung-Pi station showed a superior RMSE trend from present time to 3 h lead time across all three modules. Conversely, the Taipei Bridge and Bailing Bridge stations began using inner boundary forecast values for correction from 1 h to 3 h lead times. This increased the uncertainty of the inner boundary, resulting in higher RMSE values for these locations in modules 1 and 2 compared to module 3.

Keywords:

machine learning; multiple additive regression trees; ensemble Kalman filter; water level forecasting; Danshui River system

1. Introduction

Taiwan’s subtropical climate results in approximately 70% of its annual precipitation occurring during the typhoon and rainy seasons, often leading to localized, intense, and short-duration heavy rainfall events. These events are further exacerbated by Taiwan’s distinctive topography, characterized by steep mountains and short river courses, which amplify the impact of rainfall. Excessive precipitation can rapidly exceed river capacities, causing sudden water level surges that result in severe flooding and landslides. The limited response time intensifies the damage to lives and property. Consequently, the development of an effective river flood forecasting system that provides timely and accurate warnings is essential for mitigating flood-related disasters and enabling swift remedial actions [1,2].

Traditionally, flood control efforts in Taiwan have emphasized structural measures. However, mitigating flood damage now requires an integrated approach that incorporates non-structural measures, such as advanced flood forecasting systems. By leveraging modern communication technologies to collect real-time, accurate observational data ahead of impending typhoons and applying dynamic analyses to estimate runoff and river flood levels at critical points, timely warnings can be issued for flood-prone areas. This proactive strategy enhances preparedness for disaster response teams and residents, enabling precautionary measures that reduce flood-related losses. Contemporary disaster mitigation strategies highlight the importance of adopting and advancing flood warning systems. As a result, countries frequently affected by flood disasters are continuously working to refine and improve the accuracy of flood level forecasts. To assess flood risks and mitigate associated losses, studies by researchers [3,4,5,6] have employed advanced technologies to develop flood warning systems. These systems disseminate alerts to relevant institutions, key industries, and residents in high-risk areas, facilitating early preparedness and the implementation of protective measures. This proactive approach significantly reduces the potential loss of life and property, thereby advancing the objectives of flood prevention and disaster mitigation.

One-dimensional hydrodynamic models have been extensively employed for river water level forecasting [7,8,9,10]. Recent advancements in river water level predictions have leveraged machine learning and deep learning techniques, yielding promising results [11,12,13]. A comprehensive review of machine learning methods for estimating water levels and discharges in tidal rivers and estuaries is also available [14]. Furthermore, integrating hydrodynamic models with the Kalman filter has proven effective in enhancing the accuracy of water level predictions [15,16,17,18,19,20]. For instance, Barthelemy et al. [18] reported that the ensemble Kalman filter improved the root mean square error of water level predictions by up to 88% at the analysis time and 40% at a 4 h forecast lead time compared to a standalone model. Similarly, Lee et al. [20] demonstrated that the conditional bias-penalized ensemble Kalman filter increased the multi-basin mean skill score of the mean square error by approximately 0.15 over the ensemble Kalman filter, for lead times extending up to the basin’s time to peak.

The application of multiple additive regression trees (MART) in water level forecasting has also been investigated [21,22]. Fu et al. [21] proposed three MART models, each with distinct approaches to model training and error correction. Their findings indicated that while the original MART and real-time MART models were more effective at capturing overall river stage variations, the naïve MART model demonstrated a higher accuracy in predicting peak river stages. They concluded that the proposed MART models are both efficient and accurate, making them suitable for practical flash flood early warning systems. Jang et al. [22] developed a hybrid model integrating MART with Runge–Kutta numerical schemes to enhance water level forecasting. Their study showed that the original MART model without Runge–Kutta schemes and the hybrid models reduced errors in mean and peak river stage predictions by 29% and 53%, respectively.

However, the combined use of MART and the ensemble Kalman filter in a flash flood forecasting model for river water level forecasting remains underexplored. This study presents a novel approach to water level forecasting by integrating a one-dimensional unsteady flow model with multiple additive regression trees (MART) and the ensemble Kalman filter (EnKF). The primary objective is to improve the accuracy of river water level predictions. To achieve this, the hyperparameters of the MART model were first optimized using the grid search method. Next, the river forecasting model was calibrated and validated with observational data from four historical typhoon events at multiple locations, ensuring the reliability of the predictions. Finally, the impact of different forecasting modules on the accuracy of the water level forecasts was analyzed.

2. Materials and Methods

2.1. Description of Study Area

The Danshui River system, the largest river in northern Taiwan, serves as the study area for this research. Originating from Mount Pintian in the central mountain range at an elevation of 3529 m, the Danshui River spans a watershed area of 2726 km². It is formed by the confluence of three main tributaries: the Dahan River, the Xindian River, and the Keelung River (Figure 1). The main stream of the Danshui River, resulting from the merging of the Dahan and Xindian Rivers at Jiangzicui, extends approximately 158.7 km to Youchekou. It traverses several administrative regions, including Taipei City, New Taipei City, Keelung City, and Taoyuan City. The river basin lies within a subtropical climate zone, experiencing a rainy season from May to October and a dry season from November to April. The average annual rainfall in the basin is 2996.1 mm.

The Dahan River’s main stream is 135 km long, with a watershed area of 1163 km². The Xindian River’s main stream is 82 km long, covering a watershed area of 916 km². The Keelung River’s main stream is 86.4 km long, with a watershed area of 490 km². Due to the river capture effect, the Keelung River changes direction in its upper, middle, and lower reaches before ultimately joining the Danshui River near Guandu [23,24].

Figure 1. Map of the Danshui River system in northern Taiwan [25].

2.2. Data Collection

2.2.1. Geographical Data

River cross-sections are essential for calculating discharge and predicting water level variations. In this study, survey data from 102 cross-sections within the Danshui River system were collected and incorporated into a one-dimensional hydrodynamic model. The Manning’s coefficient (n), a critical parameter for estimating flow velocity and water levels, quantifies the frictional resistance of flow within the channel. The Manning’s coefficients used in this study were based on prior research [26,27]. Following model validation, the Manning’s coefficients for the Danshui River system were determined to range from 0.016 to 0.050.

2.2.2. Rainfall Data

This study utilized rainfall data from 11 stations, sourced from the Water Information Integration Platform of the Taiwan Water Resources Agency. The stations included Dapao, Sanxia, Shanjia, Quchi, Huoshaoliao, Ruifang, Zhongzheng Bridge, Shezi, Guandu, Wudu, and Shehou Bridge. The locations of these rainfall stations are presented in Figure 1.

2.2.3. Water Level Data

Water level data were also collected from the Water Information Integration Platform of the Taiwan Water Resources Agency. The dataset includes seven stations located from downstream to upstream: River Mouth, Tu-Ti-Kung-Pi, Bailing Bridge, Dazhi Bridge, Taipei Bridge, Zhongzheng Bridge, and Xinhai Bridge. The locations of these water level stations are also shown in Figure 1.

2.3. Flash Flood Forecasting Model in River

This study examines several factors unique to Taiwan, including the 12 to 24 h duration of typhoon impacts, short river lengths, steep slopes, tidal influences, and rapid changes in water levels. These characteristics constrain the availability of observational data from rivers during typhoon events. Consequently, simple polynomial equations are insufficient for estimating water levels under such conditions. Accurate predictions require a more sophisticated hydrodynamic model grounded in robust theoretical principles. To address this, this study adopts a river flood forecasting model based on the dynamic wave algorithm of the Saint-Venant equations. The Saint-Venant equations, comprising continuity and momentum equations, consider factors such as gravity, friction, convective acceleration, and local acceleration in the water flow motion equation. These equations simplify into a functional relationship between water depth (Y) and discharge (Q). However, hyperbolic partial differential equations cannot be solved analytically and must be addressed using numerical methods. In this study, the four-point implicit finite difference approximation [28] was employed to solve the flow variables. Details of the discretization process and the solution of the nonlinear system are provided in Hsu et al. [1,8].

For both past and real-time computations, the river flood forecasting model utilizes upstream observed water levels and tidal model-calculated water levels at the river mouth as boundary conditions. The model also incorporates internal boundary condition corrections, including the initial value correction and real-time water level correction [1]. Data for internal boundary conditions were obtained by predicting water levels at three stations (Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge) within the watershed using a MART model.

During the internal boundary condition correction, while adhering to mass and motion conservation, the least squares method is employed to adjust the internal boundary conditions. This ensures that the model’s calculated results closely match the observed water depth at hydrological stations, thereby enhancing forecast accuracy [1]. Figure 2 illustrates the flow chart for calculating water levels, which comprises three main components: present time calculation, lead time calculation, and water level computation at the inner boundary. The lead time calculation incorporates the EnKF, while the MART method is employed to forecast water levels at the inner boundary.

2.4. Multiple Additive Regression Trees (MART)

This study utilizes multiple additive regression trees (MART) to predict water levels at various river stations. The results from these predictions will be employed for inner boundary correction in the river flood level forecasting model, aiming to enhance its accuracy. Decision trees serve as fundamental tree-based machine learning models, whereas MART is a type of gradient boosting algorithm. Gradient boosting algorithms are ensemble learning methods that iteratively build multiple weak learners (i.e., decision trees) and combine them into a robust predictive model. MART specifically calculates the loss function in a gradient manner, using multiple additive regression trees to fit and predict the target variable. By progressively enhancing the performance of each regression tree to minimize model residuals, MART improves overall predictive capability.

To mitigate overfitting issues caused by complex decision trees, MART uses a boosting method to prune the trees, involving four iterative steps:

1st step: Input the training factors z, the fitted values y, and the loss function into the equation [29].

{\{(x_{i}, y_{i})\}}_{i = 1}^{n}, L (y_{i}, f (x))

(1)

2nd step: Before training gradient boosting trees, it is essential to determine an initial prediction value. The optimal initial prediction can be obtained by solving the model to find the value that minimizes the loss function [30,31]. Utilizing this value as the initial prediction enhances the model’s predictive performance.

f_{0} (x) = a r g \underset{γ}{m i n} \sum_{i = 1}^{N} L (y_{i}, γ)

(2)

3rd step: By performing M computations across M trees, the residuals between each prediction and the actual values can be calculated. The training data is then grouped based on the characteristics of the tree type, and the residuals are updated by minimizing the loss function [32]. To update each data point with the product of the updated residual and the learning rate, the prediction for the mth iteration is represented as follows:

Compute the residual between each prediction value and the actual value.

γ_{i m} = - {[\frac{\partial L (y_{i}, f (x_{i}))}{\partial f (x_{i})}]}_{f = f_{m - 1}}, i = 1, 2, . . ., N

(3)

2.: Fit a regression tree model on the target variable γ_im and split it into multiple regions R_jm, j = 1, 2, ….., J_m.
3.: For each region R_jm, calculate the residual r_im,i = 1, 2, …., J_m, where the residual is obtained by minimizing the loss function.

γ_{i m} = a r g \underset{γ}{m i n} \sum_{x_{i} \in R_{j m}} L (y_{i}, f_{m - 1} (x_{i} + γ))

(4)

4.: To obtain the prediction for the mth iteration (f_m(x)), add the product of the updated residual and the learning rate to the previous prediction value [21].

f_{m} (x) = f_{m - 1} (x) + \sum_{j = 1}^{J_{m}} γ_{j m} I (x \in R_{j m})

(5)

Final step: After completing M iterations, produce the final value.

\hat{f} (x) = f_{M} (x)

(6)

This study employs the gradient boosting regressor from Python’s Sklearn library to establish the MART model. Sklearn is a machine learning library in Python. When constructing the model, setting hyperparameters is crucial. These parameters, which must be manually configured, affect the training process and the model’s results. The hyperparameters include the learning rate, number of trees (N estimators), subsample rate, and loss function. The Huber loss function [29] can be expressed as follows:

\{\begin{matrix} \frac{1}{2} [y_{i} - f {(x_{i})}^{2}], f o r |y_{i} - f (x_{i})| \leq α \\ α |y_{i} - f (x_{i})| - \frac{1}{2} α^{2}, o t h e r w i s e \end{matrix}

(7)

When utilizing the Huber loss function, setting the parameter α in Equation (7) is essential. This study applies grid search to optimize hyperparameters, particularly targeting the learning rate, number of trees, sampling rate, and Huber α. Grid search systematically partitions the hyperparameter ranges into a grid and explores each combination to identify the optimal set. This approach enables a thorough assessment of various hyperparameter configurations to determine the most effective combination.

For water level forecasting, this study used the MART algorithm. Out of 10 typhoon events, 7 were used as the training set, 2 as the validating set, and 1 as the test set (Table 1). The training set trained the model, while the test set evaluated the model’s prediction results. Water level predictions were conducted for 5 typhoon events, with the model predicting water level variations for the next 3 h based on observational data from water level and rainfall stations. The prediction model is expressed as follows:

H_{z}^{t + 1}, H_{z}^{t + 2}, H_{z}^{t + 3} = f (\binom{R_{o}^{t - 3}, R_{o}^{t - 2}, R_{o}^{t - 1}, R_{o}^{t}, R_{o}^{t + 1}, R_{o}^{t + 2}, R_{o}^{t + 3}, R_{o}^{t + 4}, R_{o}^{t + 5}, R_{o}^{t + 6}}{H_{r}^{t - 3}, H_{r}^{t - 2}, H_{r}^{t - 1}, H_{r}^{t}})

(8)

where

H_{z}^{t + 1}

~

H_{z}^{t + 3}

are the forecasted water level for hours t + 1~t + 3 at water level stations, where z = 1~6, corresponding to the six water level stations: Tu-Ti-Kung-Pi, Taipei Bridge, Dazhi Bridge, Bailing Bridge, Zhongzheng Bridge, and Xinhai Bridge.

R_{o}^{t - 3} ~ R_{o}^{t + 6}

denote the hourly rainfall for hours t − 3~t + 6 at rainfall station o, where o = 1~11, corresponding to the eleven rainfall stations: Dapao, Sanxia, Shanjia, Quchi, Huoshaoliao, Ruifang, Zhongzheng Bridge, Shezi, Guandu, Wudu, and Shehou Bridge.

H_{r}^{t - 3} ~ H_{r}^{t}

represent the water levels for hours t − 3~t at water level station r, where r = 1~7, corresponding to the seven water level stations: river mouth, Tu-Ti-Kung-Pi, Taipei Bridge, Dazhi Bridge, Bailing Bridge, Zhongzheng Bridge, and Xinhai Bridge.

2.5. Ensemble Kalman Filter (EnKF)

The Ensemble Kalman Filter (EnKF) is a data assimilation technique derived from the Kalman filter. This method accounts for uncertainties in both model forecasts and observations, assuming these uncertainties follow a normal distribution with a mean of zero. The primary concept involves updating forecast analysis values by integrating information from both model forecasts and observations, thereby providing a more precise estimation of the system’s true state.

The EnKF creates an ensemble based on initial model forecasts and observations. An ensemble comprises multiple system models with varied initial states, generated through random sampling or other techniques. Each ensemble member generates a model forecast, producing a forecast value for the subsequent time step. These forecast values are compared with observations to calculate errors, which are then used to update the forecast values.

Incorporating the ensemble concept into the Kalman filter provides an efficient approach to managing the computational challenges associated with storing and updating the error covariance matrix. By employing a finite set of random samples to represent system uncertainties, the Ensemble Kalman Filter (EnKF) eliminates the need for large error covariance matrices, particularly in high-dimensional and nonlinear models, thereby significantly reducing computational demands [27]. Figure 2 illustrates the EnKF workflow. The key steps in executing the EnKF include computing the model error covariance, estimating the observation error covariance, and calculating the Kalman gain matrix. Subsequently, the model is updated to correct the forecasted water levels. The theoretical foundation of the EnKF is detailed below.

2.5.1. Error Covariance and Analysis Equations

In a discrete time process, forecast variables can be represented in vector form, referred to as the state vector. To differentiate between the original forecast value and the analysis value obtained after assimilating observation data, the symbols x^f and x^a are used, respectively, where x represents the forecast water level. The errors of the elements in the state vector can be calculated using the error covariance matrix P.

e^{f} = x^{f} - {\hat{x}}^{f}

(9)

e^{a} = x^{a} - {\hat{x}}^{a}

(10)

P^{f} = c o v [x^{f}, x^{f}] = E [e^{f} e^{f^{T}}]

(11)

P^{a} = c o v [x^{a}, x^{a}] = E [e^{a} e^{a^{T}}]

(12)

where the superscripts f and a denote the forecast state and the analysis state, respectively, cov stands for covariance,

\hat{x}

indicates the estimated value of the state vector, e is the error, and E represents the expectation function.

The ensemble forecast results serve as sample representations of the probability distribution of the state vector. These samples can be directly utilized to calculate the error covariance.

P^{f} = c o v [x^{f}, x^{f}] = \frac{1}{N - 1} \sum_{k = 1}^{N} [x_{k}^{f} - {\bar{x}}^{f}] {[x_{k}^{f} - {\bar{x}}^{f}]}^{T}

(13)

P^{a} = c o v [x^{a}, x^{a}] = \frac{1}{N - 1} \sum_{k = 1}^{N} [x_{k}^{a} - {\bar{x}}^{a}] {[x_{k}^{a} - {\bar{x}}^{a}]}^{T}

(14)

where subscript k denotes the index of the ensemble member, N indicates the total number of ensemble members, the overline expresses the ensemble mean, and [ ]^T represents the transpose of a matrix.

The data assimilation cycle comprises two main steps: forecast and analysis. In the forecast step, the ensemble Kalman filter (EnKF) method is employed to compute forecast results for all ensemble members. Each member represents a potential sample of the state vector, and their forecast values are aggregated, typically by averaging, to produce the ensemble forecast for the state vector.

In the subsequent analysis step, upon receiving new observational data, the EnKF updates the forecast values of all ensemble members to derive the analysis values. During this updating process, uncertainty is assessed using the error covariance between the observed and forecast values, allowing for adjustments to the analysis values of the state vector and the associated error covariance.

This forecast-analysis cycle is iteratively repeated. Notably, the model can continue to refine and enhance the forecast results even in the absence of new observational data. The EnKF innovates by incorporating ensemble forecasting in the forecast step. The equations for the analysis step are as follows:

K = P^{f} H^{T} {(H P^{f} H^{T} + R)}^{- 1}

(15)

x_{k}^{a} = x_{k}^{f} + K [y_{k}^{0} - H x_{k}^{f}]

(16)

where H represents the observation operator that transforms the model’s state vector into an observation vector, R is the error covariance matrix of the observation values, and K denotes the Kalman gain matrix, which reflects the weighting balance between the model forecast and the observation data.

During the update of each ensemble member, the observation vector (e.g., observed water level) must be perturbed with a random error ε_k of covariance R. This procedure is essential to generate unbiased statistical estimates among the updated ensembles [33].

y_{k}^{0} = y^{0} + ε_{k}

(17)

c o v [y^{0}, y^{0}] = \frac{1}{N - 1} \sum_{k = 1}^{N} ε_{k} {ε_{k}}^{T} = R

(18)

where

y_{k}^{0}

represents the observation water level y⁰ at a future time step in the Kalman filter model, including a random error ε_k. While ensemble members are treated as independent samples in the analysis equation, they are actually updated together and influenced by their interrelationships. Consequently, any modification to one ensemble member impacts the overall error covariance.

2.5.2. Ensemble Square Root Filtering

In the analysis phase of the Ensemble Kalman Filter (EnKF), deviations (i.e., ensemble perturbations) between each ensemble member and the ensemble mean are employed to determine the analysis values. This method effectively captures the variability among ensemble members and accounts for possibilities beyond the ensemble mean. By incorporating ensemble perturbations into the ensemble mean, a new set of updated ensemble members is produced.

{x^{’}}_{k}^{f} = x_{k}^{f} - {\bar{x}}^{f}

(19)

{x^{’}}_{k}^{a} = x_{k}^{a} - {\bar{x}}^{a}

(20)

where

{x^{’}}_{k}^{f}

represents the updated ensemble mean, and

{x^{’}}_{k}^{a}

denotes the updated ensemble perturbations.

The ensemble square root filtering applied in this study follows the sequential processing method [34]. This approach notably diminishes computational demands by incorporating each observation separately. By assimilating observations individually, the observation vector is reduced to a one-dimensional scalar. The calculations for P^fH^T and HP^fH^T are conducted as follows:

P^{f} H^{T} = c o v [x^{f}, H x^{f}] = \frac{1}{N - 1} \sum_{k = 1}^{N} [x_{k}^{f} - {\bar{x}}^{f}] [H x_{k}^{f} - \bar{H x^{f}}]

(21)

{H P}^{f} H^{T} = c o v [H x^{f}, H x^{f}] = \frac{1}{N - 1} \sum_{k = 1}^{N} {[H x_{k}^{f} - \bar{H x^{f}}]}^{2}

(22)

where H denotes the observation operator that transforms the state vector into observation, and Hx^f represents a scalar.

This ensemble square root filtering method separates the analysis step into two parts: updating the ensemble mean and updating the ensemble perturbations. The update of the ensemble mean adheres to the same procedure as standard Kalman filtering.

K = P^{f} H^{T} {(H P^{f} H^{T} + R)}^{- 1}

(23)

{\bar{x}}^{a} = {\bar{x}}^{f} + K [y^{0} - H {\bar{x}}^{f}]

(24)

The equation for updating the ensemble perturbations can be expressed as follows:

{x^{’}}_{k}^{a} = {x^{’}}_{k}^{f} + \tilde{K} H^{’} x_{k}^{f}

(25)

H^{’} x_{k}^{f} = H x_{k}^{f} - \bar{H x^{f}}

(26)

By utilizing sequential processing, the initial complex computation of the matrix square root for

\tilde{K}

can be simplified as follows:

\tilde{K} = γ K, γ = {(1 + \sqrt{\frac{R}{H P^{f} H^{T} + R}})}^{- 1}

(27)

2.5.3. Covariance Localization

When the EnKF is applied to river flood forecasting models, water level observations at any cross-section update all cross-section water levels using the error covariance matrix. According to the principle of water level continuity, the correlation decreases with the distance from the observed water level. However, with a small number of ensemble samples, errors in estimating the correlation of cross-section water levels may occur, causing abnormal updates for distant cross-sections.

To address this issue, Hamill et al. [35] proposed a distance filter, known as covariance localization, which allows the error covariance matrix to decrease with distance. At the observation location, the maximum value is set to 1.0, and it gradually decreases with distance, eventually reaching 0 beyond a certain range.

P^{f} H^{T} = S \circ \frac{1}{N - 1} \sum_{k = 1}^{N} [x_{k}^{f} - {\bar{x}}^{f}] [H x_{k}^{f} - \bar{H x^{f}}]

(28)

where the operator o denotes the Schur product, used to perform element-wise multiplication between the correlation matrix S and the corresponding elements of the posterior matrix, yielding a new matrix P^fH^T. The correlation matrix S is constructed using the fifth-order function Ω [36].

2.6. Comparison Criteria

Four metrics were used to evaluate the discrepancies between the model simulations and observational data: mean absolute error (MAE), root mean square error (RMSE), Nash–Sutcliffe efficiency (NSE) [37], and the coefficient of determination (R²) [38]. These metrics are defined by the following equations:

M A E = \frac{1}{N} \sum_{i = 1}^{N} |y_{o b s, i} - y_{s i m, i}|

(29)

R M S E = \sqrt{\frac{\sum_{i = 1}^{N} {(y_{o b s, i} - y_{s i m, i})}^{2}}{N}}

(30)

N S E = 1 - \frac{\sum_{i = 1}^{N} {(y_{o b s, i} - y_{s i m, i})}^{2}}{\sum_{i = 1}^{N} {(y_{o b s, i} - {\bar{y}}_{o b s})}^{2}}

(31)

R^{2} = {(\frac{\sum_{i = 1}^{N} (y_{o b s, i} - {\bar{y}}_{o b s}) (y_{s i m, i} - {\bar{y}}_{s i m})}{\sqrt{\sum_{i = 1}^{N} {(y_{o b s, i} - {\bar{y}}_{o b s})}^{2}} \sqrt{\sum_{i = 1}^{N} {(y_{s i m, i} - {\bar{y}}_{s i m})}^{2}}})}^{2}

(32)

where N represents the total number of data points,

y_{o b s, i}

denotes the observational data,

y_{s i m, i}

indicates the simulation value,

{\bar{y}}_{o b s}

indicates the mean of observational data, and

{\bar{y}}_{s i m}

indicates the mean of simulation value.

The MAE and RMSE provide insights into the degree of dispersion between the predicted and observed values. Smaller MAE and RMSE values indicate lower dispersion, suggesting that the predictions are closer to the observed values and that the forecasting performance is more accurate. An NSE value closer to 1 indicates a higher degree of agreement between predicted and observed values, reflecting greater forecasting accuracy and improved model fit. The R² value ranges from 0 to 1, with values closer to 1 indicating higher predictive accuracy.

3. Results

3.1. Determination of Hyperparameter in MART

The tuning of hyperparameters in machine learning directly impacts model results. The goal of hyperparameter tuning is to identify the best combination of hyperparameters within the hyperparameter space to optimize model performance. Default hyperparameter values are set as follows: learning rate at 0.1, number of trees at 200, Huber alpha at 0.75, and sampling rate at 1 [39]. Typhoon data is divided into a training set (seven typhoon events), validation set (one typhoon event), and test set (two typhoon events) as shown in Table 1.

Using the grid search method, all possible combinations of hyperparameters within the hyperparameter space are evaluated to identify the optimal combination. Given that the grid search method outputs only one optimal combination each time, multiple iterations of hyperparameter tuning are conducted to obtain an average of the optimal hyperparameters. The number of hyperparameter tuning iterations is set to 5, 8, 10, 15, and 20 times. The best parameters from each iteration are stored, and the average of these parameters is used as the final setting. These optimal parameters are then used to build the model with the training set, and the test set is used to evaluate the forecast results.

The optimal average parameter values across different hyperparameter tuning iterations were identified using the validation set. Results indicated that the RMSE, MAE, and NSE achieved after 10 iterations of hyperparameter tuning were superior compared to other iteration counts. Consequently, the MART hyperparameters used in this study are set to a learning rate of 0.09, 95 trees, Huber alpha of 0.76, and a sampling rate of 0.46.

Figure 3 illustrates the RMSE, MAE, and NSE values for 1 h lead time water level forecasts at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge during Typhoons Dujuan and Megi, using both default and tuned hyperparameters. For Typhoon Dujuan, the RMSE values of the 1 h forecast with default parameters at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge are 0.17 m, 0.25 m, and 0.28 m, respectively. The corresponding MAE values are 0.12 m, 0.16 m, and 0.13 m, with NSE values of 0.96, 0.95, and 0.93. When applying parameters obtained through the grid search method, the RMSE values improve to 0.14 m, 0.19 m, and 0.27 m, respectively, while the MAE values remain at 0.12 m, 0.16 m, and 0.13 m, with NSE values of 0.97, 0.97, and 0.94.

For Typhoon Megi, the RMSE values with default parameters at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge are 0.27 m, 0.29 m, and 0.26 m, respectively. The corresponding MAE values are 0.21 m, 0.21 m, and 0.21 m, with NSE values of 0.81, 0.86, and 0.83. Using grid search parameters, the RMSE values improve to 0.27 m, 0.22 m, and 0.25 m, respectively, with MAE values of 0.23 m, 0.16 m, and 0.20 m, and NSE values of 0.81, 0.92, and 0.85.

Overall, the results demonstrate that hyperparameters tuned through the grid search method generally yield better RMSE values for 1 h water level forecasts at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge for both typhoons, compared to default parameters. However, for Typhoon Megi, the MAE value at Tu-Ti-Kung-Pi is slightly higher with grid search parameters, likely due to the model’s slight overestimation of observed water levels at this location during high water events.

3.2. Water Level Forecasting

This study utilizes the Ensemble Kalman Filter (EnKF) in conjunction with a flood routing model for river flood forecasting. The boundary conditions are defined as follows: for the upstream boundaries at Hsinhai Bridge, Zhongzheng Bridge, Dazhi Bridge, and Erchong Floodway, observed water levels from the Water Information Integration Platform of the Taiwan Water Resources Agency are used. The downstream boundary at the river mouth utilizes water levels calculated from a tidal level equation established in this study. For the internal boundary, the MART forecasts water levels for 1 h to 3 h lead time at Bailing Bridge (Keelung River) and at Tu-Ti-Kung-Pi and Taipei Bridge (main stream of Danshui River). These boundary conditions generate an ensemble through random sampling, after which the unsteady flow model forecasts water levels. The Kalman filter then corrects and updates these forecasted levels, which serve as initial conditions for the next time step. Before each new flood routing calculation, a fresh ensemble of boundary conditions is randomly generated and applied for forecasting over the next hour. This iterative process of flood routing, correction, and updating via the Kalman filter is repeated to obtain the probabilistic distribution of forecasted water levels for a 1 h lead time, and similarly for 2 h and 3 h lead times.

3.2.1. Model Calibration

The calibration of the model in this study involved evaluating the forecasting performance for water levels at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge during Typhoon Dujuan in 2015 and Typhoon Nepartak in 2016. Figure 4 illustrates the RMSE values of the forecasts at these locations from present time to 3 h lead time for both typhoons, demonstrating a decline in forecast accuracy with increasing lead times. Figure 5 compares the observed with the model-forecasted water levels at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge at present time for both typhoons. It indicates that the model, employing internal boundary adjustments and Kalman filter corrections, can provide relatively accurate forecasts at present time, though forecast errors increase from 1 h to 3 h lead times due to various uncertainties. Figure 6, Figure 7 and Figure 8 show the R² values calculated at the three stations for Typhoon Dujuan and Typhoon Nepartak. These figures also illustrate that R² values decrease with longer lead times, yet all R² values remain above 0.8, indicating that the forecast results are generally good and within an acceptable range. Although the 3 h lead time forecasted water levels at Bailing Bridge and Tu-Ti-Kung-Pi are less accurate, the results still fall within an acceptable range.

3.2.2. Model Validation

Further validation of the model was conducted using observed data from Typhoon Megi in 2016 and Typhoon Hinnamnor in 2022 to assess its forecasting performance. Figure 9 displays the RMSE values for forecasts at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge from present time to 3 h lead time for both typhoons, indicating good forecast accuracy at various lead times. Figure 10 compares the observed water levels with the model-forecasted water levels at the three stations at present time for Typhoon Megi and Typhoon Hinnamnor, showing that the simulated water levels are generally close to the observed levels. Figure 11, Figure 12 and Figure 13 illustrate the R² values obtained at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge for both typhoons, spanning from the present time to a 3 h lead time. The results indicate a decrease in R² values as the forecast lead time increases. Similarly, Wang and Babovic [17] applied a hybrid Kalman filter to enhance water level forecasting and reported that RMSE values increased with longer lead times.

The forecasted water levels for Typhoon Hinnamnor are more accurate than those for Typhoon Megi. For Typhoon Megi, uncertainties in boundary conditions likely caused a significant decrease in R² values at 2 h and 3 h lead time.

Additionally, for Typhoon Hinnamnor, the water level profiles along the Danshui River to Dahan River and along the Keelung River were plotted at present time (2:00 AM on 4 September 2022), and for the forecasted flood peak times over the next 1 to 3 h, as shown in Figure 14. It demonstrates that the internal boundary adjustments and Kalman filter corrections can align the forecasted water levels closely with the observations. Nonetheless, as the forecast lead time extends, the correction effect weakens, leading to a gradual divergence between the forecasted and observed water levels. Similarly, Fu et al. [21] employed multiple additive regression trees for water level forecasting and observed a decline in forecasting accuracy with increasing lead time.

This study applies the MART model, recognized for its rapid training capabilities and robust performance, to predict flood water levels with lead times of 1 to 3 h. The model integrates rainfall data spanning from t − 3 to t + 6 and water level data from t − 3 to t. The findings demonstrate that MART effectively captures peak water levels and flood trends at a 1 h lead time, even under conditions of abrupt, extreme changes or outlier observations. However, forecast errors increase with lead times of 2 and 3 h, due to factors such as unaccounted errors from the preceding forecast step, uncertainties in rainfall forecasts, and the omission of lateral inflows along riverbanks in the modeling process.

4. Discussion

This study presents a water level forecast by integrating the flood forecasting model with MART and EnKF (module 1). Module 1 is compared with two other methods: module 2, which uses an ensemble unsteady flow model with internal boundary correction combined with MART, and module 3, which utilizes an ensemble unsteady flow model without internal boundary correction. The effects of these different correction methods on water level forecast results are analyzed. The evaluation, using a 95% confidence interval, is based on data from Typhoon Muifa in 2022, providing insights into the performance of the three forecasting methods.

4.1. Water Level Forecast with Module 1

Figure 15 illustrates the water level hydrographs and 95% confidence intervals for Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge, forecasted from 1 to 3 h lead times, using the unsteady flow model with internal boundary correction combined with MART and EnKF. The forecasted hydrographs at these stations generally match the observed water level trends at a 1 h lead time, but accuracy decreases with longer forecast times. At Taipei Bridge, deviations from observed values occur between the 1st and 6th hours and the 13th and 18th hours, due to the influence of internal boundary forecast values. However, accurate forecasts of internal boundary water levels allow the 95% confidence interval to reliably capture actual values, indicating that boundary condition uncertainty is closely related to forecast accuracy.

4.2. Water Level Forecast with Module 2

Figure 16 shows the water level hydrographs and 95% confidence intervals for Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge, forecasted from 1 to 3 h lead times, using the ensemble unsteady flow model with internal boundary correction and MART. The forecasted hydrographs generally match observed trends at a 1 h lead time, though some deviations occur as the forecast time increases. At Taipei Bridge, the 1 to 3 h forecast results are influenced by forecasted internal boundary water levels, leading to deviations from observed values between the 1st and 6th hours and the 13th and 18th hours. Within the 95% confidence interval, accurate forecasted internal boundary water levels result in the effective prediction of actual levels. As forecast time increases, the confidence interval widens, indicating increasing uncertainty.

4.3. Water Level Forecast with Module 3

Figure 17 displays the water level hydrographs and 95% confidence intervals for Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge, from forecasted 1 to 3 h lead times, using the ensemble unsteady flow model without internal boundary correction. The forecasted hydrographs are similar to observed levels from 1 to 3 h lead times, but accuracy decreases as forecast time increases.

Figure 18 depicts the RMSE values of forecasted water levels for the three modules at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge from present time to 3 h lead time. The RMSE at Tu-Ti-Kung-Pi indicates relatively ideal forecast results. Module 1 produces superior results compared to module 2, which in turn outperforms module 3. This is because Tu-Ti-Kung-Pi is more affected by tides, making its RMSE more reasonable compared to Taipei Bridge and Bailing Bridge. Forecasts at Taipei Bridge and Bailing Bridge are influenced by internal boundary water levels, resulting in better RMSE results for module 3 from 1 to 3 h lead times, as it does not use internal boundary forecasts, unlike modules 1 and 2.

Analysis of Variance (ANOVA) [40] is a widely used statistical method for identifying sources of variation in data and assessing whether the mean values of different groups differ significantly. In this study, ANOVA was employed to evaluate the variance between observed water levels and predicted water levels generated by various ensemble models at multiple stations for lead times of 1–3 h. The results indicate that all significance p-values, ranging from 0.148 to 0.997, exceed the threshold of 0.05. This finding suggests no statistically significant differences between the predicted water levels from the ensemble models and the observed water levels, indicating that the ensemble models provide appropriate predictions of water level distributions across the groups analyzed.

Previous research by Chung [41] used neural networks to forecast water levels at internal hydrological stations within river segments, correcting initial values in dynamic wave calculations. This method improved forecast accuracy for the first and second hours, with limited improvement for the third hour. Tsao [26] and Wu et al. [15] enhanced flood forecast accuracy by incorporating the error statistics features of the Kalman filter method, effectively reducing error propagation over time. Yu [27] demonstrated that a 95% confidence interval from probabilistic forecasts could effectively predict flood levels. The results from Chung [41] and Tsao [26] align with this study’s findings, showing that corrections using internal boundaries and the Kalman filter significantly improve forecast accuracy, though accuracy degrades over time. Consistent with Yu [27], this study found that accurate boundary condition forecasts allow the 95% confidence interval to reliably predict flood levels. However, improvements are needed in internal boundary water level forecasts. Comparisons between internal boundary forecasts and forecasted water level hydrographs indicate that boundary condition uncertainty is closely related to forecast accuracy.

4.4. Advantages, Limitations, and Future Work

To enhance stand-alone model performance, this study highlights the potential of machine learning and ensemble Kalman filtering in forecasting river water levels. The combination of machine learning and ensemble models typically yields more accurate results. This approach effectively captures the overall trends in river water level changes during typhoon events, providing vital information for issuing flood warnings, coordinating evacuations, and implementing disaster prevention and response measures. However, a limitation of the current model lies in its lack of integration of additional monitoring and forecasting data.

Future work should aim to expand the model by incorporating more observational data, such as additional water level measurements, upstream reservoir release discharges, lateral inflows from riverbanks, and pumping volumes. Forecasting information, such as predicted tidal levels at the river mouth, should also be included. Furthermore, the boundary scope of the water level forecasting model should be extended to encompass a broader region, enabling forecasts with lead times exceeding 12 h. These advancements would enhance the ability to conduct disaster prevention and emergency response operations with increased preparation time and improved readiness.

5. Conclusions

This study developed a flash flood forecasting model integrating multiple additive regression trees (MART) and the ensemble Kalman filter (EnKF) to predict water levels in the Danshui River system during typhoon periods. The model was calibrated and validated using observational data from four typhoons, and the impact of different correction methods on water level forecasting was evaluated. The findings of this study are summarized as follows.

(1): Hyperparameters in MART were tuned using the grid search method, and the performance of adjusted hyperparameters was compared against default values using RMSE and MAE metrics. Results demonstrated that water level forecasts with adjusted parameters outperformed those with default parameters.
(2): Model calibration and validation indicated that simulated water levels at present time generally aligned with forecasted levels. However, as the forecast lead time extended, the accuracy of the corrections diminished, leading to increasing divergence between forecasted and observed levels.
(3): A 95% confidence interval generated through probabilistic forecasting was utilized to explore the potential range of forecasted water levels. Findings indicated that with smaller uncertainty in boundary conditions, the confidence interval was more accurate and often encompassed the actual water level. However, as the forecast lead time increased, uncertainties in boundary conditions and other factors grew, resulting in an expanded confidence interval.
(4): Comparison of RMSE values for water level forecasts from present time to 3 h lead time at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge under modules one, two, and three revealed that the RMSE value at Tu-Ti-Kung-Pi was consistently more reasonable than those at Taipei Bridge and Bailing Bridge. This is attributed to the significant tidal influence and reduced impact from internal boundaries at Tu-Ti-Kung-Pi. After accounting for uncertainties in upstream and downstream boundaries, the primary source of uncertainty affecting Taipei Bridge and Bailing Bridge was the forecasted water level at the internal boundary.

Author Contributions

Conceptualization, J.-C.F., W.-C.L. and W.-C.H.; methodology, J.-C.F., M.-P.S. and W.-C.H.; software, M.-P.S.; validation, J.-C.F., W.-C.L. and W.-C.H.; formal analysis, M.-P.S.; investigation, J.-C.F. and W.-C.L.; resources, W.-C.L.; data curation, M.-P.S. and W.-C.H.; writing, W.-C.L. and M.-P.S.; original draft preparation, M.-P.S.; writing—review and editing, W.-C.L. and H.-M.L.; visualization, M.-P.S. and H.-M.L.; supervision, J.-C.F. and W.-C.L.; project administration, W.-C.L. and W.-C.H.; funding acquisition, W.-C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science and Technology Council, Taiwan, under grant number 112-2625-M-239-001.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors express their sincerest appreciation for funding support.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

Hsu, M.H.; Fu, J.C.; Liu, W.C. Flood routing with real-time stage correction method for flash flood forecasting in the Tanshui River, Taiwan. J. Hydrol. 2003, 283, 267–280. [Google Scholar] [CrossRef]
Hsu, M.H.; Lin, S.H.; Fu, J.C.; Chen, A.S. Longitudinal stage profiles forecasting in rivers for flash floods. J. Hydrol. 2010, 388, 426–437. [Google Scholar] [CrossRef]
Monro, J.C.; Anderson, E.A. National weather service river forecasting system. J. Hydraul. Div. 1974, HY5, 621–630. [Google Scholar] [CrossRef]
Corradini, C.; Melone, F.; Ubertin, L. A semi-distributed adaptive model for real-time flood forecasting. Water Resour. Bull. 1986, 22, 1031–1038. [Google Scholar] [CrossRef]
Solomon, S.I.; Bosso, E.; Dsorio, C.; Melo de Moraes, H.; Serrano, A. Flood forecasting for Tucurui Hydroelectrical plant, Brazil. Water Resour. Bull. 1986, 22, 209–217. [Google Scholar] [CrossRef]
Goppert, H.; Ihringer, J.; Plate, E.J. Flood forecast model for improved reservoir management in the Lenne River catchment, Germany. Hydrol. Sci. 1988, 43, 215–241. [Google Scholar] [CrossRef]
Förster, S.; Kneis, D.; Gocht, M.; Bronstert, A. Flood risk reduction by the use of retention areas at the Elbe River. Int. J. River Basin Manag. 2005, 3, 21–29. [Google Scholar] [CrossRef]
Hsu, M.H.; Fu, J.C.; Liu, W.C. Dynamic routing model with real-time roughness updating for flood forecasting. J. Hydraul. Eng. 2006, 132, 605–619. [Google Scholar] [CrossRef]
Kimura, N.; Hsu, M.H.; Tsai, M.Y.; Tsao, M.C.; Yu, S.L.; Tai, A. A river flash flood forecasting coupled with ensemble Kalman filter. J. Flood Risk Manag. 2016, 9, 178–192. [Google Scholar] [CrossRef]
Patel, S.B.; Mehta, D.J.; Yadav, S.M. One dimensional hydrodynamic flood modeling for Ambica River, South Gujarat. J. Emerg. Technol. Innov. Res. 2018, 5, 595–601. [Google Scholar]
Belyakova, P.A.; Moreido, V.M.; Tsyplenkov, A.S.; Amerbaev, A.N.; Grechishnikova, D.A.; Kurochkina, L.S.; Filippov, V.A.; Makeev, M.S. Forecasting water levels in Krasnodar Krai Rivers with the use of machine learning. Water Resour. 2022, 49, 10–22. [Google Scholar] [CrossRef]
Kim, D.; Park, J.; Han, H.; Lee, H.; Kim, H.S.; Kim, S. Application of AI-based models for flood water level forecasting and flood risk classification. KSCE J. Civ. Eng. 2023, 27, 3163–3174. [Google Scholar] [CrossRef]
Li, S.; Yang, J. Improved river water-stage forecasts by ensemble learning. Eng. Comput. 2023, 39, 3293–3311. [Google Scholar] [CrossRef]
Mihel, A.M.; Lerga, J.; Krvavica, N. Estimating water levels and discharges in tidal rivers and estuaries: Review of machine learning approaches. Environ. Model. Softw. 2024, 176, 106033. [Google Scholar] [CrossRef]
Wu, X.L.; Xiang, X.H.; Wang, C.H.; Chen, X.; Xu, C.Y.; Yu, Z. Coupled hydraulic and Kalman filter model for real-time correction of flood forecast in the Three Gorges interzone of Yangtze River, China. J. Hydrol. Eng. 2013, 18, 1416–1425. [Google Scholar] [CrossRef]
Chen, J.C.; Chang, C.H.; Wu, S.J.; Hsu, C.T.; Lien, H.C. Real-time correction of water stage forecast using combination of forecasted errors by time series models and Kalman filter method. Stoch. Environ. Res. Risk Assess. 2015, 29, 1903–1920. [Google Scholar]
Wang, X.; Babovic, V. Application of hybrid Kalman filter for improving water level forecast. J. Hydroinform. 2016, 18, 773–790. [Google Scholar] [CrossRef]
Barthélémy, S.; Ricci, S.; Rochoux, M.C.; Le Pape, E.; Thual, O. Ensemble-based data assimilation for operational flood forecasting-On the merits of state estimation for 1D hydrodynamic forecasting through the example of the “Adour Maritime” river. J. Hydrol. 2017, 552, 210–224. [Google Scholar] [CrossRef]
Yu, L.; Tan, S.K.; Chua, L.H. Online ensemble modeling for real time water level forecasts. Water Resour. Manag. 2017, 31, 1105–1119. [Google Scholar] [CrossRef]
Lee, H.; Shen, H.; Noh, S.J.; Kim, S.; Seo, D.J.; Zhang, Y. Improving flood forecasting using conditional bias-penalized ensemble Kalman filter. J. Hydrol. 2019, 575, 596–611. [Google Scholar] [CrossRef]
Fu, J.C.; Huang, H.Y.; Jang, J.H.; Huang, P.H. River stage forecasting using multiple additive regression trees. Water Resour. Manag. 2019, 33, 4491–4507. [Google Scholar] [CrossRef]
Jang, J.H.; Lee, K.F.; Fu, J.C. Improving river-stage forecasting using hybrid models based on the combination of multiple additive regression trees and Runge-Kutta schemes. Water Resour. Manag. 2022, 36, 1123–1140. [Google Scholar] [CrossRef]
Liu, W.C.; Hsu, M.H.; Wu, C.R.; Wang, C.F.; Kuo, A.Y. Modeling salt water intrusion in Tanshui River estuarine system-Case-study contrasting now and then. J. Hydraul. Eng. 2004, 130, 849–859. [Google Scholar] [CrossRef]
Young, C.C.; Liu, W.C.; Liu, H.M. Uncertainty assessment for three-dimensional and fecal coliform modeling in Danshuei River estuarine system: The influence of first-order parametric decay reaction. Mar. Pollut. Bull. 2023, 193, 115220. [Google Scholar] [CrossRef]
Liu, W.C.; Liu, H.M.; Young, C.C.; Huang, W.C. The influence of freshwater discharge and wind forcing on dispersal of river plumes using a three-dimensional circulation model. Water 2022, 14, 429. [Google Scholar] [CrossRef]
Tsao, M.C. A River Flood Forecast Model with Data Assimilation Based on Ensemble Kalman Filter. Master’s Thesis, Department of Bioenvironmental Systems Engineering, National Taiwan University, Taiwan, 2011. [Google Scholar]
Yu, S.L. River Flood Ensemble Forecast Model. Master’s Thesis, Department of Bioenvironmental Systems Engineering, National Taiwan University, Taiwan, 2012. [Google Scholar]
Amein, M.; Fang, C.S. Implicit flood routing in natural channel. J. Hydraul. Div. ASCE 1970, 96, 2481–2500. [Google Scholar] [CrossRef]
Huber, P.J. Robust estimation of a location parameter. Ann. Math. Stat. 1964, 53, 73–101. [Google Scholar] [CrossRef]
Friedman, J.H.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion). Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Friedman, J.H.; Meulman, J.J. Multiple additive regression trees with application in epidemiology. Stat. Med. 2003, 22, 1365–1381. [Google Scholar] [CrossRef]
Burgers, G.; van Leeuwen, P.J.; Evensen, G. Analysis scheme in the ensemble Kalman filter. Mon. Weather Rev. 1998, 126, 1719–1724. [Google Scholar] [CrossRef]
Whitaker, J.S.; Hamill, T.M. Ensemble data assimilation without perturbed observations. Mon. Weather Rev. 2002, 130, 1913–1924. [Google Scholar] [CrossRef]
Hamill, T.M.; Whitaker, J.S.; Snyder, C. Distance-dependent filtering pf background error covariance in an ensemble Kalman filter. Mon. Weather Rev. 2001, 129, 2776–2790. [Google Scholar] [CrossRef]
Gaspari, G.; Cohn, S.E. Construction of correlation functions in two and three dimensions. Q. J. R. Meteorol. Soc. 1999, 125, 723–757. [Google Scholar] [CrossRef]
Nash, J.E.; Sutcliffe, J.V. River Flow Forecasting through Conceptual Model. Part 1-A Discussion of Principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
Steel, R.G.D.; Torrie, J.H. Principles and Procedures of Statistics with Special Reference to the Biological Sciences; McGraw Hill: New York, NY, USA, 1960; pp. 187–287. [Google Scholar]
Lee, K.F. Using Numerical Methods to Improve Machine Learning Models in River Stage Forecast. Master’s Thesis, National Cheng Kung University, Taiwan, 2021. [Google Scholar]
Sthle, L.; Wold, S. Analysis of variance (ANOVA). Chemom. Intell. Lab. Syst. 1989, 6, 259–272. [Google Scholar] [CrossRef]
Chung, S.F. Application of Artificial Neural Networks on Flood Routing and Forecasting. Master’s Thesis, Department of Bioenvironmental Systems Engineering, National Taiwan University, Taiwan, 2007. [Google Scholar]

Figure 2. The flow chart for calculating water levels at the present time and during the lead time.

Figure 3. RMSE, MAE, and NSE values for 1 h lead time water level forecasts at Tu-Ti-Kung-Pi, Taipei Bridge, and Bailing Bridge during Typhoons Dujuan and Megi, using default (a–c) and tuned (d–f) hyperparameters.

Figure 4. RMSE values from present time to a 3 h lead time for Typhoon Dujuan (a–c) and Typhoon Nepartak (d–f) at Tu-Ti-Kung-Pi (a,d), Taipei Bridge (b,e), and Bailing Bridge (c,f).

Figure 5. Comparison of the observed and model-forecasted water levels at present time for Typhoon Dujuan (a–c) and Typhoon Nepartak (d–f) at Tu-Ti-Kung-Pi (a,d), Taipei Bridge (b,e), and Bailing Bridge (c,f).

Figure 6. R² values calculated at Tu-Ti-Kung-Pi for Typhoon Dujuan and Typhoon Nepartak, (a,e) present time, (b,f) 1 h lead time, (c,g) 2 h lead time, and (d,h) 3 h lead time.

Figure 7. R² values calculated at Taipei Bridge for Typhoon Dujuan and Typhoon Nepartak, (a,e) present time, (b,f) 1 h lead time, (c,g) 2 h lead time, and (d,h) 3 h lead time.

Figure 8. R² values calculated at Bailing Bridge for Typhoon Dujuan and Typhoon Nepartak, (a,e) present time, (b,f) 1 h lead time, (c,g) 2 h lead time, and (d,h) 3 h lead time.

Figure 9. RMSE values from present time to a 3 h lead time for Typhoon Megi (a–c) and Typhoon Hinnamnor (d–f) at Tu-Ti-Kung-Pi (a,d), Taipei Bridge (b,e), and Bailing Bridge (c,f).

Figure 10. Comparison of the observed and model-forecasted water levels at present time for Typhoon Megi (a–c) and Typhoon Hinnamnor (d–f) at Tu-Ti-Kung-Pi (a,d), Taipei Bridge (b,e), and Bailing Bridge (c,f).

Figure 11. R² values calculated at Tu-Ti-Kung-Pi for Typhoon Megi and Typhoon Hinnamnor, (a,e) present time, (b,f) 1 h lead time, (c,g) 2 h lead time, and (d,h) 3 h lead time.

Figure 12. R² values calculated at Taipei Bridge for Typhoon Megi and Typhoon Hinnamnor, (a,e) present time, (b,f) 1 h lead time, (c,g) 2 h lead time, and (d,h) 3 h lead time.

Figure 13. R² values calculated at Bailing Bridge for Typhoon Megi and Typhoon Hinnamnor, (a,e) present time, (b,f) 1 h lead time, (c,g) 2 h lead time, and (d,h) 3 h lead time.

Figure 14. Water level profiles along the Danshui River to Dahan River (a–d) and along the Keelung River (e–h) for Typhoon Hinnamnor at (a,e) present time (2:00 AM on 4 September 2022), 1 h lead time (b,f), 2 h lead time (c,g), and 3 h lead time (d,h).

Figure 15. Hydrographs of water levels and 95% confidence intervals from 1 h to 3 h lead times using module 1 at Tu-Ti-Kung-Pi (a–c), Taipei Bridge (d–f), and Bailing Bridge (g–i). Note that the first hour corresponds to 5:00 AM on 12 September 2022.

Figure 16. Hydrographs of water levels and 95% confidence intervals from 1 h to 3 h lead times using module 2 at Tu-Ti-Kung-Pi (a–c), Taipei Bridge (d–f), and Bailing Bridge (g–i). Note that the first hour corresponds to 5:00 AM on 12 September 2022.

Figure 17. Hydrographs of water levels and 95% confidence intervals from 1 h to 3 h lead times using module 3 at Tu-Ti-Kung-Pi (a–c), Taipei Bridge (d–f), and Bailing Bridge (g–i). Note that the first hour corresponds to 5:00 AM on 12 September 2022.

Figure 18. RMSE values of water levels from present time to 3 h lead time using three modules at (a) Tu-Ti-Kung-Pi, (b) Taipei Bridge, and (c) Bailing Bridge.

Table 1. Date sets for determining the hyperparameters in MART.

Category	Typhoon Event	Time Period (Day Month Year)	Duration (hours)
Training set	Typhoon Saola	30 July 2012~3 August 2012	120
	Typhoon Haikui	6 August 2012~7 August 2012	48
	Typhoon Soulik	12 July 2013~13 July 2013	42
	Typhoon Trami	20 August 2013~22 August 2013	72
	Typhoon Usagi	19 September 2013~21 September 2013	51
	Typhoon Fitow	4 October 2013~7 October 2013	96
	Typhoon Chan-How	9 July 2015~11 July 2015	72
Validating set	Typhoon Soudelor	6 August 2015~9 August 2015	96
Test set	Typhoon Dujuan	27 September 2015~29 September 2015	72
Test set	Typhoon Megi	25 September 2016~28 September 2016	96

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fu, J.-C.; Su, M.-P.; Liu, W.-C.; Huang, W.-C.; Liu, H.-M. Water Level Forecasting Combining Machine Learning and Ensemble Kalman Filtering in the Danshui River System, Taiwan. Water 2024, 16, 3530. https://doi.org/10.3390/w16233530

AMA Style

Fu J-C, Su M-P, Liu W-C, Huang W-C, Liu H-M. Water Level Forecasting Combining Machine Learning and Ensemble Kalman Filtering in the Danshui River System, Taiwan. Water. 2024; 16(23):3530. https://doi.org/10.3390/w16233530

Chicago/Turabian Style

Fu, Jin-Cheng, Mu-Ping Su, Wen-Cheng Liu, Wei-Che Huang, and Hong-Ming Liu. 2024. "Water Level Forecasting Combining Machine Learning and Ensemble Kalman Filtering in the Danshui River System, Taiwan" Water 16, no. 23: 3530. https://doi.org/10.3390/w16233530

APA Style

Fu, J.-C., Su, M.-P., Liu, W.-C., Huang, W.-C., & Liu, H.-M. (2024). Water Level Forecasting Combining Machine Learning and Ensemble Kalman Filtering in the Danshui River System, Taiwan. Water, 16(23), 3530. https://doi.org/10.3390/w16233530

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Water Level Forecasting Combining Machine Learning and Ensemble Kalman Filtering in the Danshui River System, Taiwan

Abstract

1. Introduction

2. Materials and Methods

2.1. Description of Study Area

2.2. Data Collection

2.2.1. Geographical Data

2.2.2. Rainfall Data

2.2.3. Water Level Data

2.3. Flash Flood Forecasting Model in River

2.4. Multiple Additive Regression Trees (MART)

2.5. Ensemble Kalman Filter (EnKF)

2.5.1. Error Covariance and Analysis Equations

2.5.2. Ensemble Square Root Filtering

2.5.3. Covariance Localization

2.6. Comparison Criteria

3. Results

3.1. Determination of Hyperparameter in MART

3.2. Water Level Forecasting

3.2.1. Model Calibration

3.2.2. Model Validation

4. Discussion

4.1. Water Level Forecast with Module 1

4.2. Water Level Forecast with Module 2

4.3. Water Level Forecast with Module 3

4.4. Advantages, Limitations, and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI