1. Introduction
Accurate and reliable runoff forecasting provides data support for reservoir scheduling decisions, enhances the benefits of reservoirs under the condition of ensuring flood safety, and ultimately achieves the goal of optimizing water resources allocation, flood control, and disaster reduction [
1]. However, with the increasing frequency of climate change and human activities the runoff series exhibit more obvious nonlinearity and nonstationarity, which has enhanced the complexity of runoff forecasting [
2]. Therefore, improving the precision of runoff forecasting is a vital challenge for hydrologists [
3]. In general, runoff forecasting models are classified according to the modeling approach, which are process-driven and data-driven models [
4]. Process-driven models need to be based on hydrological concepts to analyze the runoff process for constructing a physical model, and ultimately achieve the purpose of runoff forecasting. Building physical models based on hydrologic concepts gives process-driven models the advantage of being highly interpretable but leads to models with systematic biases and reliance on high-precision data. Therefore, the accuracy of underground hydrological characteristic data and meteorological data in the basin is an important factor affecting the process-driven model performance [
5]. When the underground hydrological characteristics and historical long-term meteorological data cannot be explicitly obtained, the performance of the process-driven model is difficult to give full play [
6]. Different from process-driven models, data-driven models make the models fit the data to find patterns among the data, so there is no need to build physical models. As a result, although the interpretability of data-driven models is weak, they are better able to explore the regularity among data [
7]. Li et al. [
4] compared the runoff forecasting results of the data-driven model Long Short Term Memory (LSTM) and the process-driven model Gridded Surface Subsurface Hydrologic Analysis (GSSHA) which demonstrated that the data-driven model exhibited better performance and robustness in forecasting and calibration. Partal and Sezen [
8] used wavelet-based artificial neural network (WANN) for daily runoff forecasting. The results of the study showed that the data-driven model WANN outperformed the process-driven model GR4J. Due to the strong generalization ability, the data-driven models have gained widespread use in predicting runoff in recent years. So far, the widely used data-driven models include artificial neural networks, extreme learning machines, genetic programming, and support vector machines [
9]. Among these data-driven models, the SVM was developed by Vapnik [
10], which is based on the Vapnik–Chervonenkis dimensionality theory and the principle of structural risk minimization. The SVM model has the advantages of global optimization in theory, avoiding dimensionality disasters and small sample advantage, and thus has been widely used in runoff forecasting [
11,
12]. However, due to the nonlinear and nonstationary characteristics of the runoff series, a single model is hard to capture the periodicity and regularity in the runoff series, and the predictive ability of the single model is usually limited [
13]. Typically, two strategies are utilized to improve SVM monthly runoff forecasts. The first is to utilize intelligent algorithms to optimize SVM parameters, and the second is to apply data preprocessing techniques to decompose the underlying subseries within the runoff series for better capture of the runoff periodicity and regularity.
The commonly used parameter optimization intelligent algorithms are genetic algorithm (GA) and particle swarm optimization (PSO). Because the PSO algorithm has fewer parameters and is easy to implement, it has found broad application in the field of runoff forecasting. Li et al. [
14] applied the PSO to search for optimal parameters of the back propagation (BP) neural network and compared the forecasting results of the BP neural network with that of the PSO–BP. The comparison results indicated that the BP model optimized by the PSO algorithm provides better prediction performance. Yang et al. [
15] used PSO and LSTM model coupling to predict glacier runoff. According to the results, the PSO–LSTM model exhibited better forecasting accuracy than the LSTM model. Sudheer et al. [
16] developed a PSO–SVM model to forecast the flow, which improved the forecasting accuracy of a single SVM model. Therefore, this study adopted the PSO method to search for optimal parameters of the SVM model.
Commonly applied techniques for data preprocessing are empirical mode decomposition (EMD) [
17] and ensemble empirical mode decomposition (EEMD) [
18]. EMD is a commonly used nonlinear series decomposition method and has strong adaptability, but it is easy to produce a mode mixing problem and end effect. As a solution to the mode mixing problem of EMD, EEMD was presented by Wu and Huang [
19]. EEMD is an improved EMD method that essentially suppresses the issue of mode mixing by introducing zero-mean and well-characterized white noise. However, EEMD still has some problems, because the auxiliary white noise added in the decomposition process eventually needs to increase the number of ensemble averages to offset, and the mode components are uncontrollable, resulting in large errors in the model results. To solve these problems, Gilles [
20] proposed the empirical wavelet transform (EWT) in 2013. EWT combines EMD and wavelet transform, which has the adaptability of EMD and the completeness of wavelet transform theory, a simple and fast calculation. Because of its relatively reliable performance in separating nonlinear and nonstationary signals, EWT is widely used in mechanical fault diagnosis, medical disease diagnosis, intelligent wind speed forecasting, and financial time series forecasting [
21,
22]. Chegini et al. [
23] used EWT to decompose the bearing vibration signal to denoise the vibration signal and identify the bearing fault. Experiments showed that the denoising technology after EWT decomposition can detect early faults. Hu et al. [
24] decomposed the wind speed series by EWT, and effectively obtained the real information in the series so that the forecasting model obtained more accurate prediction ability. He et al. [
25] used EWT to decompose the financial time data into series more suitable for forecasting, which effectively reduced the influence of noise in financial time series on forecasting results. Many studies have shown that EWT can be well adopted to the decomposition of time series data, and makes the model achieve good forecasting results. Therefore, this study adopted EWT to assist the SVM model to capture the periodicity and regularity embedded in runoff series.
Global karst areas cover about 12% of the earth’s land surface and provide drinking water for almost 25% of the global population, so the study of karst area is significant in economic development [
26]. Nevertheless, the geographical structural characteristics of the karst area have serious heterogeneity, which makes the hydrological process of the karst area more complex than that of the non-karst area, and constructing an accurate runoff forecasting model is a challenging task [
27]. Therefore, the accuracy of simulation results of single models in karst areas is usually poor, and hybrid models must be explored more urgently.
In summary, although many previous studies have been done on hybrid models for runoff forecasting, there are still some problems. The motivations and contributions of this study are summarized as follows. Firstly, EWT has been widely used in series decomposition studies in other fields, but still less in decomposition of runoff forecasting. Therefore, applying EWT to runoff forecasting studies in different basins can further demonstrate its generalizability. Secondly, previous studies of hybrid models have focused on non-karst basins, so it is necessary to discuss their feasibility in karst basins. Finally, previous studies on the performance of hybrid models have mainly used sequential structured data inputs, while few have discussed their performance when inputting monthly structured data. Therefore, it is meaningful to discuss the stability of the hybrid models under different data structure inputs. To achieve these objectives, the following studies are done: (1) A hybrid EWT–PSO–SVM model based on “decomposition-forecasting-reconstruction” is constructed. (2) Using the runoff data collected from karst area as the input data of the models, the forecasting results of SVM, GA–SVM, PSO–SVM, EMD–PSO–SVM, and EWT–PSO–SVM models with different performance metrics are compared to validate the superiority and feasibility of the developed model. (3) To further identify the stability of the developed model, the forecasting results under different input data structures are compared. The EWT–PSO–SVM hybrid model and its performance investigation schematic diagram are shown in
Figure 1.
The remaining structure of the paper is organized as follows.
Section 2 describes the model method construction and the results evaluation system.
Section 3 presents an overview of the study area and data sources.
Section 4 analyzes the predictive performance of the model and further discusses the findings of this study.
Section 5 summarizes this work.
3. Research Areas and Data
The Chengbi River Karst Basin is located in the northeast of Baise City, Guangxi and belongs to Xijiang River system. Chengbi River Karst Basin covers an area of 2087 km
2, of which the karst landform area is about 1123 km
2, which is a typical karst basin. Above the middle of the basin are mostly soil mountains and karst landforms, while below the middle of the basin are alluvial along the river, and the terrain is relatively flat, belonging to hilly landforms [
34]. Due to the complexity of the karst basin, the underground hydrological characteristics of the basin are difficult to obtain. Meanwhile, the sparse meteorological stations in the Chengbi River basin before 2001 could not provide accurate meteorological data input to the process-driven model. This difficulty in obtaining underground hydrological characteristics and lack of long-term meteorological data is common in karst basins. Therefore, it is necessary to explore data-driven models based on runoff series. The catchment area above the Chengbi River Reservoir dam site covers 2000 km
2, which is 95.8% of the total catchment area of the whole basin. The annual average precipitation is 1560 mm, with about 87% of it occurring during the flood season. The average annual flow is about 37.8 m
3/s, and the basin belongs to the subtropical monsoon climate with long summers and short winters. The basin is hot and rainy in spring and summer due to the influence of the ocean monsoon. At the same time, due to the existence of more tributaries in the basin, floods with longer duration are often formed in the basin, and the flood season in the basin lasts longer, starting from April to October. The Chengbi River Reservoir can provide residential water, flood control, power generation, agricultural irrigation, and other social benefits, which play very important roles in the development of Baise City. Therefore, it is essential to establish an appropriate and high-precision runoff forecasting model, which can effectively adjust the optimal scheduling model of Chengbi River Reservoir and make full use of water resources in flood season. A total of 12 telemetric rainfall stations are set up in the Chengbi River Karst Basin. This paper focuses on Bashou station and uses a dataset of 492 monthly runoff data from 1979 to 2019 for runoff forecasting. The runoff data are provided by the Chengbi River Reservoir Authority, and runoff depth data are obtained by dividing the runoff data of the designated section by the catchment area of the reservoir. The Chengbi River Karst Basin’s approximate location is depicted in
Figure 3.
5. Conclusions
Due to the complexity of karst basin structure, runoff forecasting has always faced great challenges. According to the “decomposition-prediction-reconstruction” process, the following research steps were taken for the hybrid model EWT–PSO–SVM to forecast runoff. First, EWT is employed to split the runoff series into subseries for the purpose of reducing series nonlinearity and nonstationarity. Second, the parameters of SVM are selected using PSO, and then the subseries are substituted into the optimized SVM model for prediction. Finally, the predicted values of every subseries are reconstructed for the final runoff forecasting results. The runoff data from the Chengbi River Karst Basin was substituted into the different models for forecasting to test the superiority of the developed model. The comprehensive evaluation index of the developed model reached 0.68 and the maximum error was reduced by 43.12% compared with the SVM model. Meanwhile, the monthly structure data was fed into the different models for prediction to further verify the stability of the developed model. The composite evaluation index of the developed model under the monthly structure reached 0.67. The results show that the EWT–PSO–SVM model exhibits better performance than the single SVM model under different data structures, which indicates that data decomposition and parameter optimization strategies can effectively enhance the precision of a single SVM model. In this way, the developed EWT–PSO–SVM model makes a prospective method for predicting nonlinear and nonstationary runoff series in karst basins.
Although the developed model has shown good overall forecasting performance, there are still some limitations. The developed model has some prediction errors in predicting low values in the series. Therefore, future research directions could consider separating high and low values in the runoff series for prediction to obtain higher prediction accuracy. Meanwhile, it is valuable to further investigate the hybrid performance of EWT with other artificial intelligence or machine learning methods to enhance the accuracy of runoff forecasting.