1. Introduction
Hydrological forecasts can be obtained from past observations by identifying the variation and characteristics of hydrological systems and applied for the prediction of future hydrological data. Accurate and immediate hydrological forecasting (e.g., watershed immediate flood forecasting) is essential, because it can be used as a reference for disaster decision-making. Long-term hydrological forecasting (e.g., annual runoff) can provide a critical reference for water resource planning. Many data-driven models, including linear, nonparametric or nonlinear approaches, have been developed for hydrologic discharge time series prediction in the past decades [
1].
However, hydrological systems exhibit randomness, chaos, fuzziness, grayness, fractals and other uncertainties, because of the influence of human activity; therefore, the results of hydrological forecasting are typically affected by uncertainty. To reduce the uncertainty of the upcoming state of a hydrological system, it is crucial to apply the observed hydrological data and prediction theories to forecast changes in the hydrological system for a certain future period. There are many approaches to represent and quantify uncertainty, such as generalized likelihood uncertainty estimation (GLUE), probability, grey and fuzzy set theory. One of the favored methods for uncertainty assessment in rainfall-runoff modeling is GLUE. However, some fundamental questions related to the application of GLUE remain unresolved [
2]. Set pair analysis (SPA) is a novel method for dealing with uncertainty problems [
3]. SPA can express the overall and local structure of the relationship by using the connection degree, which can express a variety of uncertainties [
3]. A hydrological time series can be effectively predicted using the SPA-based similarity forecast (SPA-SF) model proposed by Wang
et al. [
3]. The statistic and physical concepts of the SPA-SF model are distinctive; its computation method is visual, its precision high and its modeling scheme simple and effective [
4]. Therefore, this study presents a discussion of the data-driven SPA-SF model and its application to forecasting annual runoff time series from identical, discrepant and contrary viewpoints.
The SPA theory, proposed by Zhao [
5], is a novel uncertainty theory. The core of this theory is to consider certainties and uncertainties as a certain-uncertain system and to depict uniformly all types of uncertainties, such as random uncertainty, fuzzy uncertainty, indeterminate-known uncertainty, unknown and unexpected incident uncertainty and uncertainty that results from imperfect information, using a connection degree formula that can fully embody this idea [
6]. SPA has been successfully applied to many fields, such as industry, agriculture, forestry, education, physical education, military affairs, traffic, data fusion, decision-making, forecasting, comprehensive evaluation and network planning [
6]. In the hydrology field, the SPA-SF model has been applied to hydrological time series forecasting. Jin
et al. [
4] used the SPA method to compute the similarity between the estimated and historical main physical vectors from the views of identical, discrepant and contrary sides. They regarded the weighted average of values of the water resources of the historical nearest neighbor samples as the predicted value of the estimated water resources. They then established the SPA-SF model of water resources change. They observed that the statistic and physical concepts of the SPA-SF model can be applied for forecasting different hydrological time series with abundant representative historical samples.
Li and Fu [
7] used the combined SPA test and correlation coefficient test to determine the primary physical vectors that determine the highest flood level, according to the unexpected flood variation process, the complex nonlinear relationship between the highest flood level and its influencing factors and extensive hazard scope. They then established the SPA-SF model of the highest flood level change. Their results also indicate that the SPA-SF model can provide a new theory for predicting the highest flood level. Wang and Li [
3] introduced the basic theory of the SPA and presented the applications of the SPA in the field of water resources and hydrology. Their research targets and the contents of the SPA, as well as its key questions assist in the resolution of hydrological problems [
3].
The insufficiency for annual runoff forecasting based on the SPA model is that it uses the finite-length annual runoff sequence to estimate future runoff rules; when the state of the future runoff beyond the historical data obtains the rule of itself, the model cannot do anything [
8]. The annual runoff data are often affected by noise caused by sampling errors in the runoff data. The extent of the noise on the hydrological data reduces the performance of data-driven models [
9]. Thus, the noise reduction of data, using an appropriate denoising scheme, may lead to an enhanced performance of the data-driven model [
10]. Thus, to enhance the accuracy of runoff forecasting in the SPA-SF model, denoised annual runoff data were used to compute the similarity between estimated runoff and historical runoff samples. Wavelet analysis, a multi-resolution analysis (MRA), can effectively separate the approximate and detailed signals in original hydrological time series data. Wavelet techniques are effective for denoising, and their numerous applications include image research [
11]. The wavelet denoising algorithm can satisfy the requests for various denoising procedures. Wavelet denoising is clearly superior to conventional methods and has recently been applied in the hydrology field. Lim and Lye [
12] applied wavelet-based denoising to correct a series of high temporal resolution data for a streamflow, after the data had been corrupted by tidal data. Their study confirmed that there was a tidal influence at the fluvial flow gauging site. Their method also demonstrated the potential use of wavelet analysis for solving similar problems. Liu
et al. [
13] applied wavelet analysis to decompose runoff time series data into approximate data and detailed data. Before reconstructing the wavelet method, thresholds were added to various details to reduce noise in the original runoff data. The convergence capability, learning precision and network generalization were greatly improved in a back-propagation (BP) network model with denoised runoff data.
Wang and Fei [
14] applied wavelet analysis to obtain the yearly periodic components in a data series for hydrological runoff. After eliminating periodic components, the remaining data series were denoised through wavelet analysis to obtain the dependent stochastic components. An autoregression (AR) model was then constructed using dependent stochastic series data. Their modelling results showed that, compared with traditional stochastic methods, simulation by the wavelet method obtained hydrological series parameters that were closer to those of the measured series. Wang
et al. [
15] indicated that, during wavelet analysis of a hydrological series, denoising methods should be used to eliminate the effects of noise. The affected range of the data of the hydrological series should be discarded before analysis, and the anomalous data should be used to highlight the actual undulation of the hydrological series. Cui
et al. [
16] applied the gray topological prediction method based on wavelet denoising to forecast precipitation. Their computational results showed that their model was simpler and more accurate than the basic gray topological prediction model and, therefore, provided a vital tool for forecasting precipitation, as well as preventing and mitigating disasters.
Wang
et al. [
17] developed a new wavelet transform method for the synthetic generation of daily stream flow sequences. The advantage of their method was that the generated sequences could capture the dependence structure and statistical properties presented in the data. Wavelet denoising could be combined with synthetic data generation. Chou [
18] developed a novel framework for considering wavelet denoising in linear perturbation models (LPMs) and simple linear models (SLMs). The denoised rainfall and runoff time series data, using wavelet denoising, were applied to the SLM and regarded as the smooth seasonal mean used in the LPM. The noise (
i.e., the original time series value minus the denoised time series value) was used as the perturbation term in the LPM. Chou analyzed daily rainfall-runoff data for an upstream area of the Kee-Lung River to verify the accuracy of the proposed method. He observed that wavelet denoising enhanced the rainfall-runoff modeling precision of the LPM. Li and Lu [
19] applied wavelet denoising characteristics to establish the wavelet denoising SPA model. The 1959–1989 data of Fenhe reservoir Baxia station were used. Through the comparison between the single prediction model and the synthetic prediction model of hydrological forecasting and measured series, the synthetic prediction model is superior to the single prediction model.
Nejad and Nourani [
10] applied the wavelet-based global soft thresholding method to denoise a daily time series of river stream discharges, observed at the outlet of the Murder Creek River at Brewton, Alabama. Thereafter, the denoised time series was applied to an artificial neural networks (ANN) model to forecast the flow discharge value on the following day. They observed that the outcome of the ANN model for streamflow forecasting could be 11% more accurate when the data were pre-processed using the wavelet-based denoising approach, compared with the results obtained using raw data. Nourani
et al. [
9] proposed applying the ANN approach, focusing on the wavelet-based denoising method for modeling the daily streamflow-sediment relationship. Daubechies was used as a mother wavelet to decompose both streamflow and sediment time series into detailed and approximation subseries. The appropriate input combination with raw data to estimate the current suspended sediment load (SSL) was determined and re-applied to ANN with the denoised data. Their results revealed that, regarding the determination coefficient, the result obtained using denoised data was 23.2% more accurate than that obtained using noisy, raw data. This showed that denoised data could be used successfully for ANN-based daily SSL forecasting.
Instead of using the original annual runoff time series, wavelet denoising was applied in this study to acquire the denoised annual runoff time series. Moreover, the denoised runoff time series data were applied to the SPA-SF. The uncertainty of any model mainly depends on input data, the parameters; value and the model structure. The wavelet denoising applied in the study actually reduces the uncertainty due to the input data. The remainder of this study is organized as follows. First, the structures of the SPA-SF are introduced; wavelet denoising is then described. A case study of six hydrological stations in Eastern Taiwan is introduced to demonstrate the effectiveness of the proposed method. Finally, analytical results are discussed and conclusions are given.
5. Results and Discussion
The wavelet denoising method using a fixed threshold denoising, proposed by Donoho [
24], was applied to the annual runoff time series. The original time series of the annual runoff and the denoised annual runoff time series of the six stations are shown in
Figure 3. When the traditional denoising algorithm (e.g., Fourier filtering) is applied, data distortion is generated, and the true meaning of the data is lost.
Figure 3 shows that the wavelet denoising method exerted a smoothing effect, and the data had little distortion. This result suggests that the wavelet denoising method can be used for different denoising technique requests and has prime advantages compared with the traditional denoising method (e.g., Fourier filtering).
Figure 3.
Original annual runoff time series and denoised annual runoff time series using wavelet denoising of the six stations. (a) Lijia Station; (b) Yanping Station; (c) Tateyama Station; (d) Mizuho Bridge Station; (e) Renshou Bridge Station; (f) Hualien Bridge Station.
Figure 3.
Original annual runoff time series and denoised annual runoff time series using wavelet denoising of the six stations. (a) Lijia Station; (b) Yanping Station; (c) Tateyama Station; (d) Mizuho Bridge Station; (e) Renshou Bridge Station; (f) Hualien Bridge Station.
In this study, the SPA-SF model was applied to forecast an annual runoff time series. The value of
m was selected to be five through the analysis [
20], that is the runoff time series
xi(
i = 1,2,…,
n) was dependent on the five previous historical values. With a certain classification criteria, the various elements in set
Bi were processed to obtain the symbolic rank. The current set
Bn + 1 was constructed and quantified according to the classification standard symbols. In addition, the values of
I and
J were selected as 0.5 and −1 [
20], respectively. The connection coefficient, μ′
Bn+1~Bi, of the set pair,
H(
Bn+1,
Bi), was then obtained. A suitable value of
K was chosen based on the number of the largest contact coefficients, μ′
Bn+1~Bi. Equation (2) was used to forecast the annual runoff time series. For example, the calculation process for the Hualien Bridge station in 2003 is shown in
Table 3.
Table 3 shows that the values of the largest connection coefficient and
K were 0.8 and 2, respectively. These data indicate that the values of the largest connection coefficient and
K were reasonable. This result suggests that the SPA-SF model can be applied to forecast annual runoff time series.
Table 3.
The calculation process of forecasting annual runoff for the Hualien Bridge station in 2003.
Table 3.
The calculation process of forecasting annual runoff for the Hualien Bridge station in 2003.
Sets | xi | xi+1 | xi+2 | xi+3 | xi+4 | Subsequent values
xi+5 | Identity | Differences | Compositionality | Connection coefficients |
---|
B1 | III | III | III | I | II | 37,977 | 0.4 | 0.2 | 0.4 | 0.1 |
B2 | III | III | I | II | II | 34,293 | 0.2 | 0.4 | 0.4 | 0.0 |
B3 | III | I | II | III | II | 22,702 | 0.6 | 0.4 | 0.0 | 0.8 |
B4 | I | II | III | II | I | 39,307 | 0.4 | 0.4 | 0.2 | 0.4 |
B5 | II | II | II | I | II | 34,200 | 0.0 | 0.8 | 0.2 | 0.2 |
B6 | II | II | I | I | II | 27,551 | 0.2 | 0.6 | 0.2 | 0.3 |
B7 | II | I | III | II | I | 36,767 | 0.6 | 0.4 | 0.0 | 0.8 |
B8 | I | II | II | I | II | 46,204 | 0.0 | 0.6 | 0.4 | -0.1 |
B9 | II | II | I | II | III | 30,304 | 0.0 | 0.6 | 0.4 | -0.1 |
B10 | II | I | II | III | II | 34,553 | 0.4 | 0.6 | 0.0 | 0.7 |
B11 | I | II | III | II | II | 28,901 | 0.2 | 0.6 | 0.2 | 0.3 |
B12 | II | III | II | II | I | 31,007 | 0.2 | 0.6 | 0.2 | 0.3 |
B13 | III | II | II | I | II | 49,942 | 0.2 | 0.6 | 0.2 | 0.3 |
B14 | II | II | I | II | III | 31,395 | 0.0 | 0.6 | 0.4 | −0.1 |
B15 | II | I | II | III | II | 28,567 | 0.4 | 0.6 | 0.0 | 0.7 |
B16 | I | II | III | II | I | 14,963 | 0.4 | 0.4 | 0.2 | 0.4 |
B17 | II | III | II | I | I | 34,098 | 0.2 | 0.4 | 0.4 | 0.0 |
B18 | III | II | I | I | II | 29,349 | 0.2 | 0.4 | 0.4 | 0.0 |
B19 | II | I | I | II | I | 32,706 | 0.4 | 0.4 | 0.2 | 0.4 |
B20 | I | I | II | II | II | 24,671 | 0.2 | 0.6 | 0.2 | 0.3 |
B21 | I | II | II | II | I | 46,275 | 0.2 | 0.6 | 0.2 | 0.3 |
B22 | II | I | II | I | III | 28,012 | 0.2 | 0.4 | 0.4 | 0.0 |
B23 | I | II | I | III | I | 52,494 | 0.4 | 0.2 | 0.4 | 0.1 |
B24 | II | I | III | I | III | 53,543 | 0.4 | 0.2 | 0.4 | 0.1 |
B25 | I | III | I | III | III | 16,919 | 0.2 | 0.0 | 0.8 | −0.6 |
B26 | III | I | III | III | I | – | – | – | – | – |
The results that were obtained by applying the traditional SPA-SF model to annual runoff time series forecasting are denoted as SPA-SF (
Table 4). By contrast, the results that combined the SPA-SF model and wavelet denoising are denoted as SPA-SFW (
Table 4). To compare the SPA-SF with SPA-SFW, only the similarity between estimated and historical runoff data (
i.e., weights in Equation (2)) were changed. The predicted value of the estimated annual runoff was obtained from the weighted average of the non-denoised annual runoff of the nearest neighbor historical samples.
Table 4 shows that, based on RMSE, the results obtained using the SPA-SFW model were better than those obtained using the SPA-SF model, for all six stations.
Table 4.
The forecasting results obtained using the SPA-SF and SPA-SF model and wavelet denoising (SPA-SFW) based on the RMSE for six stations.
Table 4.
The forecasting results obtained using the SPA-SF and SPA-SF model and wavelet denoising (SPA-SFW) based on the RMSE for six stations.
Lijia | Yanping |
---|
Year | Observe (m3/s) | SPA-SF (m3/s) | SPA-SFW (m3/s) | Year | Observe (m3/s) | SPA-SF (m3/s) | SPA-SFW (m3/s) |
---|
2003 | 3,385 | 3,051 | 2,192 | 2003 | 9,928 | 8,174 | 9,844 |
2004 | 2,587 | 2,774 | 2,920 | 2004 | 10,674 | 14,195 | 9,477 |
2005 | 3,448 | 2,890 | 2,789 | 2005 | 25,568 | 7,635 | 11,234 |
2006 | 4,954 | 3,880 | 4,492 | 2006 | 24,889 | 13,261 | 13,037 |
2007 | 3,757 | 2,943 | 4,029 | 2007 | 6,589 | 8,547 | 8,487 |
2008 | 2,357 | 3,124 | 2,151 | 2008 | 7,510 | 4,147 | 9,844 |
RMSE | 691 | 619 | RMSE | 9,013 | 7,707 |
Tateyama | Mizuho Bridge |
Year | Observe (m3/s) | SPA-SF (m3/s) | SPA-SFW (m3/s) | Year | Observe (m3/s) | SPA-SF (m3/s) | SPA-SFW (m3/s) |
2003 | 9,730 | 6,403 | 6,725 | 2003 | 30,552 | 34,723 | 34,158 |
2004 | 7,455 | 6,901 | 7,117 | 2004 | 34,808 | 38,753 | 32,795 |
2005 | 11,458 | 8,154 | 7,047 | 2005 | 40,134 | 38,702 | 37,280 |
2006 | 8,664 | 8,207 | 8,765 | 2006 | 39,159 | 32,655 | 45,371 |
2007 | 18,772 | 5,639 | 7,411 | 2007 | 51,320 | 44,931 | 52,029 |
2008 | 7,257 | 8,207 | 6,939 | 2008 | 34,877 | 51,015 | 42,048 |
RMSE | 5,713 | 5,128 | RMSE | 7,943 | 4,391 |
Renshou Bridge | Hualien Bridge |
Year | Observe (m3/s) | SPA-SF (m3/s) | SPA-SFW (m3/s) | Year | Observe (m3/s) | SPA-SF (m3/s) | SPA-SFW (m3/s) |
2003 | 4,624 | 7,428 | 4,225 | 2003 | 31,077 | 29,734 | 31,007 |
2004 | 6,783 | 4,851 | 5,628 | 2004 | 38,921 | 37,552 | 36,088 |
2005 | 17,327 | 6,896 | 7,834 | 2005 | 68,780 | 34,293 | 34,293 |
2006 | 12,114 | 5,135 | 5737 | 2006 | 57,138 | 28,371 | 29,611 |
2007 | 16,804 | 11,979 | 11,979 | 2007 | 68,247 | 28,901 | 34,707 |
2008 | 9,657 | 6,939 | 7,166 | 2008 | 47,589 | 30,673 | 41,024 |
RMSE | 5,770 | 5,192 | RMSE | 25,347 | 22,815 |
The overall comparisons between SPA-SF and SPA-SFW, using the RMSE for six stations, are shown in
Table 5. Based on the RMSE, the average value of the SPA-SFW (7642) model was better than that of the SPA-SF (9080) model, as shown in
Table 5. These results imply that using a denoised annual runoff time series for the SPA-SF model allows an accurate computing of the similarity between the estimated runoff and historical runoff data. Hence, better weighted values could be obtained using denoised runoff data. Moreover, the weighted average of the most similar historical runoff samples resulted in a more accurate forecasting of runoff than that obtained using the original runoff data in the SPA-SF model. This was because the high frequency and low frequency components of the signals could be effectively separated by wavelet decomposition. In addition, the high frequency components of the signals could be denoised using threshold quantifications. The denoised annual runoff was used to accurately compute the similarity between estimated and historical data in the SPA-SF model, enhancing the accuracy of runoff forecasting. The values of coefficients
S/
m,
F/
m and
P/
m in Equation (1) are different for SPA-SF and SPA-SFW. The values of these coefficients in the SPA-SF and SPA-SFW models are obtained from original and denoised annual runoff data, respectively. The noise reduction of data, using wavelet denoising, leads to an enhanced performance of the proposed data-driven model. This is the reason why SPA-SFW outperforms SPA-SF based on the SPA analysis model of Equation (1).
Table 5.
The average forecasting results obtained using SPA-SF, SPA-SFW and autoregression (AR) based on RMSE.
Table 5.
The average forecasting results obtained using SPA-SF, SPA-SFW and autoregression (AR) based on RMSE.
Name of stations | SPA-SF | SPA-SFW | AR |
---|
Lijia | 691 | 619 | 1,061 |
Yanping | 9,013 | 7,707 | 9,141 |
Tateyama | 5,713 | 5,128 | 4,674 |
Mizuho Bridge | 7,943 | 4,391 | 9,030 |
Renshou Bridge | 5,770 | 5,192 | 6,036 |
Hualien Bridge | 25,347 | 22,815 | 21,146 |
Average | 9,080 | 7,642 | 8,515 |
To compare the results obtained using the proposed method with the results obtained using the traditional method, the AR model was applied to annual runoff forecasting. The validated results obtained using SPA-SF and SPA-SFW were compared with the results obtained via the auto regressive (AR) model, as shown in
Table 5. For consistency, the value of
m for AR model was selected to be five. The values of the calibrated coefficients in the AR model for six stations are shown in
Table 6. Based on the RMSE, the average value of the SPA-SF (9080) was higher than that of the AR (8515). However, the average value of the SPA-SFW (7642) was lower than that of the AR (8515). It can be seen that the SPA-SFW (7642) outperforms AR (8515) and the SPA-SF (9080) based on the RMSE.
Table 6.
The values of the calibrated coefficients in the AR model for six stations.
Table 6.
The values of the calibrated coefficients in the AR model for six stations.
Name of stations | m = 1 | m = 2 | m = 3 | m = 4 | m = 5 |
---|
Lijia | 0.31699 | 0.25847 | 0.25648 | −0.03414 | 0.13865 |
Yanping | 0.22231 | 0.26237 | 0.15771 | −0.14061 | 0.43697 |
Tateyama | 0.08698 | 0.30733 | 0.47995 | 0.11940 | −0.00456 |
Mizuho Bridge | 0.31934 | 0.13720 | 0.39881 | 0.18277 | −0.05631 |
Renshou Bridge | 0.21177 | 0.09370 | 0.22852 | 0.28814 | 0.14405 |
Hualien Bridge | 0.06826 | 0.21847 | 0.36847 | 0.20636 | 0.12162 |
Because the SPA-SF model refers to historical samples to predict runoff, outliers that are substantially different from the historical minimum or maximum could result in poor predictive results. In other words, the size of the historical samples affects the predictive results obtained in the SPA-SF model. When the representativeness of the historical samples is high, it is expected to obtain reasonable and reliable predictions. In addition, the resolution level of the wavelet decomposition is also related to the size of the historical samples.
6. Conclusions
In this study, the SPA-SF model and wavelet denoising were applied to forecast annual runoff. The annual runoff data are often affected by noise, which reduces the performance of data-driven models. The similarity between estimated and historical runoff data in the SPA-SF model was computed from identical, discrepant and contrary viewpoints. Instead of using the original annual runoff, the denoised annual runoff using wavelet denoising was applied to compute the similarity between estimated and historical data more accurately than did the traditional SPA-SF model using the original annual runoff for model calibration. The estimated annual runoff is assumed equal to the weighted average of the values characterized by the highest connection coefficients. Therefore, the noise reduction of data through wavelet denoising led to an enhanced performance of the proposed data-driven model.
The annual runoff data of six stations in Eastern Taiwan were analyzed to verify the accuracy of the proposed method. The observed runoff data used as a benchmark for model validation purposes are the same for both applications with and without wavelet denoising. In this study, the results obtained using the annual runoff, with and without wavelet denoising, were compared. For the data obtained from the historical samples, the SPA-SFW model obtained smaller values based on the RMSE, compared with those obtained using the SPA-SF and AR model. Considering the length of the data, one resolution level was applied to carry out wavelet decomposition and denoising, obtaining acceptable results when compared with the SPA-SF and AR model based on the RMSE. The obtained results are encouraging, but further analyses on different case studies should be carried out, and some specific hydrological understanding should be added to confirm the usefulness of the proposed method for water resources planning activities.