Next Article in Journal
New Potentiometric Wireless Chloride Sensors Provide High Resolution Information on Chemical Transport Processes in Streams
Next Article in Special Issue
An Optimization Model for Waste Load Allocation under Water Carrying Capacity Improvement Management, A Case Study of the Yitong River, Northeast China
Previous Article in Journal
Phosphorus Dynamics along River Continuum during Typhoon Storm Events
Previous Article in Special Issue
An Integrated Method for Interval Multi-Objective Planning of a Water Resource System in the Eastern Part of Handan
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Model for Forecasting Groundwater Levels Based on Fuzzy C-Mean Clustering and Singular Spectrum Analysis

by
Dušan Polomčić
1,
Zoran Gligorić
1,
Dragoljub Bajić
1,* and
Čedomir Cvijović
2
1
University of Belgrade-Faculty of Mining and Geology, Đušina 7, 11000 Belgrade, Serbia
2
Čedomir Cvijović, Department of Geodesy, Belgrade University College of Applied Studies in Civil Engineering and Geodesy, Hajduk Stanka 2, 11000 Belgrade, Serbia
*
Author to whom correspondence should be addressed.
Water 2017, 9(7), 541; https://doi.org/10.3390/w9070541
Submission received: 27 April 2017 / Revised: 11 July 2017 / Accepted: 15 July 2017 / Published: 19 July 2017
(This article belongs to the Special Issue Modeling of Water Systems)

Abstract

:
Having the ability to forecast groundwater levels is very significant because of their vital role in basic functions related to efficiency and the sustainability of water supplies. The uncertainty which dominates our understanding of the functioning of water supply systems is of great significance and arises as a consequence of the time-unbalanced water consumption rate and the deterioration of the recharge conditions of captured aquifers. The aim of this paper is to present a hybrid model based on fuzzy C-mean clustering and singular spectrum analysis to forecast the weekly values of the groundwater level of a groundwater source. This hybrid model demonstrates how the fuzzy C-mean can be used to transform the sequence of the observed data into a sequence of fuzzy states, serving as a basis for the forecasting of future states by singular spectrum analysis. In this way, the forecasting efficiency is improved, because we predict the interval rather than the crisp value where the level will be. It gives much more flexibility to the engineers when managing and planning sustainable water supplies. A model is tested by using the observed weekly time series of the groundwater source, located near the town of Čačak in south-western Serbia.

1. Introduction

Maintaining the stability of groundwater exploitation represents a key issue in attaining efficient and sustainable water supplies. It involves stable recharge conditions for the captured aquifer during the exploitation, absence or the slight degradation of the initial seepage characteristics of the aquifer, as well as the selection of an appropriate exploitation regime. An optimal-yield exploitation over a period of many years produces effects related to the spread of general drawdown. It occurs as a consequence of the exploitation regime of all of the intake objects. The fluctuation of the drawdown values is influenced by seasonal wavering in the values of balance elements participating in the recharge of the captured aquifer and the exploitation regime caused by changes of consumption rate.
The deterioration of the recharge conditions of the captured aquifer and its overexploitation lead to an increase in the values of the drawdown and the effective groundwater source radius [1]. For the purpose of the effective management of the exploitation, it is necessary to know the data regarding the drawdown of the groundwater source, independent of the conditions influencing the wavering of drawdown values. In this way, we can define the range of possible total flow of the groundwater source, primarily in dry season periods.
Many models and techniques have been proposed to forecast time series in hydrogeology: the nonlinear optimization technique, the multiple linear regression method, the hybrid soft-computing technique, the hybrid wavelet packet-support vector regression method, artificial neural-network techniques, the adaptive neuro-fuzzy inference system method, and hydrodynamic modeling [2,3,4,5,6,7]. The singular spectrum analysis was used in this paper but is also implemented by various other authors [8,9,10,11,12,13,14,15,16,17].
City water consumption represents a highly dynamic temporal appearance, which causes great difficulties in the water supply management system. Reliable groundwater level forecasting is broadly recognized for its key role in the efficient management of water resources and consumption. In this paper, we proposed a hybrid model that could effectively forecast groundwater levels and improve the efficiency of the process of their management. The hybrid model combined the fuzzy C-mean clustering algorithm (FCM) and singular spectrum analysis (SSA). The FCM is able to effectively classify the monitored data into temporal states of the groundwater level. In this way, the behavior of the observed system can be defined much more flexibly. The SSA is able to effectively forecast the state of the groundwater level and provide opportunities to make different combinations within the obtained components of the data series. The proposed methodology represents an easier way of modelling groundwater levels and offers an opportunity to describe the behavior of a groundwater source without including the physical characteristics of the location. Furthermore, it can be easily updated with new information. There is an opportunity to transform this one single time series model into a multi-dimensional model by adding another observed parameter; in which case, we can use a multivariate singular spectrum analysis.
The development of the model is related to the forecasting of the future states of the groundwater level (the general drawdown) using data obtained during the period of exploitation. The model is composed of two stages: in the first stage, we make fuzzy states of the monitored data, while in the second, we forecast the future states. By using a fuzzy C-mean clustering algorithm, the original time series is divided into an adequate number of fuzzy states. Accordingly, we can create the adequate fuzzy time series. In many cases, the creation of fuzzy relations among fuzzy time series is a very difficult task. In order to avoid this, we represent fuzzy time series by cluster time series, where each cluster is defined by its center, minimum and maximum value. This approach enables us to apply a deterministic forecasting model based on the singular spectrum analysis. This analysis reveals the structure of the time series, i.e., components such as trend, oscillations and noise. Planners can create different scenarios using different combinations of components. This model is very beneficial to city authorities due to its effective water resource management.

2. Forecasting Model

In this paper, we study the forecasting time of the invariant fuzzy time series of groundwater levels. The fuzzy C-mean algorithm is used for the fuzzification of the observed data, while the SSA is applied to make a forecasting model.
By applying linear recurrent formulae, we predict the future values of cluster centers. After that, the sequence of the forecasted cluster centers is transformed into a sequence of the actual centers obtained by fuzzy C-mean clustering. The transformation uses the equation of the fuzzy C-mean clustering algorithm, which calculates the membership degree. Finally, the developed model produces the interval time series, characterized by the minimum and maximum value of the groundwater level for every point in the future.
The developed model was tested by using the real data obtained by monitoring the groundwater source Perminac. It is located in the upstream area of Čačak city. The groundwater source contains 14 wells with a maximum total capacity of 131 l/s and an average of 90 l/s. In recent years, overexploitation caused a significant decrease in the groundwater level in the wider area of the groundwater source. Accordingly, some wells were excluded from the exploitation, and supply restrictions were introduced as a way of stabilizing consumption during the summer months.

2.1. Fuzzy Time Series

Song and Chissom [18] first introduced the definition of fuzzy time series as follows [19]:
“Let X ( t ) R 1 ,     t = 0 , 1 , 2 , be the universe of observed data on which fuzzy sets f i ( t ) ,   i = 1 , 2 , are defined and let F ( t ) be a collection of f i ( t ) . Then, F ( t ) is called a fuzzy time series on X ( t ) .”
Song and Chissom [18] defined fuzzy relations among fuzzy time series, which are based on the assumption that the values of fuzzy time series F ( t ) are fuzzy sets, and the observation of time t is caused by the observations of the previous times [19].
If for any f j ( t ) F ( t ) , there exist f i ( t 1 ) F ( t 1 ) and a fuzzy relation R i j ( t , t 1 ) such that f j ( t ) = f i ( t 1 ) R i j ( t , t 1 ) , where ” ” is the relation, then F ( t ) is said to be caused by F ( t 1 ) only. It is expressed as follows:
f i ( t 1 ) f j ( t ) ,   F ( t 1 ) F ( t )
Suppose that F ( t ) is caused by F ( t 1 ) only, or by F ( t 1 ) or F ( t 2 ) or F ( t k ) ,   k > 0 . This relation can be expressed as follows:
F ( t ) = F ( t 1 ) R ( t , t 1 )
Equation (2) represents the first-order model of F ( t ) . If F ( t ) is caused by
F ( t 1 ) ,   F ( t 2 ) , , F ( t k ) ,   k > 0 simultaneously, then their relations are represented as:
F ( t ) = ( F ( t 1 ) × F ( t 2 ) , , F ( t k ) ) R ( t , t k )
Equation (3) represents the k-th order model of F ( t ) , and R ( t , t k ) is a relation matrix describing the fuzzy relationship between F ( t 1 ) × F ( t 2 ) , , F ( t k ) and F ( t ) .
To fuzzify the observed data, we apply the fuzzy C-mean algorithm.

2.2. A Brief Description of the Fuzzy C-mean Algorithm

In order to divide the observed data into an adequate number of fuzzy states, we apply the fuzzy C-mean clustering algorithm [20,21,22,23] over the set S = { s i } , i = 1 , 2 , , N . The reason that we clustered the time series is primarily related to the need to develop models that use the results of monitoring in a form that represents the states of the observed appearance. Decision-making models based on the interval inputs are much more flexible than deterministic models. Management models have a much higher confidence because they incorporate uncertainties expressed by intervals into management systems.
The fuzzy C-mean algorithm is a method based on the minimization of a generalized least-squared errors-function. Given a set S = { s 1 , s 2 , , s N } R N × q , where N is the number of the observed data and q is the dimension of the sample s i ( i = 1 , 2 , , N ) , q = 1 . Every cluster is a fuzzy set defined by the relative closeness of space S. Suppose that there is a groundwater level vector composed of M cluster centers; C m = { c m } ,   m = 1 , 2 , .. , M . For the i-th relative closeness and m-th cluster center, there is a membership degree u m i [ 0 , 1 ] indicating with what degree the relative closeness SN belongs to the cluster center vector Cm, which results in a fuzzy partition matrix U = | u i m | N × M .
Let uim be the membership, cm the center of the cluster, N the number of observed data and M the number of clusters. This algorithm aims to determine cluster centers and the fuzzy partition matrix by minimizing the following function:
J = ( U , c 1 , c 2 , , c m , S ) = i = 1 N m = 1 M u i m ω d i m 2 ( s i , c m )
subject to
m = 1 M u i m = 1 ,   i = 1 , 2 , , N
0 u i m 1 ,   m = 1 , 2 , , M ; i = 1 , 2 , , N
0 < i = 1 N u i m < m ,   m = 1 , 2 , , M
where dim is Euclidean distance between the observation and the center of the cluster, defined as:
d i m = ( s i c m ) 2   , i = 1 , 2 , , N
Finally, the objective function is:
J = ( U , c 1 , c 2 , , c m , S ) = i = 1 N m = 1 M u i m ω ( s i c m ) 2
The objective function J represents the intra-cluster variance. If we want to have those elements that are most similar to the cluster center in a given cluster, we can do this by minimizing the variance inside the cluster. The exponent ω is used to adjust the weighting effect of membership values. A large ω will increase the fuzziness of the function J. Pal and Bezdek [24] suggested that ω in the interval [1.5, 2.5] was generally recommended for use in FCM.
In this paper, the value of ω is set to 2 as a midpoint of the suggested interval. The objective function is iteratively minimized. In j-th iteration, the values of u i m ω and c m are updated as follows:
c m = i = 1 N u i m ω s i i = 1 N u i m ω
u i m = ( 1 | s i c m | ) 1 ω 1 m = 1 M ( 1 | s i c m | ) 1 ω 1
The iteration process stops at J ( j + 1 ) J ( j ) < δ , where δ represents the minimum amount of improvement. Sorting the sequence of obtained centers in an ascending order gives us c 1 < c 2 < < c m .
The fuzzification of the data is done according to the results of the final fuzzy partition matrix.
U = | u 11 u 12 u 1 m u 21 u 22 u 2 m u i 1 u i 2 u i m |
The number of fuzzy sets corresponds to the number of clusters. Each row of the matrix U represents the fuzzy state of that observation. Accordingly, we obtain the fuzzy state matrix of the observed data:
A i = | A 1 = u 11 ( c 1 ) c 1 + u 12 ( c 2 ) c 2 + + u 1 m ( c m ) c m A 2 = u 21 ( c 1 ) c 1 + u 22 ( c 2 ) c 2 + + u 2 m ( c m ) c m A N u i 1 ( c 1 ) c 1 + u i 2 ( c 2 ) c 2 + + u i m ( c m ) c m |
The state of the observed data is defined as:
A i m = m a x ( u i m ( c m ) c m )
Finally, the sequence Aim represents a fuzzy time series on X ( t ) . In this way, we obtain the transitions from one state to another over the time of observation; A m ( t = 0 ) A m ( t = 1 ) A m ( t = N ) ,   m { 1 , 2 , , M } .
The creation of a set of certain transition rules for fuzzy relationships between states can be very difficult. To overcome this situation, we transform the fuzzy time series into an adequate time series of the center of the clusters. This approach enables us to apply a deterministic forecasting model based on the singular spectrum analysis.

2.3. Forecasting Model Based on the Singular Spectrum Analysis

The process of the transformation of the fuzzy time series into a crisp time series is based on the fact that each fuzzy state A m ( t ) ,   m { 1 , 2 , , M } can be represented by a corresponding center of the cluster. Accordingly, the following time series, c m ( t = 0 ) c m ( t = 1 ) c m ( t = N ) ,   m { 1 , 2 , , M } , are obtained.
The forecasting algorithm is based on SSA methodology [25,26,27]. In SSA terminology, it is often assumed that the series is noisy with an arbitrary series length N. The SSA technique consists of two main complementary stages: decomposition and reconstruction. The noisy series is decomposed in the first stage, and the noisy reduced series is reconstructed at the second stage. The reconstructed series will be used for forecasting the future values.
Consider the stochastic process { c m ( t ) } ,   t = 1 , 2 , .. , N ; m { 1 , 2 , .. , M } and suppose that a realization of size N from this process is available: C m ( t ) = { c m ( 1 ) , c m ( 2 ) , , c m ( N ) } . Since we are faced with time-invariant series, and for simplicity, we can rewrite the realization as follows: C N = { c 1 , c 2 , , c N } .
The first stage of the algorithm, called decomposition, includes the following two steps: embedding and singular value decomposition (SVD).
Embedding is a mapping that transfers a one-dimensional time series of centers C n = { c 1 , c 2 , , c N } into a multidimensional matrix [ Y 1 , , Y K ]   with vectors Y j = ( c j , , c j + L + 1 ) T R L , where L   ( 1 < L N 1 ) is the window length and K = N L + 1 . The window length represents a vector of L observations of the original series. If we remember Equation (3), we can see the window length model is similar to the k-th order model of the fuzzy time series, but taking into account original values from t = 1 to t = L. The usual value of L is (N + 1)/2 if N is odd and N/2 or (N/2) + 1 if N is even (for more details see [27]). The result of this step is the trajectory matrix:
Y = [ Y 1 , , Y k ] = [ c i j ] i , j = 1 L , K = [ c 1 c 2 c K c 2 c 3 c K + 1 c L c L + 1 c N ]
The trajectory matrix Y is the Hankel matrix where all elements along the diagonal i + j = const are equal.
The SVD of matrix Y is based on the spectral decomposition of the lag-covariance matrix Y Y T R L × L . Denote λ 1 , , λ L as the eigenvalues of Y T , arranged in decreasing order λ 1 0 λ L 0 , and U 1 , , U L the corresponding eigenvectors. The SVD of the trajectory matrix Y can be represented as
Y ^ = i = 1 d U i U i T Y = Y ^ 1 + Y ^ 2 + + Y ^ d
where d is the rank of Y.
The second stage of the algorithm, called reconstruction, includes the following two steps: grouping and diagonal averaging or Hankelization.
The grouping step corresponds to the splitting of the set of matrices { Y ^ 1 , Y ^ 2 , .. , Y ^ d } into several disjointed subsets and the summing of the matrices within each subset. The procedure of choosing the subsets { I 1 , , I k } is called grouping. As a simple case, where we have only signal and noise components (k = 2), we use two subsets, I 1 = { 1 , , r } and I 2 = { r + 1 , , d } , and associate the subset I 1 with the signal component and the subset I 2 with noise. Selecting the appropriate number of eigenvalues (r) to be included into the reconstruction is very important. If we take an r smaller than it should really be, some parts of the signal will be lost and the accuracy of the reconstructed series will be lower. On the other hand, if the value of r is too large, then a lot of noise will be included into the reconstructed series. After performing a singular value decomposition of the trajectory matrix, singular values ordered in a decreasing manner are obtained. The plot of the logarithms of the obtained singular values gives very useful information regarding breaks in the eigenvalue spectra. The component where a significant drop in values occurs can be interpreted as the start of the noise floor [28].
Diagonal averaging or Hankelization represents the last step in SSA, where each reconstructed trajectory matrix (see Equation (16)) is transformed into a new one-dimensional time series of length N. This corresponds to the averaging of the matrix elements over the anti-diagonals i + j=k + 1; the selection k = 1 gives c ^ 1 = c ^ 11 , for k = 2, c ^ 2 = ( c ^ 12 + c ^ 21 ) / 2 , c ^ 3 = ( c ^ 13 + c ^ 22 + c ^ 31 ) / 3 and so on. For example, the reconstructed trajectory matrix Y ^ 1 is transformed into a new one-dimensional time series C ^ 1 . Finally, the original time series CN is decomposed into a sum of r vectors or principal components:
C N = i = 1 r C ^ i N = C ^ 1 N + C ^ 2 N + + C ^ r N
The reconstructed (extracted) series will be used to forecast new data points.
The third stage of the algorithm concerns the future states of the groundwater level and is based on the linear recurrent formulae. Let U i denote the vector of the first L-1 coordinates of the eigenvectors U i and π i indicate the last coordinate of the eigenvectors U i ,   i = 1 , 2 , , r . Define the verticality coefficient as
v 2 = i = 1 r π i 2 = π 1 2 + π 2 2 + + π r 2
If v 2 < 1 , then the h-step ahead SSA forecasting exists. Obviously, the value of r must be carefully selected to satisfy the previous inequality, as well as to separate the signal from the noise components. The main concept behind the definition of the value of r is related to the dependence between the different reconstructed (principal) components [28]. The weighted correlation represents the level of dependence between the two series C ^ 1 N and C ^ 2 N :
ρ 1 , 2 w = | C ^ 1 N , C ^ 2 N w | C ^ 1 N w C ^ 2 N w
where
  • | C ^ 1 N , C ^ 2 N w | —absolute value of the weighted Frobenius inner product,
  • C ^ i N w ,   i = 1 , 2 , , r —the weighted norm
  • w ( t ) = m i n ( t , L , N t ) —vector of weights.
If the two reconstructed components have zero w-correlation, it means that these two components are well separated. Large values of w-correlations between the reconstructed components indicate that the components should possibly be gathered into one group and correspond to the same component in SSA decomposition [28]. The obtained correlations can be effectively represented by the λ L × λ L grey-scaled correlation matrix.
The linear vector of coefficients = ( α L 1 , α L 2 , , α 1 ) T is calculated as follows:
= 1 1 v 2 i = 1 r π i U i
The h-step ahead SSA forecasting is achieved by the following equation:
{ c ˜ ( t ) } T = { { c ^ ( 1 ) , c ^ ( t ) } , t = 1 , , N T C h , t = N + 1 , , N + h
where
C h ( t ) = { c ˜ ( N L + h + 1 ) , , c ˜ ( N + h 1 ) } T
The accuracy of the proposed model is estimated by the mean absolute percentage error (MAPE) and the coefficient of determination (R2):
M A P E = 1 N t = 1 N ( | s ( t ) c ˜ ( t ) | s ( t ) )
R 2 = 1 t = 1 N ( s ( t ) c ˜ ( t ) ) 2 t = 1 N ( s ( t ) s ¯ ( t ) ) 2
where s(t) is the actual value, c ˜ ( t ) is the forecasted value of the cluster center and s ¯ ( t ) is the average of the observed set. R2 is a positive number which demonstrates how well the model fits the data. It can take values between zero and one, where zero indicates that there is a poor correlation between the model output and the actual data. Note, there is a difference between the actual c ˜ m ( t ) and the forecasted value c ˜ ( t ) of the cluster center. The sequence of the forecasted cluster centers is now transformed into a sequence of the actual centers by Equation (11); C ˜ ( t ) = { c ˜ ( 1 ) , , c ˜ ( N ) , c ˜ ( N + h ) } C m ( t ) = { c m ( 1 ) , , c m ( N ) , c m ( N + h ) } .
According to the concept of the C-mean clustering algorithm, each fuzzy state can be defined as a triplet; A ( t ) = [ a m m i n ( t ) , c m ( t ) , a m m a x ( t ) ] ,   m { 1 , 2 , .. , M } , where a m m i n ( t ) is equal to the element of the cluster with the minimum value, a m m a x ( t ) is equal to the element with the maximum value and c m ( t ) has already been explained. Finally, the developed model produces the interval time series A ( t ) = [ a m m i n ( t ) , a m m a x ( t ) ] ,   m { 1 , 2 , .. , M } ; t = 1 , 2 , , N ,   N + h .

3. Numerical Example

The groundwater source of Perminac was formed in the alluvion of the Zapadna Morava river, in the Zapadna Morava valley in the south-western region of the Republic of Serbia. Alluvial sediments are composed of sand and gravel varying from 4 to 6 m in thickness. The presence of a hydraulic connection to the Zapadna Morava river enables the intensive recharge of the aquifer. The groundwater source was formed along the left bank of the river, upstream from the town of Čačak. The location of the study area is represented by Figure 1.
The data used in this paper includes weekly groundwater level time series. We divided the set of data into the training subset, where the model is applied, and the validation subset, where the comparison between the forecasted and actual values is made. About 85% of the data was used to check the confidence of the model, while about 15% was used to check its validity. The main reason for such data division was primarily influenced by a lack of funds for a longer period of exploration; the monitoring lasted only one year. By using this method of data division, we wanted to be sure about the confidence of the model. Usually, 2/3 of data is used for training and 1/3 for validation.
The observed data is represented in Table 1.
We used the exponent ω = 2 and seven clusters to partition the original time series (from week 1 to 45) and the resulting cluster centroids were as follows: C={c1,c2,c3,c4,c5,c6,c7}={244.603;244.994;245.162;245.329;245.552;245.740;246.641}. Next, the historical data was fuzzified with respect to where the maximum membership degree occurred. For example, the fuzzy state for week 7 was A5 because c5 had the greatest membership degree. Table 2 and Figure 2 and Figure 3 give the results of the fuzzification of the data based on the application of the fuzzy C-mean clustering algorithm.
Having obtained the sequence of fuzzy state transitions, we can continue searching for the relation which describes it. For that purpose, we have performed an SSA decomposition of the cluster center time series. The window length L in the SSA decomposition has taken a value of 23, while the value of K was also 23. The initial cluster of the center time series was decomposed into 22 principal components, and they were ordered with respect to the decreasing value of their eigenvalues. Figure 4 depicts the plot of the logarithms of the 22 singular values. Here, a significant drop in the logarithm values occurs around component 9, and we adopted this as the start point of the noise floor.
Figure 5 represents the eigenvectors related to the first nine eigenvalues.
A matrix of weighted grey-scaled correlations between the 22 principal components is represented by Figure 6. The first nine principal components were selected for the reconstruction stage (see Figure 7).
The reconstruction of the original time series CN = 45 using the first nine principal components (r = 9) is represented by Table 3 and Figure 8.
Bold letters indicate the difference between the original and reconstructed fuzzy state for the training subset of data (weeks 1–45). In ten cases, the model missed the original fuzzy state within a range of ±1 state, while the difference was ±2 states in only one case.
The accuracy of the proposed model is estimated by Equations (23) and (24) and represented by Table 4.
The results represented in Table 4 indicate that the model has a very high accuracy and can be used for the forecasting of future states. We used Equation (21) over the period t = 46, 47,…, 52 for the purpose of measuring the validity of the developed model; the results are represented in Table 5.
Bold letters indicate the difference between the original and forecasted fuzzy state for the validation subset of data (weeks 46–52). Only in two cases did the model miss the original fuzzy state within a range of ±1 state. The accuracy of the model for the period of validation is represented by Table 6.
When applying Equation (21), we forecasted the future values of the groundwater level for t = 53, 54,…, 85 (see Figure 9).

4. Conclusions

Having the ability to forecast the future states of any system plays a key role in the planning process. The main aim of this paper was to develop a forecasting model for future states of the groundwater level (the general drawdown) using data obtained during the period of exploitation.
The model is composed of two stages. In the first stage, we make fuzzy states of the monitored data, while in the second, we forecast future states. Using a fuzzy C-mean clustering algorithm, the original time series is divided into an adequate number of fuzzy states. After that, an adequate number of fuzzy time series are created. In many cases, creating the fuzzy relations among the fuzzy time series is a very difficult task. In order to avoid this, the fuzzy time series is represented by an adequate cluster of time series, where each cluster is defined by its center, minimum and maximum value. This approach enables us to apply a deterministic forecasting model based on a singular spectrum analysis.
The validation of the developed hybrid model has been performed using real data obtained by monitoring the groundwater level. The values of the mean absolute percentage error and the coefficient of determination show the high accuracy of the developed model. There are no limits on the application of the model for representing only numerical examples. We can use it to forecast the future states of any time series in hydrogeology. For example, to forecast precipitation, the yield of a groundwater source, and the inflow or outflow of a defined area using different time spans (day, week, month, year).
The forecasted states of the flow or groundwater level that can be obtained by the application of this model enable us to set up state boundary conditions for water supply planners more efficiently. Further research will be focused on the creation of the multivariable forecasting model.

Acknowledgments

Our gratitude goes to the Ministry of Education, Science and Technological Development of the Republic of Serbia for financing projects “OI176022”, “TR33039” and “III43004”.

Author Contributions

All authors contributed to the finalization of paper by joint efforts: D.P. designed the methodology and wrote the final manuscript with Z.G. and D.B. Research plan, fieldwork and initial data analysis was conducted by Č.C.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bear, J. Hydraulics of Groundwater; McGraw-Hill: New York, NY, USA, 1979. [Google Scholar]
  2. He, B.; Takase, K.; Wang, Y. Regional groundwater prediction model using automatic parameter calibration SCE method for a coastal plain of Seto Inland Sea. Water Resour. Manag. 2007, 21, 947–959. [Google Scholar] [CrossRef]
  3. Chang, F.; Chang, L.; Huang, C.; Kao, I. Prediction of monthly regional groundwater levels through hybrid soft-computing techniques. J. Hydrol. 2016, 541, 965–976. [Google Scholar] [CrossRef]
  4. Raghavendra, S.; Deka, P.C. Forecasting monthly groundwater level fluctuations in coastal aquifers using hybrid Wavelet packet-Support vector regression. Cogent Eng. 2005. [Google Scholar] [CrossRef]
  5. Emamgholizadeh, S.; Moslemi, K.; Karami, G. Prediction the Groundwater Level of Bastam Plain (Iran) by Artificial Neural Network (ANN) and Adaptive Neuro-Fuzzy Inference System (ANFIS). Water Resour. Manag. 2014, 28, 5433–5446. [Google Scholar] [CrossRef]
  6. Sahoo, S.; Madan, K.J. Groundwater-level prediction using multiple linear regression and artificial neural network techniques: A comparative assessment. Hydrogeol. J. 2013, 21, 1865–1887. [Google Scholar] [CrossRef]
  7. Polomčić, D.; Bajić, D. Application of Groundwater modeling for designing a dewatering system: Case study of the Buvač Open Cast Mine, Bosnia and Herzegovina. Geol. Croat. 2015, 68, 123–137. [Google Scholar] [CrossRef]
  8. Barco, J.; Hogue, T.S.; Girotto, M.; Kendall, D.R.; Putti, M. Climate signal propagation in southern California aquifers. Water Resour. Res. 2010. [Google Scholar] [CrossRef]
  9. Cao, G.; Zheng, C. Signals of short-term climatic periodicities detected in the groundwater of North China Plain. Hydrol. Process. 2015, 30, 515–533. [Google Scholar] [CrossRef]
  10. Dickinson, J.E.; Hanson, R.T.; Ferré, T.P.A.; Leake, S.A. Inferring time-varying recharge from inverse analysis of long-term water levels. Water Resour. Res. 2004. [Google Scholar] [CrossRef]
  11. James, S.C.; Doherty, J.E.; Eddebbarh, A. Practical Postcalibration Uncertainty Analysis: Yucca Mountain, Nevada. Groundwater 2009, 47, 851–869. [Google Scholar] [CrossRef] [PubMed]
  12. Markovic, D.; Koch, M. Stream response to precipitation variability: A spectral view based on analysis and modelling of hydrological cycle components. Hydrol. Process. 2014, 29, 1806–1816. [Google Scholar] [CrossRef]
  13. Masbruch, M.D.; Rumsey, C.A.; Gangopadhyay, S.; Susong, D.D.; Pruitt, T. Analyses of infrequent (quasi-decadal) large groundwater recharge events in the northern Great Basin: Their importance for groundwater availability, use, and management. Water Resour. Res. 2016, 52, 7819–7836. [Google Scholar] [CrossRef]
  14. Shun, T.; Duffy, C.J. Low-frequency oscillations in precipitation, temperature, and runoff on a west facing mountain front: A hydrogeologic interpretation. Water Resour. Res. 1999, 35, 191–201. [Google Scholar] [CrossRef]
  15. Tiwari, R.K.; Rajesh, R. Imprint of long-term solar signal in groundwater recharge fluctuation rates from Northwest China. Geophys. Res. Lett. 2014, 41, 3103–3109. [Google Scholar] [CrossRef]
  16. Yiou, P.; Baert, E.; Loutre, M.F. Spectral analysis of climate data. Surv. Geophys. 1996, 17, 619–663. [Google Scholar] [CrossRef]
  17. Zhang, Q.; Wang, B.; He, B.; Peng, Y.; Ren, M. Singular Spectrum Analysis and ARIMA Hybrid Model for Annual Runoff Forecasting. Water Resour. Manag. 2011, 25, 2683–2703. [Google Scholar] [CrossRef]
  18. Song, Q.; Chissom, B.S. Fuzzy time series and its models. Fuzzy Sets Syst. 1993, 54, 269–277. [Google Scholar] [CrossRef]
  19. Li, S.-T.; Cheng, Y.-C.; Lin, S.-Y. A FCM-based deterministic forecasting model for fuzzy time series. Comput. Math. Appl. 2008, 56, 3052–3063. [Google Scholar] [CrossRef]
  20. Bezdek, J.C.; Enrlich, R.; Full, W. FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 1984, 10, 191–203. [Google Scholar] [CrossRef]
  21. Lu, Y.H.; Ma, T.H.; Yin, C.H.; Xie, X.Y.; Tian, W.; Zhong, S.M. Implementation of the Fuzzy C-Means Clustering Algorithm in Meteorogical Data. Int. J. Database Theory Appl. 2013, 6, 1–18. [Google Scholar] [CrossRef]
  22. Wang, X.; Wang, Y.; Wang, L. Improving fuzzy c-means clustering based on feature-weight learning. Pattern Recognit. Lett. 2004, 25, 1123–1132. [Google Scholar] [CrossRef]
  23. Chang, C.T.; Lai, J.Z.C.; Jeng, M.D. A fuzzy K-means clustering algorithm using cluster center displacement. J. Inf. Sci. Eng. 2011, 27, 995–1009. [Google Scholar]
  24. Pal, N.R.; Bezdek, J.C. On cluster validity for fuzzy c-means model. IEEE Trans. Fuzzy Syst. 1995, 1, 370–379. [Google Scholar] [CrossRef]
  25. Hassani, H.; Zhigljavsky, A. Singular Spectrum Analysis: Methodology and Application to Economics Data. J. Syst. Sci. Complex. 2009, 22, 372–394. [Google Scholar] [CrossRef]
  26. Harris, T.J.; Yuan, H. Filtering and frequency interpretations of Singular Spectrum Analysis. Physica D 2010, 239, 1958–1967. [Google Scholar] [CrossRef]
  27. Hassani, H.; Mahmoudvand, R. Multivariate Singular Spectrum Analysis: A general view and new vector forecasting approach. Int. J. Energy Stat. 2013, 1, 55–83. [Google Scholar] [CrossRef]
  28. Hassani, H. Singular Spectrum Analysis: Methodology and Comparison. J. Data Sci. 2007, 5, 239–257. [Google Scholar]
Figure 1. Location of the groundwater source.
Figure 1. Location of the groundwater source.
Water 09 00541 g001
Figure 2. Surface plot of membership functions by a fuzzy C-mean algorithm for the observed data (training subset).
Figure 2. Surface plot of membership functions by a fuzzy C-mean algorithm for the observed data (training subset).
Water 09 00541 g002
Figure 3. Interval plot by a fuzzy C-mean algorithm for the observed data (training subset).
Figure 3. Interval plot by a fuzzy C-mean algorithm for the observed data (training subset).
Water 09 00541 g003
Figure 4. Logarithms of the 22 eigenvalues.
Figure 4. Logarithms of the 22 eigenvalues.
Water 09 00541 g004
Figure 5. One-dimensional plots of the first nine eigenvectors.
Figure 5. One-dimensional plots of the first nine eigenvectors.
Water 09 00541 g005
Figure 6. The weighted grey-scaled correlation matrix; the white color corresponds to zero values; the black color corresponds to absolute values equal to 1.
Figure 6. The weighted grey-scaled correlation matrix; the white color corresponds to zero values; the black color corresponds to absolute values equal to 1.
Water 09 00541 g006
Figure 7. Principal components obtained by singular spectrum analysis (SSA) decomposition (horizontal axis: week; vertical axis: level).
Figure 7. Principal components obtained by singular spectrum analysis (SSA) decomposition (horizontal axis: week; vertical axis: level).
Water 09 00541 g007
Figure 8. Interval plot by the SSA algorithm for the observed data (training subset).
Figure 8. Interval plot by the SSA algorithm for the observed data (training subset).
Water 09 00541 g008
Figure 9. Interval plot by the SSA forecasting algorithm.
Figure 9. Interval plot by the SSA forecasting algorithm.
Water 09 00541 g009
Table 1. Historical data of the groundwater level.
Table 1. Historical data of the groundwater level.
WeekLevelWeekLevelWeekLevelWeekLevel
1245.38414244.72527245.53940245.335
2245.17315244.91828245.38941245.161
3246.49216244.81029245.25342245.078
4246.80617244.56430245.13443244.980
5246.67618244.45831245.02144245.057
6245.77619244.54732245.55945245.303
7245.57120244.51533245.93246245.584
8245.66321244.63934245.97747245.694
9245.64822245.38635245.73548245.785
10245.30423245.69836245.53949245.794
11245.16224245.31137245.48950245.551
12245.00225245.24438245.55251245.400
13244.74726245.54739245.55652245.159
Note: Bold numbers indicate the validation subset.
Table 2. Fuzzification of the observed data (training subset).
Table 2. Fuzzification of the observed data (training subset).
WeekMembership ValuesCluster CenterIntervalFuzzy State
No.c1c2c3c4c5c6c7m[min;max]Am
10.03530.07080.12440.50610.16380.07760.0220245.329[245.253;245.389]A4
20.01500.04800.83940.05430.02250.01500.0058245.162[245.133;245.244]A3
30.04430.05590.06300.07200.08910.11130.5643246.641[246.492;246.806]A7
40.04500.05480.06030.06720.07910.09300.6005246.641[246.492;246.806]A7
50.01460.01790.01990.02240.02690.03220.8661246.641[246.492;246.806]A7
60.02160.03240.04130.05680.11350.70520.0293245.740[245.648;245.976]A6
70.01440.02420.03410.05770.77440.08220.0130245.552[245.488;245.570]A5
80.03090.04900.06540.09810.29590.42720.0335245.740[245.648;245.976]A6
90.03180.05090.06850.10440.34800.36280.0335245.740[245.648;245.976]A6
100.02500.05670.12400.67080.07030.04010.0131245.329[245.253;245.389]A4
110.00100.00340.98930.00340.00150.00100.0004245.162[245.133;245.244]A3
120.01690.89500.04200.02060.01220.00910.0041244.994[244.810;245.078]A2
130.38880.22570.13450.09590.06930.05630.0295244.603[244.458;244.747]A1
140.44200.19970.12310.08900.06500.05300.0281244.603[244.458;244.747]A1
150.12190.49890.15680.09310.06040.04660.0223244.994[244.810;245.078]A2
160.26920.30110.15780.10700.07490.05980.0304244.994[244.810;245.078]A2
170.76710.07070.05090.03980.03080.02590.0147244.603[244.458;244.747]A1
180.51100.13840.10550.08520.06790.05790.0340244.603[244.458;244.747]A1
190.70300.08910.06480.05100.03970.03340.0191244.603[244.458;244.747]A1
200.61470.11300.08370.06650.05220.04420.0255244.603[244.458;244.747]A1
210.76410.07650.05200.03940.02980.02470.0136244.603[244.458;244.747]A1
220.03590.07170.12540.49690.16850.07930.0224245.329[245.253;245.389]A4
230.02350.03650.04790.06970.17630.61890.0273245.740[245.648;245.976]A6
240.02000.04480.09550.73780.05840.03290.0106245.329[245.253;245.389]A4
250.04400.11300.34470.32980.09140.05690.0202245.162[245.133;245.244]A3
260.00590.01010.01450.02570.91000.02880.0051245.552[245.488;245.570]A5
270.01180.02030.02930.05270.82100.05500.0100245.552[245.488;245.570]A5
280.03660.07290.12680.48290.17590.08200.0230245.329[245.253;245.389]A4
290.04320.10870.30980.36650.09370.05770.0202245.329[245.253;245.389]A4
300.03510.13370.64890.09490.04440.03070.0123245.162[245.133;245.244]A3
310.04410.69390.13050.05970.03470.02560.0114244.994[244.810;245.078]A2
320.00560.00950.01360.02350.91310.02970.0050245.552[245.488;245.570]A5
330.05370.07600.09260.11830.18790.37090.1006245.740[245.648;245.976]A6
340.05770.08080.09740.12260.18710.33510.1194245.740[245.648;245.976]A6
350.00440.00680.00880.01240.02760.93450.0055245.740[245.648;245.976]A6
360.01210.02080.03010.05410.81640.05630.0103245.552[245.488;245.570]A5
370.03420.06130.09280.19030.47450.12060.0263245.552[245.488;245.570]A5
380.00040.00070.00100.00170.99390.00200.0003245.552[245.488;245.570]A5
390.00320.00550.00780.01350.95060.01660.0028245.552[245.488;245.570]A5
400.00630.01360.02680.91710.02120.01140.0035245.329[245.253;245.389]A4
410.00210.00710.97800.00700.00300.00200.0008245.162[245.133;245.244]A3
420.06160.35130.34640.11620.06160.04420.0187244.994[244.810;245.078]A2
430.03250.82080.06700.03490.02130.01610.0074244.994[244.810;245.078]A2
440.06210.45040.26810.10350.05690.04130.0178244.994[244.810;245.078]A2
450.02510.05690.12460.66960.07050.04020.0131245.329[245.253;245.389]A4
Table 3. Reconstructed original time series.
Table 3. Reconstructed original time series.
Weeks(t)Amcm c ^ ( t ) c ^ m ( t ) A ^ m
1245.384A4245.329245.212245.162A3
2245.173A3245.162245.425245.329A4
3246.492A7246.641246.355246.641A7
4246.806A7246.641246.848246.641A7
5246.676A7246.641246.656246.641A7
6245.776A6245.740245.805245.740A6
7245.571A5245.552245.478245.552A5
8245.663A6245.740245.697245.740A6
9245.648A6245.740245.624245.552A5
10245.304A4245.329245.291245.329A4
11245.162A3245.162245.229245.162A3
12245.002A2244.994245.090245.162A3
13244.747A1244.603244.631244.603A1
14244.725A1244.603244.563244.603A1
15244.918A2244.994244.971244.994A2
16244.810A2244.994244.987244.994A2
17244.564A1244.603244.576244.603A1
18244.458A1244.603244.478244.603A1
19244.547A1244.603244.621244.603A1
20244.515A1244.603244.626244.603A1
21244.639A1244.603244.825244.994A2
22245.386A4245.329245.339245.329A4
23245.698A6245.740245.496245.552A5
24245.311A4245.329245.255245.329A4
25245.244A3245.162245.246245.329A4
26245.547A5245.552245.510245.552A5
27245.539A5245.552245.540245.552A5
28245.389A4245.329245.367245.329A4
29245.253A4245.329245.329245.329A4
30245.134A3245.162245.242245.162A3
31245.021A2244.994245.059244.994A2
32245.559A5245.552245.206245.162A3
33245.932A6245.740245.647245.740A6
34245.977A6245.740245.843245.740A6
35245.735A6245.740245.724245.740A6
36245.539A5245.552245.636245.552A5
37245.489A5245.552245.592245.552A5
38245.552A5245.552245.453245.552A5
39245.556A5245.552245.341245.329A4
40245.335A4245.329245.301245.329A4
41245.161A3245.162245.174245.162A3
42245.078A2244.994244.972244.994A2
43244.980A2244.994244.938244.994A2
44245.057A2244.994245.165245.162A3
45245.303A4245.329245.452245.552A5
Table 4. Accuracy of the model.
Table 4. Accuracy of the model.
ErrorMAPE (%)R2
E = f ( s , c ^ ) 0.0003820.943
E = f ( s , c ^ m ) 0.0004040.931
Table 5. Validation of the original series.
Table 5. Validation of the original series.
Weeks(t)Amcm c ˜ ( t ) c ˜ m ( t ) A ˜ m
46245.584A5245.552245.646245.552A5
47245.694A6245.740245.736245.740A6
48245.785A6245.740245.697245.740A6
49245.794A6245.740245.519245.552A5
50245.551A5245.552245.345245.329A4
51245.400A4245.329245.280245.329A4
52245.159A3245.162245.208245.162A3
Table 6. Error of the validation.
Table 6. Error of the validation.
ErrorMAPE (%)R2
E = f ( s , c ˜ ) 0.0004900.522
E = f ( s , c ˜ m ) 0.0003840.649

Share and Cite

MDPI and ACS Style

Polomčić, D.; Gligorić, Z.; Bajić, D.; Cvijović, Č. A Hybrid Model for Forecasting Groundwater Levels Based on Fuzzy C-Mean Clustering and Singular Spectrum Analysis. Water 2017, 9, 541. https://doi.org/10.3390/w9070541

AMA Style

Polomčić D, Gligorić Z, Bajić D, Cvijović Č. A Hybrid Model for Forecasting Groundwater Levels Based on Fuzzy C-Mean Clustering and Singular Spectrum Analysis. Water. 2017; 9(7):541. https://doi.org/10.3390/w9070541

Chicago/Turabian Style

Polomčić, Dušan, Zoran Gligorić, Dragoljub Bajić, and Čedomir Cvijović. 2017. "A Hybrid Model for Forecasting Groundwater Levels Based on Fuzzy C-Mean Clustering and Singular Spectrum Analysis" Water 9, no. 7: 541. https://doi.org/10.3390/w9070541

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop