1. Introduction
Consecutive droughts and increasing water demands mean that prediction and planning for the usage of precipitation are necessary for decision makers [
1,
2]. In addition, precipitation forecasting is necessary to prevent floods and construct flood-controlling structures. What is more, precipitation forecasting is one of the main issues of the water resources planning that can reduce the effects of drought [
3]. Researchers have used various statistical and hydrological models to predict precipitation; however, it should be noted that precipitation patterns are dependent on various parameters and have a nonlinear behavior. Therefore, these complexities result in uncertainties in precipitation forecasting models. In recent years, researchers have used soft computing models to forecast hydrological variables such as precipitation [
4]. Models such as artificial neural network (ANN), adaptive neuro-fuzzy inference system (ANFIS), support vector machine (SVM), or improved regression models and optimization algorithms not only are simple in structure and application but also have a wide range of capabilities due to high precision and short computation time [
5]. The high flexibilities and prediction based on different kinds of data are another characteristic of soft computing models. Kisi and Cimen [
6] used a wavelet neural network model to forecast precipitation. Daily precipitation was forecasted based on the hybrid model of support vector machine–wavelet transform approach. The results showed that the RMSE (root-mean-square error) index obtained by the hybrid model was reduced by 50% compared to that obtained by the simple support vector machine. Ramana et al. [
7] used a wavelet neural network to forecast monthly precipitation. Input data included maximum temperature, precipitation, and humidity of previous months. The results showed that the wavelet neural network model had a larger Nash–Sutcliffe (NSE) coefficient compared to the neural network and regression models. Shamshirb et al. [
8] used a support vector machine and a neural fuzzy model to forecast monthly precipitation. The results showed that the neural fuzzy model significantly reduced the mean absolute error (MAE) compared to the support vector machine model. Kisi and Sanikhani [
9] used an adaptive neuro-fuzzy inference system (ANFIS) with grid partition (GP), ANFIS with subtractive clustering (SC), a support vector machine, and the neural network to predict monthly and annual precipitation in Iran. The results showed that the highest precipitation occurs in the north, west, and southwest of Iran and the lowest in the east and southeast, which was precisely modeled by an adaptive neuro-fuzzy inference system (ANFIS) with grid partition (GP). Shenify et al. [
10] forecasted the precipitation based on a wavelet neural network–support vector machine, genetic programming, and a neural network. Their results showed that the wavelet neural network–support vector machine model is more accurate in precipitation forecast than the neural network and genetic programming. Amiri et al. [
11] used ANN and wavelet neural network models for precipitation forecasting in the Aidoghmush basin in Iran. The results showed an increase in the R
2 coefficient and a decrease in the relative error (RE) of the wavelet neural network model compared to the ANN model in precipitation forecasting. Du et al. [
12] used a support vector machine with the particle swarm algorithm as a hybrid model to forecast the precipitation. The particle swarm algorithm increased the accuracy of the model by finding the optimal parameters of the support vector machine, so that the hybrid model of the support vector and particle swarm reduced the relative error rate by 20% compared to the support vector model.
Mirabbasi et al. [
13] made a precipitation forecast based on tree modeling, genetic programming, and the least squares support vector machine. The results showed that the RMSE values of the least squares vector machine model, genetic programming, and decision tree model were 13.96, 36.74, and 37.22 mm, respectively. Therefore, the performance of the least squares support vector machine model was confirmed. Mehr et al. [
14] used hybrid models of the support vector machine and firefly algorithm (FFA) to forecast precipitation in Iran. The results showed that the improved support vector machine increased the accuracy of the results and reduced the relative error rate by up to 30% compared to the simple model of support vector machine. Azad et al. [
15] used a fuzzy model along with the particle swarm algorithm, ant colony optimization algorithm, and the genetic algorithms to forecast the precipitation. The results showed that the neuro fuzzy–ant colony optimization algorithm not only decreases the relative error and mean absolute error (MAE) but also has a higher convergence rate than the other models. Kumar et al. [
16] used recurrent neural network to forecast precipitation in India. Monthly precipitation data were considered from the years 1871 to 2016 to forecast monthly precipitation. The results showed high accuracy of a recurrent neural network based on a high NSE coefficient. Another study used a hybrid wavelet–M5 model tree to forecast precipitation [
17]. Hossain et al. [
18] used the neural network and regression models to forecast monthly precipitation. The result showed that the neural network model and the outputs had a high correlation with the precipitation model of Western Australia and a higher NSE coefficient than the multiple regression model. In any case, previous studies have shown that methods such as the neural network, fuzzy–neural network and support vector machine have good capabilities to forecast precipitation [
19,
20,
21,
22,
23,
24]. Models such as the neural network and support vector machine have a good performance, however, it is important to note that they have unknown parameters, the values of which should be obtained before precipitation forecast [
25,
26,
27]. The number of hidden layers and neurons, values of weight and bias in neural networks, as well as the parameters associated with the kernel function should be quantified in the support vector machine, so that the accuracy of the forecasting models would be acceptable. Although there are training algorithms in the structure of neural network and other algorithms to accurately calculate the parameters of the support vector machine, optimization and evolutionary algorithms are a priority due to their ease of use, high convergence rate, and high accuracy in finding the optimal solution.
It should be noted here that the novelty of the current research study is to introduce a new prediction model for precipitation utilizing advanced machine learning model. In fact, the proposed model is a further enhancement for the traditional machine learning model after improving its backpropagation algorithm with a nature-inspired optimization algorithm. In addition, a new way of model performance analysis has been introduced, which is the uncertainty. Finally, the model outputs have been presented as precipitation forecasting zone mapping for case study area which is considered as new visual presentation for forecasting model.
The present study trains the new models of the multilayer neural network and support vector machine based on the new flow regime optimization algorithm and then compares the results with the MT5 tree model. These models are used to predict precipitation, and a comprehensive comparison is made between them considering the uncertainties of the models. The optimization algorithm is used to determine weights, bias, number of hidden layers and neurons, as well as parameters associated with the support vector model. The monthly temperature and precipitation are used as the inputs to the soft computing models.
Section 2 explains the material and methods.
Section 3 explains the used case study and source of data in the current article. The results and discussions are explained by the
Section 4. Finally,
Section 5 presents the general results and next steps of the current research.
2. Materials and Methods
2.1. Support Vector Machine
The support vector machine is one of the most widely used hydrological simulation models providing simple structure and acceptable results. This model is a regression analysis to predict or simulate time series. The linear form of the support vector machine is based on Equation (1) [
23,
24,
25,
26,
27]:
where
is the input variable,
is the bias,
is the weight, and
Tr is the transpose. The difference between the observed and simulated data is minimized by the calculation process of the support vector machine. Therefore, the support vector machine tries to minimize the error function of an optimization problem. If the prediction error values are within the permissible range (
), the error is ignored. The mathematical form of the optimization problem is consistent with Equation (2) [
28,
29,
30]:
where
is the penalty factor,
and
are penalties of training data of which the error is outside of the
range,
is the weight,
is the input variable, and
is the output variable. The values of
w and
b are calculated by Equation (2) and then substituted in Equation (1). There are multiple kernel functions for the support vector machine, and according to the previous studies, the radial kernel function (Equation (3) is an effective and widely used function.
is the kernel function, and
is the radial kernel parameter. In addition to the radial kernel parameter,
and
are parameters which have to be calculated precisely to achieve a suitable prediction model. In the present study, an optimization method was used to achieve these parameters and increase the accuracy of the support vector machine model. The flow regime optimization algorithm was used to obtain the best value of SVR parameters. The algorithm was created based on fluid flow concepts [
30], which will be introduced later in the following section
Section 2.4.
2.2. Multilayer Perceptron (MLP)
Previous studies have shown that the multilayer neural network is one of the most effective and accurate tools for predicting hydrological variables. It contains input, hidden, and output layers. The network works based on Equation (5) [
29]:
where
is the objective variable,
is the weight coefficient of neurons,
is the nonlinear activation function of the
jth neuron,
n is the number of neurons of input layers,
m in the number of neurons of hidden layer,
is the input variable at time
k,
is the bias of the output layer, and
T is the transpose of the matrix.
The performance of the multilayer neural network depends on a precise determination of the number of hidden layers, hidden neurons, and weight and bias values. Although multilayer neural networks have training algorithms such as the Levenberg–Marquardt in their structure, the training time and accuracy of these training algorithms does not reach an acceptable level in some problems. Starting the optimization from a point away from the ultimate minimum, the Levenberg–Marquardt algorithm tries to minimize the error measure to reach an ultimate minimum. As with other training algorithms, the mentioned algorithm has an iterative process, and the initial values of the neural network parameters are obtained by a primitive guess. Studies have shown that if neural network models use an accurate and fast algorithm to find their unknown values, they will have high accuracy. The present study used a fluid flow regime in addition to the Levenberg–Marquardt algorithm to train the neural network. In fact, the aim of the paper is related to the development of an artificial neural network based on the flow regime optimization algorithm of a traditional training algorithm.
2.3. Decision Tree Model
The decision tree model is a simple model without high complexity and has a good accuracy that is widely used in the field of hydrological prediction [
29]. The decision tree model creates a multivariate linear model for the data in each inner node. A dual stage method is considered for the decision tree model. The decision trees are generated based on splitting of input or output data into subsets in the first level.
Applying ‘divide-and-conquer’, a model is generated where N data are accompanied with a leaf or a test criterion that divides them into subsets corresponding to a test output. The splitting criterion depends on treating the standard deviation of class values and computing the decrement in error.
The values of the standard deviation reduction are calculated based on Equation (6):
where
is a series of samples that reach the node,
is the ith output of the model,
i is the number of available data records and
is the standard deviation. After the number of split branches is maximized, the tree model selects the split that maximizes the expected reduction. The high number of divisions and branches of the tree model causes overfitting in the tree model. Therefore, the pruning stage is performed for the tree model. Pruning causes the input model of the tree model to be divided into smaller areas. In addition, based on the greedy algorithm, the model eliminates the variables that have little to do with modeling.
2.4. Flow Regime Optimization Algorithm
The flow regime optimization algorithm is used as one of the new optimization algorithms in mathematical modeling, engineering problems, structural problems, and other optimization areas. It is known as a successful algorithm due to its fast convergence, high accuracy of the results, use of advanced operators to diversify solutions, and creating a great balance between diversification and intensification [
30]. The algorithm was created based on fluid flow concepts. According to the laws of fluid mechanics and hydraulics, the fluid flow is divided into laminar and turbulent states based on the ratio of inertia to viscosity force. The laminar state of the flow regime is known as a local search and the turbulent state as a global search. The Reynolds number is also responsible for determining the type of flow. In the present study, a number similar to Reynolds determined the type of flow. In addition, another hypothesis of the algorithm is that the optimal response is the starting point of the movement of fluid in the boundary layer. As the fluid flows over a surface, due to the viscosity of the flow, a velocity profile is formed. The flow velocity is zero on the surface, and at the elevation known as the boundary layer thickness, it is equal to 99% of the velocity on the surface. Like other algorithms, the mentioned algorithm has an initial population. Here, the fluid particles are known as the initial population (Equation (7)):
where
is known as the
ith particle of the fluid and Nvar is the number of decision variables. The objective function is then calculated for each fluid particle. After calculating the objective function, the best particle is selected as the global optimum. The best solutions replace the worst ones in each iteration of the algorithm. With the increase in the number of iterations, the algorithm changes the search from the global state to the local one. To distinguish the local search from the global one, a number similar to Reynolds number is used (Equation (8)):
where
is the search type factor,
in the current iteration number,
is the maximum iteration,
i,
j, and
k are the number of solutions, and
is the global optimum solution. Then, Equation (9) is used to determine the type of search (local or global search):
was used because the search type factor (STF) is similar to the Reynolds number used to determine the type of flow in the boundary layer. As mentioned, evolutionary algorithms have both local and global search types. The algorithms first make a global search for the global optimum and try to find the optimum solutions around the global optimum. The term
in Equation (8) is related to the iteration and search space, in such a way that in the initial iteration, the search is of the global type, and with the increase in the number of iterations, the search space becomes smaller and changes to the local type. The second term of Equation (8) is the ratio of two Euclidean distance terms. The first is the distance between the
ith solution and the best solution, and the second is the distance between two random solutions. This term can be larger or smaller than one. If it is larger than 1, it means that the distance between two random solutions is smaller than the distance between the optimal and the random solution, thus reducing the global ability search. When the mentioned term (the second term in Equation (8)) is larger than 1, the STF value is increased to improve the global ability search. If the second term of Equation (8) is less than 1, it means that the algorithm searches the adjacent space of the best solution, and thus, the local search ability is increased by the algorithm.
Figure 1 shows two possibilities for the solutions. STF is the search type factor. If
, the ith particle is far from boundary layer starting point (global best solution). If
, the particle will approach the boundary layer, which is the optimum solution.
is the thickness of the stable boundary layer, and
is the thickness of the turbulent boundary layer. The yellow curve shows the threshold of the thickness of stable boundary and the thickness of turbulent. The dashed lines depict the radius of turbulent and stable area.
Finally, Equations (10) and (11) are used for global and local search in the following order:
is the number created by distribution, is a random number, and is the scaling factor equal to 0.30.
2.5. Construction of Hybrid Models (ANN–Flow Regime Optimization Algorithm (FRA) and SVM–FRA)
As mentioned above, the multilayer neural network model requires determining the number of hidden layers, the number of hidden neurons, the weight values, and bias values. In addition, the support vector machine requires determining the parameters of , , and . These parameters were explained in the earlier sections and they were regarded as the kernel and SVR parameters.
In the present paper, the hybrid model of the neural network–flow regime and the support vector machine–flow regime is used as follows to find the above values:
The input data are identified.
The data are normalized.
The training phase is completed.
The criterion of stopping the modeling process is controlled. If satisfactory, go to step 10; otherwise, go to step 5.
The parameters of the multilayer neural network, including weight, bias, number of neurons and hidden layers, as well as the support vector model, are defined as the initial population of fluid particles. The above parameters are considered as decision variables in the flow regime algorithm.
The objective function is computed for the fluid particle population members. The present study considers the RMSE value as the objective function. Then, the best particle or optimal global solution is determined with the best objective function.
The two random particles of J and K are determined, and the STF value is calculated.
Particle movement is based on control of the STF value, and Equations (9)–(11) are used for the particle movement.
The maximum number of iterations is controlled. If satisfied, the algorithm is stopped and goes to step 3; otherwise, it goes to step 5.
The test phase is performed, and the output is provided, which is the monthly precipitation value.
3. Case Study
The present case study made a precipitation forecast in one of Iran’s major basins called Aidoghmush. The River Aidoghmush is one of the main rivers of the Ghezel Ozen basins. The area of the basin is 1800 km
2. This river is located in the East Azarbaijan province in Iran. The study area is in the geographical coordinates of 46°52′ to 47°45′ E longitude and 36°43′ to 37°26′ N latitude (
Figure 2). The annual discharge of the basin is 170 million m
3. The average precipitation in the entire basin is 336.2 mm, and the length of the Aidoghmush River is 80 km. The elevation of the basin ranges from 1100 to 2500 m. In the present study, precipitation data from 2000 to 2010 were analyzed to predict precipitation. In total, 70% of data were used for the training phase and 30% for the test phase. This splitting percentage was selected upon the strong recommendations from previous studies. Therefore, the current setup for this model, 70% for training and 30% for testing, is suitable for this study [
18,
19,
20,
21]. The data were obtained from rain gauge stations that operated by the meteorological department in Iran. Furthermore, the model has been structured to provide one-month-ahead forecasting for precipitation.
The average annual precipitation of the basin is 236.2 mm, and the maximum of monthly average precipitation is in April. The basin has an average annual temperature of 11.6 °C. July has the absolute maximum temperature of 21.9 °C. The minimum temperature is also −16.8 °C in January. The average normal monthly temperature was below zero in January for the basin, with a maximum of −1.79 °C. In summer, temperatures above 25 °C were observed. Precipitation exceeds 30 in winter and spring (monthly value)
. Winter and spring have the highest precipitations, and the precipitation in summer is nearly zero.
Figure 3 demonstrates the monthly average of temperature and precipitation in the research region.
Three scenarios were taken into account to forecast the precipitation. The lead times were suggested after initial screening of correlation between lag times and precipitation values to be one-month-ahead forecasting. The precipitation and the temperature values are selected to be the input variables for the model because the high correlation between these two variables has been proved, and, hence, it is necessary while developing the forecasting model for precipitation to consider the temperature as one of the input variables [
18,
19,
20,
21].
Input of precipitation forecasting models is the average temperature based on various time delays from 1 to 12 months.
Input of precipitation forecasting models is the average precipitation based on various time delays from 1 to 12 months.
Input of precipitation forecasting models is the average temperature and precipitation based on various time delays from 1 to 12 months.
When the number of input variables is high as in these three scenarios, the principal component analysis method can be used. In this method, the initial variables of the problem are transformed into new and independent components. The new components are a linear combination of the initial variables. Given that all variables are used in the formation of components, the components are able to provide preliminary information on all variables without losing details. First, the Kaiser–Meyer–Olkin (KMO) coefficient was used to determine the applicability of the principal component analysis method (Equation (12)). If the coefficient is greater than 0.5, the implementation of the proposed method is allowed. Each component has a percentage of the information provided by the initial variables. The component that is most important in providing the data has the highest variance, and the component that has the least variance is considered as the last component. If all the initial variables are used in component generation, it will be difficult to analyze the components. Therefore, component rotation is usually selected to simplify component analysis. The orthogonal component rotation maintains the independence between the components. Therefore, the most influential variables are expressed in every component after each rotation.
where
is the correlation coefficient and
is the partial correlation coefficient between
i and
j variables.
The error indices presented in Equations (13) to (15) are used to evaluate different models’ efficiency.
where
is observed data,
is simulated data,
is the mean data,
MAE is the mean absolute error of data,
is the root mean square error-observations standard deviation ratio, and
is the Nash–Sutcliffe efficiency coefficient. The above indexes are widely used to evaluate the performances of the soft computing models [
15,
16,
17,
18,
19,
20,
21,
30].