1. Introduction
Side orifices, sluice gates and side weirs are diverting structures in open channels and they are used for diverting some of the main channel flow to side channels and regulating the head of distributaries. In addition, they are commonly utilized in wastewater treatment systems, irrigation and drainage networks, sedimentation tanks and aeration basins [
1,
2]. Accurate estimation of the discharge coefficient is necessary to know the volume of water passing through the diversion structures. To date, many experiment-based and analytical studies have been done to estimate the discharge coefficient in flow diversion structures.
Ramamurthy et al. [
3,
4] were the first researchers who experimentally studied the hydraulic properties of rectangular side orifices. They provided equations for computing the orifice discharge coefficient based on the parameters of orifice length, main channel width and ratio of the main channel velocity to the orifice jet velocity. Gill [
5] proposed equations for predicting the discharge of circular and rectangular side orifices in the closed conduit flow (in both gravity and pressurized flows) by analytically solving the steady varied flow. Hussain et al. [
1] performed experiments under free-flow conditions to provide regression equations for computing the discharge coefficient of rectangular side orifices in small and large sizes, based on the Froude number and the ratio of orifice width to channel width. Hussain et al. [
6] obtained an equation to calculate the discharge through rectangular side orifices using analytical relationships with ±5% accuracy. Hussain et al. [
7,
8] performed experimental and analytical studies to examine the hydraulic properties of flow in sharp-crested circular side orifices and presented equations to compute the discharge coefficient of circular side orifices based on the ratio of the orifice diameter to the main channel width and flow Froude number, under free- and submerged-flow conditions. Bryant et al. [
9] and Guo and Stitt [
10] investigated the flow of circular orifices for different hydraulic conditions by applying physical and analytical models and derived analytical relationships for the discharge calculation. Vatankhah and Mirnia [
2], for the first time, experimentally and analytically investigated the flow through a sharp-crested side triangular orifice (Δ-shaped orifice) located in a rectangular channel and presented equations for computing the discharge coefficient.
In order to estimate the discharge coefficient of side weirs, another diversion structure in open channels, experimental studies have been performed for rectangular [
11,
12,
13,
14] and triangular sections [
15,
16,
17] in subcritical conditions.
Currently, applying soft computing has been accepted as an efficient tool for mapping the complex and nonlinear systems and has been commonly utilized in the water sciences to predict various hydraulic and hydrological parameters. Various data-intelligent techniques including neuro-fuzzy (ANFIS), gene expression programming (GEP), multi-layer perceptron (MLP), extreme learning machine (ELM), group method data handling (GMDH) and support vector machine (SVM) have been utilized for predicting the discharge coefficient of rectangular side weirs [
18,
19,
20,
21,
22,
23,
24,
25,
26,
27]. Granata et al. [
28] developed two lazy machine learning algorithms, k-Nearest Neighbor and K-Star, for predicting the
Cd of a side weir in a circular channel under a supercritical flow regime. Both of the algorithms outperformed the empirical equations of Biggiero et al. and Hager. Li et al. [
29] provided the machine learning models ANN, SVM and extreme learning machine (ELM) for the prediction of the discharges of rectangular sharp-crested weirs. They found that all three models were capable of predicting the
Cd with high accuracy, but the SVM exhibited somewhat better performance.
The discharge coefficient of triangular labyrinth side weirs has also been successfully predicted using intelligent models [
30,
31,
32,
33,
34,
35,
36,
37]. Dutta et al. [
38] carried out experiments in a rectangular flume under free-flow to investigate the discharge capacity of a multi-cycle W-form labyrinth weir and a sharp-crested circular arc weir. They utilized the experimental data for building predictive models using multiple linear regression (MLR), SVM and ANN. They observed that the SVM model performed better than the rest of the models in predicting the discharge.
A few studies have been conducted to predict the discharge coefficient of circular and rectangular side orifices using data-driven techniques [
39,
40,
41]. Roushangar et al. [
42] evaluated the potential of two different machine learning methods, namely support vector machines combined with genetic algorithm (SVM–GA) and GEP, for predicting the
Cd of trapezoidal and rectangular sharp-crested side weirs. The results showed that the SVM–GA model gave more accurate outputs than the GEP.
Due to the importance of accurately determining the discharge of side orifices in open channels, in the presented work, for the first time, new and efficient data-driven models including SVM, least squares support vector machine (LSSVM) and LSSVM with gravity search algorithm (LSSVM-GSA) were applied for modeling the discharge coefficient of triangular (Δ-shaped) side orifices. For this purpose, laboratory data of Vatankhah and Mirnia [
2] were used. It should be noted that there are a few studies on the application of GSA in the water resources field such as a hybrid model of an artificial neural network with GSA (ANN–GSA) in rainfall-runoff modeling [
43], an LSSVM-GSA hybrid model in the prediction of wind power [
44,
45], an MLP-GSA hybrid model in the prediction of lake water surface [
46], a GSA in water tank optimization [
47] and an LSSVM–GSA in river flow forecasting [
48].
4. Application of the Models
Three data-driven methods, SVM, LSSVM and LSSVM–GSA, were employed for predicting the discharge coefficient of a Δ-shaped side orifice with different hydraulic and geometric conditions.
Table 3 sums up the five different input scenarios considered to predict the discharge coefficient of Δ-shaped side orifices (
Cd). Considering the highest correlation (−0.770) between B/H and
Cd, this parameter was selected as the first input combination. To compose the other combinations, other parameters were involved in the combination one by one at each step. The fifth combination includes all input parameters.
The analysis results showed that the geometrical parameters of Δ-shaped side orifices including height (H) and length (L) had the highest correlation with Cd; therefore, they were used as input variables in the first and second scenarios. The remaining variables were also ranked based on the correlation coefficients and were applied in the subsequent input scenarios.
In order to calibrate the models, firstly, various kernel functions were utilized to develop the SVM model. Radial basis function (RBF) was found as the optimal kernel function due to its better prediction accuracy. The parameter combinations of SVM (
C,
γ) and (
γ,
σ) of LSSVM were obtained via trial and error. For each parameter (regularization factor
γ and kernel parameter
σ), different numbers from 10
−5–10
5, 10
−2–10
2 and 10
−3–10
3, 10
−5–10
5 were applied following previous literature [
59,
60,
61]. In addition, GSA was employed to get optimized hyper-parameters for the LSSVM. To optimize LSSVM hyper-parameters, control parameter values of the GSA algorithm have a key role, i.e., gravitational constant (
G0) and constant alpha (
α). Therefore, parameters of LSSVM are found optimal with the
G0 parameter in range of 108–114 and
α parameter in range of 18–20.
Root mean square error (
RMSE) was utilized as the fitness function in the presented work. The optimal hyper-parameters of the LSSVM model obtained by GSA for each input scenario are given in
Table 4.
5. Results and Discussion
Table 5 provides the performance metrics of the models used in the current study for the five scenarios defined in
Table 3. As can be seen, the performance of the models is similar by considering the applied scenarios. In addition, the models’ error decreased from
M1 to
M5 by adding the effective variables. Moreover, it is found that by the addition of the
W/
H variable to the input combination (
M4), the accuracy of the models’ results is significantly improved;
RMSE decreases from 0.0272 to 0.0156 for SVM, from 0.0227 to 0.0129 for LSSVM and from 0.0217 to 0.0108 for LSSVM-GSA. Additionally, all three methods had the best performance for the
M5 input scenario including all five variables (
B/
H,
B/
L,
Fr1,
W/
H,
H/
y1).
The R2, RMSE, MAE and NSE values for the optimal pattern (M5) were obtained as 0.938, 0.0134, 0.0101 and 0.9849 for the SVM model, 0.958, 0.0125, 0.0099 and 0.9895 for the LSSVM model, and 0.965, 0.0099, 0.0077 and 0.9934 for the LSSVM-GSA model, respectively.
It is worth noting that the
NSE values of the models utilized in this research were greater than 0.8 for all the models, indicating acceptable accuracy [
62,
63].
The outcomes reveal that the LSSVM–GSA has superior performance in comparison with other methods for all of the scenarios. In addition, it is clear that the aforementioned model for the M
5 pattern with the highest
R2 and
NSE values and the lowest values of error has the highest power of predicting the Cd of Δ-shaped side orifices. After that, the SVM and LSSVM models rank second and third, respectively. The superiority of the LSSVM–GSA over the SVM and LSSVM has also been seen in the research performed by Yuan et al. [
44] and Lu et al. [
45].
It should be noted that the observed Cd values were in the range of 0.3246–0.5843, while the estimated Cd range for the best responses was 0.3412–0.5872 for the superior model (i.e., LSSVM–GSA).
Figure 5 illustrates the variation graph of observed and predicted
Cd values versus experimental No. It is apparent from the detailed parts of the figure (see the two detailed graphs in the lower part of the figure) that the LSSVM–GSA is more successful in catching
Cd of Δ-shaped side orifices than the SVM and LSSVM models.
Comparison between the measured and predicted
Cd values over the test period for the best input combination (
M5) has been depicted in
Figure 6. For all the models, very good dispersion is seen around the 45° axis, indicating high capability of the models used in the current work. For 170 discharge data during the test stage for SVM, LSSVM and LSSVM–GSA, 95%, 96% and 97% of the points were situated within the 5% confidence band.
Figure 7 demonstrates a boxplot showing the statistical distribution of the measured and predicted discharge coefficients for the testing period, including the lower quartile, upper quartile and median for the optimal SVM, LSSVM and LSSVM−GSA models. As shown in this plot, for the lower quartile, the SVM model has better yield than the other two approaches. Meanwhile, this range shows the over-prediction of the models. For the median, the LSSVM−GSA model shows complete accordance with the observed values and it is evident that the spread of this model closely resembles the observed
Cd values. In addition, the SVM and LSSVM models have some fluctuations in estimating the observed values in the mentioned range. For the upper quartile, the SVM and LSSVM−GSA models have similar accuracy in terms of statistical distribution and matching with the observed values. Additionally, the LSSVM model is found to overestimate the higher
Cd values.
The agreement between the data-intelligent methods and observation values was also checked using another visualization presentation (i.e., Taylor diagram). This graph provides three important statistical indices including centered
RMSE, correlation coefficient and standard deviation [
64].
Figure 8 displays the Taylor diagram of the predicted and observed values of
Cd for the test period and best input combination. As observed in
Figure 8, the representative markers of LSSVM and LSSVM−GSA have similar positions; however, the LSSVM−GSA model shows better accuracy than the two other models in terms of
RMSE, r and SD indices.
Moreover, run times of the applied models were evaluated. The simulations were done in the MATLAB environment (MATLAB R2017b) using a computer with an operating system of Windows 10 (64 bit) with an Intel(R) Core(TM) i5-10500 CPU @ 3.10 GHz processor with 16 GB RAM. In MATLAB R2017b, LS−SVM lab and Libsvm toolkit were utilized to develop the SVM and LSSVM models, respectively.
Figure 9 provides the convergence graphs of the implemented models for the best input combination (
M5). For the best input scenario, the run times of the SVM, LSSVM and LSSVM−GSA models were 1.534 s, 0.027 s and 26.2 s, respectively. Although the computational cost of the LSSVM-GSA model was higher than the other two models, the aforementioned model had the best prediction accuracy, and moreover, the time required to implement the model is acceptable from an engineering point of view. Similarly, in the research performed by Lu et al. [
45], the LSSVM−GSA model was selected as the superior model, with the highest performance and computational time among the implemented models.
The outcomes of the methods were also assessed by one-way analysis of variance (ANOVA) to see the robustness (the possible significant differences between the observations and model predictions) of the methods. The test was set at a significance level of 95%.
Table 6 sums up the test results. As observed from the table, the LSSVM−GSA yielded a smaller testing value (0.00004) with a higher significance level (0.995) compared to LSSVM and SVM, and this indicates that the LSSVM−GSA is more robust in predicting the discharge coefficient of triangular side orifices than the other models.
6. Conclusions
Given the importance and wide application of orifices in irrigation and drainage networks, urban water and sewage treatment plants and hydroelectric power facilities, this study was performed to estimate the discharge coefficient of Δ-shaped side orifices. To this end, data-intelligent models including SVM, LSSVM and LSSVM−GSA were applied. Geometric and hydraulic variables were considered as inputs to the models in a set of 570 experimental data. After performing sensitivity analysis, five different input combinations were identified considering the influence of input variables on the output (Cd). According to the statistical indices, the models generally provided satisfactory performance in estimating Cd for all input combinations, and the models had similar sensitivity to the scenario changes and addition of the hydraulic and geometric variables to the input combinations. However, in all of the models, adding the ratio of orifice crest height to orifice height (W/H) into the model input (M4) produced the best accuracy and improvement in RMSE by 42.6%, 43.2% and 50.2% for the SVM, LSSVM and LSSVM−GSA models, respectively. The outcomes also indicated that the LSSVM−GSA improved prediction accuracy over SVM and LSSVM. It was found that optimization of the LSSVM model using the gravity search algorithm (GSA) improved the RMSE by 26% and 20%, respectively, for the best input combination (M5) compared to the SVM and LSSVM models. In addition, based on the visualization inspection (i.e., scatter plots, boxplots and Taylor diagram), the highest correlation and the best statistical distribution of model values belonged to the LSSVM−GSA, although the low discharge coefficients tended to be slightly overestimated when compared to the respective measured values. The overall results of the present study propose the LSSVM optimized with GSA model as a new and efficient model to estimate the discharge coefficient and other hydraulic parameters in open channels.
The present study used a set of 570 experimental data having 12 geometric configurations, and outcomes indicated that the models cannot well catch the low discharge coefficients and this implies that more data is needed to be able to better calibrate the implemented models and obtain better estimation accuracy. The main reason for overestimation of the low values can be related to the convergence criterion (root mean square error (RMSE)). The models focus on catching high values with this criterion. Another criterion (mean absolute error (MAE)) can be tried in future studies to provide a balance in the model estimation accuracy in catching the high and low values. The recommended model (LSSVM−GSA) can also be compared with other improved LSSVM and SVM models with new metaheuristic algorithms and deep learning methods.