Next Article in Journal
Groundwater Pollution: Sources, Mechanisms, and Prevention
Previous Article in Journal
Measurement and Calculation of Sediment Transport on an Ephemeral Stream
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Jucazinho Dam Streamflow Prediction: A Comparative Analysis of Machine Learning Techniques

by
Erickson Johny Galindo da Silva
,
Artur Paiva Coutinho
*,
Jean Firmino Cardoso
and
Saulo de Tarso Marques Bezerra
Agreste Campus, Federal University of Pernambuco, Av. Marielle Franco, Km 59, Caruaru 55014-900, Brazil
*
Author to whom correspondence should be addressed.
Hydrology 2024, 11(7), 97; https://doi.org/10.3390/hydrology11070097
Submission received: 3 April 2024 / Revised: 11 June 2024 / Accepted: 11 June 2024 / Published: 4 July 2024
(This article belongs to the Section Hydrological and Hydrodynamic Processes and Modelling)

Abstract

:
The centuries-old history of dam construction, from the Saad el-Kafara Dam to global expansion in the 1950s, highlights the importance of these structures in water resource management. The Jucazinho Dam, built in 1998, emerged as a response to the scarcity of water in the Agreste region of Pernambuco, Brazil. After having less than 1% of its water storage capacity in 2016, the dam recovered in 2020 after interventions by the local water utility. In this context, the reliability of influent flow prediction models for dams becomes crucial for managers. This study proposed hydrological models based on artificial intelligence that aim to generate flow series, and we evaluated the adaptability of these models for the operation of the Jucazinho Dam. Data normalization between 0 and 1 was applied to avoid the predominance of variables with high values. The model was based on machine learning and employed support vector regression (SVM), random forest (RF) and artificial neural networks (ANNs), as provided by the Python Sklearn library. The selection of the monitoring stations took place via the Brazilian National Water and Sanitation Agency’s (ANA) HIDROWEB portal, and we used Spearman’s correlation to identify the relationship between precipitation and flow. The evaluation of the performance of the model involved graphical analyses and statistical criteria such as the Nash–Sutcliffe model efficiency coefficient (NSE), the percentage of bias (PBIAS), the coefficient of determination (R2) and the root mean standard deviation ratio (RSR). The results of the statistical coefficients for the test data indicated unsatisfactory performance for long-term predictions (8, 16 and 32 days ahead), revealing a downward trend in the quality of the fit with an increase in the forecast horizon. The SVM model stood out by obtaining the best indices of NSE, PBIAS, R2 and RSR. The graphical results of the SVM models showed underestimation of the flow values with an increase in the forecast horizon due to the sensitivity of the SVM to complex patterns in the time series. On the other hand, the RF and ANN models showed hyperestimation of the flow values as the number of forecast days increased, which was mainly attributed to overfitting. In summary, this study highlights the relevance of artificial intelligence in flow prediction for the efficient management of dams, especially in water scarcity and data-scarce scenarios. A proper choice of models and the ensuring of reliable input data are crucial for obtaining accurate forecasts and can contribute to water security and the effective operation of dams such as Jucazinho.

1. Introduction

Many archaeologists consider the Saad el-Kafara dam to be one of the first in the world. It was probably built during the reign of Khufu, who was the king of Egypt around 2900–2877 B.C. [1]. Since then, there has been a long tradition of dam construction that spans millennia and with various purposes, such as flood control, providing water for human consumption, irrigation and animal thirst and, more recently in the history of dams, for industrial purposes and electricity generation.
As we arrived in the 1950s, which was a period of great expansion of global populations and economies, dams began to be increasingly considered as a solution to meet the growing demands for water and energy. Since then, according to the World Commission on Dams (WCD) [2], at least 45,000 large dams have been built worldwide, with almost half of the world’s rivers having at least one large dam in their course.
In the Brazilian city of Surubim in the state of Pernambuco, the Jucazinho Dam was inaugurated in 1998; it is located in the Capibaribe watershed and bars the river that is also called the Capibaribe. It was built to minimize the water scarcity in the rural region of Pernambuco and to control floods on the Capibaribe River.
According to the Pernambuco Water and Climate Agency (APAC) [3], the dam has an accumulation capacity of 204.82 million cubic meters of water at an elevation of 292 m from the main spillway crest, and, as recorded by the Brazilian National Water and Sanitation Agency (ANA) [4], it has 92% of the total withdrawal demand for water for human supply, 7% for animal thirst, and 1% for irrigation. This amount of water is relevant due to the large number of municipalities it serves: a total of 15 cities.
The Hydroenvironmental Plan for the Capibaribe Hydrographic Basin of the Secretariat of Water Resources of the State of Pernambuco highlights the economic potential of the municipalities served by the dam, nine of which belong to the second largest textile and clothing pole in Brazil, three with relevant agricultural activities, two belonging to the furniture and tourism pole and one with a strong thermal tourism industry.
Even with the efforts of the Pernambuco Sanitation Company (COMPESA), which is responsible for the operation of the Jucazinho Dam, to combat the prolonged drought, the dam’s water level dropped to below 0.01% in 2016 and only recovered four years later, in 2020, surpassing 1% [5]. At the height of the water crisis, the company implemented a rotation in the supply, providing the population with potable water only seven days per month [6].
To address such challenges and improve the management of water resources, researchers have increasingly turned to advanced computational methods. In recent decades, several artificial intelligence models have been used by researchers to predict influentary flows due to the great accuracy and flexibility of considering physical processes with all their characteristics. Among the methods used, artificial neural network (ANN) [7,8,9], random forest (RF) [10,11,12] and support vector machine (SVM) [13,14,15] models stands out.
Based on the literature, there are models that are better suited depending on the watershed. Adnan et al. [13] conducted an evaluation of various models, including the Optimally Pruned Extreme Learning Machine (OP-ELM), Least Square Support Vector Machine (LSSVM), Multivariate Adaptive Regression Splines (MARS) and M5 Model Tree (M5Tree), for modeling monthly streamflows with precipitation and temperature inputs. Their findings highlighted the superiority of LSSVM and MARS-based models for streamflow prediction without the need for local input data, surpassing the OP-ELM and M5Tree models. Parisouj et al. [8] investigated the predictive accuracy of three renowned machine learning algorithms—Support Vector Regression (SVR), ANN and Extreme Learning Machine—for monthly and daily streamflows across four rivers in the United States. The study identified SVR as the most effective model at both the monthly and daily scales, whereas the ANN model exhibited the least satisfactory performance. Meshram et al. [9] compared the efficacy of three AI techniques—Adaptive Neuro Fuzzy Inference System (ANFIS), Genetic Programming and ANN—for forecasting streamflow within India’s Shakkar watershed. The findings underscored that models incorporating cyclic terms outperformed those that did not consider periodicity and relied solely on previous streamflow data. Sousa Jr. et al. [12] assessed K-Nearest Neighbor, SVM, RF and ANN models for daily streamflow prediction in a transitional region between the Savanna and Amazon biomes in Brazil. The results demonstrated that these models achieved promising streamflow predictions for up to three days ahead, even in basins with scarce hydrological data.
The motivation of this study is the delivery of a hydrological model for reservoir management based on emerging artificial intelligence techniques, adding to the literature machine learning adaptations for the prediction of tributary flows and optimizing the operation of water reservoirs.
Therefore, the objective of the present study is to generate synthetic series of tributary flows for the Jucazinho Dam, which is located in the Agreste region of Pernambuco, from stochastic models. By evaluating the adaptability of machine learning models using SVM, RF and ANN to generate synthetic series of tributary flows to the dam, analyzing the influence on the quality of adjustments with the increase in the number of forecast days and verifying the quality of the adjustments based on statistical metrics, we determine the best model for the hydrological variables under study.

2. Materials and Methods

2.1. Study Area

Located in the Brazilian city of Surubim in the state of Pernambuco in the Capibaribe watershed, the Jucazinho Dam bars the river that is also called the Capibaribe, as we can see in the map of the situation in Figure 1. Its construction began in 1995 and was completed in 1998, and at the time, there was the expropriation of more than two thousand hectares in the riverside areas, involving about 5000 people [16].
One of the reasons for the construction of the dam was the scenario of scarcity of water supply in the rural region of Pernambuco. With the construction of Jucazinho, which has a maximum storage capacity of 245.26 million cubic meters of water at the maximum maximorum elevation of 295 m, 21 municipalities could be served, which impacted the lives of approximately 800 thousand inhabitants. As stated in Figure 2, 92% of the dam’s use is for water for human supply. In addition, the dam is used for fish farming, livestock and agricultural.
The other reason for the construction of the dam was for flood control planning in Capibaribe, which involved the construction of several dams in order to protect the metropolitan region of Recife (about 135 km away) from historical floods such as those that occurred between the years 1960 and 1980. The dam has a flood control volume of 10 6 m3.
Its construction type is gravity with roller-compacted concrete, and it has a central stepped spillway with a ski-jump-type dissipation basin. There are also two side spillways connected to the dam’s abutment. It contains a gallery with access at the two abutments and which extends over the entire embankment of the dam. Above the central spillway is a bridge that connects the abutments. It has a water intake for supply and another for the release of the ecological flow downstream, both with a pipe of 2.0 m in diameter and a reduction to 1.5 m. In Table 1, we provide data from the dam’s technical file, with information provided by Department of Water Resources of Pernambuco and data measured by Neves et al. [17].

2.2. Input Data

The “garbage in, garbage out” principle refers to the fundamental idea that the quality of the output of a data processing system is directly influenced by the quality of the input data. In other words, if inaccurate, incomplete or inadequate information is fed into a system, it is inevitable that the resulting output will also be inaccurate or of poor quality. This principle highlights the critical importance of reliable and high-quality data entry to ensure accurate and useful results in any computing or decision-making process.
The data for this study were obtained from the HIDROWEB portal: an online platform by ANA that offers information on Brazil’s water resources. The portal provides real-time data from a vast network of monitoring stations across the country, covering hydrometeorological, hydrographic and water quality aspects. In the Jucazinho Dam’s basin, 32 rainfall monitoring stations and two river flow monitoring stations were identified, as shown in Figure 3.
To choose stations for the study, those with records within the same time frame were initially selected. Fluviometric stations 39100000 and 39130000 had records matching with pluviometric stations 735159, 736040, 736041, 736042 and 836092 and were the most recently updated and were thus chosen for the study.
A key aspect of hydrological modeling is the correlation between rainfall and flow data. Rainfall drives surface runoff and groundwater recharge, directly affecting river flow levels. Spearman’s correlation ( ρ ), a robust non-parametric measure, was used to assess this correlation, as shown in Equation (1). It ranges from −1 to 1 and indicates negative ( ρ < 0), positive ( ρ > 0) or no correlation ( ρ = 0).
ρ = 1 6 i = 1 n d i 2 n ( n 2 1 )
where d i and n are, respectively, the difference in ranks between the original series and the series sorted in ascending order for the i-th observation and the total number of observations.
The highest correlation was observed between pluviometric station 736042 and fluviometric station 39130000, as demonstrated in Figure 4, with a correlation coefficient of 0.18, leading to their selection for the study.
Despite the low correlation, this research addresses a real situation where the case study (Jucazinho Dam) was selected by the study funder (COMPESA). The choice was due to its importance to the region, its history of “collapse”, and its operation based on the technicians’ empiricism. Precisely due to data limitations, this study can contribute to the literature by evaluating artificial intelligence models in real situations with scarce data.
The general data of both stations are contained in Table 2. Due to the beginning and end of both series, the records from 1 January 1986 to 1 June 2023 were used to develop the hydrological model.
Regarding the flow data, to fill the faults of station 39130000, data from station 39100000 were used and were multiplied by a factor of 1.57, referring to the ratio between the drainage area of 2450 km2 of station 39130000 and the drainage area of 1560 km2 of station 39100000, which were obtained through data from ANA’s HIDROWEB portal.
It is noteworthy that both stations are located on the Capibaribe River, which is the main river of the Capibaribe Hydrographic Basin, which is barred by the Jucazinho Dam, and that the number of faults is insignificant when compared to the total series, allowing, without major damage to the model, this type of fault filling.
The rest of the missing data, both flow and precipitation, for stations 736042 and 39130000, were interpolated linearly. The series with the gaps filled is presented in Figure 5 and covers a total of 13,668 days. Initially, stations 736042 and 39130000 presented, respectively, 0.61% and 5.38% of failures in this number of days. After using the data from station 39100000 multiplied by the factor of 1.57, the percentage of failures from station 39130000 dropped to 4.15%. Finally, both series presented 13,668 days of records without failures.
It should be noted that linear interpolation is not the appropriate methodology for filling in daily failures of a precipitation or flow series, but specifically because we were filling for a period with low precipitation and flow, with values equal to zero or close to it, it was acceptable to apply the method without the association of significant errors in the results of the filled series.
It also should be noted that fluviometric station 391300000 is about 35.16 km from the Jucazinho Dam embankment and 14.64 km from the Jucazinho inundation area, and side spillway crest elevations are about 295 m. Therefore, the flow series generated by the artificial intelligence models trained from the data presented in Figure 5 should be multiplied by a factor of 1.69 for a practical application of reservoir management. This factor refers to the ratio between the drainage area of 4149.90 km2 of the Jucazinho Dam and the drainage area of 2450.00 km2 of station 39130000.
Figure 6 presents the average of the monthly accumulations of all the years of the catchment basin. It is verified that there is little rainfall in the area, with approximately four months of rain and eight months of drought, indicating that the Capibaribe watershed has no hydrological memory.
The hydrological memory of a watershed represents its ability to store and release water over time in response to climatic conditions. It is influenced by factors such as geology and land use, and basins with permeable soils tend to have greater hydrological memory.
Hydrological models face challenges in basins without hydrological memory because they may have less predictable responses, impairing the model’s ability to capture anomalous climate events, given that temporal variability in water retention and release directly influences the hydrological response.

2.3. Model Construction

For the development of the model, the input variables are presented in Table 3; the model predicts a sequence of next steps from a sequence of past observations. As Q ( t 1 ) represents the flow rate for the time prior to Q ( t ) , the delayed flows of 1, 2, 3, …, 32 days with respect to t are called, respectively, Q ( t 1 ) , Q ( t 2 ) , …, Q ( t 32 ) ; likewise, the flows with an advance of 1, 2, 3, …, 32 days in relation to t are called, respectively, Q ( t + 1 ) , Q ( t + 2 ) , …, Q ( t + 32 ) . The same nomenclature logic is used for precipitation.
Thus, in the first scenario, the prediction of the flow for the next day was made from the previous data of one day of flow and precipitation (C-1). In the second scenario, previous data from two days of flow and precipitation were considered in order to predict the next two days of flow (C-2). In the other scenarios, the same logic was used but with 4 (C-4), 8 (L-8), 16 (L-16) and 32 days of data (L-32). The models were grouped into:
  • (Group C): Short-term prediction for 1 (C-1), 2 (C-2) and 4 (C-4) days of prediction;
  • (Group L): Long-term prediction for 8 (L-8), 16 (L-16) and 32 days (L-32) of prediction.
The models were constructed in an orderly manner without mixing the chronological order of the pairs containing the input and output variables and with recursive, repeating values in these pairs. Using model C-1 as an example, we show the pairs in Table 4.
Note that P ( t 2 ) , Q ( t 2 ) and Q ( t + 1 ) are repeated in both the input and output sets. In this way, historical values are used both to predict future flow values and to provide information about past patterns that influence these predictions.
In a short-term context, usually covering periods of up to a week, flow prediction is essential for immediate decision-making. This includes real-time control of water flow, flood prevention, and reservoir water level management. The ability to anticipate intense weather events or sudden changes in hydrological conditions allows for rapid responses, such as the controlled release of water to prevent flooding or the immediate adjustment of the stored volume to meet current demand.
On the other hand, from a long-term perspective, usually covering periods longer than a week, flow forecasting is crucial for strategic planning. This allows for gradual adaptation to seasonal hydrological conditions, contributing to the sustainable management of water resources over time.
Regarding their classification, hydrological models based on artificial intelligence can be adapted to generate both probabilistic and deterministic predictions. Because the model generates unique and specific predictions based on initial conditions and defined parameters, it is reasonable to classify it as deterministic.
The input and output data were normalized between 0 and 1 to prevent the predominance of variables with high values, which is common in machine learning models. It is noteworthy that the normalization was done by variable: for example, the normalization of Q ( t 1 ) considers the data series only of Q ( t 1 ) . For this process, the MinMaxScaler function of the Sklearn library was used, which performs the transformation through Equation (2):
x n = x n x m i n x m a x x m i n
where x n is the nth value of the data series, x n is the value after normalization, and x m i n and x m a x are, respectively, the minimum and maximum values of the data series.
To apply the machine learning models, the data were divided into training and testing using the traintestsplit function from the Sklearn library, and a sensitivity analysis was performed considering the following proportions:
  • 65% for training and 35% for testing (65–35);
  • 70% for training and 30% for testing (70–30);
  • 75% for training and 25% for testing (75–25);
  • 80% for training and 80% for testing (80–20).

2.3.1. Defining Hyperparameters

Hyperparameters are external parameters that are not learned directly by the model during training but need to be specified before the training process begins. These parameters influence the overall behavior of the model and affect training performance and effectiveness. In contrast, model parameters are the weights that the model adjusts during training to make predictions based on the data.
Because the time series is large-scale, which significantly increases the computational cost, and we aim to obtain an initial reference point to evaluate the adaptability of machine learning models, the present work used the standard values for the hyperparameters. These values, because they are generic and work well in a variety of situations, provide computational efficiency, avoiding the initial need for extensive experimentation.

2.3.2. Support Vector Machine

In the SVM model, the SVR function of the Sklearn library was used, for which the list of hyperparameters with their respective types of variables and default values are provided in Table 5.
The SVR machine learning model is an extension of the SVM algorithm for regression tasks. The original SVM was initially developed for classification problems, but the SVR variant has been adapted to handle the prediction of numerical values instead of classes.
The core logic behind SVR is the same as for SVM and involves searching for an optimal hyperplane that best fits the training data. However, unlike SVM classification, where the goal is to find a hyperplane that separates classes efficiently, SVR seeks a hyperplane that optimizes the prediction of continuous values.

2.3.3. Random Forest

In the RF model, the RandomForestRegressor function of the Sklearn library was used, for which the list of hyperparameters with their respective types of variables and default values is provided in Table 6.
The RandomForestRegressor machine learning model resides in the construction of an ensemble of decision trees, which are a fundamental component of RF. Each tree is trained independently using random sampling of both the dataset instances and the characteristics in each node split. This random approach contributes to the diversity among the trees and, consequently, to the robustness of the model.
During the training process, each tree makes individual predictions for the dataset instances according to the decisions made in their structures. The final RandomForestRegressor prediction is obtained by averaging these predictions, resulting in a more stable estimate that is less susceptible to overfitting.
Overfitting is a common phenomenon in machine learning in which a model over-adapts to the specific details of the training data, losing the ability to generalize to new data. This occurs when the model is too complex relative to the inherent complexity of the data, capturing irrelevant patterns, noise, or specific variations of the training set.
As a result, the model’s performance may be excellent on the training data but may fail to tackle new data, hindering its ability to make accurate predictions in real-world situations.

2.3.4. Artificial Neural Network

In the ANN model, the MLPRegressor function from the Sklearn library was used, for which the list of hyperparameters with their respective types of variables and default values is provided in Table 7.
The MLPRegressor machine learning model belongs to the category of ANNs known as MLPs. This model is specifically designed for regression tasks as it is able to perform predictions of numerical values based on input data.
The fundamental structure of MLPRegressor is composed of multiple layers of neurons, with each layer connected to its adjacent layers. This architecture allows the model to capture complex relationships between input and output variables. Unlike simple linear models, MLPRegressor is able to learn nonlinear patterns in data.
During training, MLPRegressor uses an iterative process known as backpropagation. This process involves forward-passing inputs through the network to generate predictions and then comparing those predictions with the actual values to calculate the error. The error is then backpropagated through the network, and the weights of the connections between neurons are adjusted to minimize the error in the next iteration.

2.4. Model Evaluation

The models were evaluated for their performance by means of a graphical analysis and statistical criteria. The purpose of the evaluation was to verify the quality of the calibration and validation results by comparing the flow data simulated by the models with the actual observed data.
The adopted graphical analysis sought to verify the trend of the correlation of the observed and predicted data along the increase in the flow and also with the increase in days according to the models.
The statistical criteria adopted were the Nash–Sutcliffe model efficiency coefficient [18] (NSE), the percentage of bias (PBIAS), the coefficient of determination (R2) and the root mean standard deviation ratio (RSR). The NSE ranges from − to 1, with 1 being representative of the optimal value. Values between 0 and 1 are seen as acceptable performance levels; however, values ≤ 0 indicate that the observed mean is a better predictor than the simulated value, indicating poor performance of the model.
The optimal value of the PBIAS is 0, and positive or negative values with low magnitudes represent good performance. Positive values indicate that the model underestimated measured values, while negative values indicate that the model overestimated measured values.
The R2 estimates the correlation between the measured and simulated values and ranges from 0 to 1, with a value of 1 representing perfect agreement. The RSR ranges from the optimal value of 0, which indicates that the root mean square error (RMSE) is zero, to a large positive value. Therefore, the lower the RSR, the lower the RMSE and the better the simulation performance.
In the quantitative analysis, the Moriasi et al. [19] classification presented in Table 8 was used. The classification used the evaluation of a monthly model, and therefore, for simulations with daily values, it can be considered that NSE values above 0.36 are still satisfactory.
Graphical analyses were performed using scatter plots for all models and graphs of readings over time for models C-1 and L-32. The graph of readings over time for the L-32 model was created considering only the first, fifteenth and last day of the pairs, considering that the values are repeated. Both graphs depict the relationships between two variables visually.
The scatter plot displays observations comparing the independent variable on the x-axis to the dependent variable on the y-axis. The pattern of points on this graph offers insights into the relationship’s nature: indicating whether it follows a linear trend and assessing the model’s fit quality. The second graph shows readings over time, with each point representing observations comparing dates on the x-axis to flows measured at the pluviometric station and predicted by the model on the y-axis. The closer the generated curves, the better the model fit. This graphical representation allows for the identification of difficulties with predicting highs or lows in the series and detecting any delays between the graphs.

3. Results and Discussion

The results of the statistical efficiency coefficients of the models are shown in Table 9 for the test data. It is observed that, in general, all models showed unsatisfactory performance for long-term prediction, which includes 8 (L-8), 16 (L-16) and 32 (L-32) days for every training and testing division (65–35, 70–30, 75–25 and 80–20, where the first number is the percentage of data used for training, and the second number is the percentage of data used for testing, as described in the methodology). In addition, with the increase in prediction days, there is a clear tendency to decrease the quality of the fit.
It is also important to note that, despite the low correlation between the pluviometric and fluviometric stations used in the study, the artificial intelligence models SVM, RF and ANN presented acceptable results for the models for short-term prediction, which includes one day (C-1), two days (C-2) and four days (C-4).
Cheng et al. [20] adopted an ANN and long short-term memory (LSTM) to predict flow rates at daily and monthly scales. For the monthly flow forecast, a recursive prediction approach was used. The two models were trained and validated using precipitation and flow datasets collected in the Nan and Ping river basins in Thailand, covering the period from 1974 to 2014. The main findings of the study highlight that both ANN and LSTM models can provide accurate daily predictions of up to 20 days.
It is not uncommon for the performance of flow prediction models using artificial intelligence to decrease with increasing prediction horizons. This comparison suggests that the accuracy of long-term flow forecasting can vary significantly depending on the modeling method and the approach taken.
The decrease in the quality of adjustments with the increase in the number of forecast days is attributed to increasing uncertainty regarding future conditions and the accumulation of errors over the forecast horizon. According to the statistical criteria, the model that best adapted to the series was SVM 80-20 (support vector machine using 80% for training and 20% for testing), which resulted in the best statistical indices of NSE, PBIAS, R2 and RSR according to Table 10.
Al-Mukhtar [21] investigated the modeling and prediction of sediment in the Tigris River in Baghdad, which is an influential parameter for the pollution of water bodies. Three artificial intelligence methods (RF, SVM and ANN) were employed using observed flow (m3/s) and suspended sediment concentration (mg/L) data collected between 1962–1981 and 2000–2010. The predictive results of the three methods evaluated were analyzed based on the coefficients R2, RMSE and NSE and indicated that RF presented the best performance.
These findings highlight the importance of considering the variation in performance between different forecasting methods as well as the need to properly assess the uncertainties associated with models, especially for long-term forecasting scenarios.
The results of the models are presented in Figure 7, Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12. The models took 90% of the code execution time to train, which means that the computational cost of training is significantly higher than what is required to use the trained model.
In the SVM models, with the increase in the number of days, there was underestimation of the flow forecast values (points approaching the x-axis). Despite being effective at modeling nonlinear relationships, the SVM model is sensitive to complex patterns in time series. For this reason, as the forecast horizon increased, the model was unable to predict the significant changes in flow patterns, as can be seen in Figure 7.
The AF and ANN models had the opposite results of the SVM model: with the increase in days, there was an overestimation of the flow prediction values (points approaching the y-axis), as can be seen in Figure 8 and Figure 9. This is mainly due to overfitting, where the models have adjusted too much to the training data and incorporated transient noise and patterns that are not representative of the actual flow behavior.
Han et al. [22] applied SVM to the Bird Creek watershed for flood prediction and found that, like ANN models, SVM also suffers from underfitting and overfitting problems, with overfitting being more harmful than underfitting. The study also reveals an interesting result in the response of the SVM to different rainfall inputs, where lighter rains generated very different responses than more intense rainfall, similar to what occurred in the present work.
On the graph of the readings over time shown in Figure 10, it is possible to verify the underestimation of the peaks mentioned above for the SVM models. The C-1 model was able to predict peaks better than the L-32 model. It is also verified that there is a delay between the graphs of the observed and measured flow for the L-32 model during the increase in the number of pairs of input and output data points. This delay explains the low statistical coefficients because the accumulated error is summed.
According to Figure 11 and Figure 12, the RF models presented similar results as those of the ANN: generating noise that oscillated much above the measured flow. The underlying relationship between the features and the target variable is highly nonlinear and complex; that is why both RF and ANN struggled to capture it effectively, leading to instability and noisy predictions. As with the SVM model, there is also a delay between the graphs of the observed and measured flow for the L-32 model during the increase in the number of pairs of input and output data points.
The ANN models were able to represent the flow peaks well, as Figure 12 shows. Above all, generated noise oscillated the values that should be zero between values slightly higher or lower than zero, even predicting negative flow values. As with both models presented above, there is also a delay between the graphs of the observed and measured flow for the L-32 model during the increase in the number of pairs of input and output data points.
The results confirm that the Capibaribe watershed does not have hydrological memory. It is recommended in future works using the application of SVM for different watersheds to verify the adaptability of the model according to the hydrological memory of each basin.

4. Conclusions

The general objective of this research was to generate synthetic series of tributary flows to the Jucazinho Dam, which is located in the Agreste region of Pernambuco, based on stochastic models. The specific objectives were to evaluate the adaptability of SVM, RF and ANN machine learning models to generate the synthetic flow series, to analyze the influence of the quality of the adjustments with the increase in the number of days of the forecast, and to verify the quality of the adjustments made by the models using statistical metrics. Based on the results obtained and the discussions presented, it is concluded that:
  • All models showed satisfactory performance for short-term prediction, which includes 1, 2 and 4 days, and unsatisfactory for long-term prediction, which includes 8, 16 and 32 days.
  • The graphical results of the SVM models showed underestimation of the flow values with an increase in the forecast horizon due to the sensitivity of the SVM to complex patterns in the time series.
  • On the other hand, the RF and ANN models showed hyperestimation of the flow values as the number of forecast days increased, which was mainly attributed to overfitting.
  • For all models, an increase in the number of prediction days led to a tendency to decrease the quality of the adjustment; this was mainly justified as due to the delay in the predictions, which generated an accumulation of errors.
  • According to the statistical criteria, the model that best adapted to the series was SVM, which resulted in the best statistical indices of NSE, PBIAS, R2 and RSR.
  • Even in situations where data are scarce, artificial intelligence models SVM, RF and ANN have the potential to be applied for short-term prediction.
  • The Capibaribe watershed does not have hydrological memory, which impacted model training. It is recommended in future works using the application of ANNs in different watersheds to verify the adaptability of the model according to the hydrological memory of each basin.

Author Contributions

Conceptualization, E.J.G.d.S. and A.P.C.; Formal analysis, A.P.C. and S.d.T.M.B.; Investigation, E.J.G.d.S.; Methodology, E.J.G.d.S.; Project administration, S.d.T.M.B.; Software, E.J.G.d.S. and J.F.C.; Supervision, A.P.C. and S.d.T.M.B.; Validation, A.P.C.; Writing—original draft, E.J.G.d.S.; Writing—review and editing, E.J.G.d.S. and S.d.T.M.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Academic Master’s and Doctoral Program for Innovation (MAI/DAI), which was promoted by the Brazilian National Council for Scientific and Technological Development (Conselho Nacional de Desenvolvimento Científico e Tecnológico—CNPq, Brazil), in a partnership between the Federal University of Pernambuco (Universidade Federal de Pernambuco—UFPE, Brazil) and the Pernambuco Sanitation Company (Companhia Pernambucana de Saneamento—COMPESA, Brazil).

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Acknowledgments

The authors would like to thank Vinnycius Luz and Milton Melo Neto from the Pernambuco Sanitation Company (Companhia Pernambucana de Saneamento—COMPESA, Brazil), the National Council for Scientific and Technological Development (Conselho Nacional de Desenvolvimento Científico e Tecnológico—CNPq, Brazil) for the productivity scholarships for Artur Coutinho [process 315927/2021-6] and Saulo Bezerra [process 308202/2022-8], the Foundation for Support of Science and Technology of the State of Pernambuco (Fundação de Amparo à Ciência e Tecnologia de Pernambuco—FACEPE, Brazil) [process APQ-1767-3.01/22], and the Coordination for the Improvement of Higher Education Personnel (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—CAPES, Brazil) [Finance Code 001].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Jansen, R.B. Dams and Public Safety: A Water Resources Technical Publication; United States Printing Office: Denver, CO, USA, 1980.
  2. World Commission on Dams. Dams and Development: A New Framework for Decision-Making: The Report of the World Commission on Dams; Earthscan: Oxford, UK, 2000. [Google Scholar]
  3. Agência Pernambucana de Águas e Clima-Apac. Available online: https://acesse.one/hyipu (accessed on 20 November 2022).
  4. Agência Nacional das Águas-Ana. Reservatórios do Semiárido Brasileiro: Hidrologia, Balanço Hídrico e Operação; ANA: Brasília, Brazil, 2017; Volume Anexo E, 178p. [Google Scholar]
  5. Companhia Pernambucana de Saneamento-Compesa. Available online: https://l1nq.com/RCLDM (accessed on 20 November 2022).
  6. Santana, R.A.; Bezerra, S.T.M.; Santos, S.M.; Coutinho, A.P.; Coelho, I.C.L.; Pessoa, R.S.V. Assessing alternatives for meeting water demand: A case study of water resource management in the Brazilian Semiarid region. Util. Policy 2019, 61, 100974. [Google Scholar]
  7. Ali, S.; Shahbaz, M. Streamflow forecasting by modeling the rainfall–streamflow relationship using artificial neural networks. Model. Earth Syst. Environ. 2020, 6, 1645–1656. [Google Scholar] [CrossRef]
  8. Parisouj, P.; Mohebzadeh, H.; Lee, T. Employing machine learning algorithms for streamflow prediction: A case study of four river basins with different climatic zones in the United States. Water Resour. Manag. 2020, 34, 4113–4131. [Google Scholar] [CrossRef]
  9. Meshram, S.G.; Meshram, C.; Santos, C.A.G.; Benzougagh, B.; Khedher, K.M. Streamflow prediction based on artificial intelligence techniques. Iran. J. Sci. Technol. Trans. Civ. Eng. 2022, 46, 2393–2403. [Google Scholar] [CrossRef]
  10. Sun, N.; Zhang, S.; Peng, T.; Zhang, N.; Zhou, J.; Zhang, H. Multi-variables-driven model based on random forest and Gaussian process regression for monthly streamflow forecasting. Water 2022, 14, 1828. [Google Scholar] [CrossRef]
  11. Islam, K.I.; Elias, E.; Carroll, K.C.; Brown, C. Exploring random forest machine learning and remote sensing data for streamflow prediction: An alternative approach to a process-based hydrologic modeling in a snowmelt-driven watershed. Remote Sens. 2023, 15, 3999. [Google Scholar] [CrossRef]
  12. De Sousa, M.F., Jr.; Uliana, E.M.; Aires, R.V.; Rápalo, L.M.; da Silva, D.D.; Moreira, M.C.; Lisboa, L.; da Silva Rondon, D. Streamflow prediction based on machine learning models and rainfall estimated by remote sensing in the Brazilian Savanna and Amazon biomes transition. Model. Earth Syst. Environ. 2024, 10, 1191–1202. [Google Scholar] [CrossRef]
  13. Adnan, R.M.; Liang, Z.; Heddam, S.; Kermani, M.; Kisi, O.; Li, B. Least square support vector machine and multivariate adaptive regression splines for streamflow prediction in mountainous basin using hydro-meteorological data as inputs. J. Hydrol. 2020, 586, 124371. [Google Scholar] [CrossRef]
  14. Essam, Y.; Huang, Y.F.; Ng, J.L.; Birima, A.H.; Ahmed, A.N.; El-Shafie, A. Predicting streamflow in Peninsular Malaysia using support vector machine and deep learning algorithms. Sci. Rep. 2022, 12, 3883. [Google Scholar] [CrossRef] [PubMed]
  15. Ikram, R.M.A.; Hazarika, B.B.; Gupta, D.; Heddam, S.; Kisi, O. Streamflow prediction in mountainous region using new machine learning and data preprocessing methods: A case study. Neural Comput. Appl. 2023, 35, 9053–9070. [Google Scholar] [CrossRef]
  16. Girão, L.C.P. Uma Análise da Contribuição dos Programas Básicos Ambientais Como Instrumento de Gestão Ambiental Para a Barragem de Jucazinho Localizada no Município de Surubim/PE. Master’s Thesis, Universidade Federal de Pernambuco-UFPE, Recife, Brazil, 2004. [Google Scholar]
  17. Neves, Y.T.; Rodrigues, A.; Cabral, J.J.S.P. Modelagem computacional do rompimento hipotético da barragem de Jucazinho no estado de Pernambuco (Brasil). Rev. DAE 2021, 69, 167–182. [Google Scholar] [CrossRef]
  18. Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I-A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
  19. Moriasi, D.N.; Arnold, J.G.; van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Am. Soc. Agric. Biol. Eng. 2007, 50, 885–900. [Google Scholar]
  20. Cheng, M.; Fang, F.; Kinouchi, T.; Navon, I.M.; Pain, C.C. Long lead-time daily and monthly streamflow forecasting using machine learning methods. J. Hydrol. 2020, 590, 125376. [Google Scholar] [CrossRef]
  21. Al-Mukhtar, M. Random forest, support vector machine, and neural networks to modelling suspended sediment in Tigris River-Baghdad. Environ. Monit. Assess. 2019, 191, 673. [Google Scholar] [CrossRef] [PubMed]
  22. Han, D.; Chan, L.; Zhu, N. Flood forecasting using support vector machines. J. Hydroinform. 2007, 9, 267–276. [Google Scholar] [CrossRef]
Figure 1. Situation map.
Figure 1. Situation map.
Hydrology 11 00097 g001
Figure 2. Total withdrawal demands.
Figure 2. Total withdrawal demands.
Hydrology 11 00097 g002
Figure 3. Pluviometric and fluviometric stations in the Jucazinho Dam’s catchment area.
Figure 3. Pluviometric and fluviometric stations in the Jucazinho Dam’s catchment area.
Hydrology 11 00097 g003
Figure 4. Correlation matrix graph.
Figure 4. Correlation matrix graph.
Hydrology 11 00097 g004
Figure 5. Data series used in this study.
Figure 5. Data series used in this study.
Hydrology 11 00097 g005
Figure 6. Average of the cumulative monthly index of all the years in the series.
Figure 6. Average of the cumulative monthly index of all the years in the series.
Hydrology 11 00097 g006
Figure 7. Scatter plots of the tests for SVM 80-20.
Figure 7. Scatter plots of the tests for SVM 80-20.
Hydrology 11 00097 g007
Figure 8. Scatter plots of tests for RF 80-20.
Figure 8. Scatter plots of tests for RF 80-20.
Hydrology 11 00097 g008
Figure 9. Scatter plots of tests for ANN 80-20.
Figure 9. Scatter plots of tests for ANN 80-20.
Hydrology 11 00097 g009
Figure 10. Comparison between observed and simulated flows for SVM 80-20.
Figure 10. Comparison between observed and simulated flows for SVM 80-20.
Hydrology 11 00097 g010
Figure 11. Comparison between observed and simulated flows for RF 80-20.
Figure 11. Comparison between observed and simulated flows for RF 80-20.
Hydrology 11 00097 g011
Figure 12. Comparison between observed and simulated flows for ANN 80-20.
Figure 12. Comparison between observed and simulated flows for ANN 80-20.
Hydrology 11 00097 g012
Table 1. Technical data sheet of the Jucazinho Dam.
Table 1. Technical data sheet of the Jucazinho Dam.
NameDimensionUnit
Embankment
Latitude07°57′59.39″ S-
Longitude35°44′3.16″ W-
Incremental drainage area2865.60km2
Total drainage area4149.90km2
Maximum volume204.82hm3
Minimum volume0.29hm3
Usable volume326.75hm3
Maximum operating water level292.00m
Minimum operating water level253.00m
Elevation of the bottom of the lake238.00m
Crest
Length442.00m
Width8.00m
Elevation299.00m
Main Spillway
Length170.00m
Crest elevation292.00m
Distance between spillway and embankment crest7.00m
Maximum flow rate5446.69m3·s−1
Side Spillways
Length57.00m
Crest elevation295.00m
Maximum flow rate1291.30m3·s−1
Gallery
Length2.00m
Width2.00m
Elevation250.00m
Maximum flow rate2.72m3·s−1
Table 2. General data of the pluviometric and fluviometric stations used.
Table 2. General data of the pluviometric and fluviometric stations used.
InformationPluviometric StationFluviometric Station
Station NameTaquaritinga do NorteToritama
Code73604239130000
Basin3—Atlantic, NW/NE section3—Atlantic, NW/NE section
Sub-basin39—Capibaribe, Ipojuca, Una, Goiana, Mundaú, Paraíba do Meio, Coruripe, Sirinhaém, São Miguel and Camaragibe Rivers39—Capibaribe, Ipojuca, Una, Goiana, Mundaú, Paraíba do Meio, Coruripe, Sirinhaém, São Miguel and Camaragibe Rivers
CityTaquaritinga do NorteSanta Cruz do Capibaribe
StatePernambucoPernambuco
AccountableANAANA
OperatorGeological Survey of Brazil (CPRM)CPRM
Latitude−7.9039−8.0128
Longitude36.0469−36.0578
Elevation (m)785376
Drainage area (km2)-2450
Distance to Jucazinho Dam (km)34.3135.16
Start of the series1 January 19861 January 1973
End of the series30 June 20231 June 2023
Series size (years)36.549.5
Table 3. Structure of the model for predicting flows.
Table 3. Structure of the model for predicting flows.
TemplateInputOutput
C-1 P ( t 1 ) , Q ( t 1 ) Q ( t )
C-2 P ( t 1 ) , P ( t 2 ) , Q ( t 1 ) , Q ( t 2 ) Q ( t ) , Q ( t + 1 )
C-4 P ( t 1 ) , P ( t 2 ) , …, P ( t 4 ) , Q ( t 1 ) , Q ( t 2 ) , …, Q ( t 4 ) Q ( t ) , Q ( t + 1 ) , Q ( t + 2 ) , Q ( t + 3 )
L-8 P ( t 1 ) , P ( t 2 ) , …, P ( t 8 ) , Q ( t 1 ) , Q ( t 2 ) , …, Q ( t 8 ) Q ( t ) , Q ( t + 1 ) , …, Q ( t + 7 )
L-16 P ( t 1 ) , P ( t 2 ) , …, P ( t 16 ) , Q ( t 1 ) , Q ( t 2 ) , …, Q ( t 16 ) Q ( t ) , Q ( t + 1 ) , …, Q ( t + 15 )
L-32 P ( t 1 ) , P ( t 2 ) , …, P ( t 32 ) , Q ( t 1 ) , Q ( t 2 ) , …, Q ( t 32 ) Q ( t ) , Q ( t + 1 ) , …, Q ( t + 31 )
Table 4. Ordering and recursion of models, using model C-1 as an example.
Table 4. Ordering and recursion of models, using model C-1 as an example.
PairInputOutput
1 P ( t 1 ) , P ( t 2 ) , Q ( t 1 ) , Q ( t 2 ) Q ( t ) , Q ( t + 1 )
2 P ( t ) , P ( t 1 ) , Q ( t ) , Q ( t 1 ) Q ( t + 1 ) , Q ( t + 2 )
3 P ( t + 1 ) , P ( t ) , Q ( t + 1 ) , Q ( t ) Q ( t + 2 ) , Q ( t + 3 )
n P ( t + n 2 ) , P ( t + n 3 ) , Q ( t + n 2 ) , Q ( t + n 3 ) Q ( t + n 1 ) , Q ( t + n )
Table 5. Hyperparameters of the SVR function from the Sklearn library.
Table 5. Hyperparameters of the SVR function from the Sklearn library.
ParameterVariable TypeDefault Value
Kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable‘rbf’
Degreeint3.0
Gamma{’scale’, ’auto’} or floatScal
coef0float0.0
tolfloat0.001
Cfloat1.0
Epsilonfloat0.1
Shrinkingbooltrue
cache_sizefloat200
Verboseboolfalse
max_iterint−1.0
Table 6. Hyperparameters of the RandomForestRegressor function of the Sklearn library.
Table 6. Hyperparameters of the RandomForestRegressor function of the Sklearn library.
ParameterVariable TypeDefault Value
n_estimatorsint100
criterionstring: {“squared_error”, “absolute_error”, “friedman_mse”, “poisson”}squared_error
max_depthintNone
min_samples_splitint or float2.0
min_samples_leafint or float1.0
min_weight_fraction_leaffloat0.0
max_featuresstring: {“sqrt”, “log2”, None}, int or float1.0
max_leaf_nodesintNone
min_impurity_decreasefloat0.0
bootstrapbooltrue
oob_scoreboolfalse
n_jobsintNone
random_stateint, RandomState or NoneNone
verboseInt0.0
warm_startboolFalse
ccp_alphanon-negative float0.0
max_samplesint or floatNone
Table 7. Hyperparameters of the MLPRegressor function from the Sklearn library.
Table 7. Hyperparameters of the MLPRegressor function from the Sklearn library.
ParameterVariable TypeDefault Value
hidden_layer_sizesarray-like of shape(n_layers-2,)100
activation{‘identity’, ‘logistic’, ‘tanh’, ‘relu’}‘relu’
solver{‘lbfgs’, ‘sgd’, ‘adam’}‘adam’
alphfloat0.0001
batch_sizeintauto
learning_rate{‘constant’, ‘invscaling’, ‘adaptive’}‘constant’
learning_rate_initfloat0.001
power_tfloat0.5
max_iterint200
shuffleboolTrue
random_stateintNone
tolfloat0.0001
verboseboolFalse
fwarm_startboolFalse
momentumfloat0.9
nesterovs_momentumboolTrue
early_stoppingboolFalse
validation_fractionfloat0.1
beta_1float0.9
beta_2float0.999
epsilonfloat0.00000001
n_iter_no_changeint10
max_funint15,000
Table 8. Classification of modeling efficiency coefficients.
Table 8. Classification of modeling efficiency coefficients.
ClassificationNSEPBIAS
Very Good0.75 < NSE 1.00 PBIAS ± 10
Good0.65 < NSE 0.75 ± 10 < PBIAS ± 15
Satisfactory0.50 < NSE 0.65 ± 15 < PBIAS ± 25
UnsatisfactoryNSE 0.50 PBIAS ± 25
ClassificationR2RSR
Very Good0.8 < R2  1.0 0.0 < RSR 0.5
Good0.7 < R2  0.8 0.5 < RSR 0.6
Satisfactory0.6 < R2  0.7 0.6 < RSR 0.7
UnsatisfactoryR2  0.6 RSR 0.6
Table 9. Statistical criteria of training.
Table 9. Statistical criteria of training.
NSE
SVMRFANN
65–3570–3 075–2580–2065–3570–3075–2580–2065–3570–3075–2580–20
C-10.540.830.910.940.700.870.850.820.730.920.920.93
C-20.400.750.850.910.660.850.800.690.710.870.880.79
C-40.270.630.780.860.570.630.720.560.640.760.770.71
L-80.160.470.680.800.500.490.620.480.530.600.700.59
L-160.080.300.520.700.370.350.500.200.410.370.570.38
L-320.010.170.410.570.210.050.06–1.810.240.150.420.00
PBIAS
SVMRFANN
65–3570–3075–2580–2065–3570–3075–2580–2065–3570–3075–2580–20
C-1–27.73–10.17–0.095.52–9.631.157.239.04–20.128.42–11.7420.32
C-2–37.38–16.13–4.182.61–5.611.279.6413.25–11.943.61–15.16.78
C-4–48.06–23.69–7.710.94–13.055.9716.4920.56–19.356.6761.9237.23
L-8–56.63–32.24–13.74–1.23–22.898.7823.2731.75–16.22–0.04–1.7755.23
L-16–62.5–41.70–23.98–6.69–34.8815.4834.7249.17–24.0115.2239.5322.71
L-32–70.99–51.05–30.33–14.68–44.5723.6382.32128.36–35.5712.3853.8763.46
R2
SVMRFANN
65–3570–3075–2580–2065–3570–3075–2580–2065–3570–3075–2580–20
C-10.540.830.910.940.700.870.850.820.730.920.920.93
C-20.400.750.850.910.660.850.800.690.710.870.880.79
C-40.270.630.780.860.570.630.720.560.640.760.770.71
L-80.160.470.680.800.500.500.620.480.530.600.700.59
L-160.080.300.520.700.370.350.500.200.410.370.570.38
L-320.010.170.410.570.210.050.03–1.810.240.150.410.00
RSR
SVMRFANN
65–3570–3075–2580–2065–3570–3075–2580–2065–3570–3075–2580–20
C-10.670.410.300.250.550.360.380.420.520.280.280.27
C-20.780.500.380.300.580.380.440.550.540.350.340.46
C-40.860.610.470.370.660.610.530.660.600.490.480.54
L-80.920.730.560.440.700.710.610.720.690.630.550.64
L-160.960.840.690.550.800.800.710.900.770.790.650.79
L-320.990.910.770.660.890.970.971.680.870.920.761.00
Table 10. Better statistical criteria for training.
Table 10. Better statistical criteria for training.
C-NNSEPBIASR2RSR
C-1SVM 80-20SVM 75-25SVM 80-20SVM 80-20
C-2SVM 80-20RF 70-30SVM 80-20SVM 80-20
C-4SVM 80-20SVM 80-20SVM 80-20SVM 80-20
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Silva, E.J.G.d.; Coutinho, A.P.; Cardoso, J.F.; Bezerra, S.d.T.M. Jucazinho Dam Streamflow Prediction: A Comparative Analysis of Machine Learning Techniques. Hydrology 2024, 11, 97. https://doi.org/10.3390/hydrology11070097

AMA Style

Silva EJGd, Coutinho AP, Cardoso JF, Bezerra SdTM. Jucazinho Dam Streamflow Prediction: A Comparative Analysis of Machine Learning Techniques. Hydrology. 2024; 11(7):97. https://doi.org/10.3390/hydrology11070097

Chicago/Turabian Style

Silva, Erickson Johny Galindo da, Artur Paiva Coutinho, Jean Firmino Cardoso, and Saulo de Tarso Marques Bezerra. 2024. "Jucazinho Dam Streamflow Prediction: A Comparative Analysis of Machine Learning Techniques" Hydrology 11, no. 7: 97. https://doi.org/10.3390/hydrology11070097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop