Next Article in Journal
An Audio-Based Motor-Fault Diagnosis System with SOM-LSTM
Previous Article in Journal
Numerical Simulation and Engineering Application of Synergistic Support Effect of Bolt–Mesh–Cable Support in Gob-Side Entry of Deep Soft Coal Seam
Previous Article in Special Issue
Digitalization of Management Processes in Small and Medium-Sized Enterprises—An Overview of Low-Code and No-Code Platforms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules

by
Gheorghe Grigoras
*,
Răzvan Gârbea
and
Bogdan-Constantin Neagu
Department of Power Engineering, “Gheorghe Asachi” Technical University of Iasi, 700050 Iasi, Romania
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(18), 8228; https://doi.org/10.3390/app14188228
Submission received: 14 August 2024 / Revised: 9 September 2024 / Accepted: 11 September 2024 / Published: 12 September 2024
(This article belongs to the Special Issue Intelligent Computing Systems and Their Applications)

Abstract

:
The increasing importance of hydropower generation has led to the development of new smart technologies and the need for reliable and efficient equipment in this field. As long as hydropower plants are more complex to build up than other power plants, the operation regimes and maintenance activities become essential for the hydropower companies to optimize their performance, such that including the data-driven approaches in the decision-making process represents a challenge. In this paper, a comprehensive and multi-task framework integrated into a Knowledge Discovery module based on Data Mining to support the decisions of the operators from the control rooms and facilitate the transition from the classical to smart Supervisory Control and Data Acquisition (SCADA) system in hydropower plants has been designed, developed, and tested. It integrates tasks related to detecting the outliers through advanced statistical procedures, identifying the operating regimes through the patterns associated with typical operating profiles, and developing strategies for loading the generation units that consider the number of operating hours and minimize the water amount used to satisfy the power required by the system. The proposed framework has been tested using the SCADA system’s database of a hydropower plant belonging to the Romanian HydroPower Company. The framework can offer the operators from the control room comparative information for a time horizon longer than one year. The tests demonstrated the utility of a Knowledge Discovery module to ensure the transition toward smart SCADA systems that will help the decision-makers improve the management of the hydropower plants.

1. Introduction

Hydropower is one of the most important sources of electricity globally, providing over 15% and with a total hydropower fleet of 1412 GW in 2023 [1], along with wind and solar. It helps to cut down on greenhouse gas emissions, which are a major issue of global warming. The electrical power sector is playing its part in reducing its impact on the environment by utilizing clean and renewable energy sources [2]. Hydropower plants currently account for over 75% of the world’s renewable energy sources and around 30% of the world’s flexible electricity supply capacity [3,4]. One of the most important factors that contributes to the security and flexibility of power systems is the ability of hydropower plants to generate electricity rapidly compared to other power plants such as coal, natural gas, and nuclear. They can also be stopped and started relatively quickly. Because of their high degree of flexibility, hydropower plants can quickly adapt to changes in energy demand. They can also compensate for the variations in supply from other energy production sources. This makes them an ideal choice to support integrating wind and solar power sources. Despite their widespread use, they have a huge potential to expand globally [3].
On the other hand, efficient planning and computational enhancements can increase the energy output by using the same available water [2]. The optimal operation of a hydroelectric power plant involves the gathering and processing of vast amounts of input data. Unfortunately, the techniques used in the design and implementation of the plant’s operations are not always able to extract the most value from the data. This paper proposes a clustering-based method that can help the Decision-Makers (DEMA) identify the optimal hourly load patterns of the generators. The daily generation scheduling method is an integral part of the decision-making process in a power plant. It helps reduce the time it takes to make critical decisions by implementing effective maintenance plans and energy production techniques [5].
The increasing importance of hydropower generation has led to the development of new technologies and the need for reliable and efficient equipment. It has become a vital factor in the operations and maintenance of these machines. Hydropower plants are typically more expensive to set up than other energy sources. Nonetheless, they have a longer lifespan than other power plants. The lifespan of a hydropower plant can be up to 50 years or more, which is longer than that of thermal plants. Usually, the economic and financial analyses suppose a lifespan of 30–40 years [6]. The operations and maintenance (O&M) costs should not be overlooked by the DEMA. They are typically around 2% of the investment. The specific O&M cost for large projects is around 2% to 2.5% and for small projects from 1% to 6% [7]. Although routine maintenance is carried out on the equipment, digital solutions can improve predictive maintenance allowing the hydropower plants to increase their efficiency [8], helping to maximize the life of a plant’s resources and assets. A study presented in [9] concluded that digitalization could improve the efficiency of a hydropower plant by 1% by better distributing the flow among the different turbine units, and the annual energy production can be increased by approximately 11%, depending on the site, if the spills and the hours to manage manual operations are reduced. It can also prevent costly repairs by identifying potential issues early. One of the most significant decisions to implement this approach is to monitor continuously the equipment’s condition through a smart SCADA system where the Knowledge Discovery modules are to be integrated. The core step used in the Knowledge Discovery modules is Data Mining. It involves extracting information and transforming it into a more understandable format (containing the operation patterns) by the DEMA from the operators in the control room of the hydropower plant. The information provided through such Knowledge Discovery modules can help improve the lifetime of the power units by reducing their downtime and enhancing their production. It can also minimize the costs of operations and maintenance [10].
Compared to other sustainable initiatives, such as eco-friendly products and renewable energy projects, the lack of visibility of the smart SCADA systems makes them less apparent. But they can play a vital role in helping the hydropower sector to improve resource utilization, optimize maintenance operations, and fulfill sustainability objectives [11]. The following strong points can be highlighted regarding the sustainability objectives of a smart SCADA [12,13,14]:
  • Remote monitoring and control: The SCADA systems can help reduce the need for travel to physical places of the electrical/mechanical aggregates from the hydropower plant due to the remote terminal units (RTUs), thereby decreasing carbon emissions associated with transportation. The systems allow the remote monitoring and control of aggregates/equipment from the control room using RTUs. Because the dispatcher can monitor and control the systems using RTUs, this flexibility reduces the need to move the service team in all important points, contributing thus to sustainability by lowering carbon emissions.
  • Energy efficiency: A smart SCADA system can help the use of a smaller water amount from the dam by monitoring and storing historical data in real time. This enables them to identify the improper working and leaks in the aggregates and installations, which can result in timely repairs and lower energy consumption.
  • Compliance with environmental regulations: A smart SCADA system can generate reports and track key environmental metrics, allowing hydropower companies to reduce their ecological footprint by monitoring the working hours of their generators and turbines. This enables them to plan their maintenance operations more precisely, which helps them save on resources and minimizes downtime.
  • Predictive maintenance: Predictive maintenance is ensured by a smart SCADA system, which keeps track of the hydro aggregate’s performance and running hours. This helps in identifying potential issues and planning preventive maintenance activities, which extends the lifespan of the equipment and reduces waste and the impact of manufacturing new machinery and parts. These systems can also provide notifications in real time in the event of malfunctions.
  • Rapid response to issues: Integrating with the Internet of Things can help a SCADA system provide rapid response capabilities. This allows it to monitor and respond to environmental incidents and operational issues in real time. It can also prevent more significant issues, such as failure or water leaks.
The challenge associated with a smart SCADA system should respond to the following two questions:
  • How can a deeper analysis of the data from various processes and equipment within the hydropower plant be performed?
  • Which is the best approach to perform the analysis?
This process must be carried out efficiently to maximize the information from the SCADA database. The literature presents different Knowledge Discovery applications in the hydropower industry. Parvez et al. [2] proposed a linear regression procedure used to determine the energy production relationship between upstream and downstream hydro plants. A cluster analysis has been performed to find the typical generation curves. The goal of this project is to develop a class-based extreme learning machine that can determine the optimal operation rule for a hydropower reservoir. Through a k-means clustering algorithm, the cluster analysis is performed to split the influence factors into several sub-regions. The extreme learning machine is then optimized by particle swarm analysis to identify the complex relationship between the cluster’s input and output. Feng et al. developed [15] a class-based extreme learning machine that can determine the optimal operation rule for a hydropower reservoir. Through a k-means clustering algorithm, the cluster analysis is performed to split the influence factors into several subregions. The extreme learning machine is then optimized by particle swarm analysis to identify the complex relationship between the cluster’s input and output. Zhang et al. proposed [16] an approach that can improve the quality of the monitoring data collected from hydropower units by implementing a clustering algorithm. This approach can be used to solve various problems related to the condition monitoring. A standard system to classify the huge amount of information that is collected and stored has been proposed by researchers in the study performed [17]. The system can meet the needs of the DEMA and provide them with the necessary services. Ahmed et al. [18] used three approaches, Local Outlier Factor as a density-based method, Feature Bagging for Outlier Detection as an ensemble method, and Subspace Outlier Degree, to analyze the anomalous data collected from a hydropower plant and compare their performance. The outliers were then verified by the expert utilizing a feature selection process and a decision tree to identify the critical variables that could be associated with the anomalies. Valencia et al. presented [19] a procedure that uses Knowledge Discovery to analyze a data set and extract structured information related to a hydroelectric power plant. This method can be utilized to train systems focused on identifying faults. Zhang et al. proposed [20] a decision tree-based clustering scheme that can be used to determine the various operating regimes in hydropower plants. The method uses k-means++ clustering to classify the data. The decision tree is then constructed using the group labels and other features. The decision tree is then analyzed and pruned according to the classification accuracy and complexity requirements. The reference [21] introduced the data mining concept integrated into a SCADA system to help the hydropower plant’s operators make informed decisions. The data collected by the data mining process can determine the typical loading profiles of each generation unit. Sahin and Karakus presented [22] a study on the energy generation forecasting of a hydroelectric plant based on Machine Learning and a hybrid Genetic Grey Wolf Optimizer-based Convolutional Neural Network/Recurrent Neural Network-Long Short-Term Memory regression approach. The findings can help improve the efficiency of resource management and energy generation.
Two aspects draw attention concerning the current applications of Knowledge Discovery (KD) in hydropower plants:
  • The vast majority of applications aim for a single task implemented at the level of data analysis regarding the energy production relationship between upstream and downstream hydro plants, energy production forecasting, identifying the operating regimes, improving the data quality, outliers’ detection, identifying faults, or determining the typical operating profiles.
  • The analysis time horizon corresponds with a day, season, or year.
Using as a starting point these two remarks, the main contribution of the paper is associated with designing, developing, and testing an original clustering-based data mining framework integrated into a Knowledge Discovery module from the SCADA software of a hydropower plant which fulfills more tasks regarding:
  • Performing an advanced statistical analysis and outliers’ detection,
  • Identifying the operating regimes and hourly typical operating profiles,
  • Developing the strategies for loading the generation units that consider the number of hours of operation and the minimization of the amount of water used to satisfy the power required by the system.
The framework can offer comparative information for a time horizon longer than one year, allowing the quick identification of key performance indicators that characterize the operation of the hydropower plant.
The remainder of the paper includes four sections. Section 2 presents the theoretical aspects regarding the integration of Knowledge Discovery and Data Mining in the smart SCADA, Section 3 integrates the details on the multi-task framework integrated into the Data Mining-based Knowledge Discovery, Section 4 covers the case study where the proposed framework has been tested using the SCADA system’s database of a hydropower plant belonging to the Romanian HydroPower Company, and Section 5 highlights the conclusions and future work.

2. Knowledge Discovery and Data Mining in Smart SCADA

2.1. Knowledge Discovery vs. Data Mining

Knowledge Discovery (KD) and Data Mining (DM) have transformed the power engineering research. To carry out effective and meaningful research, a deep understanding of various aspects of data mining and knowledge discovery is necessary [23]. It includes the expertise of specialists who work in all components associated with the energy generation, transmission, and distribution of the chain representing the power system.
There are some misunderstandings about the terms data mining and knowledge discovery defined in databases. Although many specialists and researchers use DM as a synonym for knowledge discovery, DM is not the entire knowledge discovery process. In addition to being defined as data mining, it comes with other names, such as information discovery or knowledge extraction. The KD is a process that aims to identify relationships and patterns in large datasets. It is defined typically as a non-trivial process that involves identifying novel, useful, and understandable patterns. In a narrow sense, KD refers to extracting information from a data source. While it can be performed through various methods, the term refers to obtaining knowledge from textual or database data. The combined process is referred to as the KD process.
Figure 1 shows the steps of a KD process, integrating the Data Mining, and the details are introduced in the following [21,23,24].
Data Selection. KD’s initial step is data selection, which involves gathering information from the SCADA database. This process is carried out to create a raw dataset.
Data Cleaning (Preprocessing). Ensuring that the data collected are of good quality is performed through preprocessing. This process involves handling noise, missing values, and inconsistencies.
Data Transformation. After cleaning the data, it is usually necessary to transform it into something suitable for mining. This can be carried out through various methods such as feature engineering and scaling. To make it easier for the machine learning tools, the label encoding was used to convert categorical data into a more readable format.
Data Mining. The DM step involves uncovering patterns, anomalies, or relationships between the data. The DM process is composed of numerous steps, each of which is related to a specific discovery task. The extraction of knowledge involves the process of gathering and storing information. It also involves analyzing and visualizing the data, designing models for machine and human interaction, and learning how to use efficient methods. One of the most frequently used techniques is clustering, which enables us to group the data into distinct groups based on similarity.
Interpretation. After Data Mining, the next step is to interpret the results. This involves understanding the clusters (patterns) and their features.

2.2. Data Mining Techniques

The Data Mining techniques are divided into two main categories: descriptive and predictive, see Figure 2 [25,26,27].
Cross-tabulation, correlation, and frequency are some of the characteristics used in the production of descriptive data mining. This process is utilized to identify the similarities between the data and the existing patterns. Another characteristic used in this type of analysis is associated with developing captivating subgroups. This is performed by analyzing the data and transforming it into meaningful information. Descriptive data mining involves techniques, such as clustering, association rule mining, and sequence discovery analysis.
Clustering. is commonly used in data mining to organize information by grouping related data points. It helps the DEMA identify patterns and similarities between different sets of information. It can be used to classify and extract patterns in the data, identify anomalies, or analyze spatial data.
Association Rules. The goal is to find the correlations in the data sets from the database. Even if the data sets come from different sources, these correlations can identify patterns that can help reveal process trends or explain the operation characteristics of the equipment/installations.
Sequence discovery analysis. The goal of sequence discovery analysis is to find interesting data sets that contain sequential patterns. This process usually involves identifying frequent patterns about a certain frequency support measure.
The second category aims to predict the future results of a given variable with a high degree of accuracy. This data mining is carried out using supervised learning techniques. There are three categories of methods that are used in this type of mining: regression, classification, and time-series analysis. The latter two are utilized in predictive analysis to model the data.
Regression. It is similar to the clustering technique in that it focuses on the relationship between a target and an independent variable. The target variables can be influenced by different predictors or independent factors, which is why a regression analysis is utilized. It predicts the outcomes based on the input fields relevant to the target.
Classification. It is carried out by identifying the various features of the data. This process helps to identify patterns and extract meaningful insights. It also helps in improving the quality of the data. The appropriate features are selected to classify it after the data has been collected and analyzed. A suitable algorithm is then chosen to implement.
  • Support Vector Machine. This algorithm creates a boundary between the different classes/patterns. It identifies the features that are most important to the classification process.
  • Decision Tree. The classification process is carried out using a tree-based structure. This algorithm uses a set of conditions to categorize the data. The root nodes of the structure are set for the test conditions, while the leaf nodes are for the outcome.
  • Neural Network. A neural network model is a computational resource that can recognize the relationships between various data sets. These units, which act like neurons, are formed by connecting the inputs and outputs. The model considers the connection strength and outputs the information in a hidden layer. The neural network is similar to the human brain in that it requires training to be effective. Although it can be hard to interpret, the models are reliable and can even classify past training procedures.
The Smart SCADA can improve the efficiency of the hydropower plant operation by combining Knowledge Discovery and statistical analysis. Figure 3 displays the interdependencies between the Knowledge Discovery and Smart SCADA (adapted after [28]).
Data are everything in the operation of a hydropower plant. The collection, analysis, and control actions make up the difference between classical and smart SCADA systems [29]. One of the biggest obstacles greeted by the transformation from the classic into a smart SCADA system is the lack of investment in hydropower processes. Many monitoring and control devices, representing remote terminal units and sensors from the hydropower processes, are old and should be replaced, but their upgrade is incredibly costly. The smart infrastructure integrating the data analytics modules alongside existing hardware can represent the solution for hydropower companies. If the process starts with the software part, it is much easier than replacing the hardware part. Thus, the transition from the classical to a smart SCADA system will be gradual. The goal of implementing a gradual transition is to make sure that the new system is stable before starting work on a smart SCADA infrastructure. The smart SCADA is built using knowledge discovery tools and delivers a unique and comprehensive solution for data processing from the hydropower aggregates/equipment. This holistic approach allows the fulfillment of the tasks at the hydropower plant level by the control room operator and the central level by the hydropower arrangement dispatcher.

3. Multi-Task Framework Integrated into the Knowledge Discovery Module

The main steps in the multi-task framework related to developing the Knowledge Discovery module based on Data Mining to facilitate the transition from the classical to smart SCADA system in the hydropower plants are discussed in this section.
Figure 4 presents the basic structure of an automation architecture presented in [30], including the proposed Knowledge Discovery module in the SCADA system implemented at the level of the power plant.
The data acquisition is performed on main hydro components (turbine, generator, power transformer, substation level), recorded in the SCADA system, and made available to the operator from the control room through the Human–Machine Interface. The Data Mining-based Knowledge Discovery module will be implemented at the SCADA level to ensure data-driven decision-making, ensuring the transition toward smart SCADA.
Figure 5 shows the flow chart of the multi-task framework integrated into the Knowledge Discovery module. Details regarding each task are provided above.
The SCADA system collects the water flows in the pipes that supply the turbines (WF_Pipe_1,…, WF_Pipe_n, and WFtot_Pipes), in [m3], the active powers produced by the generation units (GUs), which are supplied through Pipe_1, …, Pipe_n (P_GU_Pipe_1, …, P_GU_Pipe_n), the requested active and reactive power of the system to the hydropower plant (P_req and Q_req), in [MW] and [MVAr]. It also contains information about the various technical parameters of the generation units, such as the active and reactive powers (P_GU1, …, P_GUn, and Q_GU1, …, Q_GUn), in [MW] and [MVAr], the stator voltage and current (Vs_GU1, …, Vs_GUn and Is_GU1, …, Is_GUn), in [kV] and [kA], the excitation voltage or current (Vex_GU1, …, Vex_GUn and Iex_GU1,…, Iex_GUn), in [V] and [A], water levels of the reservoir upstream and downstream (WLr_u and WLr_d), in [mdMB]. Appendix A includes in Table A1, as an example, the above information for a day. All variables are recorded in a table with the time details in the format month–day–hour for each year, as seen in Figure 6.
Task 1—Statistical analysis and outliers’ detection.
The statistical analysis is associated with exploring and presenting large amounts of data on the technical parameters based on the parameters such as mean, standard deviation, confidence degree, and quintiles-Q0 (minimum value), Q1 (25th), Q2 (50th), Q3 (75th), and Q4 (maximum value). Also, the boxplot is used to show the spread and skewness of the variables from the database through their quintiles. It can also include lines known as whiskers, which extend from the box to indicate variability outside the lower and upper limits of the dataset. Outliers with significant differences from the rest of the data can be plotted on the box-and-whisker diagram [31]. Thus, the DEMA (represented by the operator from the control room) can select the fields containing the values of the monitoring parameters. Regarding the outliers’ detection, a rules-based algorithm based on the values of the quintiles has been integrated [32,33]. These two rules refer to Q0 and Q4:
If < Xh < Q0x > then < Xh is outlier >
If < Xh > Q4x > then < Xh is outlier>
where Xh is the hourly value of the analyzed technical parameter and x refers to the name of the analyzed technical parameter (identically with the name of the field).
However, there are cases when the outliers identified for certain technical parameters can be associated with the different operating regimes compared to most regimes but which do not lead to the violation of the allowable limits. In these cases, an attention message will alert the DEMA, who will verify only those regimes. For all other cases, when the outliers are identified, the least squares method is applied to estimate the “true” value. The approach is based on a regression model determined for that parameter.
A comparison between more years can be performed by choosing the analysis period.
Task 2—Determining the operating regimes.
This task is based on the clustering-based data mining process. The input data are associated with a matrix structure built with the values of technical parameters selected by the DEMA. The regimes are identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GUn) or hourly loading patterns, including the hourly average power of each generation unit. The DEMA can choose any algorithm from the hierarchical clustering category (complete-linkage clustering, single-linkage clustering, average linkage clustering, centroid linkage clustering, median linkage clustering, or Ward linkage clustering) or K-means clustering algorithm. In hierarchical clustering, a similarity measure between the sets of observations is necessary to determine which patterns should be grouped or separated. Usually, in hierarchical clustering, the set’s similarity is determined by using a distance between the observations. It is carried out through a linkage criterion that specifies the set’s similarity.
When the DEMA requests characterization of the operating regimes through the typical operating profiles, then the matrix structure of the input data corresponding to the clustering process is identified with the fields of the active power produced by each generation unit in a certain time horizon (usually a year to cover all operating regimes).
The matrix size is Hyear × (NHPPGU × 24 + 2), where Hyear represents the number of hours when at least one generation unit worked and NHPPGU represents the number of the generation units from the hydropower plant. The additional columns correspond to the water levels of the reservoir—upstream and downstream.
Each obtained pattern is associated with a typical operating profile that characterizes the operating regime of the hydropower plant in certain periods (days inside a year). The DEMA can identify the operating regimes of the hydropower plant through an evolution along the time axis of the degree of hourly loading of all generating units and the water volume used in each operating regime. The DEMA can establish an operating strategy for the plant for the next day depending on the forecasting of the requested powers by the system.
The input data matrix is different when the DEMA’s requests are as the operating regimes to be characterized through hourly loading patterns. The structure is identified with the fields of the active power produced by each generation unit in a certain time horizon (usually a year to cover all operating regimes). The size of the matrix is Hyear × (NHPPGU + 2), where Hyear and NHPPGU have the same signification as above. The last two columns correspond to the water levels of the reservoir.
Task 3—Developing the strategies to load the generation units.
The task is associated with an expert system that uses the operating regimes to be characterized by hourly loading patterns determined above and the number of hours of operation. This last information is recorded in a database and considered in the decision-making process to avoid overloading the generation units over a long period, which leads to minimizing the number of maintenance operations. Using a water amount to satisfy the power required by the system represents the main objective. The main components of the expert system are presented synthetically in the following [34,35].
  • The knowledge base is composed of two main elements: the rules base (which contains the knowledge required to solve problems) and the facts base (the patterns obtained in the clustering-based data mining are recorded in this base).
  • The inference engine can determine the mode in which knowledge derived from the rules base is utilized to interpret the data from the information base. It can perform various tasks, such as confirming or rejecting a hypothesis or the solution of a problem.
  • The editor of the knowledge base provides the DEMA with the ability to update and inspect the information base’s content, particularly its rules base’s content.
  • The explanation system can provide explanations for the stages in the Expert System’s reasoning.

4. Case Study

The proposed framework has been tested using the SCADA system’s database of a hydropower plant belonging to the Romanian HydroPower Company.
The plant, identified through the red circle in Figure 7, is the first from a hydro arrangement located on an important river in eastern Romania. The plant has six Francis-type units. The first water pipe supplies the last two units, and the second piper the first four units. The first four units have a total installed power of 27.5 MW, while units five and six have 50 MW.
The SCADA database includes fields completed at every hour with the following technical parameters: the individual water flows of each pipe, the total water flow of the two pipes, the total produced power of the first four-generation units (GU1–GU4), the produced power by the GU5, the produced power by the GU6, the total produced active and reactive power by the plant, the frequency, the stator voltage, the stator current, the produced active and reactive powers, the excitation voltage, the excitation current of each generation unit, and water levels of the reservoir (upstream and downstream).
The SCADA file associated with a day from the database containing the fields highlighted above is shown in Figure 8. The signification of the blank cells corresponds with the case of the non-loading of the generation unit. The obtained results at the year level for each task integrated into the proposed framework are presented in the following.
The DEMA can see after data processing the summary information on the operation of the plant regarding the number of hours and the total energy produced by each generation unit (GU1–GU2), see Figure 9.
Table 1 presents the extracted results from the advanced statistical analysis containing the mean (m), standard deviation (σ), and quintiles (Q0, Q1, Q2, Q3, and Q4) for each technical parameter from the database. The values have been calculated only for the hours when at least one unit was in operation. Also, the DEMA has available boxplots from which outliers are identified quickly.
Two situations are presented in the following. The first refers to the loading of the generation units when the outliers identified associated with the different operating regimes compared to most regimes did not lead to the violation of the allowable limits, see Figure 10. In these cases, an attention message is launched to the DEMA, who must verify the regimes and establish if any disturbance appeared.
The second situation belongs to the downstream level of the water reservoir recorded, where more outliers have been identified exceeding the upper limit. For these values, the least squares method has been applied to estimate the “true” value.
In the case of the outliers below Q0, these depend on the upstream level of the water reservoir, which did not lead to the violation of the allowable limits. Figure 11 and Figure 12 present the results obtained in these cases (with and without outliers over the maximum limit).
The second task refers to determining the operating regimes identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6) and hourly loading patterns, including the hourly average power of each generation unit. This task is based on the clustering-based data mining process.
The input data have been associated with a matrix structure built with the values of hourly active powers of all six generation units selected from the database from three successive years (2017–2019). The K-means clustering algorithm has been used to obtain the typical operating profiles presented in Figure 13, Figure 14, Figure 15 and Figure 16.
Appendix A includes in Table A2, Table A3, Table A4 and Table A5 the hourly values of each operating regime (identified through the Patterns P1–P4) containing the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6). The input data used to obtain hourly loading patterns contain the following fields: water flows on the two pipes, the loading of each generator unit, and the upstream and downstream water level of the reservoir.
The K-means clustering algorithm has been used to extract the patterns, and the optimal number has been 25. Figure 17, Figure 18 and Figure 19 show how the hydropower plant has been operated based on the obtained patterns of hourly loading of the generation units in three consecutive years, 2017–2019.
Annex A includes in Table A6 and Table A7 details of the patterns regarding the hourly loading of the generation units and their features regarding the operating conditions in an analyzed three-year period. This task improves the quality of the obtained solutions and the decision-making processes that are involved in the loading of the generation units.
The third task is based on an expert system involved inside the module that analyzes the hourly patterns and operating conditions of the units to determine the optimal loading solution depending on the number of operating hours and the power requested by the system. Figure 20, Figure 21 and Figure 22 present the obtained results for a representative day from 2020.
It can be observed that the loading of the six GUs is different in the developed strategy compared with the experience-based strategy adopted by DEMA. Table 2 presents the differences between the experience-based strategy and the expert system-based developed strategy. The signification of the colors is the following:
  • Red color is associated with the generation units that have been loaded in the experience-based strategy but not considered in the case of the expert system-based strategy (the sign is “-“).
  • Blue color is associated with the generation units that have been loaded in the expert-based strategy but not considered in the case of the experience-based strategy (only with the sign “+“).
  • Green color is associated with the generation units loaded in the expert-based strategy, having the same (value “0”) or having another loading in the expert-based strategy (with signs “+“ or “−“).
  • Yellow color is associated with the generation units that have not been loaded in either strategy.
The proposed strategy adopted a loading of the first four GUs between 60 and 72% of the rated power with the loading of GU5 and GU6 only for powers required by the system with higher values. These generation units will be available for ancillary services.
Table 3 presents the specific indicators for each GU obtained for operating the HPP in 2020 with the proposed strategy regarding the total operating time, the total energy production, and the average loading.
A comparison between results obtained with the adopted expert system-based strategy in 2020 and those obtained with the experience-based strategy in 2017 revealed an increase in the average loading for all six GUs with 21.5% (GU1), 18.9% (GU2), 16.9% (GU3), 14.8% (GU4), 6.3% (GU5), and 57.7% (GU6). The smallest increase has been observed in the case of GU5, the most elevated has been obtained for GU6, and between 14.8% and 21.5% for the first four GUs. However, the higher loading of UG6 was because it was in maintenance operation for two years and worked for very few hours during that time. The total energy production has increased by 20.8% from 313.6 TWh to 378.7 TWh, although the total operating time of all GUs decreased by 5.1% from 13,653 h to 12,960 h. The years 2018 and 2019 are not considered in our analysis because GU6 stopped in 2018 and had a few hours of operation in 2019.

5. Conclusions

As the world moves toward a more sustainable energy future, the need for more reliable and dispatchable sources of electricity is increasing. An important factor that can help improve the optimal operation of hydropower plants is associated with quick data processing and extracting hidden patterns. Knowledge discovery can represent an efficient tool for addressing various challenges, among which is the optimal operation of hydropower plants.
This paper proposes a framework that combines data mining and knowledge discovery to help the transition from a traditional SCADA system to a smart one in hydropower plants. It will allow the DEMA (operators from the control room) to identify the outliers and implement effective strategies to minimize water consumption and maximize power generation. Based on the advanced statistical tools, the framework will also help them identify the optimal operating conditions for the plant. The performance has been tested in a Romanian hydropower plant using the SCADA database. It allowed the control room operators to obtain comparative information on the plant’s performance over a longer time horizon (in our case study, three years). The results of the tests revealed the utility of a knowledge discovery module in helping the control room operators improve the efficiency of their operations by transitioning toward smart SCADA systems. Thus, a comparison between results obtained with the adopted expert system-based strategy in 2020 and those obtained with the experience-based strategy in 2017 revealed an increase in the average loading at the level of the HPP from 133 MW to 166 MW (representing the sum of powers produced by all six GUs), which means that the HPP was loaded from 64% to 80% of the total installed power of 210 MW. Also, the total energy production has increased by 20.8%, although the total operating time of all GUs decreased by 5.1%.
The future of work is characterized by developing a new task associated with the uncertainty modeling of two variables: the hourly powers requested by the system and the upstream level in the water reservoir. The modeling of the first variable is based on the results obtained with a forecasting method, having the input data rainfall, temperature, and historical values recorded in the SCADA database. The operating regime of the hydro cascade, of which the plant is a part, and the historical data from the other plants represent the factors used in the developed models for the second variable. This new task will be a new component of the Data Mining-based Knowledge Discovery Module to determine quickly the best strategy to load the generation units in uncertain conditions.

Author Contributions

Conceptualization, G.G.; methodology, G.G. and R.G.; software, G.G.; validation, G.G., R.G. and B.-C.N.; formal analysis, R.G. and B.-C.N.; investigation, G.G., R.G. and B.-C.N.; resources, G.G.; data curation, R.G.; writing—original draft preparation, G.G., R.G. and B.-C.N.; writing—review and editing, G.G.; visualization, G.G. and R.G.; supervision, G.G.; project administration, G.G.; funding acquisition, G.G and B.-C.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

DEMADecision-Makers
DMData Mining
GUGeneration Unit
HPPHydropower Plant
KDKnowledge Discovery
O&MOperations and Maintenance
SCADASupervisory Control and Data Acquisition
RTURemote Terminal Units
WF_Pipe_nWater flows in the pipe n from the hydropower plant HPP, in [m3/s]
Iex_GUnExcitation current of the generation unit GUn from the hydropower plant HPP, in [V]
Is_GUnStator current of the generation unit GUn from the hydropower plant HPP, in [A]
Hyearthe number of hours when at least one GU works, in [hours]
mmean of the data set
NHPPGUthe number of generation units from the hydropower plant
P_GU_Pipe_nActive power produced by the generation units, GU, which are supplied through the pipe n from the hydropower plant HPP, in [MW]
P_GUnActive power produced by the generation unit GUn from the hydropower plant HPP, in [MW]
P_reqRequested active power of the system to the hydropower plant HPP, in [MW]
Q0zeroth quartile (minimum value of the data set)
Q1first quartile (25%)
Q2second quartile (50%)
Q3third quartile (75%)
Q_GU_Pipe_nReactive power produced by the generation units, GU, which are supplied through the pipe n from the hydropower plant HPP, in [MVAr]
Q_GUnReactive power produced by the generation unit GUn from the hydropower plant HPP, in [MVAr]
Q_reqRequested active power of the system to the hydropower plant HPP, in [MVAr]
Vex_GUnExcitation voltage of the generation unit GUn from the hydropower plant HPP, in [V]
Vs_GUnStator voltage of the generation unit GUn from the hydropower plant HPP, in [kV]
WLr_dWater levels of the reservoir downstream, in [mdMB]
WLr_uWater levels of the reservoir upstream, in [mdMB]
σstandard deviation of the dataset

Appendix A

Table A1. The technical parameters of the generation units (GU1–GU6) in the operating regimes of the analyzed HPP within a day recorded in the SCADA database.
Table A1. The technical parameters of the generation units (GU1–GU6) in the operating regimes of the analyzed HPP within a day recorded in the SCADA database.
HourGU1GU2GU3
Vs
[kV]
Is
[A]
P
[MW]
Q
[MVAr]
Vex
[V]
Iex
[A]
Vs
[kV]
Is
[A]
P
[MW]
Q
[MVAr]
Vex
[V]
Iex
[A]
Vs
[kV]
Is
[A]
P
[MW]
Q
[MVAr]
Vex
[V]
Iex
[A]
110.51.102019029000000010.51.0019195290
210.51.102019029000000010.51.0019195290
310.51.102019029000000010.51.0019195290
410.51.102019029000000010.51.0019195290
510.51.102019029000000010.51.0019195290
610.51.102019029000000010.51.0019195290
710.51.102019029000000010.51.0019195290
810.51.102019029010.51.1020110029010.51.0019195290
910.51.102019029010.51.1020110029010.51.0019195290
1010.51.1020511031010.51.1020511030010.51.00195105300
1110.51.1020511031010.51.1020511030010.51.00195105300
1210.51.1020511031010.51.1020511030010.51.00195105300
1310.51.1020511031010.51.1020511030010.51.00195105300
1410.51.1020511031010.51.1020511030010.51.00195105300
1510.51.1020511031010.51.1020511030010.51.00195105300
1610.51.1020511031010.51.1020511030010.51.00195105300
1710.51.1020511031000000010.51.00195105300
1810.51.1020511031000000010.51.00195105300
1810.51.1020511031000000010.51.00195105300
2010.51.1020511031000000010.51.00195105300
2110.51.1020511031000000010.51.00195105300
2210.51.102019029000000010.51.0019195290
2310.51.102019029000000010.51.0019195290
2410.51.102019029000000010.51.0019195290
HourGU4GU5GU6
Vs
[kV]
Is
[A]
P
[MW]
Q
[MVAr]
Vex
[V]
Iex
[A]
Vs
[kV]
Is
[A]
P
[MW]
Q
[MVAr]
Vex
[V]
Iex
[A]
Vs
[kV]
Is
[A]
P
[MW]
Q
[MVAr]
Vex
[V]
Iex
[A]
100000010.52.20401110340000000
200000010.52.20401110340000000
300000010.52.20401110340000000
400000010.52.20401110340000000
500000010.52.20401110340000000
600000010.52.20401110340000000
7000000000000000000
8000000000000000000
9000000000000000000
1010.51.00185100310000000000000
1110.51.0018510031000000010.51.7030190300
1210.51.0018510031000000010.51.7030190300
1310.51.0018510031000000010.61.7032195300
1410.51.00185100310000000000000
1510.51.00185100310000000000000
1610.51.00185100310000000000000
1700000010.52.20405120360000000
1800000010.52.20405120360000000
1800000010.52.20405120360000000
2000000010.52.20405120360000000
2100000010.52.20405120360000000
2200000010.52.20401110340000000
2300000010.52.20401110340000000
2400000010.52.20401110340000000
Table A2. The operating regime of Pattern P1 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].
Table A2. The operating regime of Pattern P1 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].
HourGU1GU2GU3GU4GU5GU6
10.005750.003340.003190.000910.014700.00181
20.004940.003150.002590.000920.013850.00169
30.004790.003110.003320.000930.013990.00134
40.004580.003100.002820.000910.012600.00108
50.004510.002940.002950.000730.013340.00107
60.005620.003170.003230.000710.015960.00104
70.006560.004550.003670.000970.021240.00122
80.006510.005100.004480.001190.022180.00145
90.006980.006380.005070.001310.026580.00157
100.006970.006400.005230.001420.027020.00176
110.006900.006600.005480.001690.025100.00176
120.007190.007510.005190.001880.027490.00177
130.008540.007200.005240.001740.026580.00194
140.009100.005920.005160.001660.025480.00192
150.009090.005730.004820.001630.025950.00168
160.007780.005360.004860.001710.024400.00169
170.006830.005660.005270.001380.024330.00182
180.006930.006480.004800.001510.025080.00202
190.007920.006190.004850.001860.025560.00212
200.008250.005580.004700.002150.025070.00200
210.008790.005860.004690.001920.025900.00185
220.009360.005490.004700.001930.027230.00174
230.007290.004830.004280.001900.023450.00160
240.005800.004180.003660.001020.018330.00161
Table A3. The operating regime of Pattern P2 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].
Table A3. The operating regime of Pattern P2 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].
HourGU1GU2GU3GU4GU5GU6
10.007140.007230.003850.000000.021890.00000
20.005950.001560.003250.000580.008670.00000
30.003530.001800.001790.000510.003420.00000
40.004010.001230.000560.000990.000910.00000
50.002630.001230.000560.000910.002020.00000
60.002870.002740.001770.001480.005970.00000
70.003340.003520.001770.001190.004770.00000
80.005470.003170.001160.001150.002400.00000
90.007200.001840.002430.001150.007640.00000
100.007750.001550.001090.001670.003110.00000
110.005830.002980.002250.001090.007720.00000
120.007410.000530.002110.000580.009980.00000
130.006830.002940.001410.000580.010110.00000
140.006740.004980.002620.001480.015450.00000
150.008160.005140.001480.001060.024020.00000
160.011090.006830.005240.001020.041260.00000
170.013800.008300.007510.005100.050500.00000
180.020520.012750.008600.005750.058670.00000
190.018720.012300.012010.002800.065640.00000
200.015550.010800.011570.003280.054830.00000
210.013910.009920.007390.002810.049160.00000
220.012580.007190.005160.001810.045080.00000
230.008320.005410.003660.001460.030310.00000
240.005750.005120.002990.001460.018210.00000
Table A4. The operating regime of Pattern P3 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].
Table A4. The operating regime of Pattern P3 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].
HourGU1GU2GU3GU4GU5GU6
10.017130.007410.006550.001920.015270.00062
20.010450.006460.007630.001540.009450.00062
30.009500.008520.007640.000990.006760.00062
40.009030.008410.008050.000580.005520.00062
50.008500.007650.007980.000840.006230.00062
60.015800.008400.008710.001440.009010.00062
70.015950.006430.007920.002550.012800.00095
80.014050.005480.007280.003820.010310.00095
90.012270.007270.011460.004650.013360.00095
100.011660.006750.010590.004560.011380.00095
110.010410.004940.009360.004750.005210.00095
120.010720.005560.009670.004580.003300.00057
130.012240.005150.010460.005250.002720.00058
140.010630.005280.009610.004630.001080.00058
150.009890.005070.008900.004020.000730.00096
160.011350.004450.008570.005040.002170.00066
170.013310.005200.010760.004510.003850.00066
180.012660.005980.009190.004310.001450.00066
190.015250.005670.010980.005110.003820.00064
200.016080.005670.009320.004670.004670.00064
210.019840.005600.011010.007470.012110.00068
220.022180.004900.014530.005880.014470.00068
230.024100.003900.013940.005400.013140.00068
240.023360.004140.011100.004680.009390.00033
Table A5. The operating regime of Pattern P3 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].
Table A5. The operating regime of Pattern P3 identified through the typical operating profiles divided into the intervals associated with the generating units (GU1, …, GU6), in [MW/MWh].
HourGU1GU2GU3GU4GU5GU6
10.010240.009380.001770.000000.053020.00000
20.007790.008980.001700.002260.054550.00000
30.007620.009580.002150.002260.053020.00083
40.007500.006680.002150.000000.042600.00083
50.010610.006670.002150.000000.046090.00000
60.013550.007960.001700.000000.054100.00000
70.012360.020010.001770.000000.060930.00000
80.009670.003880.000450.000000.028150.00000
90.006400.004200.002630.000000.023560.00000
100.006290.003470.003190.000800.016950.00000
110.007470.002620.004160.000000.005040.00000
120.005400.002750.001330.000000.002390.00000
130.004130.002720.000000.000000.001430.00000
140.003350.001810.000000.000630.001200.00000
150.001710.002350.001670.000630.002510.00000
160.002210.003130.002500.000000.005660.00000
170.003000.002360.002460.000000.009370.00000
180.006310.005120.005500.003720.017720.00083
190.008070.004920.004450.003620.021170.00083
200.008830.005150.004450.003010.023110.00083
210.011280.007480.004880.001310.025870.00083
220.011560.006470.005750.000600.023820.00000
230.004530.005680.005950.000000.014120.00000
240.001930.005290.004310.000000.012270.00000
Table A6. The patterns regarding the hourly loading of the generation units GU1–GU6 in an analyzed three-year period.
Table A6. The patterns regarding the hourly loading of the generation units GU1–GU6 in an analyzed three-year period.
Pattern201720182019
GU1GU2GU3GU4GU5GU6GU1GU2GU3GU4GU5GU6GU1GU2GU3GU4GU5GU6
P10000340021202040020000390
P21616161633018000001919191800
P317000353120202018002019180044
P415161603332202020203900201918420
P50017035019190183802001903949
P61801717350190191838019001800
P7018003600202018002020019420
P80171616340020190400001918410
P9170017340190181800181801700
P100016163401901803802020004150
P111701703430000036020201919420
P12170000321900038000190390
P13171716000017161700020018400
P14017016340001817002020190420
P151801700310180036020019194127
P161818017350192019039019190000
P1716161603231190171737001901800
P1800160031191901800200018390
P19017161500020101839000004036
P201818171700001803801901918022
P21015171700001717350020191800
P2217170033320001735000181800
P2300015032190018001900000
P24171700030191919000020190410
P2517170170017180036000018400
Table A7. Comparison between the features of the patterns in an analyzed three-year period.
Table A7. Comparison between the features of the patterns in an analyzed three-year period.
PatternsOperating TimePowers Required by the System [MW]
HoursDaysHours/DayMinimum ValueAverage ValueMaximum Value
201720182019201720182019201720182019201720182019201720182019201720182019
P14324713231283549313730754234102634010759
P236023553954815573107115608218791102276
P3810908276152664251474464475378108888558
P4165424467233645712106090898611910010812999
P5267166276602729461045756752941296010579
P61162884730401647342753465954010110737
P74881373729911345121143458854581036064101
P82353134295240525885760636978801008478
P9115165209294527448603048674659816454
P10953971291659306746060736675132738482
P114524255161011123844145830103693612210044120
P1228780610110311816376134547185762516458
P1315111913130461953735055491981593971
P141569114925332163745159256231027642100
P1518341824349672246112645863655128686499
P164851591869134425545878287296429710638
P17922161311539426636260158973401268423
P1871198355304840249143058194582506177
P1910271280441764244156030187690349040
P20119201191154132856504533685693946349
P212388117111620266286035356958478456
P22161101582031218334645158253381066327
P23555212125164823303015103721334119
P24921723440338254303975365582626580
P2576293179225844354306046407161588558

References

  1. International Hydropower Association. World Hydropower Outlook. Opportunities to Advance Net Zero. 2024. Available online: https://www.hydropower.org/publications/2024-world-hydropower-outlook (accessed on 5 September 2024).
  2. Parvez, I.; Shen, J.; Hassan, I.; Zhang, N. Generation of Hydro Energy by Using Data Mining Algorithm for Cascaded Hy-dropower Plant. Energies 2021, 14, 298. [Google Scholar] [CrossRef]
  3. International Energy Agency. Hydropower Special Market Report Analysis and Forecast to 2030. 2021. Available online: https://iea.blob.core.windows.net/assets/83ff8935-62dd-4150-80a8-c5001b740e21/HydropowerSpecialMarketReport.pdf (accessed on 5 September 2024).
  4. Essenfelder, A.H.; Larosa, F.; Broccoli, D.; Mazzoli, P.; Bagli, S.; Luzzi, V.; Mysiak, J.; dalla Vallw, F. Smart Climate Hydro-power Tool: A Machine-Learning Seasonal Forecasting Climate Service to Support Cost–Benefit Analysis of Reservoir Management. Atmosphere 2020, 11, 1305. [Google Scholar] [CrossRef]
  5. Garbea, R.; Scarlatache, F.; Grigoras, G.; Neagu, B.C. Extracting the Operating Characteristics of Hydropower Plants Using a Clustering-based Efficient Methodology. In Proceedings of the IEEE 9th International Conference on Modern Power Systems (MPS), Cluj-Napoca, Romania, 16–17 June 2021. [Google Scholar]
  6. International Finance Corporation. Hydroelectric Power. A Guide for Developers and Investors. 2024. Available online: https://documents1.worldbank.org/curated/en/917841468188335073/pdf/99392-WP-Box393199B-PUBLIC-Hydropower-Report.pdf (accessed on 5 September 2024).
  7. International Renewable Energy Agency. Renewable Energy Technologies: Cost Analysis Series. Hydropower. 2012. Available online: https://www.irena.org/-/media/Files/IRENA/Agency/Publication/2012/RE_Technologies_Cost_Analysis-HYDROPOWER.pdf (accessed on 5 September 2024).
  8. Eker, O.F. Data Science for Industry: Hydropower Condition Monitoring and Predictive Maintenance. 2022. Available online: https://medium.com/@omerfarukeker/data-science-for-industry-hydropower-condition-monitoring-and-predictive-maintenance-49952215fdd7 (accessed on 31 July 2024).
  9. Quaranta, E.; Aggidis, G.; Boes, R.; Comoglio, C.; De Michele, C.; Ritesh Patro, E.; Georgievskaia, E.; Harby, A.; Kougias, I.; Muntean, S.; et al. Assessing the energy potential of modernizing the European hydropower fleet. Energy Convers. Manag. 2021, 246, 114655. [Google Scholar] [CrossRef]
  10. Betti, A.; Crisostomi, E.; Paolinelli, G.; Piazzi, A.; Ruffini, F.; Tucci, F. Condition Monitoring and Predictive Maintenance Methodologies for Hydropower Plants Equipment. Renew. Energy 2021, 171, 246–253. [Google Scholar] [CrossRef]
  11. European Commission. 2050 Long-Term Strategy. Striving to Become the World’s First Climate-Neutral Continent by 2050. Available online: https://climate.ec.europa.eu/eu-action/climate-strategies-targets/2050-long-term-strategy_en (accessed on 5 September 2024).
  12. Ovarro. Five Ways SCADA Systems Can Benefit Sustainability: Although Hidden in the Background, SCADA Systems Are Crucial to Energy Savings. 2024. Available online: https://www.linkedin.com/pulse/five-ways-scada-systems-can-benefit-sustainability-although-hidden-sbdae/ (accessed on 31 July 2024).
  13. Yang, S.; Stempfle, T.; Thiede, S.; Lanza, G. Approach for the Development of a Sustainability-oriented Implementation Strategy of Smart Automation Technologies. Procedia CIRP 2024, 122, 849–854. [Google Scholar] [CrossRef]
  14. Vagnoni, E.; Gezer, D.; Anagnostopoulos, I.; Cavazzini, G.; Doujak, E.; Hočevar, M.; Rudolf, P. The New Role of Sustainable Hydropower in Flexible Energy Systems and its Technical Evolution Through Innovation And Digitalization. Renew. Energy 2024, 230, 120832. [Google Scholar] [CrossRef]
  15. Feng, Z.K.; Niu, W.J.; Zhang, R.; Wang, S.; Cheng, C.-T. Operation Rule Derivation of Hydropower Reservoir by K-Means Clustering Method and Extreme Learning Machine Based on Particle Swarm Optimization. J. Hydrol. 2019, 576, 229–238. [Google Scholar] [CrossRef]
  16. Zhang, F.; Guo, J.; Yuan, F.; Qiu, Y.; Wang, P.; Cheng, F.; Gu, Y. Enhancement Methods of Hydropower Unit Monitoring Data Quality Based on the Hierarchical Density-Based Spatial Clustering of Applications with a Noise–Wasserstein Slim Generative Adversarial Imputation Network with a Gradient Penalty. Sensors 2024, 24, 118. [Google Scholar] [CrossRef] [PubMed]
  17. Luo, W.; Xu, J.; Zhou, Z. Mobile Information Systems, Retracted: Design of Data Classification and Classification Management System for Big Data of Hydropower Enterprises Based on Data Standards, Mobile Information Systems. 2022. Available online: https://onlinelibrary.wiley.com/doi/10.1155/2022/8103897 (accessed on 31 July 2024).
  18. Ahmed, I.; Dagnino, A.; Bongiovi, A.; Ding, Y. Outlier Detection for Hydropower Generation Plant. In Proceedings of the IEEE 14th International Conference on Automation Science and Engineering (CASE), Munich, Germany, 20–24 August 2018. [Google Scholar]
  19. Valencia, A.M.; Caratar, J.; Caicedo, G.; Chamorro, C. Proposal for a KDD-Based Procedure to Obtain a Set of Intelligent Systems Training Applied to the Identification of Failures in Hydroelectric Power Plants. J. Appl. Res. Technol. 2021, 18, 376–389. [Google Scholar] [CrossRef]
  20. Zhang, W.; Ge, Y.; Liu, G.; Qi, W.; Xu, S.; Peng, Z.; Li, Y. Clustering and Decision Tree Based Analysis of Typical Operation Modes of Power Systems. Energy Rep. 2023, 9, 60–69. [Google Scholar] [CrossRef]
  21. Garbea, R.; Scarlatache, F.; Grigoras, G.; Neagu, B.C. Integration of Data Mining Techniques in SCADA System for Optimal Operation of Hydropower Plants. In Proceedings of the IEEE 13th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Pitesti, Romania, 1–3 July 2021. [Google Scholar]
  22. Sahin, M.E.; Ozbay Karakus, M. Smart Hydropower Management: Utilizing Machine Learning and Deep Learning Method to Enhance Dam’s Energy Generation Efficiency. Neural Comput. Appl. 2024, 36, 11195–11211. [Google Scholar] [CrossRef]
  23. Shu, X.; Ye, Y. Knowledge Discovery: Methods from Data Mining and Machine Learning. Soc. Sci. Res. 2023, 110, 102817. [Google Scholar] [CrossRef] [PubMed]
  24. Monika; Shauib, M. Implementation Platforms and Strategy for the Knowledge Discovery from the Data. In Proceedings of the International Conference on Computational Modelling, Simulation and Optimization (ICCMSO), Pathum Thani, Thailand, 23–25 December 2022. [Google Scholar]
  25. Ghongade, T.G.; Khobragade, R.N. Evaluation on Utilization and Emaciation of Data Mining Techniques in Information System. In Proceedings of the OPJU International Technology Conference on Emerging Technologies for Sustainable Development (OTCON), Raigarh, Chhattisgarh, India, 8–10 February 2023. [Google Scholar]
  26. Järvinen, P.; Siltanen, P.; Kirschenbaum, A. Data Analytics and Machine Learning. In Big Data in Bioeconomy; Södergård, C., Mildorf, T., Habyarimana, E., Berre, A.J., Fernandes, J.A., Zinke-Wehlmann, C., Eds.; Springer Nature: Cham, Switzerland, 2021; pp. 129–146. [Google Scholar]
  27. Garbea, R.; Grigoras, G. Clustering-Using Data Mining-based Application to Identify the Hourly Loading Patterns of the Generation Units from the Hydropower Plants. In Proceedings of the IEEE International Conference and Exposition on Electrical and Power Engineering (EPE), Iasi, Romania, 20–22 October 2022. [Google Scholar]
  28. Odrynska, A. What is Data Mining: Definition, Process, Techniques and Role in Business Intelligence. 2023. Available online: https://www.alphaservesp.com/blog/what-is-data-mining-definition-process-techniques-and-business-intelligence (accessed on 31 July 2024).
  29. Onlogic. Setting Up Smart SCADA for Digital Transformation. 2024. Available online: https://www.onlogic.com/blog/smart-scada-digital-transformation (accessed on 5 September 2024).
  30. Kaur, S.; Kathpal, N.; Munjal, N. Role of SCADA in Hydro Power Plant Automation. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2015, 4, 8085–8090. [Google Scholar]
  31. Mirzargar, M.; Whitaker, R.T.; Kirby, R.M. Curve Boxplot: Generalization of Boxplot for Ensembles of Curves. IEEE Trans. Vis. Comput. Gr. 2023, 20, 2654–2663. [Google Scholar] [CrossRef] [PubMed]
  32. Chelaru, E.; Grigoras, G. Decision Support System to Determine the Replacement Ranking of the Aged Transformers in Electric Distribution Networks. In Proceedings of the IEEE 12th International Conference on Electronics, Computers and Artificial Intelligence (ECAI) Proceedings, Bucharest, Romania, 25–27 June 2020. [Google Scholar]
  33. Neagu, B.C.; Grigoras, G.; Scarlatache, F. Outliers Discovery from Smart Meters Data Using a Statistical Based Data Mining Approach, In Proceedings of the IEEE 10th International Symposium on Advanced Topics in Electrical Engineering (ATEE), Bucharest, Romania, 23–25 April 2017.
  34. Wang, Z.; Wang, S.; Zhang, S.; Zhan, J. An Expert System Based on Data Mining for a Trend Diagnosis of Process Parameters. Processes 2023, 11, 3311. [Google Scholar] [CrossRef]
  35. Dandea, V.; Grigoras, G. Expert System Integrating Rule-Based Reasoning to Voltage Control in Photovoltaic-Systems-Rich Low Voltage Electric Distribution Networks: A Review and Results of a Case Study. Appl. Sci. 2023, 13, 6158. [Google Scholar] [CrossRef]
  36. Dunca, G.; Ghergu, C.M.; Rosioru, O.; Bucur, M.D. Analysis of the Areas with Optimal Working of Aggregates in CHE Stejaru, Symposium on Informatics, Automation and Telecommunications in Energy, Sinaia, Romania. 2010. Available online: https://www.researchgate.net/publication/281646497_Analiza_zonelor_cu_functionare_optima_ale_agregatelor_din_CHE_Stejaru (accessed on 31 July 2024). (In Romanian).
  37. Cojoc, G.M. Analysis of The Hydrological Regime of the Bistrita River in the Context of Hydrotechnical Developments; Terra Nostra Publishing House: Iasi, Romania, 2016. [Google Scholar]
Figure 1. The steps of a KD process.
Figure 1. The steps of a KD process.
Applsci 14 08228 g001
Figure 2. Data Mining techniques.
Figure 2. Data Mining techniques.
Applsci 14 08228 g002
Figure 3. Interdependencies between the Knowledge Discovery and Smart SCADA.
Figure 3. Interdependencies between the Knowledge Discovery and Smart SCADA.
Applsci 14 08228 g003
Figure 4. The basic structure of an automation architecture including the SCADA system.
Figure 4. The basic structure of an automation architecture including the SCADA system.
Applsci 14 08228 g004
Figure 5. The multi-task framework integrated into the Knowledge Discovery module.
Figure 5. The multi-task framework integrated into the Knowledge Discovery module.
Applsci 14 08228 g005
Figure 6. The fields of the SCADA database.
Figure 6. The fields of the SCADA database.
Applsci 14 08228 g006
Figure 7. The hydro arrangement of which the analyzed plant is a part (adapted after [36,37]).
Figure 7. The hydro arrangement of which the analyzed plant is a part (adapted after [36,37]).
Applsci 14 08228 g007
Figure 8. SCADA file associated with a day from the database.
Figure 8. SCADA file associated with a day from the database.
Applsci 14 08228 g008
Figure 9. The summary information regarding the operation of the plant regarding the number of hours and the total energy produced by each generation unit.
Figure 9. The summary information regarding the operation of the plant regarding the number of hours and the total energy produced by each generation unit.
Applsci 14 08228 g009
Figure 10. The boxplots corresponding to the loading of the generation units over a period of one year.
Figure 10. The boxplots corresponding to the loading of the generation units over a period of one year.
Applsci 14 08228 g010
Figure 11. The values corresponding to the level of the water reservoir—downstream ((a) taken from the database containing outliers; (b) after data processing, without outliers).
Figure 11. The values corresponding to the level of the water reservoir—downstream ((a) taken from the database containing outliers; (b) after data processing, without outliers).
Applsci 14 08228 g011
Figure 12. The boxplots corresponding to the level of the water reservoir—downstream.
Figure 12. The boxplots corresponding to the level of the water reservoir—downstream.
Applsci 14 08228 g012
Figure 13. The typical operating profile of the hydropower plant assigned to pattern P1.
Figure 13. The typical operating profile of the hydropower plant assigned to pattern P1.
Applsci 14 08228 g013
Figure 14. The typical operating profile of the hydropower plant assigned to pattern P2.
Figure 14. The typical operating profile of the hydropower plant assigned to pattern P2.
Applsci 14 08228 g014
Figure 15. The typical operating profile of the hydropower plant assigned to pattern P3.
Figure 15. The typical operating profile of the hydropower plant assigned to pattern P3.
Applsci 14 08228 g015
Figure 16. The typical operating profile of the hydropower plant assigned to pattern P4.
Figure 16. The typical operating profile of the hydropower plant assigned to pattern P4.
Applsci 14 08228 g016
Figure 17. The patterns obtained for the hourly loading of the generation units in 2017.
Figure 17. The patterns obtained for the hourly loading of the generation units in 2017.
Applsci 14 08228 g017
Figure 18. The patterns obtained for the hourly loading of the generation units in 2018.
Figure 18. The patterns obtained for the hourly loading of the generation units in 2018.
Applsci 14 08228 g018
Figure 19. The patterns obtained for the hourly loading of the generation units in 2019.
Figure 19. The patterns obtained for the hourly loading of the generation units in 2019.
Applsci 14 08228 g019
Figure 20. The active power requested by the system.
Figure 20. The active power requested by the system.
Applsci 14 08228 g020
Figure 21. The active power distributed among the six GUs—the strategy adopted by the DM without the Knowledge Discovery module.
Figure 21. The active power distributed among the six GUs—the strategy adopted by the DM without the Knowledge Discovery module.
Applsci 14 08228 g021
Figure 22. The active power distributed among the six GUs—the strategy adopted by the DM based on the Knowledge Discovery module.
Figure 22. The active power distributed among the six GUs—the strategy adopted by the DM based on the Knowledge Discovery module.
Applsci 14 08228 g022
Table 1. The extracted results from the statistical analysis.
Table 1. The extracted results from the statistical analysis.
Statistical ParametersmσQ0Q1Q2Q3Q4
Water flowWF_Pipe1 [m3/s]36.429.7518.7031.5034.5036.3074.40
WF_Pipe2 [m3/s]29.9214.0213.4017.3030.8036.6078.20
Total [m3/s]56.1421.3510.5044.5054.1070.80133.20
Active and reactive powersGU1-GU4 [MW]29.6013.540.9018.0030.0037.0078.00
GU5 [MW]34.333.0929.0031.0035.0037.0040.00
GU6 [MW]31.512.011.0030.0031.0033.0040.00
Total [MW]56.3420.3913.0045.0056.0072.00126.00
Total [MVAr]6.395.901.002.003.0010.0030.00
Frequency[Hz]49.990.0249.1049.9850.0050.0150.40
GU 1Vs [kV]10.430.711.4010.4010.4010.5050.00
Is [kA]0.960.110.080.900.951.001.90
P [MW]17.201.890.9015.0017.0018.0022.00
Q [Mvar]2.601.991.001.001.005.0011.00
Vex [V]88.288.808.0080.0090.0095.00110.00
I ex [A]287.9117.29100.00280.00290.00300.00360.00
GU 2Vs [kV]10.480.381.1010.5010.5010.5010.70
Is [kA]0.960.100.700.900.951.051.20
P [MW]17.351.8614.0016.0017.0019.0021.00
Q [Mvar]2.581.991.001.001.005.0021.00
Vex [V]89.847.8070.0085.0090.0095.00110.00
I ex [A]289.6515.01245.00280.00290.00300.00320.00
GU 3Vs [kV]10.410.421.6010.4010.5010.5010.70
Is [kA]0.910.080.700.850.900.951.15
P [MW]16.481.4113.0015.0016.0017.0020.00
Q [Mvar]2.762.000.501.001.005.0010.00
Vex [V]89.6040.8065.0080.0090.0095.00870.00
I ex [A]289.2158.37115.00280.00290.00300.002980.00
GU 4Vs [kV]10.450.0910.1010.4010.5010.5010.60
Is [kA]0.920.080.750.850.901.001.10
P [MW]16.591.3614.0015.0017.0018.0020.00
Q [Mvar]2.781.991.001.001.005.005.00
Vex [V]87.388.8760.0080.0090.0095.00105.00
I ex [A]290.1315.75260.00280.00290.00300.00330.00
GU 5Vs [kV]10.410.521.4010.4010.5010.5010.70
Is [kA]1.890.170.851.701.902.052.70
P [MW]34.333.0929.0031.0035.0037.0040.00
Q [Mvar]2.561.961.001.001.005.0010.00
Vex [V]104.7817.8910.00100.00100.00110.001110.00
Iex [A]332.53108.88235.00315.00330.00340.003210.00
GU 6Vs [kV]10.490.0710.3010.5010.5010.5010.60
Is [kA]1.700.071.501.651.701.751.85
P [MW]31.521.3329.0030.0031.0033.0033.00
Q [Mvar]2.071.771.001.001.005.005.00
Vex [V]94.476.1580.0090.0095.00100.00110.00
I ex [A]303.3410.77280.00300.00300.00310.00340.00
Water levelWLr_u [mdMB]492.665.86479.28489.80493.40497.99500.80
WLr_d [mdMB]368.215.90356.43368.90369.06369.20497.48
Table 2. The differences between the experience-based strategy and expert system-based strategy.
Table 2. The differences between the experience-based strategy and expert system-based strategy.
HourP_GU1
[MW]
P_GU2
[MW]
P_GU3
[MW]
P_GU4
[MW]
P_GU5
[MW]
P_GU6
[MW]
1+160−16000
2+160−16000
3+160−16000
4+160−16000
5+160−16000
6+160−16000
7+160−16000
8−1+17+1+17−340
9−1+17+1+17−340
10−1+17+1+17−340
11−10−10−34+36
12−10−10−34+36
13+160−16000
14+160−16000
15+160−16000
16160−16000
1700+20+38−40
1800+20+38−40
19+160−16000
20+15+160+150−46
21+15+160+150−46
22+15+160+150−46
23+15+160+150−46
24+15+160+150−46
Table 3. The specific indicators for each GU obtained for operating the HPP in 2020.
Table 3. The specific indicators for each GU obtained for operating the HPP in 2020.
Generation UnitGU1GU2GU3GU4GU5GU6HPP
Operating time [hours]34601970152014841271325512960
Energy production [TWh]72.3340.6029.328.2646.37161.85378.71
Average loading [MW]212119193650166
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Grigoras, G.; Gârbea, R.; Neagu, B.-C. Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules. Appl. Sci. 2024, 14, 8228. https://doi.org/10.3390/app14188228

AMA Style

Grigoras G, Gârbea R, Neagu B-C. Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules. Applied Sciences. 2024; 14(18):8228. https://doi.org/10.3390/app14188228

Chicago/Turabian Style

Grigoras, Gheorghe, Răzvan Gârbea, and Bogdan-Constantin Neagu. 2024. "Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules" Applied Sciences 14, no. 18: 8228. https://doi.org/10.3390/app14188228

APA Style

Grigoras, G., Gârbea, R., & Neagu, B. -C. (2024). Toward Smart SCADA Systems in the Hydropower Plants through Integrating Data Mining-Based Knowledge Discovery Modules. Applied Sciences, 14(18), 8228. https://doi.org/10.3390/app14188228

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop