1. Introduction
Reservoirs are considered the most effective form of water infrastructure to realize the comprehensive utilization of water resources [
1]. Generally, the objectives of a multi-purpose reservoir create conflict in nature, such as flood control, hydroelectric power generation, water supply, navigation, and ecological conservation, and there exists no single optimal solution that simultaneously satisfies all objectives [
2,
3]. Hence, the optimization of a multi-purpose reservoir represents a typical multi-objective optimization problem (MOP) [
4,
5].
MOPs can be roughly divided into three categories according to the articulation of preference [
6]: (1) for a given prior articulation of preference, transformation of all but one of the objectives into constraints and sorting of the objectives based on this preference; (2) when prior preference knowledge is available during the search, interactive searching is conducted via decision-making and optimization at interleaved steps; (3) for a given posterior articulation of preference, the MOP is solved by first generating the Pareto optimal solution set, and a satisfactory solution is subsequently selected from the Pareto optimal set according to this preference [
7].
Regarding optimization of the multi-objective reservoir operation, the MOP may be transformed into a single-objective optimization problem [
8] or solved with multi-objective evolutionary algorithms (MOEAs) [
9,
10]. Due to the effectiveness of Pareto optimal set generation, MOEAs have been increasingly adopted [
11,
12]. In the optimization of a multi-purpose reservoir system in India, Reddy and Kumar [
3] proposed a method, the multi-objective genetic algorithm (MOGA), to generate the Pareto optimal set. It has been demonstrated that the MOGA offers many alternatives to the decision-maker (DM). In the joint operation optimization of two cascade reservoirs, Chang and Chang [
13] applied the non-dominated sorting genetic algorithm II (NSGA-II) [
14] to solve the MOP and simultaneously minimize the shortage indices of both reservoirs. Their results demonstrated the ability of NSGA-II to attain high performance when analyzing water resource systems. Qin et al. [
15] developed the multi-objective cultured differential evolution (MOCDE) approach to achieve trade-offs between two conflicting flood control goals in multi-objective reservoir optimization. It was demonstrated that the MOCDE approach is efficient and robust with an increased ability to overcome the premature convergence problem. In addition, more MOEAs have been proposed in recent works [
16,
17,
18]. These studies focus on the development of powerful MOEAs, rather than the selection of one or several particularly apt solutions.
Regarding the abovementioned research, trade-off ranking techniques have usually been adopted to choose a solution from the results obtained with MOEAs. With a clear preference towards the objectives, methods such as the technique for order preference by similarity to ideal solution (TOPSIS), the elimination and choice translating reality (ELECTRE) [
19], and the analytic hierarchy process (AHP) are common approaches to rank and select the solutions of the Pareto optimal set [
20]. However, when applying MOEAs in real-world reservoir optimization problems, the DM may not have a clear articulation of preference (e.g., maximization of the total power production amount or guaranteed output). Thus, relative distance ranking [
21] and new trade-off ranking [
22] have been introduced to rank solutions without additional preferences.
In addition to the above ranking techniques, data-driven decision making [
23] is also an effective tool. Taboada and Coit [
24] applied cluster analysis to the Pareto optimal set. This data mining technique based decisions on the cluster analysis of a multi-objective optimization dataset. It successfully reduced the size of the Pareto optimal set and subsequently selected solutions. In the study of Suwal et al. [
25], projection pursuit clustering (PPC) was used to sequence the optimal solutions obtained via the NSGA-II algorithm. This was conducted in the objective function space. The scheme with a larger projection value was better.
However, the above ranking methods consider the Pareto frontier in objective space (e.g., the value of power production) without considering the information involved in decision space (e.g., the process of reservoir regulation). To fully utilize the information contained in the Pareto optimal set, Dumedah [
26] presented a clustering-based method for the selection of solutions from the Pareto optimal set according to the solution distribution in both objective and decision spaces. Sato and Izui [
27] applied the clustering method and association rule analysis in decision space to reduce the size of the Pareto optimal set. Simplified but vital knowledge was provided to the DM in a case study of a multi-objective topology optimization problem. The results revealed that the information detected with the clustering method facilitates the discovery of particularly effective solutions.
In this study, solution selection based on clustering in both objective and decision spaces [
26] was introduced into decision-making for multi-objective reservoir optimization purposes. The clustering-based method for solution selection (CMSS) was first applied to the Pareto optimal set of a multi-purpose reservoir. To enhance its capacity to cluster the time series [
28] (i.e., decision variables in reservoir operation), we improved the similarity measurement approach of the Mei–Wang fluctuation similarity measure (MWFSM). The MWFSM is tailored to characterize the similarity of the decision vector in both aspects of position and shape.
The remainder of this paper is organized as follows: In
Section 2, the CMSS and MWFSM are presented. The clustering algorithm adopted is introduced. A multi-objective reservoir operation optimization model is then built. In
Section 3, the background of the small- and medium-flood (SMF) utilization of the Three Gorges cascade reservoirs and the input data are provided. In
Section 4, optimal results of multi-objective reservoir operation are generated with NSGA-II. Clustering algorithms are employed to analyze the Pareto optimal results in objective space and decision space, respectively. The effectiveness of the MWFSM and CMSS is thereafter examined.
Section 5 offers a summary of our work and presents guidelines for future work.
2. Methodology
Reservoir operation processes (e.g., water releases, water levels) are considered as the decision variables. Quantification of their similarity is key to valid clustering and guarantees a high performance of solution selection in the optimization of multi-objective reservoir operation. In this section, a new similarity measurement method, clustering algorithm, solution selection procedure, and model of multi-objective reservoir operation are presented.
2.1. Mei–Wang Fluctuation Similarity Measure
It has been acknowledged that a proper distance measure is vital in clustering [
29]. Recently, a new index, Mei–Wang fluctuation (MWF) [
30], has been proposed to measure the fluctuation in a given process. The MWF index outperforms other indices by characterizing fluctuation in regard to both its quantitative variation and contour changes based on the standard deviation (SD) and rotation angle. This paper introduces the MWF index into the measurement of the similarity between two reservoir operation processes.
On the basis of the MWF index, the MWFSM was developed to identify similar reservoir operation processes. The MWFSM describes both the shape and spatial position of processes, which reflect the features of the reservoir operation.
Assume two reservoir operation processes of the same length:
The calculation procedure of the MWFSM between and is as follows:
is the index used to describe the difference in quantitative variation between
and
. It is calculated with Equation (2):
where
is the ordinate of the
point
,
is the ordinate of the
point
,
is the sequence number,
N is the length of the process, and
and
are the means of
and
, respectively.
is an index describing the difference between two processes in terms of contour variations. It is calculated with Equations (3)–(5):
where
is the angle between two line segments of
and
is the angle between two line segments of
. The rotation angle and line segment of a process are shown in
Figure 1.
is selected as an example, and for
or
,
is the rotation angle between the first line segment and a horizontal line,
is the rotation angle between the last line segment and a horizontal line, and
is the slope of the
line segment, which is defined by Equation (5).
The MWFSM between
and
is calculated with Equation (6) as follows:
where
is the index of the MWFSM method.
2.2. Clustering by Fast Search and Find of Density Peaks
Clustering by fast search and find of density peaks (DPC) [
31] is a clustering algorithm based on the density and distance developed by Rodriguez and Alessandro Liao in 2014. It has been verified that the DPC algorithm quickly determines density peaks and reduces the impact of isolated points, which is suitable for cluster analysis of large datasets [
32]. In this paper, the DPC algorithm was adopted to cluster decision processes during solution selection.
In the DPC algorithm, three important parameters are defined for each point. The local density and the distance to the nearest higher-density point δ are considered to describe data points. The decision value γ is created to simplify the selection process of the cluster centers.
In regard to data point
,
is expressed by Equation (7):
where
is the distance between points
and
, which is calculated via the proposed MWFSM, and
is the cut-off distance, which is larger than zero. According to Equation (6),
is equal to the number of points in the neighborhood area of point
within a radius of
. Hence,
is sensitive to
. However, previous research [
31] has also demonstrated that a large dataset reduces the influence of
on the clustering result. Conventionally, by sorting
in ascending order, the top 2% of the data column is assigned to
. In this paper, the value of
in the case study is determined via this approach.
In addition,
is defined by Equation (8) as follows:
where
is calculated as the minimum distance between point
and any other point with a higher density. The point with the highest
value is expressed as follows:
The decision value
is calculated with Equation (10).
where points with an extremely high
value are chosen as cluster centers. After determination of the density peaks, the remaining points are allocated following the principle of proximity.
Suppose a dataset . The detailed process of applying the DPC to is described as below:
Calculate distance matrix via MWFSM.
Sort in ascending order, assign the top 2% of the data column to .
Calculate local density according to Equation (7) for each data point in .
Calculate distance from the nearest larger density point according to Equations (8) and (9) for each data point in .
Calculate the decision value according to Equation (10) for each data point in .
Sort in ascending order and record the new order.
Construct the decision value graph where points are represented as with the ascending order in Step 6.
Select the points of the large γ values as cluster centers according to the decision value graph.
Allocate the remaining points following the principle of proximity.
2.3. Clustering-Based Solution Selection Method
The procedure of the CMSS is described below.
Suppose represents the reservoir operation decision process vectors in the Pareto set calculated with an MOEA, where is the number of solutions. Furthermore, represents the set of objective value vectors of the Pareto frontier.
The DPC method is applied to set and obtains clusters of decision processes.
The clusters generated in step 1 are ranked by size, and the decision cluster is denoted as , and the decision cluster with the largest membership is denoted as .
An operation pattern set consisting of the solutions corresponding to the centers of the decision clusters is generated.
The k-means algorithm is employed to cluster set and obtain objective value clusters.
The objective value clusters are ranked in descending order, and the cluster is denoted as , and the set with the largest membership is denoted as .
The intersection of and is considered. If the intersection set is not empty, it is denoted as . If the intersection set is empty, the intersection of and the next objective cluster is determined. This process is repeated until the intersection set is no longer empty, which is then denoted as .
The decision process with the minimum accumulative similarity in set is identified and recommended.
The selected solution and the operation pattern set are provided to DMs.
The core of this solution selection method is to identify solutions in high-density areas of the decision processes and objective values. is the area of the maximum concentration of the solutions in the decision space, and is considered a representative decision pattern if a typical solution is selected from this area. Analogously, a suitable robustness is achieved if the solution belongs to a representative area in objective space, e.g., . Hence, a linkage is built between the information extracted from the objective values and the knowledge acquired from the decision processes. In addition to the trade-offs between the objectives, the selected solution is also a compromise choice between the decision processes and objective values. Furthermore, the operation pattern set will help DMs better understand the reservoir operations of multi-objective optimal results.
2.4. Multi-Objective Optimization Model
During the flood period, the minimization of flood risk and ecological influences are common objectives of multi-objective reservoirs. In this section, a model of a cascade reservoir system is built considering these two objectives while meeting a variety of constraints (e.g., water balance and power output range). The decision variable of the model is the time series of the water level.
2.4.1. Objective Functions
The objective of the minimization of the ecological influences can be expressed as
where
is the eco-friendly objective,
is the discharge of the cascade reservoir system during the
period.
is the ecological flow series and
is the ecological flow during the
period. The definition of an ecological flow is usually based on the case study for certain purposes. When the cascade discharge process is similar to the ecological flow process,
is relatively limited, and the eco-goal is better met.
The objective of flood control is to minimize the maximum flood control capacity used during the operation horizon while satisfying additional flood control constraints. This is demonstrated in the study case. The flood control objective can be expressed as
where
is the maximum flood control capacity used during the operation horizon,
is the average capacity used in the
interval at the
cascade reservoir, and
is the lower bound of the volume of the
cascade reservoir.
2.4.2. Constraints
The water balance is expressed as
where
is the average storage of the
cascade reservoir during the
period,
is the inflow of the
cascade reservoir during the
period, and
is the outflow rate of the
cascade reservoir during the
period;
is the duration.
The outflow constraint is expressed as
where
is the inflow of the
cascade reservoir, which is equal to the sum of the outflow of the
cascade reservoir and the local inflow
during the
period.
The power output constraint is given by
where
and
are the minimum and maximum output power levels, respectively, of the
plant during the
period, and
is the average output power of the
plant during the
period.
The storage volume constraint is expressed as
where
and
are the lower and upper bounds, respectively, of the water level of the
dam during the
period.
The boundary condition limit is given by
where
and
are the water levels of the
cascade reservoir during the first and last periods, respectively, and
is the initial water level of the
dam, which is given in the case study.
The procedure of the proposed approach is illustrated in
Figure 2. It is composed of three main parts. The multi-objective problem modelling is the first part. Data preparation is carried out in this part. The second part is the optimization via NSGA-II. In the solution selection part, results of NSGA-II are used as an input. The data-driven solution selection approach is carried out.
4. Results and Discussion
In this study, the optimization results of multi-objective reservoir operation were obtained via NSGA-II. Hereafter, the optimization results of multi-objective reservoir operation are presented. Based on the optimization results, the objective values were clustered with the k-means method, and the operation processes were separately clustered in the decision space. The centers of the decision cluster were representative patterns of reservoir operation. Solution selection was conducted via the joint use of clustering results.
4.1. NSGA-II Output and Traditional Analysis of Pareto Set
NSGA-II was implemented via MATLAB 2014a software. The population size was set to 200, and the maximum number of iterations was set to 500. The stopping criterion of the algorithm was defined as the maximum number of iterations. The crossover probability of NSGA-II was set to 0.8, and the mutation probability was fixed as
(where
is the number of variables for each solution). In this case study,
was 21, and the mutation probability was set to 0.1. The algorithm converged after 500 iterations and generated a non-dominated set containing 200 feasible solutions satisfying all of the above model constraints. The output results of NSGA-II are shown in
Figure 5.
According to
Figure 5a, the trade-offs between the conflicting objectives indicate that the eco-goal
cannot be improved without worsening the flood control target
. Corresponding decision processes in the decision space are shown in
Figure 5b.
Commonly, after a Pareto optimal set is generated, researchers might select representative solutions [
35]. For comparison purposes, we selected three solutions in the same way: the solution fully satisfying the eco-friendly index, the solution fully satisfying the flood control target and a medium compromise solution considering both objectives with the same importance [
36]. The medium compromise solution has the closest Euclidean distance (ED) to the utopia point [
37] (also called the ideal point). These three solutions are shown in
Figure 6 below.
Figure 6a presents the distributions of these three solutions in objective space, and
Figure 6b shows the corresponding operation processes.
4.2. Clustering of the Trade-Off Frontier
In this subsection, objective values obtained via NSGA-II were clustered through the k-means algorithm [
38]. Before clustering, the two objective values were normalized via the
z-score method [
39]. The number of clusters in the k-means method is the most critical choice. Hence, the Calinski–Harabasz indicator (CH) [
40] was adopted to determine the optimal number of clusters, which was equal to 10. To eliminate the impact of the initial centers on the k-means method, multiple computations with randomly chosen initial centroids were performed until the results stabilized.
Figure 7 shows the clustering result based on the objective values. The colored points indicate the distribution of the Pareto optimal solutions in objective space, and the stars indicate the cluster centroids. According to the clustering results in objective space, the solutions were partitioned into ten clusters.
By clustering the Pareto set in objective space, we narrowed the selection range (200 solutions in this case study) to ten clusters with corresponding unique characteristics. In some studies [
24,
41], cluster analysis is adopted as a practical solutions selection approach. This approach offers the DM a set of k clusters. To make the final decision, the DM is required to select one cluster from among the k clusters. In this case study, if DMs prefer a low flood control risk to better meet the eco-friendly goal, they can choose among the solutions contained in cluster 3, which are the green circles in
Figure 7. If DMs have no preference, the cluster that contains the knee region will be the focus. According to [
42], cluster 9 is the knee region and a knee solution is marked in
Figure 7.
4.3. Clustering of the Reservoir Operation Processes
To discover more information about the Pareto optimal results, we clustered the operation processes in the decision space to detect reservoir operation patterns, which facilitates practical water management. In this subsection, the clustering results of the reservoir operation processes obtained through DPC with the new similarity measure MWFSM (MWFSM-DPC) are presented. Compared to DPC with ED (ED-DPC) and DPC with dynamic time warping (DTW; DTW-DPC), the validity of MWFSM was verified. The MWFSM recognized more reservoir operation patterns in the high water level zone.
DPC was employed to cluster the reservoir operation processes in the decision space. To validate the MWFSM, two common methods, i.e., ED and DTW [
43], were adopted as controls. ED is a classical distance measure, which is simple and intuitive to use. DTW is another well-known similarity measure, which has been widely applied in the clustering of time-series data [
44]. In the DPC applications, the top 2% of the data column was assigned to
. Each experiment was independently run in the same computer environment.
In this case study, the true clustering was unknown. The Silhouette method [
45], as a widely used internal validity index [
46], was adopted as the clustering validation measure. The Silhouette index (Sil) is a normalized summation-type index. Its value ranges between −1 and +1. The larger the value of Sil, the better the clustering results. As the internal validity indices (i.e., Sil) cannot make comparisons between clustering approaches that are generated using different similarity measures [
47], Sil was used to verify the validity of the clustering results [
48], not for comparison purposes.
In
Table 1 below, the results of clustering and the Silhouette method are listed.
According to the values of Sil presented in
Table 1, the results of the three experiments were reliable. From clustering results, MWFSM-DPC divided the reservoir operation processes into seven clusters, whereas DTW-DPC and ED-DPC yielded five clusters. In the MWFSM-DPC experiment, the largest cluster contained 71 reservoir operation processes, whereas the other six clusters contained few processes. In the DTW and ED experiments, the largest clusters contained 53 and 97 processes, respectively. The abovementioned similarity measurement methods impose different influences on the time-series clustering results. Proper similarity measures applied in cluster algorithms could provide more useful and highly pertinent information regarding reservoir operation.
The visualizations of the clustering results of DTW-DPC, ED-DPC, and MWFSM-DPC are shown in
Figure 8,
Figure 9 and
Figure 10 separately. Cluster centers were selected as the representative solutions. These operation patterns were divided into three categories: the high water level pattern, in which the highest water level is higher than 148 m; the medium water level pattern, in which the highest water level is between 148 and 146 m; and the low water level pattern, in which the highest water level is lower than 146 m. The results of each experiment are presented below.
In
Figure 8, DTW-DPC found one high water level pattern, three medium water level patterns, and one low water pattern. In
Figure 9, ED-DPC yielded five clusters. One high water level pattern, two medium water level patterns and two low water patterns were identified, respectively. As shown in the red boxes of the first subfigures in
Figure 8 and
Figure 9, the local trends of the high water level patterns were vastly different. During the period between the eighth and twelfth days, the pattern in DTW-DPC exhibited mono-growth, in which the pattern in DTW-DPC first increased and then decreased. In DTW-DPC and ED-DPC experiments, the clusters were generated in a narrow area with a similar position in the decision space. However, the shape dissimilarity between the various processes with similar positions was not captured.
The result of MWFSM-DPC is shown in
Figure 10. In
Figure 10, MWFSM-DPC discovered three high water level patterns, three medium water level patterns, and one low water pattern. The patterns discovered in each experiment were similar except for the high water level patterns.
Compared with ED and DTW, the MWFSM distinguished more patterns in the high water level zone, and the mono-growth pattern and the pattern with fluctuation were both recognized. More patterns were discovered, which allowed the DM to better understand the high water level operations.
According to the above results, the MWFSM method attains distinct advantages in the clustering reservoir operation processes, where the concern is not only position similarity but also shape similarity.
4.4. Solution Selection
In this part, the CMSS results are presented. Then, the results are compared to the outcomes of the traditional recommendation method in
Section 4.1 and the clustering method in
Section 4.2. The advantages and shortcomings of CMSS are discussed.
According to the cluster analysis results of the objective values and the operation processes, the intersection of the two abovementioned results was determined and is presented in
Table 2 below.
The number of solutions in each intersection is provided in
Table 2. The largest intersection was obtained between cluster 7 of MWFSM-DPC and cluster 10 determined via Pareto cluster analysis. Considering that the intersection describes the robustness level [
26], which implies that the larger the intersection, the higher the robustness, we took the largest intersection as the new selection range. The selection range was narrowed from 200 to 22 solutions, which is illustrated in
Figure 11.
The processes in the largest intersection and the intersection center are shown in
Figure 11a. Corresponding objective values are shown in
Figure 11b. In
Figure 11a, the blue curves represent the reservoir operation processes of the solutions in the intersection. The black curve with red dots represents the center. It shows that for most solutions in the intersection, the water level slightly fluctuated during the early period and decreased to the dead water level before the main flood control operation. Another fluctuation occurred during the rising period on the 10th day, and the highest water level during the flood was observed around the 14th day. All solutions in this intersection shared the same trends except for the early period and the period of the 13th, 14th, and 15th days. It was observed that objective values of solutions around the selected solution were on the Pareto frontier or close to it, which indicates the robustness of the selected solution.
Finally, the DM can be provided with the seven patterns discovered in the decision space and the selected solution.
Via the traditional recommendation method in
Section 4.1, three solutions selected from the large Pareto set are presented to the DM. The medium compromise solution is most likely recommended for implementation. As such, valuable information contained in the large Pareto optimal set is wasted at a certain point. Via trade-off cluster analysis in
Section 4.2, the selection range is narrowed to 10 Pareto clusters. Further selection requires a clear preference of the DM. Otherwise, a representative solution selected from the knee region is chosen as a recommendation. In both approaches, the selection is based on the information in the objective space. The reservoir operation processes in the Pareto optimal set are not analyzed.
In the proposed approach, we innovatively clustered the processes in the decision space. The processes were assigned to certain groups with similar position and shape. Seven typical operation patterns of the Pareto optimal set were identified. Then, the set intersection operation was performed for the largest cluster in the decision space and clusters in the objective space. The selection range was narrowed to the largest intersection which contained 22 solutions. The center of the intersection was chosen as a recommendation.
Compared to the other two selection methods, the proposed approach involved the information discovered in the decision space, not only the objective values. Valuable information regarding the Pareto optimal reservoir operation processes was uncovered during the calculation.
Although the proposed approach worked without the articulation of preference from DM, robustness was a hidden preference. In addition, the CMSS was sensitive to similarity measures adopted in the clustering algorithm according to the clustering analysis results of operation processes.
5. Conclusions
To select a solution with certain properties from among the numerous solutions in the optimal Pareto set for multi-objective reservoir operation models, this study introduced the CMSS, which benefits from the information via clustering not only in objective space but also in decision space. A new similarity measure named MWFSM was developed to capture the temporal nature of the various reservoir operation processes through clustering. Due to the advantages of the additional information extracted in the decision space, the CMSS selects solutions from a large Pareto set. In this study, the CMSS was verified in a case study of the regulation of the TGR–GZB cascade reservoirs during the flood season considering the eco-goal and flood control target. MWFSM successfully identified more patterns with different shapes. The CMSS recommended a solution that acquires robustness. The feasibilities of the MWFSM and CMSS were verified. In many cases of real-world reservoir optimization problems like the case study, the DM may not have a clear articulation of preference. The CMSS can deal with this situation and select a solution with robustness.
In future research, forecast data will be analyzed instead of historical data, and the risk objective of reservoir flood operation will be imported, which may further narrow the gap between research and practice. Another advantage of CMSS is that it selected a solution automatically. It could be developed into an automated decision-making system in reservoir operation in the future.