In recent years, renewable energy power generation has been attracting attention due to growing concern over global warming. Fossil fuel power generation is a cause of increased greenhouse gas emissions [
1]. Renewable energies are non-fossil fuels and do not emit greenhouse gases when power is generated, so countries worldwide are aiming to convert their existing power generation facilities to renewable energies [
2]. However, the amount and duration of power generation from renewable energies are unstable, which causes uncertainty in the stable supply of electricity. In particular, the duck curve phenomenon, which occurs with the introduction of large amounts of PV, causes load fluctuations to increase due to PV generation, leading to problems with the number of generator startups and shutdowns at certain times [
3]. The introduction of renewable energy sources causes unstable grid operation, resulting in a situation where a stable power supply cannot be achieved.
Against this background, there has been growing interest in unit commitment (UC) for power system operation and scheduling [
4]. UC is a study of stable power system operation and load demand distribution by proposing optimal operation schedules for generators [
5]. In Ref. [
6], it was proved that the proposal of optimal generator operation scheduling by UC can reduce the grid operation cost. In Ref. [
7], UC was proposed to improve the duck curve phenomenon by peak shifting through the introduction of demand response and ESS. The UC simulation can solve the grid operation unstable caused by the increase of renewable energy generation that may occur in the future.
Although various methods have been used to derive UC, MILP has been proven to obtain optimal solutions with high quality when compared to each method [
8]. In addition, MILP is characterized by the fact that it is relatively easy to introduce and take into account constraint conditions, and simulations can be performed with a model that is close to reality [
9]. However, one of the disadvantages of MILP is that the computation time increases due to the increase in the number of constraints and variables to be considered and the larger scale of the model [
8,
10]. In particular, the binary variables make more realistic simulations possible, but at the same time, the computation time increases exponentially [
11]. As shown in Ref. [
12], relaxing binary variables does not necessarily lead to a reduction in computational load, so methods to shorten simulation time are required.
Moreover, although the introduction and study of various demand response systems is an important issue in UC [
7], a one-day simulation is not sufficient. Since load demand varies greatly depending on climatic changes such as seasons and people’s activities, simulations for at least one year are necessary to evaluate the proposed method [
13]. However, as previously mentioned, MILP simulation requires a lot of time to solve the problem.
One of the fastest methods to obtain solutions in MILP is RLCs (representative load curves), which is a method to reduce the number of load curves using clustering of load demand over one year [
14]. In RLC, load demand is classified into several patterns based on some criteria, and a representative day is created. By considering the weights of the representative days, it is possible to obtain results that are almost equivalent to running a one-year simulation with a small number of simulations and a short time period. However, it is important to use clustering with high clustering accuracy because the results can vary greatly depending on the clustering method.
Clustering Method
The main clustering methods are supervised and unsupervised learning. As a representative of supervised learning, population neural networks (ANNs) learned from past cases were used as the standard AI technique [
15]. This technique is effective for future forecasting and performed better than other forecasting methods for PV and wind speed forecasting in Ref. [
16]. However, a major characteristic of supervised learning is that it mostly requires large amounts of high-quality training data. Therefore, unsupervised learning is used for classification of historical data.
The k-means method is a typical clustering method for unsupervised learning. The k-means method is easy to implement and provides accurate clustering [
17]. The k-means method begins by specifying the number of groups (clusters) to be classified, then randomly assigns all elements of the data to the clusters, and classifies the elements of the data by using the center of gravity within the clusters and the average value of each element. Therefore, the most important aspect of this method is how
k is determined, since the results can vary greatly depending on the number of clusters
k. Several studies have been conducted on how to determine the cluster
k for using the k-means method. In Refs. [
18,
19], the number of clusters has been determined using a method called the elbow method. In this method, the number of clusters
k is varied one by one, from one to the total number of elements, and the residual sum of squares (SSE value) of the clusters is calculated and plotted on a diagram. In Ref. [
20], based on the idea that the electricity usage of consumers differs depending on the season and on weekdays, weekends, and holidays, the number of patterns to be classified is created in advance, and this is used as the number of clusters. However, the common problems with these methods are that the results and accuracy of the classification depend on the clustering user and that
k must be determined in advance. If an incorrect number of clusters is used, the accuracy of clustering with load demand data is greatly reduced and complexity is increased [
21].
In addition, several references [
22,
23,
24] use k-means for clustering load demand, but since k-means can only handle one parameter, clustering is performed using load demand and renewable energy generation (PV and wind power). However, load demand and renewable energy generation vary with external factors (temperature, solar radiation, wind speed, etc.), but these external factors are not taken into account when clustering.
Therefore, in this paper, we examine the adaptation of DBSCAN (density-based spatial clustering of applications with noise), a clustering method that can perform clustering without predetermining the number of clusters, to pattern classification of load demand. DBSCAN is a clustering method characterized by the fact that it performs classification based on the distance between elements and can remove elements with weak relationships to each other from the classification as noise [
25]. However, since DBSCAN uses two parameters,
and MinPts, the clustering accuracy can vary greatly depending on the settings of these two parameters [
26]. In this paper, we present a new approach to automatically define the eps and MinPts parameters for the DBSCAN algorithm.
In this paper, we improve DBSCAN by using two cases to enable more detailed cluster classification and noise elimination compared to the conventionally used DBSCAN algorithm (Ref. [
27]) to increase clustering accuracy. Since DBSCAN has the property of being able to plot data on a two-dimensional plane and cluster them, this paper investigates effective combinations for load demand classification from 12 data sets (temperature, humidity, wind speed, etc.).
In Case 1, the k-dist plot in the Ref. [
27] was improved to automatically determine more knees for clustering. In Case 2, clustering was performed using DBSCAN noise reduction and k-means center-of-gravity updating. The results are compared with those of the DBSCAN algorithm (Ref. [
27]), the k-means method, and annual operations without clustering to demonstrate the effectiveness of the algorithm.
The remainder of this paper is organized as follows.
Section 2 introduces DBSCAN and the proposed method.
Section 3 presents the objective function for minimizing operating costs and the constraints considered in performing the optimization.
Section 4 presents the power system model assumed in this paper.
Section 5 presents and discusses the simulation results.
Section 6 concludes the paper.