*3.2. Methodology for Customer Targeting Based on Two-Stage Clustering Method in E*ffi*cient DR Operation*

The framework used to segment customers into groups based on load profile and to determine appropriate groups for incentive-based DR program participation is depicted in Figure 1. First, load data is collected for load profile clustering. Subsequently, we perform data preprocessing comprising data selection (i.e., exclude weekends, holidays, and event days from the data) and cleansing (i.e., replace missing data and delete incomplete customer data). After data preprocessing, a two-stage load profile clustering is performed to segment residential DR customers in accordance with electricity consumption characteristics and their load profile.

**Figure 1.** Flowchart of the proposed two-stage clustering methodology and customer targeting strategy.

Load profile including information such as peak time, duration, and electricity consumption can estimate approximately how much customers can reduce their capacity, so this information could be an important factor for determining which customers can reduce the most demand during the implementation of the DR program. These characteristics should be extracted from the load profile and treated as variables in the clustering method. Therefore, the characteristics (i.e., daily consumption, peak time) are considered in the first stage of clustering. In the second stage, the classification variable is the normalized load profile. Suitable DR participation groups are then derived by analyzing the segmentation results. Distributions of peak time, average consumption, and peak demand scale could be obtained from this analysis. After selecting the target groups, a DR effect analysis is conducted to verify the effect of targeted enrollment. This analysis shows the demand reduction capacity per customer of the targeted enrollment, and these results are compared with the results obtained assuming opt-in enrollment into the DR program. When the clustering method is applied, considering many variables does not always produce reliable results. Therefore, it is necessary to include the essential variables strategically. However, if there are too many variables to segment customers well, a method to deal with this problem should be devised. In this study, we improve load profile clustering performance by applying our proposed methodology. Figure 2 explains the proposed two-stage load profile clustering algorithm.

**Figure 2.** Two-stage clustering methodology for load profile segmentation.

Before load profile segmentation, load characteristics should be found from load profile by using feature selection (being the process of selection of a subset of relevant features). Features used for cluster input variables are selected through correlation analysis. When we derive relevant features from load profile, we consider factors (i.e., daily consumption, peak time, difference between peak demand and minimum demand) affecting effective DR operation.

The next step is normalization of load characteristics for 1st stage segmentation and load profile for 2nd stage segmentation instead of using raw data. Normalization transforms the load to a number from 0 to 1 and can provide better performance by changing the value of input data. The normalization about load characteristics was conducted on the basis of each variable. On the other hand, the normalization about load profile was used in accordance with each customer. Min–max normalization was used, as illustrated by Equation (4) in the case of load profile normalization:

$$\widetilde{d\_{i,t}} = \frac{d\_{i,t} - \min\_{t \in T}(d\_{i,t})}{\max\_{t \in T}(d\_{i,t}) - \min\_{t \in T}(d\_{i,t})} \tag{4}$$

where *i*, *t*, *di*,*t*, and *d i*,*<sup>t</sup>* are customers, time, demand of customer *i* at time *t*, and the min–max normalization result, respectively.

After the normalization process, segmentation based on load characteristics is preceded by the k-means method before the load profile segmentation as explained in Figure 2. This process separates customers based on their consumption scale and peak times. In other words, it is a process to segment customers over a large range. The reason why these components are chosen is that consumption scale would be an indicator to estimate how much customers can reduce their demand, and the customer's peak time occurrence during an event indicates whether customers stay at home. The next step is customer segmentation based on load profiles, which is conducted for all members of each group following the first-stage clustering analysis. The effect of separation as two-stage k-means clustering is that features can be better reflected as compared to basic k-means.

The main goal of this analysis is to determine a way to produce the most significant effect with suitable customers enrolled in the DR program. To achieve this goal, we propose two standards to select customer groups with high DR potential. If a peak demand event occurs, the likelihood of customers staying in their homes is relatively high. It may be argued that the corresponding customers tend to be able to reduce their demand effectively. However, this is not an absolute indicator. In some cases, for instance, the demand of some customers could be high although peak demand time may not remain constant, or some customers may register an insignificant demand reduction although peak demand times remain constant. Therefore, we stipulate the following criteria to determine the target groups:


After the load profile segmentation, result analysis through a boxplot chart is adopted as a method of excluding customer groups who are inappropriate customers in DR program participation.
