4.4.2. Hierarchical Clustering

Hierarchical clustering algorithms are formed by iteratively dividing the groups using bottom-up or top-down methods called agglomerative and divisive hierarchical clustering [33]. In this study, we employed agglomerative hierarchical clustering to segment load curves based on preprocessed features. The agglomerative builds up clusters starting with a single object as a single cluster and then using distance metric to merge the two most similar clusters [34]. Repeat until all of the objects are finally merged into a single cluster. We use "Ward" linkage to compute the distance between the new cluster and the rest of the

clusters, minimizing the variance of the merged clusters [35]. Ward linkage criterion can be expressed as follows:

$$\Delta\left(X\_i, X\_j\right) = \frac{n\_i n\_j}{n\_i + n\_j} \left\|c(X\_i) - c\left(X\_j\right)\right\|^2 \tag{6}$$

where *c*(*Xi*) is the centroid of cluster *i*, *ni* denotes the number of points in cluster *i*.

**Figure 8.** Elbow method estimated by distortion.

**Figure 9.** Elbow method estimated by Calinski–Harabasz.

Figure 10 depicts the Ward linkage truncated dendrogram which present a tree structure to vasualize the clusters and the number belonged each cluster. Ward's method dendrogram displays the clustering structure of the data. The numerical data in Figure 10 means the distance between different cluster centers which is calculated by Equation (6). The black dashed line represents the distance threshold which is 50. In addition, we combined the Calinski–Harabasz index with dendrogram to determine the optimal number of clusters (Table 1). According to the result, it can be confirmed that when *K* changes from 2 to 3, the Calinski–Harabasz index increases rapidly and then gradually increases thereafter. The Calinski–Harabasz index and dendrogram indicate that three is the optimal number for the value of *K*.

**Figure 10.** Agglomerative clustering dendrogram using Ward linkage.

**Table 1.** Calinski–Harabasz Index of agglomerative clustering.


4.4.3. Fuzzy c-Means Clustering

The fuzzy c-means (FCM) algorithm is one of the soft clustering algorithms, also known as "soft K-means," where each data object can belong to multiple clusters. The fuzzy c-means algorithm has been widely used in many applications, such as consumer behavior and market segmentation [36]. FCM aims to minimize the objective function, as follows:

$$J\_m = \sum\_{i=1}^{N} \sum\_{j=1}^{C} \mu\_{ij}^m ||\mathbf{x}\_i - \mathbf{c}\_j||^2 \tag{7}$$

where *m* is the fuzziness parameter in the range of [1,+∞), *uij* is the degree of membership of *xi* in cluster *j*, *cj* is the centroid of cluster *j*. The membership degree and cluster center will be updated iteratively until the objective function value is smaller than the error. The cluster center *cj* and membership degree *uij* and can be obtained as follows:

$$x\_j = \frac{\sum\_{i=1}^{N} u\_{ij}^m x\_i}{\sum\_{i=1}^{N} u\_{ij}^m} \tag{8}$$

$$\mu\_{ij} = \frac{1}{\sum\_{k=1}^{C} \left(\frac{\|\mathbf{x}\_i - \mathbf{c}\_j\|}{\|\mathbf{x}\_i - \mathbf{c}\_k\|}\right)^{\frac{2}{m-1}}} \tag{9}$$

The algorithm comprises the following steps:

Step 1: Determine the number of clusters, fuzziness parameter *m* and the error *ε*.

Step 2: Initialize the membership matrix *<sup>U</sup>*[0] using *<sup>c</sup>* ∑ *j*=1 *μj*(*xi*) = 1.

Step 3: At *k* step, compute the centroid *ck* with equation (8).

Step 4: Update the new membership matrix *U*[*k*] , *U*[*k*+1] with Equation (9).

Step 5: If *U*[*k*+1] <sup>−</sup> *<sup>U</sup>*[*k*] < *ε*, stop, else, return to step 3.

The main advantage of FCM is its suitability for overlapped data, its scalability and simplicity, and accuracy. However, the time complexity of fuzzy c-means is more than *k-means*. In our study, we selected the fuzziness index and error *ε* by grid search. The optimal fuzziness index was determined as *<sup>m</sup>* = 1.25, and the error as *<sup>ε</sup>* = 1 <sup>×</sup> <sup>10</sup><sup>−</sup>5. Figure <sup>11</sup> shows the clustering result based on three principal components. The points from Cluster 1 and Cluster 2 are relatively compact, but Cluster 3 is more dispersed.

**Figure 11.** Clustering result with FCM applied when c is set to 3.

#### **5. Experiment Results and Analysis**

In the clustering phase, we employed three different clustering algorithms to segment different daily load curves. Considering the diversity of clustering performance evaluations, we selected three indices for validation, namely the silhouette coefficient, Calinski–Harabasz index, and Davies–Bouldin index (DBI) [37], which are internal clustering criteria. The Calinski–Harabasz index has been described in Section 4. The silhouette coefficient combines cohesion and separation. Cohesion indicates the similarity of points in the same cluster. On the contrary, separation indicates the object compared to other clusters. Specifically, the silhouette coefficient is calculated as follows:

$$\text{SC} = \frac{b(i) - a(i)}{\max\{a(i), b(i)\}} \tag{10}$$

where *a(i)* indicates that cohesion is the mean distance between a sample and all other points in the same cluster, and *b(i)* is the minimum value of the mean distance between an object and all other objects in the nearest cluster, then, the equations of *a(i)* and *b(i)* are as follows:

$$a(i) = \frac{1}{|\mathcal{C}\_i| - 1} \sum\_{j \in \mathcal{C}\_i, i \neq j} d(i, j) \tag{11}$$

$$b(i) = \min\_{k \neq i} \frac{1}{|\mathbb{C}\_k|} \sum\_{j \in \mathbb{C}\_k} d(i, j) \tag{12}$$

The value of silhouette is in the range of [−1,1]. If the silhouette coefficient is close to 1, it means that the model is suitable; a negative value indicates incorrect clustering. Higher values of the silhouette coefficient imply that the model clustered well. Davies–Bouldin index measures the average similarity between clusters, where the similarity compares the distance between clusters with the size of the clusters themselves. For a given set of clusters *C* = {*c*1, *c*2,..., *ck*}, *ci* is the most similar with *cj*. Davies–Bouldin index is defined as follows:

$$DB = \frac{1}{k} \sum\_{i=1}^{k} \max\_{i \neq j} \frac{s\_i + s\_j}{d\_{ij}} \tag{13}$$

where *k* is the cluster number, *si* is the average distance between all objects in cluster *i* and cluster *i* centroid, *dij* is the distance between *ith* and *jth* cluster centroids. The smaller value of the Davies–Bouldin index implies that the clusters are separated properly.

We compared our proposed method with the original clustering algorithm without reducing the dimension. Table 2 compares the three clustering results, presented by calculating cluster validity indexes. The name of clustering methods that include 'Original' denotes the daily load data without reducing dimensionality. *N* denotes that the daily load data were normalized by min–max normalization to rescale the data to fit in the range 0 to 1. Generally, normalizing the data before clustering could ignore the distance difference between different variables. Equation (14) presents the formula for min–max normalization, as follows:

$$\alpha' = \frac{\mathbf{x} - \min(\mathbf{x})}{\max(\mathbf{x}) - \min(\mathbf{x})} \tag{14}$$

According to the evaluation index, our wavelet-based preprocessing method slightly improves clustering performance compared to the original method. However, the performance of wavelet-based hierarchical clustering is better than hierarchical clustering without dimensionality reduction. Compared with the three wavelet-based clustering algorithms, the performance of k-means and FCM were similar, the silhouette coefficient and Davies–Bouldin index of FCM were better than k-means. For hierarchical clustering, the silhouette coefficient is the best, but the other two indices are worse than those of k-means and FCM. In addition, the proposed method significantly saves the computation time by dimensionality reduction.


**Table 2.** Clustering evaluation comparison results of the proposed methods.

Note: SC is Silhouette Coefficient, CH is Calinski–Harabasz Index, and DBI is Davies–Bouldin index. The larger the SC and CH values, the better. Conversely, the smaller the DBI values, the better.

Based on our comparison, we adopt the wavelet-based fuzzy c-means method. In three clusters, the first, second, and third clusters represent 66.54%, 26.84%, and 6.62% of the daily load curves, respectively. Figure 12 shows the load patterns of the three clusters and daily load curves. Cluster 1 and 3 represent the lowest and highest power consumption, respectively. In each cluster figure, the bold red line represents the representative load pattern, while the other curves represent the daily power usage in the cluster. Cluster 1 contains 6297 daily load curves, with stable power consumption; the average power usage and average peak power were 0.187 kW and 0.438 kW, respectively. Cluster 2 contains 2540 daily load curves; the average power usage and average peak power were 0.517 kW and 1.056 kW, respectively. Cluster 3 is composed of 627 daily load curves, which is the highest power usage group and has the highest variability. For Cluster 3, the average power usage and average peak power were 1.212 kW and 2.209 kW, respectively.

Figure 13 illustrates the average daily power usage box and whisker plot of three clusters. Boxplot could present data distribution based on a five-number summary, including minimum, first quartile, median, third quartile and maximum. There are some outliers (data point in Figure 13) in cluster 2, while in cluster 3, many outliers fall beyond the maximum value. As the power usage increases from cluster 1 to 3, the variation in power also increases. The standard deviation of cluster 1 is 0.0829 kW, cluster 2 is 0.1329 kW, and cluster 3 is 0.3193 kW.

Figure 14 shows the average power load pattern in four seasons of three clusters. It appears that the three clusters have similar power usage characteristics in the four seasons, i.e., the average power usage valley and peak at the same time every season, around 4 am and 8 pm, respectively. Moreover, the household generally needs to use air conditioners to control the indoor temperature during the summer; therefore, electricity usage is higher. The winter consumption in the three clusters is less than that of the summer, insinuating that most apartments have installed a heating system that is not taken into account in the electricity data. Looking at all four seasons, electricity demand is stably required between 8 am and 2 pm in Cluster 1 (a) and Cluster 2 (b). The section that consumes the most power is Cluster 3, and it can be seen that the power demand increases over time during the same period.

**Figure 12.** Daily load curves and load patterns of each cluster, (**a**) low load consumption group, (**b**) middle load consumption group, and (**c**) high load consumption and instability group.

**Figure 13.** Box and whisker plot of average daily power (kW) usage in three clusters.

**Figure 14.** Seasonal average load patterns in three clusters, (**a**) low load consumption group, (**b**) middle load consumption group, and (**c**) high load consumption and instability group in four seasons.
