*3.1. k-means*

k-means is a popular method for cluster analysis in data mining that is commonly employed to study electricity demand clustering. It is a simple and robust algorithm which aims to separate n observations into k clusters [15,27]. When a dataset *<sup>X</sup>* <sup>=</sup> {*x*1, *<sup>x</sup>*2, ... , *xN*}(with *xi* <sup>∈</sup> <sup>R</sup>*n*) and *<sup>K</sup>* clusters *C* = {*C*1,*C*2, ... ,*CK*} are given, each *xi* ∈ *X* is assigned to exactly one cluster *Ck* ∈ *C*, which is characterized by a cluster centroid μ*k*. The classical k-means clustering method is performed as follows. First, the integer value *K* corresponding to the number of clusters is determined. Then, the initial cluster centroid set μ1, μ2, ... , μ*<sup>K</sup>* is selected randomly. Data point *xi* ∈ *X* is assigned to the closest μ*<sup>k</sup>* through distance comparison against μ1, μ2, ... , μ*<sup>K</sup>* using the Euclidean distance. The formula for setting the data set in clusters is illustrated by Equation (1):

$$cluster(\mathbf{x}\_i) = \underset{k \in \{1, \ldots, \mathbf{K}\}}{\operatorname{argmin}} \|\mathbf{x}\_i - \mu\_k\|^2 \tag{1}$$

The clustering algorithm aims to minimize the sum of squares within the groups and maximize it between the groups. The cost function *J* to be minimized in k-means is therefore expressed by Equation (2):

$$J = \frac{1}{N} \sum\_{k=1}^{K} \sum\_{\mathbf{x}\_i \in \mathbb{C}\_k} \left\| \mathbf{x}\_i - \mu\_k \right\|^2 \tag{2}$$

The cluster centroid set update is performed by calculating the mean data set belonging to cluster *Ck* as given by Equation (3):

$$\mu\_k = \frac{1}{|\mathbb{C}\_k|} \sum\_{\mathbf{x}\_i \in \mathbb{C}\_k} \mathbf{x}\_i \tag{3}$$

This process is repeated until the distribution of the dataset among the clusters no longer changes. In other words, cluster centroids do not change.
