3. End

Formally, let us consider that *n* observations must be partitioned in *c* groups. Let *xi* and *μc* be the *i*-th observation, 1 ≤ *i* ≤ *n*, and the mean of group 1 ≤ *c* ≤ *k*, respectively. The goal of *k*-means is to minimize the sum of the squared error over all groups denoted by *J*(*C*); Thus, the objective function is stated as:

$$f(\mathbb{C}) = \sum\_{c=1}^{k} \sum\_{\mathbf{x}\_{i} \in \mathbb{c}} ||\mathbf{x}\_{i} - \mu\_{c}||\_{2}. \tag{1}$$

Minimizing this objective function is an NP-Hard problem, even for *k* =2[8]. Therefore, *k*-means, is a greedy algorithm, this means, that it builds up a solution choosing the best option at every step so it can be expected to converge to a local minimum. *k*-means starts with an initial partition with *k* groups and assigns observations to groups to reduce the squared error. Since the squared error tends to decrease with an increase in the number of *k* groups (with *J*(*C*) = 0 when *k* = *n*), it is minimized for a fixed number of groups [9].

A *k*-means algorithm requires some user-specified parameters such as the number of clusters: typically, *k*-means runs independently for different values of *k* and the partition that appears the most meaningful to the human expert is selected. Besides, different initializations can lead to different final clusters because *k*-means can only converge to local minima. Another user-specified parameter is the metric, while it is true that the most used metric for computing the distance between points and cluster centers is the Euclidean distance, which is why *k*-means is limited to linear cluster boundaries; however, some other metrics such as the Mahalanobis distance metric has been used to detect hyperellipsoidal clusters [10]. Moreover, it has a limitation about the number of observations, because *k*-means assumes that each group has roughly the same cardinality.

Due in *k*-means there is no assurance that it will lead to the global best solution, *k*-means run for multiple starting guesses, and it improves the result in each step. However, *k*-means is used because it is broadly easy and fast to code and implement it.

## *2.2. Gaussian Mixture Model*

A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. A GMM can be seen as a *k*-means generalization which incorporates information about the covariance structure of the data, as well as the centers of the latent Gaussians [11]. GMM attempts to find a mixture of multi-dimensional Gaussian probability distributions that best model any input dataset. The pseudocode is:
