3. End While

End

Formally, let *k* and *n* be the number of clusters and the total number of observations, respectively. Let *μ<sup>c</sup>*, Σ*c* and *πc* be the mean, covariance, and the mixing probability of cluster *c*, 1 ≤ *c* ≤ *k*. For GMM, *μc* is the center of cluster *c*, Σ*c* which represents its width and *πc* defines how large or small the Gaussian function will be.

Then the probability that *xi*, 1 ≤ *i* ≤ *n* is in the cluster *c* is given by:

$$\gamma\_i^{\varepsilon} = \frac{\pi\_{\varepsilon} \mathcal{N}(\mathbf{x}\_i \mid \boldsymbol{\mu}\_{\varepsilon}, \boldsymbol{\Sigma}\_{\varepsilon})}{\sum\_{c=1}^{k} \pi\_{c} \mathcal{N}(\mathbf{x}\_i \mid \boldsymbol{\mu}\_{c}, \boldsymbol{\Sigma}\_{\varepsilon})} \tag{2}$$

where N (*x* | *μ*, Σ) describes the multivariable Gaussian.

*γci* gives the probability that *xi* is in cluster *c*, divided by the sum of the probabilities that *xi* is in cluster *c*, for all 1 ≤ *c* ≤ *k*, so if *xi* is very close to a Gaussian *c*, it will have high values of *γci*and relatively low values for any other case.

As a second step, for each cluster *c*: the total weight *mc* is calculated (which can be considered as the fraction of points assigned to group *c*) and *πc*, *μc* and Σ*c* are updated using *γci*with:

$$m\_{\mathcal{C}} = \sum\_{i=1}^{n} \gamma\_i^{\mathcal{C}} \tag{3}$$

$$
\pi\_c = \frac{m\_c}{m} \tag{4}
$$

$$
\mu\_{\mathfrak{c}} = \frac{1}{m\_{\mathfrak{c}}} \sum\_{i=1}^{n} \gamma\_i^{\mathfrak{c}} \mathbf{x}\_i \tag{5}
$$

$$\Sigma\_{\mathbf{c}} = \frac{1}{m\_{\mathbf{c}}} \sum\_{i=1}^{n} \gamma\_i^{\varepsilon} (\mathbf{x}\_i - \boldsymbol{\mu}\_{\mathbf{c}})^T (\mathbf{x}\_i - \boldsymbol{\mu}\_{\mathbf{c}}) \tag{6}$$

Finally, the first and second steps are repeated until convergence is reached [12]. The result of this is that each cluster is associated not with a hard-edged sphere, but with a smooth Gaussian model. Although GMM is categorized as a clustering algorithm, it is technically a generative probabilistic model describing the distribution of the data; due to this property, there are two important limitations with GMM: the first one is about its computation complexity because it is necessary to calculate the distributions, and whereby the algorithm can fail if the dimensionality of the problem is too high; the second limitation is that in many instances, the number of groups is unknown and it may be necessary to experiment with a number of different groups in order to find the most suitable.
