2.2.3. Algorithm

In empirical analysis, the following steps are the backbone of the calculation (Maurizio et al. 2007).


### 2.2.4. Assessment of Clustering Methods

The relevance of different clustering techniques can be tested in multiple ways. The most common metrics follow a regression-based logic. In this framework we suppose that variance has two components: the within, and the between cluster components. Therefore, the explanatory power of given clusters can be described as:

$$\frac{\sum\_{j=1}^{k} \sum\_{j=1}^{N\_i} \left(X\_{i,j} - \overline{X}\right)^2 - \sum\_{i=1}^{k} \sum\_{j=1}^{N\_i} \left(X\_{i,j} - \overline{X}\_i\right)^2}{\sum\_{i,j=1}^{N\_i} \left(X\_{i,j} - \overline{X}\right)^2},\tag{12}$$

where *k* represents the number of clusters, *Ni* shows the size of clusters and *X*, *Xi* stands for the total and cluster wise average (Zhao 2012). The formula penalizes dispersions within clusters, hence dense clusters would give a number close to 1. Moreover, calculating the ratios with a different number of clusters highlights the optimal number of clusters.
