*2.2. Methodology*

For data sets that contain different features, clustering can be used to compare them [84]. A taxonomic analysis was used to designate groups of countries similar in terms of the capacity of the electrical infrastructure and share in the GDP [85] using Ward's clustering, which is a hierarchical method [86]. This method, which is an agglomerative clustering method, is one of the best, through which homogeneous aggregates can be obtained.

In this method, at the beginning, it is assumed that each observation vector is a separate cluster. Then, between all pairs of vectors in Equations (1) and (2), using the squared Euclidean distance (SED) in Equation (3), a distance matrix is determined, using Equation (4), which describes their similarity.

$$a = [a\_1, \dots, a\_i] \tag{1}$$

$$b = [b\_1, \dots, b\_i] \tag{2}$$

where *a* and *b* are the observation vector

$$d(a,b) = \sqrt{\sum\_{i=1}^{p} \left(a\_i - b\_i\right)^2} \tag{3}$$

where *p* denotes the number of variables (vector length).

$$d(a,b) = \begin{bmatrix} 0 & d\_{12} & \cdots & d\_{1n} \\ d\_{21} & 0 & \cdots & d\_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ d\_{n1} & d\_{n2} & \cdots & 0 \end{bmatrix} \tag{4}$$

where *dij* is the distance between the *i*th and the *j*th observation

The above distance matrix is based on physical space. This is reminiscent of the topological distance matrix based on network structures [90,91]. Clusters (groups) are created by applying one of several available grouping methods on the distance matrix [92–94]. In Ward's method, the distance between clusters is estimated by an analysis of variance. It is assumed that each cluster is represented by a centroid, as shown in Figure 3.

**Figure 3.** Centroids of clusters.

At each stage of the agglomeration hierarchical grouping process into a new cluster, the two most similar clusters are combined, e.g., A and B (Figure 4), for which there is the smallest increase in the sum of the squared error (SSE):

$$d(A, B) = SSE\_{A \cup B} - (SSE\_A + SSE\_B) \tag{5}$$

$$SSE\_{\text{A\cup B}} = \sum\_{i=1}^{n\_{AB}} (y\_i - \overline{y\_{AB}})'(y\_i - \overline{y\_{AB}}) \tag{6}$$

$$SSE\_{\mathcal{A}} = \sum\_{i=1}^{n\_{\mathcal{A}}} (a\_i - \overline{a})'(a\_i - \overline{a}) \tag{7}$$

$$SSE\_{\mathcal{B}} = \sum\_{i=1}^{n\_{\mathcal{B}}} (b\_i - \overline{b})' (b\_i - \overline{b}) \tag{8}$$

where *ai* represents the *i*th observation vector in cluster A, *a* is the centroid of cluster A, *bi* represents the *i*th observation vector in cluster B, *b* the centroid of cluster B, *yi* represents the *i*th observation vector in cluster AB, and *yAB* the centroid of newly formed cluster AB.

**Figure 4.** Hierarchical clustering.

The minimize function using the Ward minimal variance method can also be written as:

$$d(A,B) = \frac{n\_A n\_B}{n\_A + n\_B} (\overline{a} - \overline{b})'(\overline{a} - \overline{b}) \tag{9}$$

where *a* and *b* represent the centroids of clusters A and B, respectively.

The process of determining the distance between clusters and joining them ends when all clusters are combined into one large cluster, e.g., ABCDEF in Figure 4.

Variables analyzed with the use of the Ward's method should have a coefficient of variation greater than 10% and should not be very strongly correlated. However, leaving variables out that do not meet these criteria for an analysis is allowed if these variables are significant from the point of view of the studied phenomenon.
