**3. Methods**

A wide group of methods that are used in the problems of multivariate comparative analyses are taxonomic methods, also commonly found under the term "cluster analysis" [61,62]. They deal with the rules and procedures for the classification of various types of objects. Taxonomies have many applications that span a variety of fields. For example, in economics and finance, taxonomies are used to group countries based on sets of development indicators or to recognize the level of regions development [63,64]. The most general division allows taxonomic methods to be converted to hierarchical methods (agglomeration and division) and grouping by the k-means method, where objects are assigned to k clusters and the number of clusters is determined by the researcher [65]. Among the agglomeration algorithms, Ward's method is widely used [66–68].

The research method applied in this paper is cluster analysis. Cluster analysis is a method used in multivariate comparative analyzes that breaks down a large group of objects into relatively homogeneous groups called clusters. In general, cluster analysis is used to classify n objects, while these objects are described with k statistical features. In the analysis of clusters, the similarity or dissimilarity of objects is taken into account, and on this basis groups of objects (clusters) that are mutually exclusive are distinguished. The objects assigned to each cluster are similar to each other in terms of the values of all k variables.

The article uses the method of Ward [69], because it is the most frequently used method in economic research [61,70,71]. In Ward's method, the sum of squares within groups is minimized, where in the first stage of grouping, each object forms an independent cluster. In the next steps, the standalone clusters are grouped into superior clusters based on the selected distance measure. In the last step, all statistical objects are combined into one cluster [61].

The use of cluster analysis has allowed us to separate homogeneous subsets of population objects, which are new member states countries, based on variables describing the examined countries, i.e., the value of energy produced from RES commonly used in Eurostat analyses. The main idea behind cluster analysis is to group objects (countries) in such a way that the objects included in the same group are characterized by a significant similarity and at the same time they differ from objects from other groups as much as

possible. To do so, the Euclidean distance was used as a measure of distance, which is given by:

$$d(\mathbf{x}, y) = \sqrt{\sum\_{i=1}^{p} (\mathbf{x}\_i - y\_i)^2},\tag{1}$$

where *x* = *<sup>x</sup>*1,..., *xp* and *y* = *<sup>y</sup>*1,..., *yp*, and in this case *p* = 8, which is the number of variables that characterise a country. The greater the distance between two countries, the more diverse they are. As a result, a cluster includes countries close to each other and far away from others that form separate clusters.

Before the determination of distance matrices, the variables were standardized using the formula:

$$z\_i = \frac{\mathbf{x}\_i - \overline{\mathbf{x}}}{\mathbf{s}\_x},\tag{2}$$

where *x* and *sx* refer to the mean and standard deviation of the sample.

The agglomerative hierarchical clustering algorithm was applied in the first step of analysis. The agglomeration method was Ward's method, which is used to minimize the sum of the squares of within-cluster variance. This resulted in a graphical illustration of the agglomeration pattern in the form of a diagram referred to as a dendrogram and the suggestion of the number of clusters to which the countries are to be assigned. In the second step of the analysis k-means non-hierarchical clustering was used. The optimal number of clusters was determined with the use of the Silhouette index [72–74]:

$$S(u) = \frac{1}{n} \sum\_{i=1}^{n} \frac{b(i) - a(i)}{\max\{a(i), b(i)\}}, \quad S(u) \in [-1, 1], \tag{3}$$

where *u* is the number of classes, *n* is the object (country) number, *a*(*i*) is the mean distance of the country with index *i* from other countries belonging to class number *r*, *r* = 1, ... , *u*. *b*(*i*) is the mean distance of the country with index *i* from other countries belonging to class number *s*, *s* = 1, ... , *u*. The criterion based on the Silhouette index indicates the selection of the number of classes *u*, for which index *<sup>S</sup>*(*u*) takes the maximum value.

The results obtained on the basis of Ward's method are most often presented in the form of a dendrogram. At the top of the dendrogram, all objects form one shared cluster. Moving to lower levels, successive clusters with a smaller number of objects are distinguished, where at the lowest level all objects form separate clusters [71,75,76].
