**3. Methods**

Data Envelopment Analysis (DEA) is a linear programming technique that allows for studying the relative efficiency of decision-making units (DMU). The development of the method was initiated by the publication of Charnes, Cooper, and Rhodes (1978) [145] which was based on previous work by Farrell (1957) [146] and his concept of the 'best practice frontier' determined by the most effective units in the analysed set of units. Since its development, the DEA has become one of the most popular nonparametric benchmarking methods for measuring efficiency. A constantly expanding bibliography of the DEA method confirms its usefulness in analysing the efficiency of facilities of any complexity from almost all sectors of the economy.

The DEA method considers efficiency as the ability to produce maximum outputs at a minimum cost. Inputs and outputs must be clearly specified for each *j* unit in a set (*j* = {1, . . . , *jO*,..., *n*}) as the vector of measurable attributes: *xj* = ..., *xij*,... , *i* = {1, . . . ,*r*} and *yj* = ..., *yrj*,... , *r* = {1, . . . ,*s*}. In this work, the variable return to scale super-efficiency DEA (SE-BCC) model by Andersen and Petersen [147] was employed:

$$\begin{array}{c} \max \phi j\_{\bullet \prime} \\ \sum\_{j=1}^{n} \lambda\_{j} \mathbf{x}\_{ij} \le \mathbf{x}\_{ij \bullet \prime}, i = 1, \dots, m, \ j \ne j\_{\bullet} \\ \sum\_{j=1}^{n} \lambda\_{j} y\_{rj} \ge \phi y\_{rj \bullet \prime}, r = 1, \dots, s, \ j \ne j\_{\bullet} \\ \sum\_{j=1}^{n} \lambda\_{j} = 1, \ \lambda\_{j} \ge 0. \end{array} \tag{1}$$

The efficiency score *φ* of unit *jO* is determined by finding a weighting vector *λ<sup>j</sup>* = ..., *λj*,... that solves the linear programming problem. Decision-making units which achieve *maxφ* ≥ 1 (*maxφ* ≥ 100%) are efficient.

DEA determines the efficiency but also allows indicating the benchmarks: units whose linear combinations of input and output vectors are the pattern to follow. Moreover, in order to divide units into groups, the concept of technological competitors can be used. The term technology in the DEA method is used in the sense of vectors of empirical inputs *xj* and outputs *yj*. Technological competitors in the DEA method should not be viewed as rivals for resources or outcomes, but rather as rivals for a position in the ranking. Technological competitors may be defined by solving the DEA model formulated with the exclusion of effective objects [148]. The idea is presented in Figure 1. In the standard DEA model, the frontier is formed by units A, B, and C, and they are considered as fully, 100% efficient. In the SE-DEA model, for example, to assess the efficiency of B, this unit is excluded from the constraints; the frontier consists of A and C, so that B achieves efficiency higher than 100%. Its competitors are A and C. Efficient units B and C are the benchmarks for E, but after their exclusion, D and F are the technological competitors. The concept of technological competitors allows the grouping of objects on the basis of similarities, not the target.

The main drawback of DEA is that the ability to classify units as efficient or nonefficient decreases together with the increase in the number of attributes. The preferred number of attributes should be 3–5 fewer than the number of units [149]. It may be said that determining the inputs and outputs is one of the most difficult and challenging stages in the efficiency analysis with DEA. The choice of the analysed attributes has a huge impact on the results, but there are no formal rules that would clearly define what should be inputs and outputs in DEA models. Their selection depends on the specificity of the decisionmaking units and their goals, data availability, and researchers' intuition, experience, and subjective choices. Some previous works suggest establishing a list of inputs and outputs by removing variables whose exclusion causes the least changes in the efficiency scores, removing variables strongly correlated with those left in the model (those that do not significantly affect the information measured by conditional variances and partial correlations), combining DEA with principal component analysis and replacing original

variables with principal components. Another approach is the Rough Sets concept of reductions to limit the number of attributes [150]. In this paper, factor analysis is applied. It is due to the fact that the correlation coefficients between variables are not very strong, whereas the principal components have negative values, which cannot be directly included in DEA.

**Figure 1.** Frontiers in standard and super efficiency DEA models and the concept of technology competitors.

Factor analysis is a method to study the structure of multivariate observations and identify relationship between variables. By assuming that certain groups of variables represent the variability of the latent factors, a large number of variables can be reduced to a smaller set. The factor analysis for the standardised observed variable *yr* (*r* = {1, . . . ,*s*} where *Fk*, *k* = {1, . . . , *K*} denotes the factor, *ark* factor loadings, *ε<sup>r</sup>* unique factors can be written as follows:

$$y\_r = a\_{r1}F\_1 + a\_{r2}F\_2 + \dots + a\_{rK}F\_K + \varepsilon\_r \tag{2}$$

In order to obtain the simplest interpretation of individual factors, the factor loadings matrix can be rotated. It is assumed that the variance of *yr* is the sum of common and unique variance:

$$Var(y\_r) = h\_r^2 + d\_{r'}^2 \text{ where } h\_r^2 = a\_{r1}^2 + a\_{r2}^2 + \dots + a\_{rK}^2 \tag{3}$$

Cluster analysis was also used to discover the state of transition to a CE. The aim of cluster analysis is to classify objects into groups (which are not defined a priori) based on the density or distance between objects. There are several types of clustering techniques. The K-means model using Euclidean distances was employed in the research. Mathematically, assuming *K* as the number of clusters, *n* as number objects, *yj* values of unit *j*, and *μ<sup>j</sup>* as the centroid of cluster *k*, the objective function is:

$$\min \sum\_{k=1}^{K} \sum\_{j=1}^{n} \left\| y\_j - \mu\_k \right\|^2 \tag{4}$$

Following the choice of the research methods and the set of CE-related indicators, a step-by-step research procedure was established. Figure 2 presents a flow chart of the conducted study.

**Figure 2.** Research process.
