*3.1. Data*

The research sample was compiled on the basis data from the Czech Statistical O ffice, which states that, in the Czech Republic, there were 175,894 enterprises in the manufacturing industry in 2005 [59].

In February 2018 to May 2019, approximately 2500 randomly selected enterprises were asked through the managers to ensure that the 95% confidence requirement was met at 5% error margin and 15% questionnaire return. For equal representation of these 2500 enterprises, we used stratified random sampling based on the size (about 50% micro and small enterprises; 50% medium-sized enterprises) and technological intensity (about 50% of high-tech and low-tech enterprises) of enterprises. The composition of the sample corresponds to this distribution (Table 1).


**Table 1.** Characteristics of research sample for cluster analysis.

The actual rate of return of the questionnaires was 12.5%, and 314 questionnaires were completed (18 questionnaires were further excluded due to the existence of one year on the market and completeness of the survey). A total of 90 questionnaires were filled in by the large enterprises, not included in the cluster analysis for the purpose of the paper; however, they were used in comparison with the smalland medium-sized enterprises.

The sample of small- and medium-sized enterprises therefore comprises a total of 186 enterprises. The questionnaire focused on the main characteristics of Industry 4.0, and its questions were defined

in cooperation with the managers in the framework of quality research. The main areas of the questionnaire were 13 variables characterizing the various technologies of Industry 4.0 used by enterprises. Table 1, below, summarizes the characteristics of SMEs in terms of technological demands. Classification of the enterprises by size was based on the number of employees of the enterprise, as defined by the methodology of the European Commission [60].

#### *3.2. Industry 4.0 Index Methodology*

An Industry 4.0 index (VPi4) was used to determine the level of implementation of Industry 4.0 in the small- and medium-sized enterprises. The VPi4 index was developed based on a survey which asked managers of enterprises if they use di fferent Industry 4.0 technologies and processes. The index is based on exploratory factor analysis which identified 3 factors consisting of 13 variables (technologies) described in Reference [58]. These factors and variables were chosen for classification of small- and medium-sized enterprises into various groups, according to the level of implementation of Industry 4.0.

The answers from the questionnaire survey of a sample of small- and medium-sized enterprises were transformed into VPi4 variables, using factor score (as weights), and the levels of the VPi4 index were further calculated. The connection of the VPi4 and questionnaire is described in a previous study [58], which also explains the procedures for creating VPi4. The index consists of three follow-up levels (the factor scores of variables are reported in brackets):


As the factor analysis provides the factors that do not correlate with each other, three levels of VPi4 were appropriate for the cluster analysis. Principal Component Analysis (PCA) was performed to check the level independence, which is also part of the results of the paper. In general, its aim is to reduce high-dimensional data to a few so-called "principal" dimensions, to reveal structure in the data, and so to facilitate their interpretation (for more, see Reference [61]). Consequently, VPi4 levels are to be used for the division and categorization of the SMEs through cluster analysis. Specifically, factor scores of the enterprises, standardized by *z*-score, are used. The standardization process consists of transforming the variables such that they have mean zero and standard deviation one.

## *3.3. K-Means Clustering*

Cluster analysis employs a measure of similarity or dissimilarity for assigning points in space to a cluster [62]. K-means clustering is the most popular unsupervised clustering technique for partitioning a given dataset into a set of *k*-groups (clusters). It is an iterative optimization method based on the initial division of objects into k-clusters. The distribution is based on the determination of the k-centroids that form the center of the clusters. The distance of each object to the centroids is then examined based on the Euclidean distance. The object is then assigned to the nearest centroid. A new centroid is then calculated for each cluster as an *m*-dimensional vector of the average values of each variable. The cycle is then repeated by calculating distances and assigning objects to these clusters. The process is performed as long as the objects are moved between the clusters [63].

The purpose of cluster analysis was to discover a system for categorizing and grouping the enterprises based on correlation obtained by evaluating factor variables. The enterprises with high positive correlation were grouped and separated by negative correlators. Cluster analysis calculations

were performed by software R, using the following packages: ggplot2 [64], factoextra [65] and cluster [66], rgl [67], scatterplot3d [68], gridExtra [69], and dplyr [70].

K-means algorithm can be summarized as follows [71]:

1. Specify the number of clusters (*k*) based on the Elbow method (graphical form) and Average Silhouette method (Equation (1)). Silhouette coefficient (SC) finds the average distance to the best-fitting cluster, compared to the average distance between a data point, *<sup>x</sup>*<sup>∈</sup>*Ck*, and other points of *Ck*, for determining cluster system appropriateness [62]:

$$SC = \frac{1}{N} \sum\_{i=1}^{N} \frac{b(\mathbf{x}) - a(\mathbf{x})}{\max\{a(\mathbf{x}), b(\mathbf{x})\}} \,\mathrm{}\tag{1}$$

where *xi* is a data object belonging to the cluster *Ck*, *a(x)* is cohesion (average distance of *x* to all other vectors in the same cluster), and *b(x)* is separation (average distance of *x* to the vectors in other clusters). A value of +1 indicates a perfect clustering choice, and a value below 0 indicates a bad clustering choice. We try to find the minimum among the clusters.


$$\mathcal{W}(\mathbb{C}\_k) \;= \sum\_{\mathbf{x}\_i \in \mathbb{C}\_k} (\mathbf{x}\_i - \mu\_k)^2 \; \tag{2}$$

where *xi* is a data object belonging to the cluster *Ck*, and μ*k* is the mean value of the objects assigned to the cluster *Ck*.


$$\sum\_{k=1}^{k} \, ^k \mathcal{W}(\mathbb{C}\_k) \, = \prod\_{k=1}^{k} \sum\_{\mathbf{x}\_i \in \mathbb{C}\_k} (\mathbf{x}\_i - \mu\_k)^2,\tag{3}$$

The smaller the value W(Ck), the better the clustering (*Ck*, *xi*). Although, finding an optimal pair (*Ck*, *xi*) is quite a computationally intensive task; finding either optimal S or optimal c is fairly easy.

6. Clusters validation consists of measuring the goodness of clustering results, using one-way Analysis Of Variance (ANOVA test). The ANOVA F-test evaluates if there are any differences between group means of clusters. Further, the Tukey method of "Honest Significant Difference" is performed for multiple pairwise comparisons of the means of clusters in the analysis of variance (for more, see Reference [61]).

#### *3.4. Statistical Analysis and Hypotheses*

The research results were also tested by a statistical analysis. The aim of this analysis was to compare the results of the Industry 4.0 VPi4 index of the small- and medium-sized enterprises with those of the large enterprises. A comparison was performed at three VPi4 levels, including the overall index:

H10: The VPi4 index of small- and medium-sized enterprises and VPi4 index of large enterprises are identical populations.

H1 A: The VPi4 index of small- and medium-sized enterprises and VPi4 index of large enterprises are di fferent populations.

Furthermore, the dependence between subjective perception of Industry 4.0 level and VPi4 Index was investigated by using Pearson and Spearman correlation coe fficients. The index was expected to correlate, to some extent, with the subjective perception of the situation in the enterprise. The working hypotheses, verified at the 5% significance level, are as follows:

H20: There is no dependency between the perception of Industry 4.0 in enterprises and theVPi4 index of small- and medium-sized enterprises.

H2 A: There is dependency between the perception of Industry 4.0 in enterprises and the VPi4 index of small- and medium-sized enterprises.

Statistical evaluation of tests was performed by using Statistica 12 and R software.
