*3.1. Cluster Analysis—Ward Algorithm*

Generally, the definition of data mining in the literature concerns the achievement of knowledge from big databases. Possible algorithms and techniques are well-known and described in the literature. Examples of data mining techniques are [68–72]:


*Energies* **2020**, *13*, 2407


One of the described techniques is cluster analysis, also known as clustering [73]. The main aim of cluster analysis is to achieve homogeneous groups (clusters) of data as defined by Witten et al. and Wu et al. in [74,75]. The homogeneous aspect of the group is defined by the similarity or dissimilarity level of the data in the same cluster. There are a lot of data similarity/dissimilarity conditions that can be selected. However, due to the grouping process approach, two basic methods of dividing are known:


In this article, the hierarchical method is presented. Hierarchical approaches are agglomeration or divisive techniques. This article presents the agglomerative approach. Agglomerative techniques represent a set of observations in which each piece of data is treated as a separate cluster at the beginning. Then, the data are aggregated into a smaller number of clusters until one single cluster is established, which represents all the data [73]. The possible methods for connecting data into clusters are [73,76]:


The hierarchical method is selected because the agglomerative sequence is presented on a dendrogram. It is, therefore, possible to analyze if the connection is better realized by single data or by a group of similar data (achieved in the previous agglomeration) to get a final classification. The authors selected the Ward algorithm due to its features. Clustering is carried out in order to connect data concentrated in an average value until the data has a similar value (range). The hierarchical cluster analysis algorithm using the Ward method of minimal variance is presented in Figure 1.

In this paper, the hierarchical Ward method and non-hierarchical method based on the K-mean algorithm are proposed for the power quality data analysis. The indicated "finding pair of clusters which have the smallest sum of squares distance between the object and the cluster center to which this object belongs", is calculated as presented in Equation (1) [77].

$$\mathbf{D\_{pr}} = \frac{\mathbf{n\_{p}} + \mathbf{n\_{r}}}{\mathbf{n\_{P}} + \mathbf{n\_{q}} + \mathbf{n\_{r}}} \ast \mathbf{d\_{pr}} + \frac{\mathbf{n\_{q}} + \mathbf{n\_{r}}}{\mathbf{n\_{P}} + \mathbf{n\_{q}} + \mathbf{n\_{r}}} \ast \mathbf{d\_{qr}} + \frac{-\mathbf{n\_{r}}}{\mathbf{n\_{P}} + \mathbf{n\_{q}} + \mathbf{n\_{r}}} \ast \mathbf{d\_{pq}} \tag{1}$$

where:

Dpr—distance of the new cluster to cluster of number "r",

r—proceed numbers of cluster from "p" to "q",

dpr—distance of primary cluster "p" from cluster "r",

dqr—distance of primary cluster "q" from cluster "r", dpq—common distance of primary clusters "p" and "q", n—number of single objects in each object.

**Figure 1.** Cluster analysis using the Ward method of minimum variance [73,78].

Additionally, the advantage of the Ward algorithm is that it can be stopped at any moment; it can also achieve a classification represented by the excepted number of clusters. Thus, the final number of clusters should be selected in accordance with the aim of the classification. In order to support the final number of clusters, a lot of approaches have been conducted in literature. The most known are [79]:

