*3.4. Discussion*

The data mining technique presented in the article is the cluster analysis. The Ward algorithm was selected as an example of the hierarchical approach. During the data preparation stage, it was necessary to uniform the data aggregation time (selection of Plt to Pst), as well as to standardize the parameter values. For the prepared data set containing both the PQ parameters and the active power level, cluster analysis was conducted.

As a result of the cluster analysis, a dendrogram was obtained, which was illegible for the initial stages of agglomeration due to a large amount of input data. This is an unquestionable disadvantage of the hierarchical approach, but it is worth noting that it provides a division of data regardless of the final number of the obtained clusters. Additionally, on the dendrogram, there is a simple possibility of selecting the final number of clusters using methods indicated in the literature, e.g., Aggarwal [79].

Another important element of the article was to indicate the conditions that influenced the data division. On the basis of knowledge about the object, the conditions of distributed generation working, reconfiguration, and maintenance breaks were known. However, the obtained classification indicated that, in terms of the PQ level, the relevant condition was not known. It is worth highlighting the fact that the Ward algorithm is sensitive to the impact of the distributed generation on the technical conditions of the electrical power network, which confirms that the research aim was specified correctly.

The next element of the article was the analyses of the parameters that have a higher impact on the data classification. The obtained results indicated the importance of an active power level, as well as the harmonic level and flicker. The voltage variations, voltage, and frequency levels had a small impact on the classification.

Then, after obtaining the importance ranking, a comparison of the clusters in terms of the selected PQ parameters was carried out. The obtained results presented the impact of DG on the EPN. The impact of DG was indicated as positive regarding PQ. The unknown working condition was described as a time with high total harmonic distortion at the voltage level. Thus, the analysis of only this selected period of time may help to decrease the problem with harmonic pollution.

The last part of the research concerned the possibility of reducing the input database without losing the information obtained from the clustering. The authors proposed reducing the three phase-to-phase values to one mean value. Then, the comparison of the reduced input database to the completed one was conducted. The obtained classifications were similar. Around 95% of data was connected to the same clusters for both input databases and classification to more than two groups. The presented approach decreased the size of the input database by 57% (from fourteen to six parameters) without losing any data features.

The presented in-article object represents a symmetrical network, although, the method may be realized successfully for highly asymmetrical grids. Thus, if any of the phase-to-phase value was changed, the mean value of all parameters also changed. The CA is sensitive for the differences so this situation would also be indicated. The only disadvantage of this method is that there would be no information on which phase caused this situation, thus the analysis of raw data, but for the indicated period of time, is desirable.
