3.3.4. Reduction of the Input Database Size—Case Study

The natural question is, "is it possible to reduce or change the structure of the input database without losing the most important information". The first idea is just to exclude some parameters. However, the proposed, complete database includes, all-important points of the classical PQ parameters. Thus, excluding any of them would not seem to be adequate from the technical point of view.

In this research, the objects are represented by similar phase-to-phase values. Thus, the analysis of only one "new-multiphase" value was conducted. Moreover, the way of conducting this may be different. The minimal, maximal, mean, or median value from three phase-to-phase values may be selected. However, in this research, the authors decided to use the mean value. Thus, for each 10 min data of:


the mean value from all three phase-to-phase values was calculated.

After such a reduction—from 16 input parameters (complete database) for each measurement point to six input parameters (reduced database)—clustering was conducted. The result of the obtained cluster using the six-parameter database, in comparison to 14-parameter clustering, is presented in Table 3. Generally, the results of this reduction in terms of indicating the same working condition for more than two clusters are positive. The obtained classification has the same result for at least 94.9% of data. The only negative classification was obtained for two clusters. The averaged data during the division to two clusters was not sensitive for DG impact.


**Table 3.** Comparison of clustering results for the completed database to the reduced one.

\* no impact of DG is observable, only the maintenance is noticeable.

Additionally, the predictor importance for six clusters was defined. Figure 11 presents the importance rate for both classifications—(a) reduced input database, (b) complete input database. Generally, regarding the 0.7 importance rate level (noticeable importance rate), the same parameters were indicated:


The only excluded parameter is the short-term flicker severity for transformer T2. However, the importance rate is close to 0.7.

To summarize, the size of the database has been reduced from 14 parameters to six parameters, and the obtained results are generally similar.

**Figure 11.** Importance rate for six clusters for (**a**) reduced input database; (**b**) complete input database.
