**3. Results of Power Quality Assessment Using Cluster Analysis and Global Power Quality Indices**

The idea of combined analysis using CA and GPQIs is presented in Figure 1. In the first step, the clustering is applied to achieve a classification of the power quality data into clusters representing different features. The outcomes of the CA depend on the construction of the PQ database, that is the set of PQ parameters under consideration, as well as the standardization of the formula. The mentioned issues and their impact on the results of the CA were already investigated and presented in [9]. A novelty of this work is the implementation of GPQIs for the group of PQ data identified by CA. We propose using the levels of GPQIs that characterize particular clusters for the comparative analysis.

As was already mentioned, some results of the cluster analysis were described in [9]. However, selected information about the investigated electrical power network is repeated for clarity and to help in understanding the presented application of the global power quality indices. Note that the input PQ data that create the database are the four-week multipoint power quality measurements obtained from a 6 kV power network supplying the mining industry [51]. The points of measurement include a secondary side of 110 kV/6 kV transformers (denoted as "T1", "T2", "T3"), and a 6 kV outcoming feeder supplying a welding machine (denoted as "WM") [9]. Inside the network, distributed generation units are installed (denoted as "DG"), represented by combined heat and power plants (CHP) with gas-steam turbines, denoted as "G1", "G2", and "G3", respectively. The analyzed EPN of the mining industry and placement of the measurement points are presented in Figure 2.

**Figure 1.** General scheme of the algorithm that performs the cluster analysis and global power quality (PQ) calculation.

The proposed method was implemented for the real measurements collected from four measurement points: three transformers T1, T2, T3 which supplied the medium voltage (MV) industrial network and a significant load (i.e., the welding machine—WM). The changes in the power demand of the investigated measurements points T1, T2, T3, and WM during the selected four weeks of observation are presented in Figure 3a. The investigation was aimed to evaluate the influence of the DGs installed inside the observed industrial network, and so Figure 3b presents changes in active power generation of particular DG units denoted as G1, G2, and G3. Generator G1 was permanently switched off during the experiment. G2 and G3 switched off, as can be seen in Figure 3b, due to a planned maintenance break. Additionally, it can be seen that during the experiment, only G2 (connected to the transformer T3 which also supplies the welding machine WM) and G3 (connected to transformer T2) were operating. The power variations of the DG is additional information, representing conditions. The data from the DG do not form the database of measurements taken for the investigation. An analysis of voltage events in the PQ measurements was conducted. Indicated events were voltage dips, rapid voltage changes, swells, and interruptions. Detailed information about the events and number of flagged data is included in Table 1 [52]. In accordance with the flagging concept introduced in the standard [7], the aggregated 10-min data that contained such voltage events were excluded from the power quality analysis. Based on the research presented in [9], it was shown that the best results of the CA with regards to the identification of different PQ conditions caused by the impact of the DGs could be achieved for the PQ databases denoted as C and CS, where database C is constructed of frequency variation (*f*), voltage variation (*U*), short-term flicker severity (*P*st), asymmetry (*ku*2), total harmonic distortion in voltage (*THDu*), and active power level (*P*). Database CS is the standardized version of database C, obtained by dividing the particular time series by their maximum values to achieve expression of the data in the range 0–1. Thus, for the investigation presented in this paper, database C and its standardized version Cs were taken for consideration.

**Figure 2.** Analyzed mining industry electrical power network (EPN) and placement of distributed generation and PQ recorders [9]. T1—transformer 1; T2—transformer 2; T3—transformer 3; T4—transformer 4; G1—generator 1; G2—generator 2; G3—generator 3; WM—welding machine.

**Figure 3.** The analysis of active power level in electrical power network of mining industry: (**a**) Active power of the observed point of the measurements in the investigated network including the high voltage/medium voltage (HV/MV) transformers T1, T2, T3 and the connection point of the welding machine WM; (**b**) Active power of the distributed generators (DGs) during the investigated period of observation (G2 is connected to transformer T3, G3 is connected to transformer T2).


**Table 1.** Indication of events and number of flagged data during the measurements.

*3.1. Cluster Analysis—Identification of the Power Quality Data Representing Di*ff*erent PQ Conditions Due to the Impact of DG*

In [9], different results of the clustering were presented using different numbers of clusters (2, 3, 5, 20). It was shown that increasing the number of clusters enabled the identification of data not only related to the impact of the DG (i.e., when the DG was active or switched off), but also for the extraction of data associated with other working conditions (i.e., working day or non-working day, time of the network reconfiguration). This article aims to highlight the influence of distributed generation on power quality in the industry network. Thus, referring to the achievements presented in [9], in this work the scope of the CA was limited to the aim of classifying the data into three groups: cluster 1—DG was active; cluster 2—DG was switched off; cluster 3—other conditions. After the experiences described in [9], we decided to use the K-means algorithm with Euclidean distance.

In order to visualize the association of the obtained clusters with the distributed generation work information, Figure 4, which presents the clustering results, is supported by additional, artificial clusters indicated as cluster −1 and cluster 0, which were created on the basis of external information collected by the control and monitoring systems of particular DGs, as well as the output of the PQ monitoring systems considering the flagged data. Cluster −1 denotes the time series when the DG was active. This approach enables the easy comparison of the CA outcomes with regards to the identification of the working condition of the DGs. As was previously indicated, the databases are comprised only of unflagged data. Cluster 0 concerns flagged data that must be excluded from the main cluster analysis. The main clusters that are the outcomes of the CA analysis are cluster 1, which represents data when the DG was working, and cluster 2, which expresses the time period when the DG was switched off. Comparing the outcomes of the applied clustering with an artificial informative cluster denoted as −1 allows for the conclusion that the applied technique provides an appropriate output for connection of the clusters to different working states of the DG time period. Figure 4 presents the outcomes of the clustering with Euclidean distance when the initial number of clusters is 3. Referring to the information coming from external network dispatcher systems, it was confirmed that the time period indicated by cluster 3 was related to the reconfiguration of the network topology. In this case, increasing the number of clusters ensures the determination of a more sensitive classification of the collected PQ data when a specific working condition of the EPN is indicated. These and other issues concerning the initial number of clusters and the construction of the database were studied in [9]. However, it is important to note that the clustering is the first step in the multipoint long-term measurement analysis, which ensures a classification of the data into groups that are matched with the specific condition of the observed network. It finally leads to the possibility of the qualitative assessment of the data collected into clusters, as well as comparative analysis between the clusters. For this purpose, this paper proposes the use of global power quality indices.

**Figure 4.** Results of power quality data clustering using cluster analysis (CA) with K-means and Euclidean distance and three initial clusters. C1—the distributed generation (DG) was working; C2—the DG was switched off; C3—DG was switched off and with a different network topology configuration.

#### *3.2. Qualitative Assessment of the Determined Clusters Based on the Proposed Global Power Quality Indices*

As was described in Section 2, the proposed aggregated data index (*ADI*) uses five components based on 10-min aggregated data, and two other components based on 200-ms data. The acceptance levels for the *ADI* components, with regards to aggregated power quality parameters, are presented in Table 2. The values correspond to the demands included in the standard [6].

**Table 2.** The acceptance level of the components of the *ADI* related to 10-min aggregated power quality parameters in reference to [6].


In the presented results, each importance rate *k*<sup>1</sup> ÷ *k*<sup>7</sup> (weighted factors) of the seven parameters comprising the *ADI* were the same and equal to 1/7. This means that the importance of all the parameters was treated equally. The 10-min step *ADI* variation for particular measurement points (T1, T2, T3, WM) in relation to the determined clusters of the PQ data is presented in Figure 5. In order to link the *ADI* variation with the output of the CA analysis, that is time periods which refer to particle clusters, colored backgrounds for particular clusters were inserted in Figure 5.

**Figure 5.** *ADI* variation for particular measurement points (T1, T2, T3, and WM) with relation to determined clusters of the PQ data.

The lack of a background color means that the data were flagged. It can be noticed that changeability of the *ADI* for different working conditions (represented by clusters) is observable but very faint. Thus, the results of the power quality assessment using the proposed technique combining the CA global power quality indices means that *ADI* can be summarized by statistics of the *ADI* variation for particular clusters and measurement points. The results are collected in Table 3.

**Table 3.** Results of the assessment of the power quality using the proposed global power quality indices *ADI* and *FDI* for the particular measurement points and with relation to clusters 1–3 when full definition of the *ADI* index is implemented.


Comparative analysis of the *ADI* levels allows the formulation of the following remarks:


To sum up, using the proposed cluster analysis and the proposed global power quality index, the *ADI* can be a suitable tool for the identification and comparative assessment of different conditions of the observed network. We revealed that for the observed transformers T2, T3, and the connection point of welding machine WM, the power quality was better in cluster 1 when the DG was active. The different outcomes of the *ADI* level formulated for transformer T1 could be caused by the fact that there was no DG directly connected to T1. The highest values of *ADI* were identified in the feeder supplying the welding machine.

The next global power quality index proposed in this work is the flagged data index (*FDI*), which is related to the number of aggregated data affected by the events in reference to the periods identified by clusters. Comparative analysis of the *FDI* levels is presented in Table 3. It allows for the formulation of a general remark that the *FDI* level was noticeable for cluster 2 and cluster 3. The high values for cluster 2 and cluster 3 are probably connected with the events caused by changes in the electrical power network topology.

Additionally, correlation calculations between each factor to the *ADI* value were realized for each point separately. Pearson correlation was used and the description of the coefficient was defined as [10]:


The correlation between factors and *ADI* are presented in Table 4. The generally noticeable correlations in each measurement point were indicated for *P*st (*W*3), *THDu* (*W*5), *U*env (*W*6), and *THDU*max (*W*7).

**Table 4.** Results of correlation analysis between each factor and the global index.


*3.3. Influence of the Factors Comprising the Proposed Global Power Quality Indices on the Sensitivity of the Assessment*

The construction of a global power quality index, *ADI*, understood as a weighted sum of component factors related to power quality parameters, inclines us to discuss the impact of individual factors on the assessment results. It is possible to select weighting coefficients in a way that favors the selected parameters in the assessment and moves the center of gravity of the global assessment in the direction of the favorite parameters. The opposite direction is to enhance the sensitivity of the assessment by including additional parameters in the definition of global indices. This work proposes the construction of a global index using five basic 10-min parameters of power quality (frequency, root mean square (RMS) voltage, asymmetry, voltage fluctuations, and total harmonic distortion in voltage) and to extend the definition with two other parameters which are close to 200-ms values (i.e., the envelope of voltage changes and the maximum value of the total harmonic distortion in voltage identified during 10-min aggregation intervals). The aim of extending the *ADI* definition with parameters related to 200-ms intervals is to enhance the sensitivity of the obtained global index. In order to investigate the impact of the proposed 200-ms parameters on the sensitivity of the assessment, a differential approach is proposed. The *ADI* values for particular clusters and points of measurements when the full definition is involved are presented in Table 3. The results represent a scenario where all seven factors with the same weighting factors equal to *k*<sup>1</sup> ÷ *k*<sup>7</sup> = 1/7 were applied in the *ADI* calculation. Application of the full definition of *ADI* allowed us to conclude that for the observed transformers T2, T3, and the connection point of the welding machine WM, the power quality was better in cluster 1 when the DG was active. The obtained *ADI* values were generally smaller in cluster 1 than in cluster 2, and the differences of *ADI* between clusters 1 and 2 were consistently mostly negative. In order to perform a differential comparison between the *ADI* obtained using the full definition and the *ADI* based on a reduced definition, new values of the *ADI* were calculated where the parameters related to 200-ms values were neglected (i.e., when weighting factors were equal to *k*<sup>1</sup> ÷ *k*<sup>5</sup> = 1/5, *k*<sup>6</sup> = 0 and *k*<sup>7</sup> = 0, respectively). The obtained values of the *ADI* calculated without the 200-ms parameters are presented in Table 5.


**Table 5.** Results of the power quality assessment using the proposed global power quality index *ADI* for the selected measurement points, with relation to the revealed clusters when 200-ms parameters are neglected in the *ADI* definition.

Instead of calculating the direct differences between the *ADI* values obtained for both scenarios (which actually differ very slightly), we propose a comparison between interpretations of the results. In other words, the sensitivity analysis was redirected to formulate the question of whether neglecting the 200-ms parameters in the *ADI* definition would change the interpretation of the assessment. Changes in the interpretation of the results can be identified if the signs of the difference of the *ADI*s applied for full and reduced definitions are different. For example, we found that the *ADI* obtained using the full definition decreased when the DG is active (C1—cluster 1) and increased when the DG was switched off (C2—cluster 2). The difference of the *ADI*s between C1 and C2 was negative because the values of the *ADI* in C2 predominated. If a reduction of the *ADI* parameters has an influence on the sign of the differences between the clusters, it means that the interpretation is not coherent and is dependent on the *ADI* construction. Table 6 contains information about the assessment results between the *ADI* with 200-ms factors (*k*<sup>1</sup> ÷ *k*<sup>7</sup> = 1/7) and without 200-ms factors (*k*<sup>1</sup> ÷ *k*<sup>5</sup> = 1/5, *k*<sup>6</sup> = 0, and *k*<sup>7</sup> = 0). Additionally, an interpretative logical comparative assessment index is introduced in the table. A value equal to 1 means that the assessment and interpretation of the results are the same for the full and reduced definitions of the *ADI*. A value equal to −1 means that the interpretations using full and reduced definitions of the *ADI* are not coherent.


**Table 6.** Assessment of the influence of removing two parameters related to 200-ms intervals in the *ADI* definition on the power quality assessment. The interpretationdifferences

Based on the analysis presented in Table 6, it can be generally concluded that among the 36 assessments of the clusters, 3 differ in terms of the interpretation after a reduction of the *ADI* definition. In other words, a reduction of the *ADI* components introduced an 8% difference in the assessment. Alternatively, this means that including the parameters associated with the 200-ms values in the *ADI* definition enhances the sensitivity of the assessment.

To be more precise, the comparison of the differences of the *ADI* values constructed on seven and five parameters addressed to particular clusters were seen to deliver additional observations. For clusters 1 and 2, it can be concluded that interpretation results based on the *ADI*s were not sensitive to a reduction of *ADI* components, and the interpretation results were the same. This is due to the substantial differences between the power quality condition in clusters 1 and 2, which are reflected in the *ADI* values. However, when comparing clusters containing similar data, the reduction of the *ADI* components may cause differences in the assessment. An example of this can be seen with the data associated with transformer T3 in cluster 2 (DG switched off) and cluster 3 (DG switched off and with network reconfiguration). In this case, there was a significant impact of the DG; the power quality conditions were similar, and the reduction of *ADI* components brought differences in the interpretation in Table 6. This is denoted by the logical value −1. Another example can be seen in the differences between cluster 1 and cluster 3 in the case of transformer T2. The network configuration result was more loaded in transformer T1 and less loaded in transformer T2. In terms of transformer T2, it was generally a similar condition as for the impact of DG when the reduction of the load demand was also achieved. In this case, the power quality condition was similar for both clusters 1 and 3, and the reduction of the *ADI* components introduced uncertainty to the assessment.

It can be concluded that the reduction of the parameters comprising the synthetic *ADI* index influences the sensitivity of the assessment. In the case of the presented investigation, this inherent relation was more significant when the differences between the power conditions in the compared clusters were insignificant.

#### **4. Discussion**

This work presents the possibility of connecting CA and GPQIs. As indicated by the authors in a previous work [9] PQ measurements are an appropriate input to cluster analysis. Note that the aim of CA is to divide data based on its features. The proposed method was implemented for the real measurements collected from four measurement points in an industry network: three transformers T1, T2, T3 which supplied the MV industrial network, and a significant load (a welding machine, WM). The investigation aimed to evaluate the influence of the DGs installed inside the observed industrial network. However the power variations of the DGs are additional information, representing conditions. The data from the DGs do not create the database of measurements taken for the investigations. Naturally the same classification can be obtained using time identification representing different conditions of the DGs, but the point of the method is to obtain automatic classification of the PQ data based on its features, and then to find the reasons explaining the automatic classification. The presented approach has a crucial meaning when the number of monitored points is increased.

The input database consists of many different parameters, leading to a multielement assessment. Thus, in this work we proposed the use of global indices to simplify the process. The proposed indices consist of power quality parameters that represent frequency, voltage, flicker asymmetry factor, and harmonics in voltage. To classical 10-min aggregated data, we proposed adding the extremum values of voltage and harmonics. Thus, we conducted an analysis of the impact of extending the global indices to such values. Results indicated that our synthetic *ADI* index influenced the sensitivity of the assessment. In the case of the present investigation, this inherent relation was more significant when the differences between the power conditions in the compared clusters were insignificant. The composition of *ADI* index is based on classical 10-min PQ parameters as well as 200-ms parameters. Weighting factors were implemented for particular parameters. In order to reveal the influence of the DGs, all weighting factors were set to one in order to obtain maximum sensitivity of the analysis on every PQ

parameter collected in *ADI*. However, the weighted factors make it possible to focus the analysis more on particular PQ parameters and neglect others (i.e., to obtain an analysis more sensitive for selected PQ phenomena controlled by using different values of the weighting factor).

The proposed combination of CA and GPQIs was indicated as a suitable tool for the identification and comparative assessment of different conditions of the observed mining industry network. Among other things it was revealed that for the observed transformers T2, T3, and the connection point of welding machine WM, the power quality was better in cluster 1 when the DG was active. The different outcomes of the *ADI* level for transformer T1 could be caused by the fact that there was no DG directly connected to T1. The highest values of *ADI* were identified in the feeder supplying the welding machine, which is a high variable load. It can be concluded that obtained method is also technically reasonable.

We also proposed the flagged data index (*FDI*), which is related to the number of aggregated data affected by the events. It was used to compare clusters. Results concerning the use of the proposed global power quality index dedicated to voltage events (*FDI*) showed that the *FDI* was higher in cluster 2 than in cluster 1, which can be attributed to the fact that in the period of time when DG was active (cluster C1) there was relatively fewer detected voltage events than in the period when the DG was switched off (cluster 2). The sense of the *FDI* is general. Detailed analysis of particular voltage events requires separate investigations.
