1. Introduction
In present times, the demographic growth has brought a noticeable increase in the power energy consumption around the globe, which causes its overproduction for the purpose of covering the final user demands [
1]. Regarding this situation, the conventional ways to produce power energy, by taking fossil fuels and converting them to electric energy, has incited negative environmental effects boosting climate change [
2]. For that reason, several countries have decided to improve the production of energy through alternative ways better known as renewable generation systems (RGS), considering mainly wind power generation (WPG) [
3] and solar photovoltaic generation (SPG) [
4]. However, an important aspect to be considered in both cases, conventional and renewable generation systems, is the quality of the energy produced, since it will be used for feeding the final loads such as electric machines, power devices, and electronic devices [
5]. Good power quality is essential because proper operation, functionality, and lifespan of the equipment connected to the electric grid will strongly depend on it [
6]. Therefore, the analysis of the power quality is essential, and there exist different methodologies that have addressed this topic, such as heuristic techniques [
7], model-based techniques [
8], data-driven techniques [
9], and deep-learning techniques [
10]. However, the characteristics of the evolving and emerging technologies add complexity to the power quality analysis, since they are not only affected by the problems in the grid but also generate some of these problems, i.e., the development of new strategies becomes necessary to address these problems [
11]. Recently, other methodologies such as the approaches for novelty detection (ND) have proven their effectiveness in industrial applications not related to the power quality study, and, in this sense, it would be interesting to demonstrate how they could be used to provide some solutions to this issue.
In general, the ND encompasses several techniques that can be classified in the following general categorizations [
12,
13]:
Probabilistic-based. Here, the data distribution can be thresholded, which in turn can be useful for defining in the data space limits or boundaries of normality. With this information, it is possible to determine if a single sample belongs to the same distribution or not. For such a task, it would be necessary to estimate probability density functions, for example, Gaussian mixture model (GMM) [
14].
Distance-based. These approaches make use of well-defined distance metrics that help defining the similarity between two samples (data points) from a data set, for instance, k-nearest neighbor (kNN) [
15] and isolation forest (IF) [
16].
Domain-based. These approaches need the generation or definition of boundaries according to the form of the training data set. It means that they describe the domain of target classes being insensible to the sampling and density, such as one-class support vector machines (OC-SVM) [
17].
Reconstruction-based. These methodologies are widely used in applications that require classification tasks and regression purposes. Normally, these techniques perform data modeling automatically without supervision, and the estimation or prediction is used to obtain a performance metric. That metric is the difference between a test vector and an output vector, better known as the reconstruction error. Thus, this metric is related as a novelty score. For example, there are stacked auto-encoders (SAE) [
18] and self-organizing maps (SOM) [
19].
Theoretic information-based. For these approaches, the entire data set is used for computing specific metrics, such as entropies, energies, forms, and moments. With this information, the novelty means alterations in these values compared with normal data sets, such as time–frequency ridge estimation (TFRE) [
20,
21], degree of cyclostationarity (DCS) demodulation [
22], and entropy measure-based methods (EMBM) [
23].
There exist several approaches that belong to the aforementioned categorization. However, only one or two techniques, presented as examples, will be taken for the analysis. Therefore, the adopted techniques can be considered a good representation of each category according to the reviews reported in the literature [
12,
13]. Additionally, according to the analysis of these reviews, the selected techniques are most frequently used, because their implementation is relatively easy and information of different applications is readily available. Therefore,
Table 1 presents a brief analysis and discussion of such techniques used in different applications.
Now, as mentioned, it is very important to monitor the power sources that feed the industrial equipment, because in conventional networks and in renewable generation systems, there are elements connected to the grid that generate problems affecting the power quality (PQ). It is worth mentioning that all the equipment and devices are affected by some events or anomalies present in the power grids when they are connected to it. In addition, these devices are, in great part, the causes of these problems [
36]. Now, these anomalies are better known as power quality disturbances (PQDs), which are defined, in general terms, as any deviation from the behavior of a pure sinusoidal waveform with a specified amplitude, frequency, and phase [
37]. Thus, the PQDs are catalogued according to their features such as duration, frequency value, amplitude variation, and behavior. For that reason, there exist two well-known standards that define and classify these PQDs such as the case of the IEEE-1159 standard [
38], where the classification of the most common PQDs are transients, short duration variations, long duration variations, voltage imbalance, waveform distortion, voltage fluctuations, and power frequency variations, summarized in the
Table 2. The other important standard is the IEC 61000-4-30 [
39], which classifies the power quality parameters in power frequency, magnitude of the supply voltage, flicker, supply voltage dips and swells, voltage interruptions, transient voltages, supply voltage unbalance, voltage harmonics, voltage interharmonics, main signaling voltage on the supply voltage, rapid voltage changes, and measurement of under-deviation and over-deviation parameters, summarized in
Table 3. In the IEC61000-4-30 the subclasses advanced (A), surveys (S), and basic (B) are defined for the measured parameters, according to the final application, for precise measurements and power quality assessments, and for instruments with obsolete design, respectively. It is important to clarify that from all the possible PQDs, only some of them will be adopted for the analysis in this work, because the real signals from the renewable systems only present some of these disturbances. Additionally, other anomalies, outside the standards, will be included in the analysis because they are inherent in the renewable generation systems, such as climatic conditions.
Regarding the approaches that address the issue of power quality analysis (PQA) through techniques for ND, there are few works that can be discussed. For instance, in [
40], an online method for detecting events with low quality through phasor measurement units and IF is proposed. However, they focus on anomalies such as the noise in the voltage signals and oscillations with growing magnitude. Now, some research directly tackles the PQD classification through hybridizing the kNN with a fully convolutional Siamese network for voltage signals with small number of samples [
41]. In the meantime, [
42] combined the Riesz transform (RT) of two dimensions, for feature extraction, the multi-objective grey wolf optimizer (MOGWO) together with the kNN, and for selecting and learning features, thus classifying the PQD. On the other hand, some works have used the SVM for detecting and classifying electric disturbances. For example, in [
43]. Firstly, the frequency components in the signal through the fast Fourier transform (FFT) are computed, then a set of adaptive filters extracts mono-frequency components of the distorted signal, posteriorly six time-domain features, are computed and fed to multiclass SVM for classification. In other example [
44], a combination of a histogram of oriented gradients (HOG) implemented in four steps (gradients computing, binning orientation, concatenation, and normalization) and the SVM are used for distinguishing PQDs. In the same field, a five-step method is presented in [
45], where it is first simulated a set of PQDs; second, the variational mode decomposition (VMD) decomposes the signal into instinct mode functions (IMFs); third, time-domain features are extracted from the IMFs; fourth, a heuristic selection is made through the permutation entropy and the fisher score algorithms, and fifth, the multiclass SVM classifies the disturbances. At last, but not least, the research presented in [
46] describes a methodology for classifying PQDs in WPG implemented in five steps: generation of a synthetic database for training purposes, multi-domain features estimation, feature reduction through genetic algorithms (GA) and principal component analysis, modeling the power disturbances with SOM, and classifying them through SoftMax. It is worth mentioning that such work focuses only on one technique for ND, which is SOM. but for the other type of renewable generation system, which is WPG, it is worth mentioning that they optimize the feature selection process.
According to the revision of works reported in the literature, it was found that the power quality assessment is generally addressed from the point of view of signal decomposition, for further spectral analysis, feature extraction, and machine learning classifiers. However, few research has been developed from the point of view of the techniques for novelty detection on applications related with the power quality of the renewable energy generation systems, mainly the solar photovoltaic and the wind power generation. Additionally, some restrictions require to be overcome several challenges, such as power disturbances variability, sudden changes, and intrinsic conditions. The existing work, consider faults detection in the parts of the mechanical systems. However, the topic of the power quality is also an interesting field since the evolving technology causes complex problems in the power grid that affect all the devices connected to it, and classical methodologies and measurement processes may find difficulties to provide a solution.
The contribution of this work is the development of a methodology that implements six techniques of ND for detecting and classifying PQDs over datasets of real signals acquired from renewable energy generation systems (REGS), specifically SPG and WPG. The proposed approach will consider PQDs under different scenarios based on the standards IEEE-1159 and IEC 61000-4-30, such as sags, oscillatory transients, and flickers. In addition, there will be considered combinations of these PQDs and abnormal behaviors that are not deemed in the standards, such as meteorological conditions that affect the energy generation process. Such scenarios will include the appearance of the PQDs and their combinations in a dynamic that contemplates two situations: known conditions and novelty condition. The implementation of the techniques for ND comprises k-NN, OC-SVM, GMM, SAE, IF, and SOM, and their algorithms are tested and evaluated for the disturbances proposed. The obtained results demonstrate the effectiveness of every technique emphasizing the usefulness and potential according to the problem to be solved. Therefore, for each of the techniques, it will provide the performance reached, highlighting their adequateness for detecting the power disturbances adopted (sag, oscillatory transient, and flicker), other abnormal condition (meteorological affectation), and which of them are limited in this task.
3. Proposed Methodology
In this section, the proposed methodology that implements the six techniques of ND for diagnosing and classifying electric disturbances, considered in the standards IEEE-1159, IEC 61000-4-30, and outside the standards, will be described. The signals with power disturbances analyzed correspond to two types of renewable energy generation systems, SPG and WPG. Next,
Figure 3 presents the general block diagram of the proposed methodology, as it can be observed, three main blocks integrate the approach: (i) datasets of real signals, (ii) feature extraction, and (iii) novelty detection techniques. In the following lines each one of these blocks will be described in detail.
From
Figure 3, the first block refers to all related data used for evaluating the performance of the techniques for ND. This block comprises two datasets: the first one corresponds to an application of energy generation by means of solar photovoltaic panels (SPG_DS), obtained from a photovoltaic park, and the second one is also related to energy generation but through wind turbine systems (WPG_DS), obtained from a wind farm. For each dataset, five case studies will be considered for the evaluation, and such cases imply that the analyzed signals will divide the information into two conditions: known condition and novelty condition. The known condition are those signals from which the technique for ND will be trained by learning the extracted features, which means that the technique will learn the behavior of the signals defined as known condition. The novelty condition are those signals from which the technique for ND have any previous information or training. In other words, these signals will be fed to the technique to be considered novelty. To this end, the signals considered for being processed include the normal condition and power disturbances addressed by the standards IEEE-1159 and IEC 61000-4-30, such as sag, oscillatory transients (OT), and flicker; however, a condition outside the standard is also considered such as a meteorological condition (MC). Here, the MC belongs only to the signals from the SPG_DS, and it refers to climatic variations (cloudy day) that generate incipient changes in the signal amplitude. Finally, the datasets are evaluated individually, and the signals are processed in the next block.
In the second block, the feature extraction from the signals of the datasets (SPG_DAS and WPG_DS) is performed. For example, the signals from the SPG_DAS with the power disturbances are sent to this block, then time-domain features, frequency-domain features, and time–frequency domain features are computed. The time-domain features are extracted directly from the input signals, but the extraction of the frequency-domain features needs firstly the application of the FFT. Similarly, with the aim of extracting the time-frequency domain features, first, it is necessary to apply a space transformation through the empirical mode decomposition (EMD) by using the first instinct mode function (IMF). A total of 20 statistical and non-statistical features are computed, which comprise indicators such as mean, maximum value, root mean square (RMS), square root mean (SMR), standard deviation, variance, RMS with shape factor, SMR with shape factor, crest factor, latitude factor, impulse factor, skewness, kurtosis, 5th moment, 6th moment, energy, entropy, range, form factor, and log energy entropy. These features are selected because they can provide data information not directly visible from the signals (even after a domain transformation), for instance, they can give central tendencies, dispersions, patterns, profiles, distributions, geometry, asymmetries, form, energy, entropies, among others. In addition, they have demonstrated their effectiveness in other applications, such as in monitoring systems [
47], and they can be easily implemented and computed through the equations observed in
Table A1 of
Appendix A. It is worth mentioning that, with these features, the provided information of the signal was sufficient for this work. However, if the final application is characterized by other type of signals such as acoustic ones, there can be added features related to these signals. In other words, the more features can be obtained from the signals, the more information can be used by the techniques of ND, however, with a caution because in contrast much of the information is not useful; instead, it will depend on the application and the technique implemented. Therefore, considering the 3 domains (time, frequency, and time-frequency) and the 20 statistical and non-statistical features, a total of 60 features describe the signal with power disturbances from the SPG_DS. Additionally, these 60 features are extracted for each of the five case studies considered in the dataset. The abovementioned procedure is repeated similarly over the signals from the WPG_DS. Thus, the features from the datasets per case study are fed to the ND technique block for further processing.
In the third, and last block, a set of six different techniques comprise the novelty detection technique stage. The techniques implemented in this block include kNN, OC-SVM, GMM, SOM, IF, and SAE. The features extracted from the datasets are fed one by one, case study per case study, following the dynamic described next. To exemplify, the analysis of the signals from the SPG_DS will be described. Thus, the analysis begins by feeding the 60 extracted features of the first case study that corresponds to the subset defined as known condition to the set of techniques for ND, and with this information, the six techniques are trained. It must be remembered that these features consider some PQDs and the six techniques learn the patterns of such anomalies. Posteriorly, the six techniques are used for detecting novel conditions as follows. Once the techniques for ND are trained, they are fed with input vectors that have features of both types: known and novelty conditions. The purpose is that the techniques for ND indicate when the input features belong to the known conditions or to the novelty. This way, the rest of the case studies are analyzed similarly, and in turn, all the same procedure is repeated for the signals from the WPG_DS. It is worth mentioning that some case studies consider combinations of PQD and MC for the training of the techniques. Therefore, the outputs of the proposed structure of
Figure 2 are the classifications label “known” and “novelty”, which can be considered a semaphore that indicates if a signal behaves as usual or if an unknown condition occurs. Finally, when an unknown condition happens, the techniques for ND will indicate it as novelty, later the technique will learn about this situation (through training), assuming it is known. If a new unknown condition appears, one more time, it will be assumed as novelty. This way, the performances of the six ND techniques can be compared.
6. Conclusions
This work describes a methodology that implements six techniques for novelty detection and the techniques are stacked autoencoder, one-class support vector machines, k-nearest neighbor, Gaussian mixture model, self-organizing maps, and isolation forest. It is worth mentioning that these techniques are implemented under the framework of the power quality assessment deemed by the standards IEEE-1159 and the IEC 61000-4-30 for two important renewable energy systems: solar photovoltaic and wind power generation. In addition, the scheme proposed allows to test the six techniques through a set of features in three different domains (time, frequency, and time–frequency) for an equal analysis of performances, allowing to discern between two conditions: known and novelty. The selected statistical and non-statistical features are sufficiently generalized, and they provide information contained in the signals from the datasets that is not directly visible, for example, data geometry, distribution, central tendency, asymmetries, form, patterns, and energy. Having this in mind, these features provide equality to implementation, no matter the internal structure, principle, or requirements of the technique for novelty detection, it would have appropriate information to be operated. On the other hand, the techniques for novelty detection were implemented to learn a specific behavior of an electric event in the standards (power disturbance) and outside the standards (meteorological condition), and this will be the known condition. Thus, if a condition is not considered in a previously behavior learned by the technique, it must be considered as novelty, and this capability could be used advantageously in an iterative process that learns through the time the features of the new condition, advising when a sudden change happens. These techniques have proven to be adequate for the proposed scenarios, but they could be extended for analyzing other failures, or being optimized or combined with other algorithms as further work.
Now, based on the results obtained from the two datasets analyzed (solar photovoltaic and wind power), several conclusions can be made. For example, it is worth mentioning, at first place, that from the six techniques, k-nearest neighbor was the most congruent algorithm for novelty detection of electric disturbances in both datasets, since for all the cases, this technique achieves overall performances above 94.25%. In addition to this, the congruency of the technique can be noticed because it does not present high variations in the classification of the known and the novelty conditions. In contrast, according to the results from
Table 5,
Table 6,
Table 7,
Table 8,
Table 9,
Table 10,
Table 11,
Table 12,
Table 13 and
Table 14, in both datasets, Gaussian mixture model and self-organizing maps can be considered the techniques with the lowest performances. Particularly, Gaussian mixture model, although with low performances for all the cases, remained with low variations on their reached performances for the classifications. Based on this, self-organizing maps performed well for the case studies where only one power disturbance was defined as novelty, or when the known condition is used for training the technique. However, if a combination of a normal signal with a power disturbance is used for training the algorithm, its performance significantly decreases. Regarding the isolation forest algorithm, it can be concluded, based on the table’s summaries, that this technique performs well in most of the cases (above 96.84%), except for those cases where the training process is conducted through the combination of a normal signal with a power disturbance. In such case, it reaches performances between 72.17% and 89.5%. Now, in relation to the stacked autoencoder algorithm, its performances are perfect when classifying novelty conditions, with values of 100%. However, its performance in the classification of the known condition varies between 82.33%, as the lowest value, and 95%, as the highest value, which is a good range. In consequence, all the performances for the stacked autoencoder stay above 91.18%. At last, but not least, one-class support vector machines provided excellent results for the case studies where the algorithm training was through a normal signal, but having drawbacks when the training process was made through the combination of a normal signal with a power disturbance. This was reflected as a performance decrement for the classification of the known conditions achieving values between 64.33% and 100%. Finally, it is concluded that in general, when a combination of power conditions is used for the algorithm training, the worst performance values are obtained when the techniques attempt to classify the known conditions.