1. Introduction
Wind power has achieved large-scale development and utilization on a global scale and has become the most widely used and fastest-growing renewable energy source [
1]. As an important part of the wind turbine, the Supervisory Control And Data Acquisition (SCADA) system provides detailed data on the operating status of the wind turbine [
2,
3,
4]. In recent years, the wind power industry has developed rapidly, and wind farms have accumulated a large quantity of operational data. Such data are indispensable for wind turbine state assessment and wind power prediction. In real time, they are also important for power system dispatch and to schedule any wind farm curtailment or derating.
Generally, wind turbine power curves exhibit a degree of scatter, reflecting measurement uncertainty in both wind speed and power. In addition, a certain amount of abnormal data usually exists in the actual measured operational data of the wind farm, which complicates interpretation such as determination of wind turbine operating state or wind power prediction. There are many factors affecting the quality of operational data, such as the measurement error of the sensor itself, poor measurement accuracy caused by a poor operating environment, data storage and transmission error, wind turbine performance failure, and importantly, the operating of a wind turbine in a derated power state. It is useful to divide abnormal operational data into two categories: often extensive data generated by the turbine under derated control; and a generally smaller amount of outlier data that deviate from the main data distribution, due to averaging over state transitions or some other sporadic factors [
5,
6,
7]. However, there is no current SCADA parameter indicating whether it is derated or for the max power limit for a wind turbine. Therefore, how to identify a derated operation from within the turbine operational data and how to eliminate outliers are important research activities in the field of wind power.
Outlier detection is widely used in the field of wind power, and related research institutions have carried out significant research with useful results. There are many statistical methods for detecting outliers, which can be roughly divided into the following five categories: distribution-based outlier detection, depth-based outlier detection, cluster-based outlier detection, distance-based outlier detection, and outlier detection of density methods [
8,
9,
10,
11,
12].
Distribution-based statistical outlier detection makes use of a fitted probability distribution for a given data set and identifies data that are far from this as outliers. The distribution-based outlier detection method is widely used in the wind power field. In [
13], a mathematical model based on the quartile algorithm was used to identify the anomalous data. For the cases of a small amount of missing data, or in contrast, continuous missing blocks of data, the wind farm output correlation and multi-point cubic spline difference are used respectively for interpolation. The method reconstructs the missing data. In [
14], the time series characteristics of bad data were identified, and a segmentation judgment method was applied. Any abnormal data are reconstructed based on the relationship between wind power output and the data characteristics of the wind farm itself. In [
15], a joint probability model method based on the Copula function was proposed. By using the Copula function, a complex nonlinear multivariate relationship between parameters can be obtained based on the univariate marginal distribution of the data set. The significant outliers are then eliminated by examination of the derived joint probability model. In [
16], an optimal intra-group variance algorithm for power curve analysis was proposed. This algorithm changes the dependence of traditional analysis methods on multi-dimensional data. It only needs to analyze wind speed and power and can identify the normal power generation status of the turbine. In [
17], based on the analysis of the wind turbine-power abnormal operation data characteristics of wind turbines, the anomalous data are divided into four types: the bottom of the curve, the middle and upper stacking anomaly data, and the dispersive anomaly data around the curve. An anomalous data identification and cleaning process based on the combination of the change point grouping method and quartile method were proposed. This method can effectively identify four types of abnormal data, and the process is reasonable and the cleaning effect is good. The above distributed outlier detection methods can quickly and efficiently find outliers in the case of a known data set distribution. However, this method relies on the global distribution of a given data set and does not apply to situations where the high-dimensional data set and data set distribution are unknown.
In practice, most the operational data do not fully conform to a specific data model distribution [
18,
19]. To improve the distribution-based outlier detection method, the depth-based outlier detection method was created. This method assigns each data object a depth value and maps data objects to corresponding layers of a 2D space by the assigned depth values. Data objects on a shallow layer are more likely to be outliers than those on a deep layer. However, in practical applications, the existing depth-based outlier detection method is only effective in processing data in two-dimensional and three-dimensional space [
20].
Cluster-based outlier detection divides the data set into clusters according to data features and identifies data points that are far away from any cluster as outliers. In [
21], based on the k-means clustering, data stream concept drifting and existing outlier detection algorithm, a dynamic outlier detection algorithm was proposed. During the running of the algorithm, the sliding window size is adjusted adaptively according to the data flow concept drift detection result, and the cluster structure in the data set can be effectively found while determining the outliers. Cluster-based outlier detection can identify outliers in a fast and timely manner. The detection of outliers is very sensitive to the clustering algorithm used; consequently, the clustering algorithm must be selected with care.
In order to improve the above-mentioned outlier detection method, researchers have proposed a distance-based outlier detection method. This approach assesses whether the distance of most data points in the data set from the target point is greater than the user-defined distance threshold, and if so, the target point is considered to be an outlier [
22,
23]. In [
24], a fast distance-based data outlier detection algorithm was proposed. The algorithm uses the sliding window model to process the data stream and uses the vector inner product inequality to reduce the branch. The distance-based outlier detection method is widely used because of its simplicity and efficiency. However, since the method uses the global threshold and does not consider the local density change, only global outliers can be detected, and local outliers cannot be detected.
The density-based outlier detection method is built on the distance-based outlier detection method, and it determines an outlier based on the field of the data point. It is able to accurately find outliers with uneven data distribution. In [
25], in order to improve the efficiency of the existing density-based outlier detection algorithms, an outlier detection algorithm based on local density, LDBO, was introduced. The concept of strong k nearest neighbor and weak k near point was introduced. By analyzing the outlier correlation of adjacent data points, individual data points are treated differently. A data point outlier pre-judgment strategy was proposed to effectively improve the efficiency of the outlier detection algorithm for data distribution anomalies. The density-based outlier detection method can solve the problem of local outlier detection well, but there are still problems of high complexity and parameter selection.
The above-mentioned outlier detection methods each have advantages and disadvantages, and their scope of application is different. To overcome the problems caused by a single method, the current research mostly adopts a mixture of two or more methods. There are outlier detection methods based on distribution and clustering. In [
26], based on the analysis of the data characteristics of the identified wind outliers, the outlier data combination detection model based on quartile method and k-means clustering analysis was proposed. The model does not rely on the normal data set for training and learning. It has strong automated processing capability and versatility, but the k value of the method is more complex and has a greater impact on the data processing results. There are outlier detection methods based on distribution, clustering and density mixing. In [
27,
28], two quartile algorithms were used to eliminate sparse outliers, and then the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm was used to eliminate stacked outliers. The method does not need to input the number of clusters, and it has high accuracy and universality. However, when the spatial clustering density is not uniform and the cluster spacing difference is very large, the clustering quality is poor, and the outliers caused by the derated power are directly eliminated.
Instead of removing power curtailment data as outliers, our goal is to identify derated power operation states and eliminate outliers in wind turbine operating data that clearly deviate from the main trend, which contains several derated power operation states. The difficulty of the problem lies in the fact that there are data points generated by several turbines operating at several reduced power states within the data set. Not only outliers but also power curtailment data points are far from the power curve. Simply assessing distance from the power curve will not help here. Since there are more than one derated power states in the operational data, we cannot directly determine whether a specific data point belongs to the outliers based on the distance from the power curve. Therefore, in this paper, a method for detecting outliers resulting from derated power operation is proposed.
The method converts the outlier detection problem of the wind turbine with derated power operation into a mixed probability distribution model by introducing reasonable assumptions. The K-means clustering algorithm is used to initialize the parameters of the mixed probability model. Then, the expectation maximization (EM) algorithm is used to derive the updated expression for the model parameters. The logarithm likelihood function is maximized by an iterative method to obtain the optimal model parameters. Finally, the processing of the outliers of the wind turbine data in the power-reduced state is realized by calculating the posterior probability of the sample. The method proposed in this paper can quickly and efficiently identify the degrees of derating in the operational data, distinguish between normal operational data and several different degrees of derated power data, and eliminate outlier data in each data type to improve the quality of wind turbine operational data. The description of this approach is in four parts. The first section above introduced the research background and research status of the wind derated power operation data outlier detection method. The second section introduces the outlier detection model of the wind turbine derated power operation data. In the third section, real operating data of wind turbines located in North China are used to verify the proposed method. Finally, the conclusions are summarized.
2. Wind Turbine Derated Power Operation Data Outlier Detection Model
2.1. Modeling of Derated Power Operation Data Outlier Detection
The method first identifies derated power levels contained in the operational data and then divides the data accordingly for derated power. Finally, the outliers are removed from the operational data corresponding to each type of derated power state. The notion of derated power levels and outliers are shown as
Figure 1.
In keeping with statistical notation, random variables represent the wind turbine output power, nacelle wind speed, and derated power state, respectively. Among them, are observable random variables. is a latent random variable and cannot be directly observed from the sample. From the SCADA system of the wind turbine, it is usually easy to observe a sample set of turbine output power and wind speed pairs Among them, represent the ith output power and cabin wind speed samples in the data set, respectively. We need to establish the posterior probability distribution of the output power and the nacelle wind speed according to the sample data set , to identify the power limit of the sample , and further remove the outliers from the sample set whose posterior probability value is too low. In order to achieve this goal, a mixed probability density distribution with a derated power state as a latent random variable is firstly established.
2.2. Mixed Probability Density Distribution Model
In order to establish the mixed probability density distribution model , the wind turbine derated power assumption and the derated power operation output assumption are introduced.
Power-limited state assumption: the derated power state of the turbine can be expressed in a limited state, that is, the wind turbine derated power state can take different values. They respectively correspond to the normal operating state of the turbine and the derated power operating status with different derated power levels.
Power-limited state output assumption: the output of the wind turbine in a derated power operation state can be expressed as the theoretical output power multiplied by the corresponding derated power degree coefficient. That is, in a derated power state, the output power of the wind turbine can be expressed as , where is a function of the wind turbine theoretical power curve, and represents the incoming wind speed. is the power limit coefficient corresponding to the kth derated power state, , the smaller the value of is, the greater the power limit of the unit is represented. The closer the value of is to 1, the closer the unit state is to the normal power generation state. Here, the normal operating state can be regarded as a special derated power state in which the derated power degree coefficient takes a value of 1.
In order to simplify the modeling process, we use the equal-width discrete method to discretize the wind speed data and divide the wind speed distribution interval into equal parts. The spacing of each part is the same, and the median value of each wind speed interval represents the interval wind speed. It is further assumed that the discretized wind speed obeys the multinoulli distribution, i.e., , where vector is the distribution parameter of the polynomial distribution and the jth element of the vector satisfies , and It can be seen that the probability is, when the wind speed is , where represents the indication function. If the expression in the braces is true, the function value is 1; otherwise, the function value is 0; indicates whether the wind speed value corresponding to the sample belongs to the discretized jth wind speed interval .
The turbine’s derated power state cannot be directly observed; it is thus a latent random variable. It is assumed that the power-limited state also obeys the multinoulli distributions, that is, , where the kth element of the vector satisfies , and .
Assume that under a given wind speed and derated power state, the turbine output power obeys a Gaussian distribution, i.e., , where and represent the mean and standard deviation of the Gaussian distribution at a given wind speed and a derated power , respectively. According to the assumption of the derated power operation output of the wind turbine, the mean of the Gaussian distribution can be expressed as ; the standard deviation , where is the derated power coefficient corresponding to the kth derated power state, and is the standard deviation of the wind speed in the wind speed interval , and the derated power degree is taken as .
Furthermore, the wind speed and the derated power state are independent of each other; thus, .
According to the above assumptions, the mixed probability density distribution
can be expressed as follows according to the conditional probability and the full probability formula:
Among them,
indicates that under the condition that the power-limited state
is
and the wind speed random variable
takes
, the conditional probability of the unit output power
can be calculated by Equation (2).
Figure 2 shows the joint probability distribution of wind speed and power in a derated power operation state of a wind turbine in this paper. If the above distribution function parameters
are obtained, the probability values of the items in Equation (1) can be obtained. As shown in Equation (3), we can calculate the posterior probability
of each sample
under each derated power state
according to the Bayesian formula and then calculate the power-limited state to which the sample
belongs
. Finally, for each power-limited state
, the set of samples whose posterior probability is lower than the threshold
is marked as outlier data to achieve outlier detection of wind turbine derated power operation data.
To estimate the model parameters, a log-likelihood function can be written, as shown in Equation (4). Since
is a latent random variable and cannot be directly observed, it is difficult to directly maximize the log-likelihood function (3) to solve the parameters. We turn to the idea of the EM algorithm to solve the problem.
2.3. EM Algorithm Estimation Model Parameters
According to the idea of the EM algorithm, we do not directly solve the maximum value of the log-likelihood function, and instead go to the lower bound (E-step) of the log-likelihood function and then maximize the lower bound (M-step). We can find the model parameters that maximize the likelihood function by iteratively repeating the E-step and M-step. We firstly introduce the Jensen inequality.
Theorem. Let be a convex function and let be a random variable.
Moreover, if is strictly convex, then holds true if and only if with probability 1 (i.e., if is a constant).
According to Jensen’s inequality, the lower bound of the log-likelihood function can be obtained:
Among them,
is a log-likelihood function of the mixed probability model;
represents a certain distribution, and the condition that the inequality takes the equal sign is that
is a constant. According to
, you can obtain:
Let
. According to the Bayesian formula, you can obtain:
Let
take the right side of the inequality of Equation (5) as the lower bound of the log-likelihood function, then
can be expressed as:
where
,
.
After obtaining the lower bound of the log-likelihood function, we can obtain the partial derivative of the lower bound on the parameters . Then, let the partial derivative equal zero, and obtain the model parameters by maximizing the lower bound of the log-likelihood function.
Find the partial derivative of
to
Let the above formula be equal to zero, and we find:
Next, we find the partial derivative of
on
:
Let the above formula be equal to zero, and we find:
Find the partial derivative of
on
.
. Using the Lagrangian multiplier method, we find the partial derivative of
on
and
.
Let the upper two formulas be equal to zero. Combine these two formulas and we can obtain the solution:
In the same way,
. Using the Lagrangian multiplier method, we find the partial derivative of
on
and
.
Let the upper two formulas be equal to zero. Combine these two formulas and we can obtain the solution:
Thus far, we have derived two main processes, E-step and M-step, in the EM algorithm. In the E-step, according to the initialized parameters, we can calculate according to Formula (8). In the M-step, the parameters are updated according to Equations (11), (13), (16) and (19); the E-step and M-step are repeated repeatedly until convergence. Then, we can find the model parameters.
Convergence is guaranteed by the EM algorithm. Hence, we will no longer discuss the proof process in this paper. However, the EM algorithm can only converge to the local optimum, and the result is greatly affected by the initial value. Here, we will give the identification method of the optimal derated power level number and the initialization method of other parameters of the model to help the EM algorithm to quickly and stably converge.
2.4. Optimal Derated Power Level Identification
The parameters of the wind turbine derated power operation data anomaly detection model proposed in this paper include: the number of derated power levels ; the number of discretized wind speed intervals ; the derated power coefficient , the power limit distribution parameter , the discretized wind speed probability distribution parameter and the mean parameter of the Gaussian distribution and the variance parameter . We firstly determine the optimal derated power level number .
Inspired by the k-means clustering algorithm, we firstly give the derated power level number and the discretized wind speed interval number and random initialization , , , and . For each sample , we calculate the distance from to each of the derated power output curves. We find the derated power state corresponding to the curve with the smallest distance from the sample to each of the derated power state running curves as the power limit state of the sample , denoted as . The distance of the sample from the corresponding derated power running curve is recorded as . For the sample set of the same power state , we use the least squares fitting derated power operation curve and update the corresponding derated power coefficient .
Let K take 2-8 in sequence and repeat the above steps several times. We calculate the average loss value of all samples and take the average value of each loss. We use the mean as the vertical axis and the K value as the horizontal axis as the elbow curve. The K value corresponding to the position where the average loss function value has the largest decrease is taken as the optimal derated power level number.
2.5. Model Parameter Initialization Method
After obtaining the optimal power level number
and the power limit class
corresponding to each sample, we can directly write the initialization expression of the parameters
, as shown in Formula (20) to Formula (23).
According to the derated power level corresponding to each sample, we can initially realize the division of the derated power data, but the result obtained by this process is not the optimal result, and based on the k-means clustering algorithm, it is also impossible to eliminate outliers in each of the derated power levels. However, we can use the parameters obtained in the above process as the initial values of the EM algorithm, so that the parameters of the final model can be stably converged to similar local best points.
Based on the above modeling process and parameter initialization process, the algorithm for the anomaly detection method of wind turbine derated power operation data proposed in this paper is as follows (Algorithm 1):
Algorithm 1 Wind turbine derated power operation data outlier detection |
Require: the sample set of turbine output power and nacelle wind speed pair , discretized wind speed interval number , wind turbine theoretical power curve function , probability threshold . 1. Initialize the model parameters to obtain the optimal power level number and the initial values of the mixed probability model parameters . 2. While have no convergence. Do 3. E-step: For each sample and the derated power , calculate . 4. M-step: Update parameters 5. 6. 7. 8. 9. End while. 10. Calculate the posterior probability of each sample under each derated power state according to iterative parameters. 11. Calculate the derated power level to which the sample belongs. 12. For each derated power , the sample set whose posterior probability is lower than the threshold is marked as outlier data. |