1. Introduction
As a commercially viable and environmentally sustainable energy source, wind energy has attracted sustained attention due to its abundance and high social benefits. The number of installed onshore and offshore wind farms is increasing to satisfy the rapidly growing demand. At the end of 2017, the cumulative installed wind-power capacity of China comprised approximately 188,392 MW, followed by USA, Germany, India, and Spain [
1]. However, operation and maintenance (O&M) costs are still high due to the harsh environment and the early deterioration of critical components. Condition monitoring (CM) is an effective tool commonly employed to improve the reliability of wind turbines (WTs) and reduce O&M costs. It allows the maintenance to be scheduled based on the conditions of WT components [
2].
As one of the critical components of WTs, the pitch system has led to the highest failure rate and downtime according to the results of the Reliawind Project [
3]. A pitch-system fault could cause severe downtime events, such as blade fracture, which greatly limits the generating efficiency of WTs and increases O&M costs. Therefore, the CM of pitch systems is crucial for the early detection of pitch faults so as to reduce the costs and ensure the reliability of WTs.
The CM techniques for WTs have been widely studied based on the various signals, such as vibration, electrical, temperature, acoustic emission, lubrication oil parameters and supervisory control and data acquisition (SCADA) signals. Meanwhile, they have been proved effective in detecting some specific faults of WT components [
4,
5]. However, it is difficult to apply some of them to monitor the condition of pitch systems due to unsuitability and complexity. For example, vibration analysis requires multiple sensors to be installed and large volumes of data to be collected, which consequently lead to a substantially increasing CM costs. Currently, large WTs are equipped with a SCADA system, providing a great deal of information on WT operating performance. Thus, the CM method based on SCADA data is cost-effective, as no additional sensors are needed, and as a result, a number of CM approaches using SCADA data have been researched in recent years [
6,
7]. Therefore, SCADA signals were used for pitch-system CM to reduce the expenditure in this paper.
The CM approaches could be generally categorized as the analytical-model-based method, knowledge-based method and data-driven-based method [
8]. The analytical-model-based method requires constructing an accurate mathematical model [
9]. Considering there are various parameters in the pitch system and the relationship between components is quite complicated, it is hard to construct the accurate mechanism model for the pitch system. Furthermore, there is a risk of model failure with the impact of noise and environmental changes. Consequently, the analytical-model-based method has been rarely used to detect pitch-system faults, while the other two approaches have been presented over the last few years.
The knowledge-based approach does not require a quantitative mathematical model for fault detection compared with the analytical-model-based method. For instance, Chen et al. [
10] presented a priori knowledge-based adaptive neuro-fuzzy inference system to detect significant pitch faults automatically based on the 10-min averaged SCADA data. It has a great interpretability for allowing expert to introduce a priori knowledge to the system model. Ran Bi et al. [
11] applied the normal behavior models based on the performance curves to detect pitch faults. The normal behavior models were obtained from the WT technical specification. Therefore, no training model is needed, but the knowledge of WT operation and control is required to identify abnormal operation conditions. Obviously, the accuracy of knowledge-based approaches is highly dependent on professional knowledge and long-term accumulation of experience, of which the integrity is difficult to be assured.
Compared with the knowledge-based method, the data-driven approach does not require much priori knowledge and experience. Data-driven models are established by mining information in historical data. It is applicable for the complex system due to its better adaptive ability.
Jamie L et al. [
12] proposed a data-driven expert system to detect pitch faults using the 10-min averaged SCADA data. The RIPPER algorithm was used to generate the rules for the diagnosis of pitch-fault classes, including “no pitch fault”, “potential pitch fault”, and “pitch fault established”. A classification accuracy of 85.5% was achieved in this system. For accurate classification, large quantities of data are required, especially for historical fault data.
Andrew Kusiak et al. [
13] developed a data-mining-based two-class classifier to monitor blade pitch performance using 1-s SCADA data. By comparing the five data-mining algorithms, the genetic-programming algorithm was selected to perform the prediction of blade-angle implausibility faults with the best classification accuracy in the range of 68.7–87.4% for 13 time stamps. The maximum prediction time is 10 min with the accuracy of 68.7%, which can be improved for better condition-based maintenance.
B. Chen et al. [
14] applied an approach for WT SCADA alarm processing and diagnosis using an artificial neural network (ANN). The trained ANN model was generated to identify if any pitch-system fault has occurred. However, the method was performed only based on the SCADA alarm signals, while some other signals with valuable information were potentially ignored. Moreover, the ANN model is a black box that is completely dependent on a large number of training samples and consequently loses insight into new problems.
Considering that these data-driven methods have high requirements for training data, for the effectiveness of these classification models depends on large quantities of labeled data, including historical fault data and healthy data, this paper aims to construct the model by only simply using the healthy historical data, and as a result, avoiding the difficulty of getting large volumes of historical fault data. Furthermore, it provides a strategy for the diagnosis of new WTs. When constructing the model, the key to success is to select effective parameters. The status labels are commonly used as the model output to represent system health status, while the model input is usually determined based on these labels or research results of other literatures [
12,
13]. However, the model constructed by only using healthy historical data could not use labels to describe system health status because of the lack of fault status labels. Therefore, this paper evaluates the operating conditions of the pitch system by using the suitable indicator, and the related input parameters are determined with an appropriative feature algorithm in order to improve model efficiency and accuracy. In addition, a control chart is used for the identification of abnormal conditions. The model effect is demonstrated with the specific pitch-system faults.
The rest of the paper is organized as follows:
Section 2 gives a brief description of the WT pitch system and related SCADA parameters. The model structure and methodology are introduced in
Section 3. The comparative analysis of different data-driven models and monitoring results are discussed in
Section 4, and conclusions are drawn in
Section 5.
3. Methodology
Because the fault samples are not contained in the training dataset, the status labels are not available for representing system health status. As a result, a suitable status indicator was applied for condition evaluation in this paper. Temperature parameter is a great condition indicator due to its thermal inertia and strong anti-interference capacity, which means that wind-speed uncertainty and uncontrollable noise have hardly any disturbance to it [
17]. In addition, a large number of temperature data can be obtained directly from the SCADA system. Several temperature parameters of the critical WT components have been successfully applied to CM as a deterioration indication. For example, in Reference [
18], a generator-temperature trend-analysis method was proposed to monitor the generator condition of WTs. In Reference [
19], a generator-bearing-temperature model was generated using neural network algorithms to analyze the generator bearing failures of WTs. In Reference [
20], the gearbox faults were predicted using SCADA oil temperature. This paper develops a pitch-motor temperature model to monitor the condition of WT pitch systems as the pitch-system operation is mainly driven by the pitch motor. It means that the pitch-motor temperature is used as a target for monitoring WT pitch system.
3.1. Modeling Process
The pitch-motor-temperature model for pitch-system CM is presented in
Figure 2, which mainly includes the following steps:
1. Data Preprocessing
The SCADA system collects both normal and abnormal data. The training set for modeling only includes the healthy data collected in the normal operation, excluding downtime, failure, and maintenance. It could be obtained by removing the abnormal data according to the SCADA state code and maintenance record. Meanwhile, the data collected during the limited power period were not considered, which could be determined according to the power limit value in the SCADA system.
2. Feature Selection
In order to improve model performance, it is critical to select proper parameters to construct an efficient CM system for the pitch system of WTs. The related parameters used for the model input could be selected from
Table 1 with a valid feature-selection algorithm.
3. Model Training and Test
Based on the selected features, the model could be established using healthy historical SCADA data, which represent the relationship between the features and the condition indicator. Therefore, the target variable value could be estimated with the trained model and be close to the real value if the pitch system is normal. Otherwise, there is a possible failure in the pitch system.
4. Residual Analysis
In the interest of a better interpretation for the model results, a proper residual-analysis method should be applied to identify the abnormal condition. This paper employs the control chart to get the bound. This means the pitch system might be abnormal when the residual error exceeds the bound.
3.2. Feature Selection
For constructing an efficient data-driven model, it is crucial to select related parameters as the input of the pitch-motor-temperature model. The feature-selection algorithm could be applied to evaluate the importance of initial parameters and reduce modeling complexity. As an important preprocessing technique, it is widely used in data mining, which can be divided into the wrapper, embedded, and filter methods [
21]. The wrapper model determines the optimal feature subset according to the objective function, which is usually defined as the performance index such as mean square error (MSE), while the filter model gets the feature ranking on the basis of relevance. With respect to the embedded model, the features are automatically selected with the specific machine-learning model. In this paper, three parameter-selection algorithms were applied, respectively, to acquire the optimal feature subset. The SCADA data of a turbine collected in four months (from 1 September, 2016 to 31 December, 2016) are illustrated for feature selection. During this period, there was no failure.
1. Sequential Forward Selection
As one of the wrapper methods, sequential forward selection requires a proper learning algorithm and evaluation criteria to determine the feature subset. Here, support vector regression (SVR) is applied for modeling, and the MSE between the model output and monitored pitch-motor temperature is used to assess the importance of the parameters. It begins with an empty set. The feature making the MSE minimum is added into the set sequentially until the MSE does not decrease anymore. Finally, battery-cabinet temperature, blade pitch angle, hub temperature, ambient temperature, and pitch-motor current constitute the feature set, which makes the MSE minimum as shown in
Table 2.
2. Gradient-Boosting Regression Trees (GBRT)
As a flexible nonparametric machine-learning approach, GBRT can be used to train a regression model. This paper mainly takes advantage of the automatic feature selection of the GBRT algorithm as a typical embedded model. In addition, this method considers the correlation between the input variables compared with mutual information. The relative importance of the features is given in
Table 3. It was found that the MSE is smallest when the first five parameters are selected as the model input, including hub temperature, battery-cabinet temperature, ambient temperature, pitch-motor current, and blade pitch angle.
3. Mutual Information
This algorithm measures the mutual dependence between two variables as one of the filter models. In this research, it was used to calculate the correlation between the pitch-motor temperature and the related parameters in
Table 1. The results are shown in
Table 4.
Obviously, the former two algorithms have the same result. Namely, the pitch-motor-temperature model has an optimal performance with the following five parameters as the input variables: battery-cabinet temperature, hub temperature, ambient temperature, pitch-motor current, and blade pitch angle. In addition, these five parameters have a higher correlation with the pitch-motor temperature according to the result of mutual information. Considering the criteria that input variables should be highly correlated with the output and less correlated with each other, as a redundant variable, pitch-inverter temperature is not considered because it is highly corrected with the battery-cabinet temperature. Therefore, these five parameters are selected to construct the regression model eventually. In this case, the R-squared is 0.9056, the MSE is 0.5692, and the mean absolute error (MAE) is 0.5370.
3.3. Model Construction
With the selected five parameters as the input variables, a pitch-motor-temperature model is established using the healthy historical SCADA data. There are multiple data-driven algorithms that could be used to model the pitch-motor temperature, such as ridge regression (Ridge) [
22], least absolute shrinkage and selection operator (Lasso) [
23], k-Nearest Neighbors (kNN) [
24], random forest (RF) [
25], ANN [
26], and SVR [
27]. Considering SVR has the advantages of solving problems such as nonlinear and local minima point with finite samples [
27], this paper applies SVR to establish the model.
SVR principles are as follow [
27].
Given a training dataset
,
is the i-th m-dimensional input vector and
is the corresponding target of
. The SVR model can be represented as Equation (1):
where
is the model output,
is the normal vector,
is the bias parameter, and
denotes a fixed feature-space transformation, which transforms the nonlinear problems in the input space into the linear problems in the feature space.
SVR assumes the absolute difference between the target
and the model output
is less than
. Thus, the SVR optimization-objective function is given by Equation (2):
where
C is the regularization parameter, and
is the
-insensitive loss function.
and
are defined as two slack variables. Then, the optimization problem can be rewritten as Equation (3):
subject to:
This problem can be achieved by introducing Lagrange multipliers and optimizing the Lagrange function. Therefore, the dual problem of Equation (3) can be obtained as Equation (6):
subject to:
where
are Lagrange multipliers, and
stands for the kernel function.
Eventually, the regression function obtained is shown in Equation (8) by solving the equations above:
where the kernel function
is used to map the training set to high dimension space. Thus, SVR is capable of solving both linear and nonlinear regression problems. In this paper, the SVR parameters are determined by grid search.
3.4. Residual Analysis
In order to determine whether a sustained change in the condition of the pitch system has occurred, an exponentially weighted moving average (EWMA) control chart was applied to identify the abnormal condition of the pitch system. The EWMA can be viewed as a weighted average of all past and current observations. It has a smoothing effect on the uncontrollable noise. In addition, it is very effective against small process shifts. Consequently, the EWMA control chart is typically applied with individual observations [
28]. In this paper, an EWMA-based residual analysis method is adopted for the CM of WT pitch system. The outliers of the EWMA control chart that exceed the control limits are viewed as out of control, which could indicate impending pitch-system failure.
The raw SCADA data were collected at 10-min intervals. In order to avoid the misidentification of fault events, the moving-average approach is used prior to the control chart to reduce the data noise. In this research, the window length is set to 6.
The EWMA statistic is defined as Equation (9):
where
is the smoothing parameter, and
is the
i-th residual error, defined as the deviation between the measured pitch-motor temperature and the model output value. The starting value
is the process target
, which is set to the average of historical residual errors
in the normal operations. If the observations
are independent random variables with variance
, the variance of
can be calculated as Equation (10):
Thus, the EWMA control chart can be constructed by drawing the relationship between
and
i. The center line and control limits can be represented as Equations (11)–(13):
where
L stands for the width of the control limits. The performance of the EWMA control chart mainly depends on the reasonable selection of design parameters, including
and
L. It was found that 0.05, 0.1, and 0.2 were commonly used values of
in practical application, and
works reasonably well, which corresponds to the usual
limits [
28]. In this study,
and
are set to 0.2 and 3, respectively.
5. Conclusions
A data-driven approach for the CM of WT pitch systems using SCADA data has been presented. The pitch motor temperature was applied to monitor the condition of pitch systems as the status indicator. Then, a regression model was established to represent the relationship between the pitch-motor temperature and the selected features in the normal operations, including battery-cabinet temperature, hub temperature, ambient temperature, pitch-motor current and pitch blade angle, which were determined based on three feature-selection algorithms. An SVR algorithm was employed to model the pitch-motor temperature and compared with five data-driven algorithms, Ridge, Lasso, kNN, RF, and ANN. As a result, SVR was determined to construct the data-driven model because of its excellent generalization ability. It was found there is little difference between the model output and the measured value in the normal operations, and the residual errors are normally distributed. Therefore, the abnormal monitored temperature indicates a potential pitch-system fault. With the moving-average approach and the EWMA control chart, the abnormal condition could be identified clearly once there are five residual-based statistics exceed the control limits. The results demonstrate that pitch-system failures are successfully detected earlier than the SCADA alarm system with the proposed approach. Moreover, it is more effective and applicable than the classification models.
Compared with the knowledge-based method, much professional knowledge and experience are not needed in this paper. The priori knowledge of physical characteristics for modeling is also not required compared with the analytical-model-based method. The proposed approach could construct the regression model automatically with the healthy historical SCADA data. It provides a strategy for the CM of new WTs. In conclusion, the proposed method is applicable in industrial applications due to its great adaptive ability and low cost.
This paper only focuses on the CM of WT pitch systems due to a small number of pitch-system-fault samples. In the SCADA system, various alarms and fault logs are recorded. However, it is difficult to accurately determine the fault types depending on the SCADA system. Therefore, an advanced fault-identification system should be developed. As a result, further studies on fault isolation will be summarized in the next research by collecting sufficient fault samples and investigating the identification approaches.