1. Introduction
The share of wind energy among energy sources has been steadily growing in recent years [
1]. This growth is driven by the fact that wind energy is a renewable source of energy, which has allowed lowering dependency on fossil fuels [
2]. In cold climates, a challenge facing the use of wind energy involves the reduction or loss of wind-generated power due to icing on the blades of wind turbines during winter seasons [
3,
4,
5]. Ice accretion on the blades causes rotation imbalance, leading to the turbine structure getting damaged [
6]. Also, ice accretion causes power reduction or downtimes and hence loss of revenue [
7].
Most of the works in the open literature on turbine icing using SCADA data only have focused on ice detection. Detection addresses the question “Is there ice now?” on a turbine. The purpose of our work is to answer the question “Will there be ice at a future time?” That is, we seek to predict the future occurrence of icing on a turbine. The prediction of icing on wind turbines before it occurs would allow taking actions to minimize power losses. Examples of mitigating actions that can be taken once icing is predicted include changing the turbine control settings, turning on heaters if available on the turbines, or applying anti-icing fluids to the blades. In addition, the prediction of icing would allow one to conduct a cost/benefit analysis by comparing the cost of the mitigating action with the revenue losses due to icing. The prediction of icing on wind turbines before it occurs would allow taking actions to minimize power reduction or loss due to icing. Mitigating actions that can be taken include changing the turbine control settings, turning on heaters for those turbines that are equipped with them, or applying anti-icing fluids to the blades.
A number of articles have already appeared in the literature addressing the prediction of icing on wind turbines. The approaches presented can be grouped into two main categories. The first approach involves the installation or utilization of special-purpose sensors to detect icing on wind turbines. Examples of such sensors include thermal imaging cameras and ultrasonic sensors [
8,
9,
10]. There exists installation, operation, and maintenance costs associated with these sensors which can pose a challenge for their use. The second approach involves the use of Supervisory Control and Data Acquisition (SCADA) data [
11] that are already available as part of a wind turbine infrastructure. In this approach, a data-driven method is used to develop a prediction model based on past values of the variables or features in SCADA data, as well as past values of meteorological data when available [
12]. In [
13], Kreutz et al. presented a method for predicting icing on wind turbines based on SCADA and meteorological data using a deep neural network. In [
14], Tao et al. discussed a method for icing prediction on wind turbines by combining a Convolutional Neural Network (CNN) with a Gated Recurrent Unit (GRU). Zhang et al. [
15] utilized Federated Learning (FL) to predict icing on wind turbines. The framework in this reference does not require the installation of any new sensors on the turbine.
The choice of variables or features from the SCADA data used by a prediction model can greatly impact the outcome of the prediction. A data preprocessing step, dubbed feature selection, is often carried out to select those variables or features that are informative for icing conditions. As discussed in [
16], the selection of informative features led to lower training time as well as higher prediction accuracy. Furthermore, since, in practice, the occurrence of icing events is less frequent than in normal operation, there exists a data imbalance between the number of samples associated with icing and normal conditions. Such a data imbalance, if present during training, can adversely impact the prediction outcome [
17]. In [
18], Liu et al. studied the impact of data imbalance by examining different methods to establish balanced numbers of data samples for icing and normal conditions. In [
19], it was shown that this data preprocessing improved prediction accuracy.
The main contribution of this paper is the development of a data processing framework for the prediction of icing in wind turbines. This framework covers key aspects of SCADA data processing, the labeling of SCADA data for icing conditions when labels are not available, the selection of informative features or variables in SCADA data for icing prediction, and using a deep neural network Temporal Convolutional Network (TCN) for icing prediction. In summary, the main contribution of this work is the development of an icing prediction framework based on SCADA data.
Temporal Convolutional Networks (TCNs) are being increasingly used in various prediction tasks and have been shown, e.g., [
20,
21], to be more effective in capturing long-range dependencies in data sequences compared with RNNs (recurrent neural networks) and LSTMs (Long Short-Term Memory networks). This is due to their use of dilated convolutions, which enable them to capture a larger receptive field with fewer parameters as compared with the above predictive models. Furthermore, they exhibit more stable consistent training compared with these models, which suffer from the vanishing gradient problem. Another predictor architecture superior to RNNs and LSTMs is the Bi-LSTM [
22]. This network architecture has recently been proposed by [
23,
24] to predict the flow past a wind turbine rotor.
It is worth restating that the main contribution in our paper is the development of a data-driven framework to predict icing conditions on a wind turbine based on historical SCADA data without installing any new sensors on the turbine. As far as the predictor part of this framework is concerned, we utilized TCN in this paper, noting that in prior works [
20,
21], a better performance was shown for TCN compared with LSTM. It is very much possible to use other predictors such as Bi-LSTM, as discussed in [
23,
24].
The manuscript is organized as follows.
Section 2 describes the dataset used and the preprocessing steps.
Section 3 provides an overview of our TCN prediction model, followed by the icing prediction results and their discussion in
Section 4. The processes for conditioning the data and training the predictor, as well as the results, are summarized in
Section 5, which also describes potential future work.
2. Dataset and Data Preprocessing
The SCADA dataset considered, as well as its labeling of icing conditions, is described in this section. This dataset is from a wind turbine located in the northern part of the US. There are 11 SCADA data variables or features in the dataset, which are listed in
Table 1, covering a time duration from 1 January 2023 to 1 July 2023. A binary feature denoting the operation state of the turbine is also listed in this table, which is used for data preprocessing. Consecutive samples in the dataset are 10 min apart. There are a total of 26,209 data samples in the dataset. Missing values are filled by performing a linear interpolation between adjacent nonmissing data samples. In addition, weather data listed in
Table 2 for the same time period and location are obtained from the VisualCrossing weather database [
25]. The rated power of this wind turbine is 2 MW with cut-in wind speed of 4 m/s, rated wind speed of 12 m/s, and cut-out wind speed of 25 m/s.
Data samples in this dataset or other SCADA datasets may not be labeled for icing conditions. Three rules are considered here to label the data samples as “ice” or “normal” based on the icing conditions described in [
26,
27]. These rules are listed in
Table 3, which reflect temperature, relative humidity, and actual power. If all the conditions in these rules are met for a data sample, that data sample is labeled as “ice” and is represented by “1” as the output of our prediction model. Otherwise, it is labeled as “normal” and is represented by “0” as the output of our prediction model.
Different window sizes were examined, and the one with the best match between the cases of normal state (Oper_State, obtained from
Table 1) and labels created from
Table 3 (normal, represented by “0”) was selected. After the ice labeling process, it is noted that the ratio of ice data samples to normal data samples was 1 to 6. To balance the number between ice and normal samples, downsampling was performed on normal data samples within the time series to select the most representative samples in a deterministic way.
Feature selection is a data preprocessing step to reduce the number of features or variables used in a prediction model. A lower number of features reduces the complexity as well as the training time of the prediction model. This step involves removing redundant or noninformative features for icing detection and prediction. In this work, feature selection is performed by computing the Fisher Distance [
28], which is a separation measure between the distributions of a feature for the normal and icing conditions or classes as per Equation (
1):
where
and
denote the mean and standard deviation of the “ice” class distribution, and
and
denote the mean and standard deviation of the “normal” class distribution. Fisher Distance is used to rank order all the features from the most significant to the least significant ones in terms of their ability to distinguish between icing and normal conditions. There are a total of 13 features or variables, 11 features or variables from the SCADA dataset and 2 features or variables from the weather database. All 13 Fisher Distances are computed and ranked in descending order, see
Table 4. A feature with a higher Fisher Distance provides a higher separation between the “ice” and “normal” distributions or classes. As an example, the distributions for the feature or variable with the highest Fisher Distance, i.e., generated power, between “ice” and “normal” classes is shown in
Figure 1. Note the concentration of the icing case around the low-power region for this particular wind turbine. All samples shown for the “normal” and “ice” labels are flagged as “normal operating condition” (see
Table 1).
In order to select the most informative features for icing, first the highest Fisher Distance in
Table 4 or Power_Avg is used as the only input feature to our prediction model discussed in
Section 3. Then, the first two highest Fisher Distance features, that is, Power_Avg and Temp_Gear, are used as the inputs to our prediction model. The number of features is steadily increased according to their Fisher Distance till all 13 features are used as the inputs to our prediction model. Each time, the model prediction accuracy (described later in
Section 4) is computed; see
Table 5. As seen from this table, the highest accuracy is obtained when the top 12 features are used as the inputs to our prediction model. The last feature does not provide any added benefit, and thus, it is not considered as an input to our prediction model. This feature selection strategy is general and can be applied to any other SCADA dataset.
3. Prediction Model
An overview block diagram of our prediction framework is shown in
Figure 2. Our prediction framework includes three data processing blocks: Data Preprocessing, Prediction Model Components, and Prediction Evaluation. The steps of the Data Preprocessing block are described in the previous section. In this section, the Prediction Model Components block and the Prediction Model Evaluation block are described.
A Temporal Convolutional Network (TCN) is used in this work as our prediction model, noting that in other applications, this network has been shown to be more effective than other similar prediction models [
29,
30]. An illustration of the TCN architecture is shown in
Figure 3. This architecture consists of so-called convolution layers, ReLU (Rectified Linear Unit) layers, and dropout layers. The convolution layer takes in a two-dimensional input tensor or matrix of size
by
F, where
denotes the input window size and
F denotes the number of features. To obtain the output corresponding to each feature sequence or time series, the dot product of the input sequence and a kernel vector of size 3 is computed. This kernel size was found to produce the highest accuracy compared with other kernel sizes. A zero padding of size 2 is performed to ensure that the output sequence for each feature has the same length as the input sequence. Prediction is normally conducted by considering the present and previous values of the features. The number of previous feature values is shown in
Figure 4, which illustrates both the input window size and the number of features used. The experiments reported in
Section 4 showed that an input window size of 21 data samples (corresponding to 3.5 h) achieved the highest accuracy. No major changes were seen beyond this input window size for the dataset considered. The output is a binary integer number, indicating the prediction outcome (1 if the output is “ice” and 0 if the output is “normal”). After the convolution layer, a ReLU layer is used as the activation function, which addresses the vanishing gradient problem [
31]. Then, a dropout layer with a dropout probability of 0.2 as normally performed is then used for regularization to prevent overfitting during the training session [
32].
The training session starts by first ensuring that all the selected 12 features have the same dynamic range for the prediction model training; they are normalized using the standard min–max normalization approach expressed by Equation (
2):
where
x denotes a feature value, and
and
denote its minimum and maximum values in the dataset, respectively. After this normalization, the values for all the features to carry out the model training are between 0 and 1.
During the training session, for each epoch or one complete pass through the dataset, the TCN model takes in a batch of training input samples and generates an output label, which is then compared with the actual icing label. Based on the Binary Cross-Entropy loss function, the error between the predicted icing label and the actual icing label is computed via the Adam optimizer with a learning rate of 0.001 in order to update kernel weights to minimize the error. The TCN model parameters used in the training session are shown in
Table 6.
There are 288 predictors designed with prediction horizons from 10 min ahead until 2 days ahead. For example, if one desires to predict icing one hour into the future (or 6 samples ahead), each training output sample needs to be assigned an icing label corresponding to 1 h ahead, as illustrated in
Figure 5. Our prediction framework is designed in such a way that different prediction horizons can be selected or specified by the user.
4. Results and Discussion
In this section, two widely used measures derived from the so-called confusion matrix are used to evaluate the prediction model introduced in
Section 3. The measures are accuracy and
score, which are reported for different prediction horizons. The confusion matrix for the problem under consideration consists of two entries, corresponding to “ice” and “normal”; see
Table 7. This matrix consists of four entries or metrics, True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN), and their definitions are stated in
Table 8. The confusion matrix is a table that can assess the performance of a prediction model in a complete manner as compared with just accuracy. In our case, a binary prediction is performed by classifying each test sample into “ice” or “normal”. The test samples correspond to 20% of the dataset samples. These test samples were randomly selected and have no overlap with the training samples, i.e., they represent fresh data used to evaluate prediction performance.
Based on TP, TN, FP, and FN, the two commonly used performance metrics of accuracy and
score are computed as follows:
Accuracy indicates the proportion of correctly classified samples out of all the samples. Precision indicates the proportion of true positive predictions among all positive predictions. It reflects the ability of the model to avoid false positives. Recall indicates the proportion of true positive predictions among all actual positive predictions. It reflects the ability of the model to identify all positive predictions.
We use two metrics, “accuracy” and “ score”, as they are complementary. Accuracy indicates the number of times predictions are correct as reflected in the diagonal elements of the confusion matrix. score provides a measure of incorrect predictions or type I and type II errors, as reflected in the off-diagonal elements of the confusion matrix. Type I errors denote the false positive prediction samples, where normal prediction samples are incorrectly classified as ice prediction samples, and type II errors denote the false negative prediction samples, where ice prediction samples are incorrectly classified as normal prediction samples. Since the score is obtained by the harmonic mean of type I and type II errors, lower incorrectly predicted samples result in higher score values.
Our prediction framework was used to conduct icing prediction for up to 48 h or two days into the future, corresponding to 288 prediction units, with one unit denoting 10 min. The duration of the predicted icing time series covered the winter season time frame from January to April.
Figure 6 shows a plot of accuracy and
scores obtained for all prediction horizons up to two days. The best accuracy (87.2%) and
score (0.67) are obtained for the 10 min prediction horizon, which is expected, since this is the smallest horizon. To provide a better idea of performance across several prediction horizons, note that the average accuracy for prediction horizons from 10 min (zero hour) to 1 day (dash vertical line in
Figure 6) is 81.6%. The average accuracy drops to 77.6% when calculated across all prediction horizons up to 2 days. Similarly,
score averages are 0.6 and 0.5, for up to 1 day and 2 days, respectively. Additional averages can be found in
Table 9, which shows average accuracy and
scores for four different prediction horizon cases: from 10 min (0 h) to 12 h, from 12 h to 24 h, from 24 h to 36 h, and from 36 h to 48 h.
With a prediction horizon of ten minutes into the future (see Predictor1 in
Figure 6), the best prediction accuracy of 87.2% was obtained among all the cases. The confusion matrix for this case is shown in
Table 10. Also, to visualize the predictions as a function of time, a portion of the predicted icing time series for this case is shown in
Figure 7. In this figure, blue regions correspond to actual icing labels, while green regions correspond to predicted icing labels generated by our prediction model. Most of the icing durations were correctly predicted by our model, while some false alarms were also generated. Recall that the predicted time series corresponds to the one-step ahead (10 min) predictor.
It should be noted that since different datasets are used in different papers, and these datasets are not publicly available, it is not possible to compare the performance of our prediction model with the prediction models previously published. Nevertheless, the prediction accuracies obtained in this work are comparable to those previously reported. Furthermore, it is worth pointing out that when the true ice labels are unknown, a portion of the prediction errors could be due to the ice labeling procedure and not due to the prediction algorithm. What distinguishes our results from previously reported work is that we explained all the steps involved when developing a framework for icing prediction from SCADA data.
It takes 0.33 milliseconds on a PC equipped with an Nvidia GPU RTX A4000 for the TCN predictor network to provide one output sample at a specified future time in response to a time window of past SCADA data measurements. This means that the prediction can be conducted in real time for the SCADA data measurements captured every 10 min.