1. Introduction
Currently, the urgent pursuit of low-carbon economy and advances of wind power technologies are strongly driving the rapid sustainable transition in the energy sector as well as the wind power development across the world [
1,
2]. Due to the intermittent and stochastic nature, the power generation of wind farms needs to be accurately predicted at different time-scales (e.g., daily, hourly or even less) and timely reported to the dispatch center. Accurate short-term wind power forecasting can improve wind power utilization, increase system reliability, reduce operational cost nd allow flexible dispatch strategies [
3]. In particular, the ultra-short-term wind power prediction for a couple of hours ahead with the small prediction cycle (e.g., 15 min) can provide strong support for frequency modulation and spinning reserve optimization. However, such ultra-short-term prediction is often a non-trivial task that demands advanced algorithmic solutions and tools with sufficient accuracy and acceptable computational complexity.
In the literature, much research effort has been made to address the prediction issues from different aspects, e.g., electricity pricing [
4,
5] and power generation [
6], in energy systems. The available prediction tools and solutions of ultra-short-term prediction can be categorized into three classes: physical-based methods, statistical-based methods, and machine learning-based methods. The physical-based methods focus on the spatial and temporal factors in a full fluid-dynamics atmosphere model [
7], which can generally perform well in longer horizons. The statistical-based methods, e.g., autoregressive model (AR), moving-average (MA), auto-regressive integrated moving average (ARIMA) and Kalman filters, carry out statistical analysis based on the historical data to identify the internal regularity and the tendency of variations to deduce the prediction results. The machine learning-based methods include several supervised learning models, e.g., artificial neural networks (ANNs) [
8], support vector machines (SVMs) [
9], adaptive neuro-fuzzy inference systems (ANFISs) [
10] and Gaussian processes (GPs) [
11]. In addition, some combined or hybrid methods have been proposed aim to improve the prediction performance. For example, a hybrid intelligent method was proposed in [
3] using multiple support vector regression (SVR) models with the parameters estimated based on an enhanced harmony search (EHS) algorithm. In [
12], a hybrid forecasting model based on K-means clustering and an a priori algorithm was developed for short-term wind speed prediction and the prediction errors are corrected by associated rules.
In recent years, the feature selection and analysis of machine-learning based models for wind power prediction have received much attention. The prediction accuracy can be improved by exploring the information obtained from of the historical wind speed and generation time series data. In [
13], the features are firstly extracted from the historical power generation data, and then the dataset is split into subsets based on the stationary patterns. In [
14], a novel decomposition approach to fully consider the chaotic nature of wind power time series was proposed. The time series data were separated into different frequency characteristics using ensemble empirical mode decomposition (EMD) before carrying out the chaotic time series analysis and singular spectrum analysis (SSA). A forecasting model combining a support vector machine (SVM) optimized by a genetic algorithm and feature selection based on the phase space reconstruction was presented in [
15] for short-term wind speed prediction. In addition, numerical weather prediction (NWP) data (including wind speed, direction, temperature, humidity, atmospheric pressure, etc.) were adopted as the input variables for supervised models. In [
16], output data from different NWP models were used and the data with the minimum training error were selected to be used in both ANN and SVM models. Afterwards, the forecasting errors were corrected based on the model output statistics (MOS). The study in [
17] proposed a wind power prediction model based on the composite covariance function considering the joint effects among features of NWP data. In [
18], a data-driven feature extraction approach was developed to utilize unlabeled NWP data which can be used in the supervised forecasting models.
It should be noted that, as the dimensionality of input variable increases, irrelevant and redundant variables can deteriorate the prediction performance. Therefore, the selection of appropriate variables through dimensionality reduction approaches is needed [
19], and two dimensionality reduction techniques, feature selection (variable selection) and feature extraction (feature transform), are often used [
20]. The latter can produce a new feature space through mapping the original features into lower dimensional ones, e.g., singular value decomposition (SVD), principal component analysis (PCA) and locally linear embedding (LLE). However, such feature extraction methods may often lose physical properties of the original variables, and also difficult to be interpreted. In time series data analysis, the variable selection that filters out some meaningless attributes without any transformation can be more attractive than feature extraction. The variable selection methods can select the compact subset from the original dataset to improve the performance and interpretability of the prediction model.
In general, three types of feature selection methods, filter methods, wrapper methods and embedded methods, are considered [
21]. The filter method ranks the input variables with a correlation or mutual information (MI) criteria and selects the variable with the highest ranking. It is effectively a pre-processing step before the development of the predictive model [
21]. The wrapper method identifies and evaluates the subsets of input variables based on the accuracy contributed by the given output variable. Similarly, the embedded one builds the close-loop search into a classifier construction in the training process [
22]. The wrapper and embedded techniques can generally achieve better accuracy than the filter technique, but the filter method is less likely to lead to over-fitting and with less computational complexity [
23].
It should be highlighted that the existing solutions either have not been able to fully consider the available information (e.g., historical data and NWP) or select the appropriate variables for improving the prediction accuracy. To the author’s best knowledge, the technical challenge of ultra-short-term prediction of wind power generation remains and the hybrid approach based on data mining and machine learning techniques has not been thoroughly exploited.
This paper attempts to address the challenge of ultra-short-term power generation prediction in wind farms. The main technical contributions made in this work are summarized as follows: a novel hybrid algorithmic solution is presented which considers both historical generation data and NWP data, and selects the optimal combination of input features using a filter method for different prediction steps. The proposed algorithmic solution was extensively evaluated and validated through case studies of two realistic wind farms. The basic idea behind the proposed prediction solution is illustrated in
Figure 1. The long-term nonlinear dynamic characteristics of wind power time series data are extracted and recovered by using phase space reconstruction in C_C method. Afterwards, the most appropriate input variables are selected from the reconstructed phase and NWP features with respect to different forecasting steps based on the minimal redundancy and maximal relevance criterion. Finally, an adaptive neuro-fuzzy inference system based algorithmic solution with heuristically optimized parameters is adopted by using the selected input variables to produce the prediction results.
The rest of this paper is organized as follows:
Section 2 describes the input variable selection (IVS) solution based on (PSR) technique and mRMR criterion.
Section 3 presents the framework of the proposed hybrid intelligence prediction model.
Section 4 carries out the case studies and presents a set of key numerical results. Finally, the conclusions are given in
Section 5.
2. Input Variable Selection (IVS)
Due to the chaotic property of the weather system, the evolution of dynamic characteristics has initial sensitivity. The correlation between historical time series and future wind power generation will decay rapidly with the increase of forecasting time step, and even deteriorate the prediction performance. Thus, the adoption of both historical generation data and the numerical weather prediction (NWP) data as the input variables is required. NWP aims to predict the variation of weather through solving the process of thermodynamics and hydrodynamics equations based on the meteorological data of the system. However, it can only provide short-term surface wind and other weather characteristics prognoses roughly, which are not entirely adequate for specific local conditions [
24]. NWP data are often adopted to provide ancillary information, e.g., wind speed, wind direction, temperature, humidity and air pressure, for prediction. For different step-ahead prediction, these input variables can have different impacts on the forecasting targets. Therefore, a two-stage input variable selection is used in the proposed prediction solution. Firstly, the initial variables can be selected from the historical series data through the phase space reconstruction technique (PSRT), and the initial variables further can be combined with NWP information as the candidate inputs. Secondly, the optimal input variables are filtered based on mRMR criterion.
2.1. The Initial Input Variable Selection of Historical Series Using PSRT
The Lyapunov exponent can be used to prove that the wind power generation time series has chaotic characteristics. Therefore, the nonlinear dynamic characteristics of wind power time series can be extracted and recovered by using phase space reconstruction theory. In [
25], the time-delay technique was used to reconstruct a finite dimensional phase space of sampled system’s time evolution. In the time delay coordinate reconstruction, it is not only very important but also difficult to choose an appropriate time delay
τ and a good embedding dimension
m since real datasets are finite and noisy. There are currently two different viewpoints for the estimation of the aforementioned parameters. One holds that they are irrelevant and should be chosen independently. To choose time delay
τ, one can use methods including autocorrelation function, multiple-autocorrelation, mutual information, and so on. G-P algorithm or False Nearest Neighbor can be used to find the embedding dimension
m. However, it is suggested that the delay time and embedding dimension are dependent mutually. The delay time window
should be estimated for the choice of
m and
τ. The delay time window
can be estimated using C_C method [
26]. The C_C method was used to determine the optimal input variables form the historical generation with reduced computational complexity and enhanced efficiency [
27].
The phase space reconstruction is an efficient tool to analyze the dynamic pattern of a chaotic time series data. The delay-coordinate method was presented by Takens et al. to perform the phase space reconstruction. The time series
can be reconstructed into a multi-dimensional phase space
to represent the dynamic system, according to:
where
,
,
is the embedding dimension, and
is the delay time. In this study, the C_C method [
26] was constructed via two correlation integrals, developed to reconstruct the given time series
to simplify the candidate input forms.
As suggested in [
26], the correlation integral for the embedded time series is defined as:
where
,
represents the infinite norm,
denotes the index lag, and
is the Heaviside function,
. The correlation integral is a cumulative distribution function, which denotes the probability with the distance less than search radius
between any two points in the phase space. To study the nonlinear dependence and eliminate spurious temporal correlations, the given time series
must be divided into
disjoint sub-sequences. The disjoint time series can be expressed as (3):
where
is the length of subseries,
, and
denotes reserving integer of the value.
Let us construct a statistic function
as the serial correlation of a nonlinear time series, which is a dimensionless measure of nonlinear dependence. For general
t in the above disjoint time series expressed in Equation (3),
is defined as Equation (4):
Finally, when
, the following can be obtained:
If the time series data follows an independent and identical distribution,
is equal to zero constantly for fixed value
m,
t and
. However, as the real dataset is finite and the components of series are correlated,
is non-zero [
26]. The maximum deviation of
for all radius
r can be defined as (6):
Here,
N,
m and
r can be estimated based on the Brock–Dechert–Scheinkman (BDS) statistics as
,
,
, respectively, where
denotes the standard deviation of the time series. Then, Equations (7)–(9) can be obtained as:
The optimal delay time is determined when the value of first reaches zero or when reaches the first minimum point. The optimal embedding window corresponds to the global minimum point of . Furthermore, the embedding dimension m can be obtained by the following formula: .
Once the reconstruction parameters, delay time and embedding dimension m, are determined by C_C method, the initial variables related to the historical sequence are obtained as , where is the power generation values observed at current time t. The forecasting weather variables provided by NWP can be written as , where , , , and in turn represent wind speed, wind direction, temperature, humidity and air pressure at the predicted time . Therefore, candidate input variables set can be combined into , and the input set dimension is .
2.2. The Optimal Selection of Candidate Input Variables Using mRMR-Criterion Ranking
The input variables of historical generation and NWP are selected using the minimal redundancy maximal relevance (mRMR) criterion based on mutual information (MI) [
28]. As MI can measure both the linear and nonlinear dependency between variables, it has been applied for correlation measurement and variable selection [
29]. The basic idea of variable selection algorithm based on MI is to select the best subset
from the original dataset
by maximizing the joint MI between
and target output
, namely
. In the literature, many MI-based variable selection algorithms are available, e.g., mutual information feature selection (MIFS) [
30], mutual information feature selection under uniform information distribution (MIFS-U) [
31], the minimal redundancy maximal relevance (mRMR) [
28], and normalized mutual information feature selection (NMIFS) [
32]. In this work, the mutual-information-based mRMR criterion is adopted to find the compact and informative input space. The mRMR technique aims to find a subset of candidate variables with maximal dependency (with respect to the target to be predicted) as well as minimal redundancy (between the variables in the subset). The concept of MI is based on entropy that is described as follows.
The entropy of a random variable indicates the required average amount of information to describe the random variable [
33] which has been adopted in many studies [
34,
35]. The entropy of a discrete random variable
is denoted by
, where
refers to the possible values that
X can take for discrete variable or the possible value range for continuous variable.
is defined as:
where
is the probability mass function.
For any two random variables,
and
, the joint entropy is defined as:
where
is the joint probability mass function of
and
. The conditional entropy of
given
is defined as:
The conditional entropy is the amount of uncertainty left in
when a variable
is introduced, so it is less than or equal to the entropy of both variables. The conditional entropy is equal to the entropy if, and only if, the two variables are independent. Mutual Information (MI) is the amount of information that both variables share, and is defined as:
MI can be expressed as the amount of information provided by variable
, which reduces the uncertainty of variable
. MI is zero if the random variables are statistically independent. MI is symmetric, so:
The minimal-redundancy-maximal-relevance criterion (mRMR) aims to identify a compact subset of informative input variables by simultaneously considering the maximum relevance scheme and minimum redundancy. The simple combination of individually informative input variables does not necessarily achieve a good forecasting performance [
28]. Therefore, both the informativeness of individual input variables and redundancy between them should be considered. Thus, the informativeness score for individual variable
based mRMR criterion is given by:
where
is the total candidate variables,
is the selected input variables, and
is the number of variables. The mutual information
is used between the target
and the candidate input variable
to measure the strength of
relative to the forecasting process. The goal of second item is to optimally select those variables that reveal a minimum of resemblance or redundancy between them, thus making the selected set more representative or informative of the whole set.
In the implementation, a stepwise search, incremental forward selection (IFS) method, is used to select input variables according to Equation (15), in which greater
scores indicate more promising input variable
. In the first step, Max-Relevance score of all candidate input variable is calculated, where the variable with the maximum
score is determined as the first promising input variable:
The rest of the variables are selected step by step according to the criterion in Equation (15). In step
(
), it is supposed that an input variable subset
, composed of
promising variables,
, that has been selected from previous step (e.g., step
). The
promising variable can be selected from
at step
by optimizing the following condition:
As one input variable represents one step forward, the promising variables can be incrementally retrieved until step
where a total of input variables
are selected. The variables are also ranked in selection process and the informativeness score (InSc) in
step is given by:
where
is the most promising variable to be selected in
step according to Equation (17). Thus, the priority of candidate input variables can be ranked through the mRMR-based incremental forward selection (IFS) method. The cumulative amount of
, denoted as
, indicates the information contributed from the newly added variable. The final optimal number of input variables can be determined according to the change trend of
.
4. Performance Assessment and Numerical Result
To extensively verify the reliability of the proposed prediction solution, the performance assessment based on the data collected in two real wind farms with different locations and seasons are carried out: Anzishan wind farm (capacity of 45 MW, Henan, China, hub height of 70 m) in June 2017, and Xuqiao wind farm (capacity of 94 MW, Anhui, China, hub height of 90 m) in December 2017. The reference height of these two wind farms is 10 m. The power generation (directly measured) and NWP data with a 15-min interval (e.g., 96 observation values per day) are obtained from these two wind farms. The three-month data (about 8640 observation values) prior to the test month, March to May 2017 and September to November 2017, are used in the training process for Anzishan farm and Xuqiao farm, respectively. The proposed hybrid solution is implemented in the MATLAB (version 8.3, MathWorks, Natick, MA, USA) programming environment.
4.1. Input Variable Selection Process
The time series data of power generation in the previous month are used to determine the phase-space reconstruction parameters through the C_C method. With different spatial and temporal characteristics, the parameters of each wind farm will be determined, respectively. For Anzishan farm, it can be observed in
Figure 3 that
have the first local minimum when
t is equal to 26, thus the optimal delay time
is set to 26. The global minimum point of
corresponds to the optimal embedding window
as shown in the figure. Thus, the embedding dimension
m can be calculated as 3. As for Xuqiao farm, the optimal delay time
is also equal to 26 as shown in
Figure 4. The embedding window
is observed as 134, thus the optimal embedding dimension
m is 6.
The historical time series data can be selected as
m-dimension input variables using the phase-space reconstruction. For the current time
t, the initial input variables related to the historical data are
, where
is the power generation values observed at current time
t. The
m-dimensional input variables are in turn denoted as the set:
. In the same way, the forecasting weather variables provided by NWP data include wind speed, trigonometric wind direction, temperature, humidity, and atmospheric pressure.
is in turn denoted as the set:
. Based on the obtained reconstruction parameters by the C_C method, the candidate input variables set of two wind farms can be obtained as follows:
Afterwards, the candidate variables are sorted through mRMR criterion to rank the predictive strength of each input variable. By observing the variation trend of the cumulative amount of , that is, when the curve no longer increases or grows very slowly, the optimal number of input variables is selected. As the input variable selection is carried out adaptively in the prediction model, the selected input variables may vary for different prediction steps.
Based on proposed mRMR-criterion filter solution,
Figure 5 shows the changes of
curves of Anzishan and Xuqiao wind farms in different predicted steps (e.g., 1 h, 2 h, 3 h and 4 h ahead), respectively. The number of input variables is selected accordingly as shown in
Figure 5. In this study, when the
curve reaches the maxima, it is believed that adding the following variables at the back of extreme point will not add more useful information. Therefore, the candidate variables before the cumulative information maximum are regarded as the optimal or near optimal input variables to the prediction model. The detailed ranking of candidate variables and the number of input choices in multi-step ahead prediction for each farm are shown in
Table 1 and
Table 2, respectively, where the selected input variables are highlighted in shade. It can be seen that the variables ranking in the step of proximity is similar and asymptotic. For different step-ahead prediction, the proposed hybrid solution with ranking the predictive strength candidate variables can select a compact subset of informative input variables based on the max-relevance and min-redundancy, which can effectively reduce the input dimension and interference information.
4.2. Case Study and Numerical Result
For different prediction steps, the corresponding selected samples are used to train the aforementioned hybrid prediction solution in
Section 3. Here, the main parameters and settings for training the optimal ANFIS are summarized in
Table 3. After determining the ANFIS parameters by training samples, the multi-step ahead prediction results in the test month can be obtained.
Figure 6 presents the prediction result of 150 h from the test month for two wind farms, respectively.
To evaluate the effectiveness and accuracy of the prediction solution, the performance metric in terms of normalized root mean squared error (nRMSE) and normalized mean absolute error (nMAE) are adopted [
40], as given in Equations (27) and (28), respectively. In general, smaller values of these measures indicate that the corresponding solution has better prediction performance.
where
is the predicted power of time point
,
is the measured mean power of time point
,
N is the number of prediction samples, and
is the operating capacity of time point
.
To evaluate the proposed mRMR-criterion input variable selection (IVS) solution based prediction approach, mRMR-IVS model, a detailed comparative study was conducted for multi-step ahead wind power generation prediction. Two IVS based prediction approaches, the phase space reconstruction based (PSR-IVS) model and principal component analysis based (PCA-IVS) model, were selected as the comparative benchmark. For PSR-IVS based prediction model, the input variables included the NWP variables and the phase space reconstruction variables of the historical time series. The reconstruction was determined based on C_C method as well. In this model, the input variables were the ones which are candidates in proposed mRMR-IVS solution, without further exquisite selection. For PCA-IVS based model, the input variables were transformed from the combination of NWP variables and a time series of 2 h, which use the principal component analysis (PCA) technique [
41] to map the dataset from the original space to the principal component space. In this model, the original attribute variables are automatically reduced to appropriate input variables and the independent principal components can well maintain the key characteristics of the original variables. After selecting or extracting the input variables, all the IVS-based approaches used the hybrid model introduced in
Section 3 to carry out the predication.
Two error criteria, nRMSE and nMAE, were used to assess the performance of all considered prediction models.
Table 4 and
Table 5 show the comparison of multi-step ahead prediction performance of different models for two wind farms, respectively. As shown in
Table 4 and
Table 5, the proposed model demonstrates the smallest error over all 16 steps of prediction compared with both benchmark models. In addition, compared with the principal component analysis-based model (PCA-IVS), the phase space reconstruction based (PSR-IVS) model performs better in short step prediction, but has larger error in long prediction period. Due to the sophisticated and targeted input variable selection, the proposed model has a better performance than the benchmark models in the overall multi-step ahead prediction.
To present the comparison more intuitively,
Figure 7 and
Figure 8 show the broken line in two wind farms of three cases based on the values of nRMSE and nMAE of different prediction steps. The curve trend shows that the error level rises with the increase of prediction step, meeting the objective expectation. It is shown that there are fluctuations in the curve of two benchmark models, especially in the intermediate prediction period, in that the selection of input variables cannot adapt to each prediction step. In the proposed model, the error trend increased smoothly, indicating that the mRMR-IVS based model can automatically select the optimal or nearly optimal input variables for different prediction steps to reduce the error. It means that the proposed hybrid solution can select suitable input variables effectively in different geographical environments and seasons, showing better adaptability and robustness.
To further validate the integral time-scales of ultra-short-term wind power prediction, which is generally 0–4 h in the future with a resolution of 15 min, 100 integrated time series with consecutive 4-h prediction were randomly selected during the test-month forecasting process to calculate the performance metric in statistics. The nRMSE and nMAE indicators of each integrated time series were calculated. Then, the probability of different error levels could be obtained according to the statistics of appearing frequency in these 100 results. The average errors and the probability distribution of typical error levels of Anzishan and Xuqiao wind farms are shown in
Table 6 and
Table 7, respectively. For both cases, the proposed mRMR-IVS based solution demonstrates the minimum mean errors. The results given in
Table 6 and
Table 7 verify that the proposed model is more competitive for most probability distributions at different error levels. This clearly confirms that the proposed solution can provide improved prediction performance than the comparison benchmarks.