A Hybrid Neural Network Model for Short-Term Wind Speed Forecasting

Lv, Shengxiang; Wang, Lin; Wang, Sirui

doi:10.3390/en16041841

Open AccessArticle

A Hybrid Neural Network Model for Short-Term Wind Speed Forecasting

by

Shengxiang Lv

¹,

Lin Wang

²

and

Sirui Wang

^2,*

¹

School of Business Administration, Guangdong University of Finance & Economics, Guangzhou 510320, China

²

School of Management, Huazhong University of Science and Technology, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(4), 1841; https://doi.org/10.3390/en16041841

Submission received: 10 January 2023 / Revised: 5 February 2023 / Accepted: 6 February 2023 / Published: 13 February 2023

(This article belongs to the Section A3: Wind, Wave and Tidal Energy)

Download

Browse Figures

Versions Notes

Abstract

:

This study proposes an effective wind speed forecasting model combining a data processing strategy, neural network predictor, and parameter optimization method. (a) Variational mode decomposition (VMD) is adopted to decompose the wind speed data into multiple subseries where each subseries contains unique local characteristics, and all the subseries are converted into two-dimensional samples. (b) A gated recurrent unit (GRU) is sequentially modeled based on the obtained samples and makes the predictions for future wind speed. (c) The grid search with rolling cross-validation (GSRCV) is designed to simultaneously optimize the key parameters of VMD and GRU. To evaluate the effectiveness of the proposed VMD-GRU-GSRCV model, comparative experiments based on hourly wind speed data collected from the National Renewable Energy Laboratory are implemented. Numerical results show that the root mean square error, mean absolute error, mean absolute percentage error, and symmetric mean absolute percentage error of this proposed model reach 0.2047, 0.1435, 3.77%, and 3.74%, respectively, which outperform the benchmark predictions using popular parameter optimization methods, data processing techniques, and hybrid neural network forecasting models.

Keywords:

wind speed forecasting; variational mode decomposition; gated recurrent unit; grid search; rolling cross-validation

1. Introduction

With the development of the economy and society, the demand for energy resources is increasing. Due to the high pollution and non-renewable nature of traditional fossil energy sources, developing clean and renewable energy has thus become an inevitable trend. Owing to the advantage of being renewable and environmentally friendly, wind energy has been greatly developed around the world. For example, the Global Wind Energy Council (GWEC) reported that in 2021, the cumulative installed capacity of global wind energy reached 837 GW, which realized a year-on-year increase of 12.4%. Asia-Pacific had the largest share of newly installed wind power capacity in the world in 2021 (59%), followed by Europe (19%) and North America (14%) [1]. Wind speed is a key factor of wind power generation, and the effective generation and management of wind power rely on high-quality wind speed forecasting. According to the time interval of wind speed observations, wind speed forecasting can be classified into ultra-short-term forecasting (a few seconds to 30 min), short-term forecasting (30 min–6 h), medium-term (6 h–1 day), and long-term forecasting (more than one day). This study focuses on hourly short-term wind speed forecasting, where the research findings can be beneficial for wind power load dispatch planning and grid energy allocation.

1.1. Related Works

The main wind speed forecasting methods include physical models, spatiotemporal correlation (STC) models, statistical models, and machine learning models. Physical models predict the wind speed based on the numerical weather prediction system which considers multiple meteorological factors [2]. Physical models are more accurate when forecasting medium- and long-term wind speed but perform poorly on short-term forecasting tasks because of information latency [3]. STC models consider the cross-correlations among wind speeds at different sites [4]. Massive amounts of data are needed for STC models and the corresponding computation cost is expensive. Statistical models fit the linear relations between historical records and future wind speed but can have poor performance when the wind speed series present strong nonlinear features [5]. Many studies have reported the effectiveness of machine learning models for predicting wind speed owning complex characteristics [6,7,8]. Among various machine learning models, the artificial neural networks (ANNs) are popular techniques and have been widely studied by many scholars [9,10,11]; commonly-used ANN models include back propagation neural network (BPNN) [12], extreme learning machine (ELM) [13], autoencoder [14], and so on. Among all the neural networks, recurrent neural networks (RNNs) are particularly designed for time series forecasting. The recurrent model structure of RNN helps to accurately capture the temporal dependency among input variables in a sequential way. The Elman neural network (ENN) is a classic kind of RNN model, which has been adopted in many wind speed forecasting researches [15,16]. Several extensions of RNN models are also studied. For example, Nasiri and Ebadzadeh [17] designed a multi-functional recurrent fuzzy neural network for improving the forecasting performance of chaotic time series. Liu et al. [18] used a stacked RNN model to conduct wind power forecasting. Among recent developments, long short-term memory (LSTM) and gated recurrent unit (GRU) are two popular variations of RNN which can reduce the risk of gradient vanishing by using the novel gating mechanism. López and Arboleya [19] combined linear regression, LSTM, and a dynamic neural network to forecast short-term wind speed. Memarzadeh and Keynia [20] also used LSTM to predict wind speed, and the optimal structure of LSTM was tuned by using the crow search algorithm. Wu et al. [21] adopted a GRU model to predict the future wind speed based on multiple meteorological factors. Sun et al. [22] combined the LSTM, GRU, and deep belief network to enhance the performance of short-term wind speed forecasting. Several advanced network structures are also considered for wind speed forecasting. Niu et al. [23] used an attention-based GRU (AGRU) model to enhance the performance of wind power forecasting, where the attention mechanism was used to highlight the critical factors that influence the fluctuation of wind power. Tian et al. [24] also combined the attention mechanism and GRU to conduct wind power forecasting. Joseph et al. [25] adopted bidirectional network structure to improve the performance on real-time wind speed forecasting. The bidirectional GRU (BiGRU) model was used by Yu et al. [26] to predict the future wind power.

On account of the volatility of the weather system, many data processing techniques have been combined with neural networks to extract the local characteristics of wind speed and further enhance the prediction accuracy. The majority of these hybrid models follow the paradigm of “Decomposition-Forecasting-Integrating” (Dec-Fore-Int), where the original wind speed is decomposed into a set of subseries, each subseries is processed by using individual forecasting model, and the final forecasting result is formed by integrating all the subseries predictions. For example, Wang et al. [27] used empirical mode decomposition (EMD) to decompose wind speed data into multiple intrinsic mode functions (IMFs). Each IMF was predicted by using the ENN model, and the prediction values of original wind speed were calculated by the sum of the forecasting values of every IMF. Santhosh et al. [28] used the ensemble EMD (EEMD) to decompose wind speed series, and each IMF is processed by using the deep Boltzmann machine. He et al. [29] used EEMD and the quantile regression neural network to realize the Dec-Fore-Int framework. The variational mode decomposition (VMD) is a novel decomposition method that has been broadly applied in wind speed forecasting [30,31]. Li et al. [32] adopted the LSTM model to process the subseries obtained from the VMD method. Hu et al. [33] also used VMD to extract the local features of wind speed data. The Dec-Fore-Int paradigm was also adopted by Wang et al. [34], where VMD was used to decompose the wind speed, and the modified reformer model was used to predict each mode. Many scholars believe that optimizing the parameters of the hybrid neural network model is a key issue. For example, Zhang et al. [35] determined the mode number of VMD according to the center frequencies of the decomposed subseries, and the backtracking search algorithm was used to optimize the structure of ELM model. Similarly, Wu et al. [36] used the ratio of residual energy to select the parameter of VMD and used differential evolution algorithm to tune the structure parameters of the transformer model.

1.2. Motivations

The research introduced above indicates that the hybrid models combining data processing method and neural network have been some of the mainstream techniques for wind speed forecasting, and many valuable ideas have been proposed in recent years. There are still two issues that can be solved for further enhancing the effectiveness of current wind speed forecasting systems.

The majority of existing studies of hybrid models prefer to individually process and predict each decomposed subseries, and then add up their respective predictions to construct the final forecasting result. Although research reports that this decomposition-based ensemble paradigm can achieve satisfactory accuracy performance, this kind of hybrid model paradigm can be source-consuming since multiple model fitting and forecasting times are required. An efficient hybrid model paradigm for wind speed forecasting can be exploited.

On the other hand, current studies often optimize the parameters of data processing methods and the parameters of forecasting models in a decentralized way. For example, the number of decomposed subseries are often determined only based on the decomposition performance of data processing techniques, while the structure parameters of neural networks are often determined without considering the influence of decomposition parameters. Since the influence of parameters on the whole hybrid model is not considered, a suboptimal performance can be generated. An integrated optimization for the parameters of the hybrid wind speed forecasting model remains to be exploited.

1.3. Contributions

In this study, an effective combination of a data processing technique and neural network model is proposed for enhancing the forecasting accuracy of wind speed. The main works and contributions of this paper can be described as follows:

(a): A hybrid neural network model combining VMD and GRU is designed for short-term wind speed forecasting. In this hybrid model, the VMD method is used to extract the local characteristics of wind speed. All the obtained subseries are converted into two-dimensional samples for training the GRU neural network and obtaining the predictions of future wind speed.
(b): A grid search with a rolling cross-validation (GSRCV) method is proposed to integrally search the best structure of VMD and GRU. Three key parameters, including the number of decomposed modes in VMD, length of input time steps in GRU, and the number of neurons in the hidden layer of GRU, are concurrently optimized by considering the parameter’s influence on the accuracy of the finally constructed hybrid model.
(c): A comprehensive experiment and analysis based on real-world wind speed data are implemented to evaluate the performance of the proposed VMD-GRU-GSRCV model. The effectiveness of the GSRCV parameter optimization method and VMD data processing strategy are evaluated, respectively, and the overall superiority of the proposed model in comparison with popular hybrid neural network forecasting benchmarks are also verified.

The rest of this paper is organized as follows. Section 2 introduces the basic methodologies of VMD, GRU, and GSRCV, respectively. Section 3 presents the framework of the proposed model. The real-world wind speed experiments are presented in Section 4 in detail. Section 5 concludes the whole paper. Figure 1 presents the flowchart of the main contents of this study.

2. Methodology

This section introduces the basic methodologies of three main modules for developing the proposed VMD-GRU-GSRCV wind speed forecasting model, which include the VMD for wind speed data processing, GRU for wind speed forecasting, and GSRCV for integrally optimizing the key parameters of VMD and GRU.

2.1. Variational Mode Decomposition

The VMD is a data analysis method that decomposes the non-stationary time series into M sub-modes μ_m (1 ≤ m ≤ M), where each mode owns a center frequency ω_m [32,33]. The goal of VMD is to minimize the sum of the frequency bandwidth of each mode based on the constraint that the sum of all the decomposed modes should be equal to the original signal f, which is shown in Equation (1). In Equation (1), δ denotes the Dirac distribution, and * denotes convolution.

\begin{array}{l} \min_{\{μ_{m}\}, \{ω_{m}\}} \{\sum_{m = 1}^{M} {‖\partial_{t} [(δ (t) + j / π t) * μ_{m} (t)] e^{- j ω_{m} t}‖}_{2}^{2}\} \\ s . t . \sum_{m = 1}^{M} μ_{m} = f \end{array}

(1)

The penalty coefficient α and Lagrangian multipliers λ(t) are then introduced to convert this original constrained problem to an unconstrained problem shown in Equation (2):

\begin{array}{l} L (\{μ_{m}\}, \{ω_{m}\}, λ) = α \sum_{m = 1}^{M} {‖\partial_{t} [(δ (t) + j / π t) * μ_{m} (t)] e^{- j ω_{m} t}‖}_{2}^{2} + {‖f (t) - \sum_{m = 1}^{M} μ_{m} (t)‖}_{2}^{2} \\ + 〈λ (t), f (t) - \sum_{m = 1}^{M} μ_{m} (t)〉 \end{array}

(2)

This converted objective is then solved by using the alternate direction method of the multipliers (ADMM) algorithm, which iteratively updates

μ_{m}^{n + 1}

,

ω_{m}^{n + 1}

, and

λ^{n + 1}

to get the saddle point of the objective function in Equation (2).

Specifically,

μ_{m}^{n + 1}

and

ω_{m}^{n + 1}

are updated by using Equations (3) and (4), respectively. Among Equations (3) and (4),

{\hat{μ}}_{m}^{n + 1} (ω)

,

{\hat{μ}}_{i} (ω)

,

\hat{f} (ω)

, and

\hat{λ} (ω)

are the Fourier transform of

μ_{m}^{n + 1} (t),

μ_{i} (t)

,

f (t)

, and

λ (t)

, respectively, n is the number of iterations, and α is set to be 2000 in this study. Note that

μ_{m}^{n + 1}

can be obtained by the inverse Fourier transform of the real part of

{\hat{μ}}_{m}^{n + 1} (ω)

. Equation (5) updates

{\hat{λ}}^{n + 1} (ω)

, where τ is the corresponding updating parameter.

{\hat{μ}}_{m}^{n + 1} (ω) = \frac{\hat{f} (ω) - \sum_{i \neq m} {\hat{μ}}_{i} (ω) + \hat{λ} (ω) / 2}{1 + 2 α {(ω - ω_{m})}^{2}}

(3)

ω_{m}^{n + 1} = \frac{\int_{0}^{\infty} ω {|{\hat{μ}}_{m}^{n + 1} (ω)|}^{2} d ω}{\int_{0}^{\infty} {|{\hat{μ}}_{m}^{n + 1} (ω)|}^{2} d ω}

(4)

{\hat{λ}}^{n + 1} (ω) = {\hat{λ}}^{n} (ω) + τ (\hat{f} (ω) - \sum_{m = 1}^{M} {\hat{μ}}_{m}^{n + 1} (ω))

(5)

The termination condition of the iteration is shown by Equation (6), where ε denotes the tolerance of convergence criterion. That is, ADMM iteratively uses Equations (3)–(5) to update

μ_{m}^{n + 1}

,

ω_{m}^{n + 1}

, and

λ^{n + 1}

, respectively, until Equation (6) is satisfied.

\sum_{m} \frac{{‖μ_{m}^{n + 1} - μ_{m}^{n}‖}_{2}^{2}}{{‖μ_{m}^{n}‖}_{2}^{2}} < ε

(6)

2.2. Gated Recurrent Unit

A typical GRU cell is shown in Figure 2. Given a two-dimensional input sequence X = {x₁, x₂,…, x_T} owning T rows and M columns, where T is the length of input time steps and M is the feature dimension of x_t (1 ≤ t ≤ T), the forward propagation process of GRU at time step t is realized as follows:

\begin{array}{l} Reset gate : r_{t} = σ (W_{r x} x_{t} + W_{r h} h_{t - 1} + b_{r}) \\ Update gate : z_{t} = σ (W_{z x} x_{t} + W_{z h} h_{t - 1} + b_{z}) \\ Hidden activation : h_{t} = (1 - z_{t}) \otimes h_{t - 1} + z_{t} \otimes \tilde{h_{t}} \\ w h e r e \tilde{h_{t}} = \tanh (W_{h h} (r_{t} \otimes h_{t - 1}) + W_{h x} x_{t} + b_{h}) \end{array}

(7)

In Equation (7), W_rx, W_rh are the weight matrices of the reset gate, W_zx and W_zh are the weight matrices of the update gate, and W_hh and W_hx are the weight matrices to calculate the output of hidden layer. Operator ‘⊗’ denotes the Hadamard product, σ(∙) is the sigmoid function, and tanh(∙) is the hyperbolic tangent function.

2.3. Grid Search with Rolling Cross-Validation

In this study, the GSRCV is adopted to select the best value combination of three key parameters, which includes the number of decomposed modes in VMD (M), the length of input time steps in GRU (T), and the number of neurons in the hidden layer of GRU (H). The candidate combinations of three parameter values are generated based on their respective search ranges and search steps. After that, the rolling cross-validation procedure [37,38] is implemented for each possible parameter combination, which is illustrated in Figure 3. Specifically, the available dataset is sequentially and equally divided into K complementary subsets. In the kth (1 ≤ k ≤ K − 2) round of train-validation, the first k subsets are used as training samples for training the VMD-GRU model based on the tried combination of parameter values, while the (k + 2)th subset is used as validation samples for evaluating the performance of the trained model. It can be found that in rolling cross-validation, the training samples are prior to the paired validation samples to avoid time series data leakage. Finally, the average prediction accuracy on all the K − 2 validation subsets is calculated. The GSRCV procedure tries every possible combination of parameter values and selects the combination which performs best in terms of the average validation accuracy.

3. Framework of Proposed Model

3.1. Model Input

In this study, the input of the GRU forecasting model is the historical values of the subseries obtained by using the VMD method, which is illustrated by Figure 4. As can be seen in Figure 4, the input of GRU is the two-dimensional matrix owning T rows and M columns, where T is the length of time steps in each sample and M is the number of modes used in VMD. Accordingly, the proposed model can be expressed as:

\begin{array}{l} {\hat{y}}_{p} = GRU ({[x_{p - T}, x_{p - T + 1}, \dots, x_{p - 1}]}^{T}) \\ where : x_{p - t} = [μ_{1, p - t}, μ_{2, p - t}, \dots, μ_{M, p - t}], 1 \leq t \leq T \end{array}

(8)

3.2. Overall Procedure

Generally, the proposed wind speed forecasting model first uses VMD to decompose the original wind speed time series into multiple subseries, which are then converted into two-dimensional samples according to the method shown in Equation (8) and Figure 4. After that, the generated samples are used to train the GRU model. Based on this VMD-GRU framework and given data, GSRCV is implemented for determining the best combination of the key parameters in VMD and GRU.

Accordingly, the overall procedure of the proposed VMD-GRU-GSRCV model is presented by Figure 5. The proposed model follows an integrated subseries input and parameter optimization strategy, which is different from classical solutions. The detailed implementation process can be described as follows:

Step 1: Based on the designed VMD-GRU model framework, GSRCV is adopted to determine the suitable value combination of three parameters, including the number of modes in VMD (M), the length of input time steps in GRU (T), and the number of neurons in the hidden layer of GRU (H).

Step 2: Given the wind speed data, VMD is used to decompose the time series into M subseries.

Step 3: The obtained M subseries is converted into the set of two-dimensional samples where each sample contains T rows and M columns.

Step 4: The obtained samples are adopted to train the GRU model constructed by using the optimized value of H as the number of neurons in the hidden layer.

Step 5. The trained GRU model is used to produce the final predictions based on the provided input samples.

4. Experiments

4.1. Collected Data

In this study, the hourly wind speed data from the National Renewable Energy Laboratory (NREL) [39] are used to implement the comparative experiment, and a total number of 8759 hourly data points from 1 January 2021 to 31 December 2021 are collected. Figure 6 shows the time series of these data points, and Table 1 presents the related statistical information. It can be found that this short-term wind speed shows fluctuating characteristics according to the values of range and standard deviation. Results of skewness indicate that all the wind speed samples are positively skewed, and results of Kurtosis are larger than zero. The wind speed variable does not comply with the normal distribution.

In this study, the last 1440 records of wind speed are used as the test set for out-of-sample forecasting, while the former and remaining records are used as the training samples for model building. Before training the models, all the wind speed data is linearly normalized to the interval [0, 1].

4.2. Parameters Setting

Table 2 provides the basic information about the parameters used in this study.

4.3. Evaluation Metrics

The error indicators for evaluating the model performance in this study include the root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and symmetric mean absolute percentage error (SMAPE) [36,38], which are expressed by using Equations (9)–(12), respectively. In Equations (9)–(12), N is the number of samples in the test set, y_t is the actual value of wind speed, and

{\hat{y}}_{t}

is the corresponding prediction of y_t.

RMSE = \sqrt{\frac{1}{N} \sum_{t = 1}^{N} {(y_{t} - {\hat{y}}_{t})}^{2}}

(9)

MAE = \frac{1}{N} \sum_{t = 1}^{N} |y_{t} - {\hat{y}}_{t}|

(10)

MAPE = \frac{100 %}{N} \sum_{t = 1}^{N} |\frac{y_{t} - {\hat{y}}_{t}}{y_{t}}|

(11)

SMAPE = \frac{100 %}{N} \sum_{t = 1}^{N} (\frac{|y_{t} - {\hat{y}}_{t}|}{(|y_{t}| + |{\hat{y}}_{t}|) / 2})

(12)

4.4. Results and Analysis

4.4.1. Effect of Parameter Optimization

The first experiment evaluates the effectiveness of the GSRCV parameter optimization method. The decentralized parameter optimization strategy is considered as the benchmark since many studies have adopted it to determine the parameters of hybrid models [33,35]. In this benchmark, for determining the number of modes in VMD, the center frequencies of the decomposed modes under different M values are calculated. Once similar frequencies appear since the number of modes reaches M, then M − 1 will be chosen as the optimal number of modes. For determining the length of input time steps in GRU, the Bayesian information criterion (BIC) under different T values is calculated, and the T values performing the best BIC are selected as the optimized length of input time steps. Given the optimized values of M and T, the typical hold-out evaluation [23] is adopted for determining the best number of neurons in the hidden layer of GRU.

Considering the inherent randomness, all the models are run 100 times and the average performance is analyzed. Accordingly, the Wilcoxon rank sum test based on 100 SMAPE values is conducted for each benchmark model to statistically verify its accuracy difference to the proposed model.

Table 3 shows the parameter selection results of the decentralized parameter optimization strategy and GSRCV method, respectively. It can be found that the GSRCV method suggests a larger number of decomposed modes and length of input time steps.

Table 4 presents the average performance of the GSRCV method and decentralized parameter optimization method, respectively, and Figure 7 shows the boxplot of each model using the SMAPE results obtained from 100 model runs. Figure 8 shows the predictions of different parameter optimization strategies, where the actual and predicted time series of the last 200 points in the test set are plotted to more clearly present the prediction differences among different models.

It can be found in Table 4 and Figure 7 and Figure 8, that the proposed model using the GSRCV method obtains better out-of-sample performance than the hybrid model using the decentralized parameter optimization strategy, which indicates that the GSRCV method can help find better combination of three key parameters. With the help of GSRCV, the average MAPE reduces from 5.66% to 3.77%, and the average SMAPE reduces from 5.57% to 3.74%. Consistent error reduction can also be found in terms of RMSE and MAE results. The p-value of the Wilcoxon rank sum test indicates that the accuracy difference between GSRCV and decentralized strategy is significant at the significance level of 0.05. The proposed model using the GSRCV method uses more computation time than the model using the decentralized parameter optimization method since the former generates a model owning a more complex structure. The box plot in Figure 7 shows that the parameters obtained by using decentralized strategy generate more stable predictions but are not accurate enough. Overall, this comparison demonstrates that the proposed GSRCV strategy is better than decentralized strategy for optimizing the key parameters of hybrid neural network model.

4.4.2. Effect of Data Processing

The second experiment evaluates the effectiveness of the VMD data processing method on the forecasting performance. Three data processing techniques are selected to verify the performance of VMD. The first data processing benchmark uses the original and historical wind speed to build the GRU model without considering any data decomposition strategy. The second and third benchmarks adopt EMD and EEMD to decompose the wind speed, respectively, since many related studies have reported the superiority of using these two data decomposition methods for getting a better forecasting performance [27,28]. For EMD and EEMD, their number of decomposed subseries is adaptively determined based on the length of wind speed signal [40]. The length of input time steps of three benchmarks are set to be the same as in the proposed model.

Figure 9 presents the first 10 subseries decomposed by using VMD, where the first 400 observations are used to illustrate the results. It can be found that the original wind speed is decomposed into multiple subseries, owning unique local characteristics. The long-term trend is exploited by using the low-frequency subseries, and the short-term fluctuation is represented by using the high-frequency subseries.

Table 5 presents the average performance of the proposed model using VMD and three benchmark data processing methods, respectively. Figure 10 presents the boxplot of each model based on the SMAPE results obtained from 100 model runs. Figure 11 shows the predictions of the proposed model using different data processing strategies.

Results in Table 5 and Figure 10 and Figure 11 indicate that the utilization of the data decomposition strategy can remarkably reduce the forecasting error. Take the SMAPE results as an example, the accuracies of three models using data decomposition methods (EMD, EEMD, and VMD) are all better than the accuracy of the model directly using original wind speed data. Among the three data decomposition methods, VMD achieves the smallest SMAPE (3.74%), which is much better than EMD (20.00%) and EEMD (20.66%). Consistent accuracy improvements are also found in terms of RMSE, MAE, and MAPE. EMD and EEMD bring similar accuracy results. The superiority of VMD is supported by the results of the Wilcoxon rank sum test, which show that a significant accuracy difference exists between VMD and the other two data processing methods. Figure 10 shows that the proposed model using VMD can generate more stable results than using EMD and EEMD. Results of the computation time show that the decomposition-based model is more time-consuming than the model without using data decomposition method. To summarize, the proposed model using the VMD data processing method can achieve a better forecasting performance.

4.4.3. Comparison with Hybrid Neural Networks

In the third experiment, the overall effectiveness of the proposed model is evaluated. Four hybrid neural networks are selected as comparative models, where BPNN, ENN, BiGRU, and AGRU are used as forecasting models, respectively. BPNN and ENN are two classical neural network predictors which have been studied for many years and applied in many wind speed forecasting studies [12,27]. As two advanced neural network structures, bidirectional network, and attention mechanism have raised much concern in recent research of wind energy forecasting [41,42]. In this experiment, these four benchmarks all use VMD as data decomposition method. Besides, the Dec-Fore-Int model is considered as the fifth benchmark, which is one of the most popular paradigms of hybrid models for wind speed forecasting [28,32,33,34]. In the Dec-Fore-Int benchmark, each VMD subseries is individually predicted by using the GRU network, and the final forecasting results are obtained by adding up predictions of all subseries. For a fair comparison, the key parameters of these five benchmarks are the same as those set in the proposed model, which are selected by using the GSRCV method. The training settings of different neural networks are also set as the same, i.e., the number of epochs is set as 100, and the batch size is set as 12.

The average performance results of all the models are presented in Table 6, and the specific SMAPE results of 100 model runs are illustrated in Figure 12. Figure 13 shows the predictions of the proposed model and five comparative hybrid neural networks.

Results in Table 6 and Figure 12 and Figure 13 show that the considered hybrid neural networks all exhibit high out-of-sample accuracy, which demonstrates the effectiveness of hybrid neural network models for wind speed forecasting. The proposed model outperforms BPNN and ENN. Compared with these two benchmarks, the proposed model can reduce SMAPE by 17.80% and 8.11%, respectively. The corresponding p-value results indicate that the proposed model is remarkably better than BPNN and ENN at a significance level of 0.05. Compared with two advanced benchmarks (BiGRU and AGRU), the proposed model obtains comparable accuracy, which is supported by the results of the Wilcoxon rank sum test. There is no significant accuracy difference between the proposed model and two advanced hybrid neural network benchmarks. It is found that the Dec-Fore-Int achieves comparable accuracy with the proposed model but costs a lot of time for model training. Therefore, compared with Dec-Fore-Int, the proposed model is more efficient. Figure 12 shows that BPNN, ENN, and BiGRU can more easily generate extreme errors. The predictions of AGRU, Dec-Fore-Int, and proposed model are relatively stable.

5. Conclusions and Future Researches

This study aims to achieve accurate short-term wind speed forecasting by using the proposed VMD-GRU-GSRCV model. Experiments based on the NREL hourly wind speed records are implemented to evaluate the performance of the proposed method. The main conclusions can be summarized as follows.

The proposed model is an effective combination of a data decomposition technique and neural network forecasting model. The VMD method is first used to decompose the original wind speed into a set of subseries, owning unique local characteristics, which are converted into a set of two-dimensional samples for building the GRU predictor. The key parameters in VMD and GRU, including the number of decomposed modes in VMD, the length of input time steps in GRU, and the number of neurons in the hidden layer of GRU, are integrally optimized by using the GSRCV strategy.

The results of multiple comparative experiments demonstrate the effectiveness of this proposed model. In comparison with a decentralized parameter optimization strategy, the designed GSRCV procedure can help find a better combination of key model parameters. In comparison with other popular data processing strategies, VMD can better extract the local characteristics that existed in the original wind speed, which remarkably enhances the forecasting accuracy. The proposed hybrid model can obtain better performance than other hybrid forecasting models using data processing strategies and neural networks. The superiority of the proposed VMD-GRU-GSRCV model is supported by multiple statistical accuracy measures, such as MAE, RMSE, MAPE, SMAPE, and the p-value of the Wilcoxon rank sum test.

As a hybrid model based on data decomposition and the neural network, the proposed model only uses one data decomposition method, which may not fully exploit the predictive features hidden in the complex wind speed time series. Another limitation is that, since the decomposed subseries are converted into two-dimensional input samples, the length of input time steps of each subseries is thus set as the same. Considering that different subseries own unique temporal features, their suitable input time steps can be changed.

Based on the conclusions from this study, several works can be further carried forward in the future. For example, this study adopts only one data decomposition method, and multiple data processing methods can be concurrently conducted in the future work to further enhance the decomposition performance. On the other hand, it is worth designing a method for optimizing the unique lag order of each decomposed subseries, and advanced swarm intelligence algorithms [43,44,45,46] can be used to fully extract their temporal features.

Author Contributions

Conceptualization, S.L. and L.W.; methodology, S.L. and S.W.; software, S.L. and S.W.; investigation, S.L. and S.W.; writing—original draft preparation, S.L. and S.W.; writing—review and editing, L.W.; funding acquisition, L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research is partially supported by the Fundamental Research Funds for the Central Universities (HUST: 2019kfyRCPY038).

Data Availability Statement

The datasets are available on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

Acronym	Meaning
ADMM	Alternate direction method of multipliers
AGRU	Attention-based gated recurrent unit
ANN	Artificial neural network
BiGRU	Bidirectional gated recurrent unit
BIC	Bayesian information criterion
BPNN	Back propagation neural network
Dec-Fore-Int	Decomposition-Forecasting-Integrating
EEMD	Ensemble empirical mode decomposition
ELM	Extreme learning machine
EMD	Empirical mode decomposition
ENN	Elman neural network
GRU	Gated recurrent unit
GSRCV	Grid search with rolling cross-validation
GWEC	Global Wind Energy Council
IMF	Intrinsic mode function
LSTM	Long short-term memory
MAE	Mean absolute error
MAPE	Mean absolute percentage error
NREL	National Renewable Energy Laboratory
RMSE	Root mean square error
RNN	Recurrent neural network
SMAPE	Symmetric mean absolute percentage error
STC	Spatio-temporal correlation
VMD	Variational mode decomposition
Variable/Symbol	Meaning
b_r, b_z, b_h	Biases in GRU
f	Original signal
h	Output of hidden state in GRU
r	Output of reset gate in GRU
tanh	Hyperbolic tangent function
x_t	Sample of X in time step t
y	Observed wind speed
$\hat{y}$	Predicted wind speed
z	Output of update gate in GRU
H	Number of neurons in the hidden layer of GRU
K	Fold number in the rolling cross-validation
M	Number of decomposed modes in VMD
T	Length of input time steps in X
W_rx, W_rh, W_zx, W_zh, W_hh, W_hx	Weight matrices in GRU
X	Input sequence of GRU
α	Penalty coefficient
λ	Lagrangian multipliers
τ	Updating parameter in ADMM
ε	Tolerance of convergence criterion in ADMM
δ	Dirac distribution
σ	Sigmoid function
u_m	The mth subseries
ω_m	Center frequency of the mth subseries
*	Convolution
⊗	Hadamard product

References

Global Wind Report 2021. Global Wind Energy Council. 2021. Available online: https://gwec.net/wp-content/uploads/2021/03/GWEC-Global-Wind-Report-2021.pdf (accessed on 23 November 2022).
Cassola, F.; Burlando, M. Wind speed and wind energy forecast through Kalman filtering of Numerical Weather Prediction model output. Appl. Energy 2012, 99, 154–166. [Google Scholar] [CrossRef]
Liu, H.; Chen, C. Multi-objective data-ensemble wind speed forecasting model with stacked sparse autoencoder and adaptive decomposition-based error correction. Appl. Energy 2019, 254, 113686. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, S.; Zhang, W.; Peng, J.; Cai, Y. Multifactor spatio-temporal correlation model based on a combination of convolutional neural network and long short-term memory neural network for wind speed forecasting. Energy Convers. Manag. 2019, 185, 783–799. [Google Scholar] [CrossRef]
Singh, S.N.; Mohapatra, A. Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting. Renew. Energy 2019, 136, 758–768. [Google Scholar]
Rodríguez, F.; Alonso-Pérez, S.; Sánchez-Guardamino, I.; Galarza, A. Ensemble forecaster based on the combination of time-frequency analysis and machine learning strategies for very short-term wind speed prediction. Electr. Power Syst. Res. 2023, 214, 108863. [Google Scholar] [CrossRef]
Zhao, E.; Sun, S.; Wang, S. New developments in wind energy forecasting with artificial intelligence and big data: A scientometric insight. Data Sci. Manag. 2022, 5, 84–95. [Google Scholar] [CrossRef]
Liu, G.; Wang, C.; Qin, H.; Fu, J.; Shen, Q. A novel hybrid machine learning model for wind speed probabilistic forecasting. Energies 2022, 15, 6942. [Google Scholar] [CrossRef]
Yang, Y.; Zhou, H.; Wu, J.; Ding, Z.; Wang, Y.G. Robustified extreme learning machine regression with applications in outlier-blended wind-speed forecasting. Appl. Soft Comput. 2022, 122, 108814. [Google Scholar] [CrossRef]
Wang, J.; An, Y.; Li, Z.; Lu, H. A novel combined forecasting model based on neural networks, deep learning approaches, and multi-objective optimization for short-term wind speed forecasting. Energy 2022, 251, 123960. [Google Scholar] [CrossRef]
Zhu, Q.; Che, J.; Li, Y.; Zuo, R. A new prediction NN framework design for individual stock based on the industry environment. Data Sci. Manag. 2022, 5, 199–211. [Google Scholar] [CrossRef]
Sun, W.; Tan, B.; Wang, Q. Multi-step wind speed forecasting based on secondary decomposition algorithm and optimized back propagation neural network. Appl. Soft Comput. 2021, 113, 107894. [Google Scholar] [CrossRef]
Dokur, E.; Erdogan, N.; Salari, M.E.; Karakuzu, C.; Murphy, J. Offshore wind speed short-term forecasting based on a hybrid method: Swarm decomposition and meta-extreme learning machine. Energy 2022, 248, 123595. [Google Scholar] [CrossRef]
Lv, S.X.; Peng, L.; Wang, L. Stacked autoencoder with echo-state regression for tourism demand forecasting using search query data. Appl. Soft Comput. 2018, 73, 119–133. [Google Scholar] [CrossRef]
Liu, H.; Tian, H.Q.; Liang, X.F.; Li, Y.F. Wind speed forecasting approach using secondary decomposition algorithm and Elman neural networks. Appl. Energy 2015, 157, 183–194. [Google Scholar] [CrossRef]
Ding, L.; Bai, Y.; Liu, M.D.; Fan, M.H.; Yang, J. Predicting short wind speed with a hybrid model based on a piecewise error correction method and Elman neural network. Energy 2022, 244, 122630. [Google Scholar] [CrossRef]
Nasiri, H.; Ebadzadeh, M.M. MFRFNN: Multi-functional recurrent fuzzy neural network for chaotic time series prediction. Neurocomputing 2022, 507, 292–310. [Google Scholar] [CrossRef]
Liu, X.; Zhou, J.; Qian, H. Short-term wind power forecasting by stacked recurrent neural networks with parametric sine activation function. Electr. Power Syst. Res. 2021, 192, 107011. [Google Scholar] [CrossRef]
López, G.; Arboleya, P. Short-term wind speed forecasting over complex terrain using linear regression models and multivariable LSTM and NARX networks in the Andes Mountains, Ecuador. Renew. Energy 2022, 183, 351–368. [Google Scholar] [CrossRef]
Memarzadeh, G.; Keynia, F. A new short-term wind speed forecasting method based on fine-tuned LSTM neural network and optimal input sets. Energy Convers. Manag. 2020, 213, 112824. [Google Scholar] [CrossRef]
Wu, J.; Li, N.; Zhao, Y.; Wang, J. Usage of correlation analysis and hypothesis test in optimizing the gated recurrent unit network for wind speed forecasting. Energy 2022, 242, 122960. [Google Scholar] [CrossRef]
Sun, Z.; Zhao, M.; Zhao, G. Hybrid model based on VMD decomposition, clustering analysis, long short memory network, ensemble learning and error complementation for short-term wind speed forecasting assisted by Flink platform. Energy 2022, 261, 125248. [Google Scholar] [CrossRef]
Niu, Z.; Yu, Z.; Tang, W.; Wu, Q.; Reformat, M. Wind power forecasting using attention-based gated recurrent unit network. Energy 2020, 196, 117081. [Google Scholar] [CrossRef]
Tian, C.; Niu, T.; Wei, W. Developing a wind power forecasting system based on deep learning with attention mechanism. Energy 2022, 257, 124750. [Google Scholar] [CrossRef]
Joseph, L.P.; Deo, R.C.; Prasad, R.; Salcedo-Sanz, S.; Raj, N.; Soar, J. Near real-time wind speed forecast model with bidirectional LSTM networks. Renew. Energy 2023, 204, 39–58. [Google Scholar] [CrossRef]
Yu, M.; Niu, D.; Gao, T.; Wang, K.; Sun, L.; Li, M.; Xu, X. A novel framework for ultra-short-term interval wind power prediction based on RF-WOA-VMD and BiGRU optimized by the attention mechanism. Energy 2023, 269, 126738. [Google Scholar] [CrossRef]
Wang, J.; Zhang, W.; Li, Y.; Wang, J.; Dang, Z. Forecasting wind speed using empirical mode decomposition and Elman neural network. Appl. Soft Comput. 2014, 23, 452–459. [Google Scholar] [CrossRef]
Santhosh, M.; Venkaiah, C.; Kumar, D.V. Short-term wind speed forecasting approach using ensemble empirical mode decomposition and deep Boltzmann machine. Sustain. Energy Grids Netw. 2019, 19, 100242. [Google Scholar] [CrossRef]
He, Y.; Wang, Y. Short-term wind power prediction based on EEMD–LASSO–QRNN model. Appl. Soft Comput. 2021, 105, 107288. [Google Scholar] [CrossRef]
Nasiri, H.; Ebadzadeh, M.M. Multi-step-ahead stock price prediction using recurrent fuzzy neural network and variational mode decomposition. arXiv 2022, arXiv:2212.14687. [Google Scholar]
Qiao, B.; Liu, J.; Wu, P.; Teng, Y. Wind power forecasting based on variational mode decomposition and high-order fuzzy cognitive maps. Appl. Soft Comput. 2022, 129, 109586. [Google Scholar] [CrossRef]
Li, J.; Song, Z.; Wang, X.; Wang, Y.; Jia, Y. A novel offshore wind farm typhoon wind speed prediction model based on PSO-Bi-LSTM improved by VMD. Energy 2022, 251, 123848. [Google Scholar] [CrossRef]
Hu, H.; Wang, L.; Tao, R. Wind speed forecasting based on variational mode decomposition and improved echo state network. Renew. Energy 2021, 164, 729–751. [Google Scholar] [CrossRef]
Wang, X.; Ren, H.; Zhai, J.; Xing, H.; Su, J. Adaptive support segment based short-term wind speed forecasting. Energy 2022, 249, 123644. [Google Scholar] [CrossRef]
Zhang, C.; Zhou, J.; Li, C.; Fu, W.; Peng, T. A compound structure of ELM based on feature selection and parameter optimization using hybrid backtracking search algorithm for wind speed forecasting. Energy Convers. Manag. 2017, 143, 360–376. [Google Scholar] [CrossRef]
Wu, B.; Wang, L.; Zeng, Y.R. Interpretable wind speed prediction with multivariate time series and temporal fusion transformers. Energy 2022, 252, 123990. [Google Scholar] [CrossRef]
Wang, X.; Luo, D.; Zhao, X.; Sun, Z. Estimates of energy consumption in China using a self-adaptive multi-verse optimizer-based support vector machine with rolling cross-validation. Energy 2018, 152, 539–548. [Google Scholar] [CrossRef]
Lv, S.X.; Peng, L.; Hu, H.; Wang, L. Effective machine learning model combination based on selective ensemble strategy for time series forecasting. Inf. Sci. 2022, 612, 994–1023. [Google Scholar] [CrossRef]
NREL Data Catalog. Available online: https://data.nrel.gov/submissions/33 (accessed on 23 November 2022).
Wu, H.; Meng, K.; Fan, D.; Zhang, Z.; Liu, Q. Multistep short-term wind speed forecasting using transformer. Energy 2022, 261, 125231. [Google Scholar] [CrossRef]
Zhang, C.; Peng, T.; Nazir, M.S. A novel integrated photovoltaic power forecasting model based on variational mode decomposition and CNN-BiGRU considering meteorological variables. Electr. Power Syst. Res. 2022, 213, 108796. [Google Scholar] [CrossRef]
Chengqing, Y.; Guangxi, Y.; Chengming, Y.; Yu, Z.; Xiwei, M. A multi-factor driven spatiotemporal wind power prediction model based on ensemble deep graph attention reinforcement learning networks. Energy 2023, 263, 126034. [Google Scholar] [CrossRef]
Peng, L.; Sun, C.; Wu, W. Effective arithmetic optimization algorithm with probabilistic search strategy for function optimization problems. Data Sci. Manag. 2022, 5, 163–174. [Google Scholar] [CrossRef]
Zhang, C.; Ma, H.; Hua, L.; Sun, W.; Nazir, M.S.; Peng, T. An evolutionary deep learning model based on TVFEMD, improved sine cosine algorithm, CNN and BiLSTM for wind speed prediction. Energy 2022, 254, 124250. [Google Scholar] [CrossRef]
Peng, L.; Wang, L.; Xia, D.; Gao, Q.L. Effective energy consumption forecasting using empirical wavelet transform and long short-term memory. Energy 2022, 238, 121756. [Google Scholar] [CrossRef]
Xian, H.; Che, J. Unified whale optimization algorithm based multi-kernel SVR ensemble learning for wind speed forecasting. Appl. Soft Comput. 2022, 130, 109690. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the main contents of this study.

Figure 2. Inner structure of GRU cell [23].

Figure 3. Illustration of rolling cross-validation.

Figure 4. Structure of the model input.

Figure 5. Framework of proposed VMD-GRU-GSRCV model.

Figure 6. Time series of hourly wind speed data from NREL.

Figure 7. Accuracy results of different parameter optimization methods.

Figure 8. Predictions of different parameter optimization strategies.

Figure 9. Subseries decomposed by using VMD (S1–S10).

Figure 10. Accuracy results of different decomposition methods.

Figure 11. Predictions of different data processing methods.

Figure 12. Accuracy results of different hybrid neural network models.

Figure 13. Predictions of different hybrid neural network models.

Table 1. Statistical results of wind speed used in this study.

Sample	Number	Statistical Indicators
Sample	Number	Min	Max	Mean	Std	Skewness	Kurtosis
All	8759	0.3540	34.4508	4.4725	3.2083	2.3805	12.4166
Train	7319	0.3540	22.0702	4.1857	2.8162	2.0871	9.0252
Test	1440	0.3723	34.4508	5.9303	4.4461	2.1242	10.3104

Table 2. Basic information about the parameters used in this study.

Modules	Parameters	Ranges/Values
VMD	Search range of M	[2, 30]
VMD	Search step of M	3
GRU	Search range of T	[1, 12]
	Search step of T	2
	Search range of H	[10, 50]
	Search step of H	10
	Epochs	100
	Batch size	12
GSRCV	Fold number K	10

Table 3. Parameter selection results of different optimization methods.

Methods	Selection Results
Methods	M	T	H
Decentralized	16	2	50
GSRCV	23	7	50

Table 4. Average accuracies of different parameter optimization methods.

Methods	RMSE	MAE	MAPE	SMAPE	p-Value	Time (s)
Decentralized	0.2797	0.2104	5.66%	5.57%	1.22 × 10⁻¹⁷ *	122.18
GSRCV	0.2047	0.1435	3.77%	3.74%	-	221.17

The best result in each performance metric is given in boldface. Symbol * denotes that performance of proposed model is significantly different from the benchmark model at 0.05 significance level.

Table 5. Average accuracies of different decomposition methods.

Methods	RMSE	MAE	MAPE	SMAPE	p-Value	Time (s)
Original	2.2486	1.5375	34.68%	29.05%	3.90 × 10⁻¹⁸	82.64
EMD	1.3390	0.9490	22.34%	20.00%	3.90 × 10⁻¹⁸ *	188.50
EEMD	1.0498	0.7793	18.41%	20.66%	3.90 × 10⁻¹⁸ *	249.04
VMD	0.2047	0.1435	3.77%	3.74%	-	221.17

The best result in each performance metric is given in boldface. Symbol * denotes that performance of proposed model is significantly different from the benchmark model at 0.05 significance level.

Table 6. Average accuracies of different hybrid neural network models.

Models	RMSE	MAE	MAPE	SMAPE	p-Value	Time (s)
BPNN	0.2825	0.1785	4.61%	4.55%	1.86 × 10⁻⁴ *	64.57
ENN	0.2208	0.1582	4.14%	4.07%	4.08 × 10⁻² *	142.82
BiGRU	0.2114	0.1483	3.82%	3.77%	2.77 × 10⁻¹	297.26
AGRU	0.2433	0.1529	3.63%	3.61%	1.04 × 10⁻¹	286.96
Dec-Fore-Int	0.2774	0.1577	3.91%	3.77%	2.00 × 10⁻¹	1791.92
Proposed	0.2047	0.1435	3.77%	3.74%	-	221.17

The best result in each performance metric is given in boldface. Symbol * denotes that performance of proposed model is significantly different from the benchmark model at 0.05 significance level.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lv, S.; Wang, L.; Wang, S. A Hybrid Neural Network Model for Short-Term Wind Speed Forecasting. Energies 2023, 16, 1841. https://doi.org/10.3390/en16041841

AMA Style

Lv S, Wang L, Wang S. A Hybrid Neural Network Model for Short-Term Wind Speed Forecasting. Energies. 2023; 16(4):1841. https://doi.org/10.3390/en16041841

Chicago/Turabian Style

Lv, Shengxiang, Lin Wang, and Sirui Wang. 2023. "A Hybrid Neural Network Model for Short-Term Wind Speed Forecasting" Energies 16, no. 4: 1841. https://doi.org/10.3390/en16041841

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Neural Network Model for Short-Term Wind Speed Forecasting

Abstract

1. Introduction

1.1. Related Works

1.2. Motivations

1.3. Contributions

2. Methodology

2.1. Variational Mode Decomposition

2.2. Gated Recurrent Unit

2.3. Grid Search with Rolling Cross-Validation

3. Framework of Proposed Model

3.1. Model Input

3.2. Overall Procedure

4. Experiments

4.1. Collected Data

4.2. Parameters Setting

4.3. Evaluation Metrics

4.4. Results and Analysis

4.4.1. Effect of Parameter Optimization

4.4.2. Effect of Data Processing

4.4.3. Comparison with Hybrid Neural Networks

5. Conclusions and Future Researches

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI