Wind Speed Forecasting Using Attention-Based Causal Convolutional Network and Wind Energy Conversion

Shang, Zhihao; Wen, Quan; Chen, Yanhua; Zhou, Bing; Xu, Mingliang

doi:10.3390/en15082881

Open AccessArticle

Wind Speed Forecasting Using Attention-Based Causal Convolutional Network and Wind Energy Conversion

by

Zhihao Shang

,

Quan Wen

,

Yanhua Chen

^*,

Bing Zhou

and

Mingliang Xu

School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(8), 2881; https://doi.org/10.3390/en15082881

Submission received: 1 March 2022 / Revised: 6 April 2022 / Accepted: 11 April 2022 / Published: 14 April 2022

Download

Browse Figures

Versions Notes

Abstract

:

As one of the effective renewable energy sources, wind energy has received attention because it is sustainable energy. Accurate wind speed forecasting can pave the way to the goal of sustainable development. However, current methods ignore the temporal characteristics of wind speed, which leads to inaccurate forecasting results. In this paper, we propose a novel SSA-CCN-ATT model to forecast the wind speed. Specifically, singular spectrum analysis (SSA) is first applied to decompose the original wind speed into several sub-signals. Secondly, we build a new deep learning CNN-ATT model that combines causal convolutional network (CNN) and attention mechanism (ATT). The causal convolutional network is used to extract the information in the wind speed time series. After that, the attention mechanism is employed to focus on the important information. Finally, a fully connected neural network layer is employed to get wind speed forecasting results. Three experiments on four datasets show that the proposed model performs better than other comparative models. Compared with different comparative models, the maximum improvement percentages of MAPE reaches up to 26.279%, and the minimum is 5.7210%. Moreover, a wind energy conversion curve was established by simulating historical wind speed data.

Keywords:

attention mechanism; causal convolutional network; wind speed forecasting; singular spectrum analysis; wind energy

1. Introduction

The utilization of renewable energy is the way to achieve sustainable development. Therefore, renewable energy draws more and more attention within industry and from academics. Among them, wind power generation is one of the most promising renewable energies, which has been widely used in the past decades. It is estimated that the reserve of wind power generation is more than 400 million MW (MegaWatt), which greatly exceeds the 18 million MW primary energy supply in advance [1]. However, the basic scientific research of onshore wind power development lags behind industrial development [2]. Due to the lack of early evaluation of wind energy resources, the utilization rate of wind energy resources is low. The reduction rate of power generation is high, hindering the further development of wind energy [3]. Accurate wind speed prediction can provide adequate decision-making information for wind farm management and energy scheduling [4]. Therefore, wind speed prediction is crucial for the design and installation of large wind farms and essential for maintaining reliability and safe operation of the power network [5]. However, accurate wind speed prediction is a challenge due to the volatility and diversification of wind speed.

1.1. Existing Methods to Forecast Wind Speed

In order to obtain accurate forecasting results, a series of strategies have been proposed by researchers. These models can be divided into four categories: (1) physical models, (2) statistical models, (3) machine learning-based models, (4) combined models and hybrid models [6]. The practical application of current physical models is limited by the complex implementation and massive computing resources [7,8,9]. Statistical models make predictions based on the distribution of the observed wind speed samples, such as auto-regressive (AR) [10], autoregression moving average (ARMA) [11,12], autoregression integrated moving average (ARIMA) [13], Kalman filtering [14], persistence method [15], and Markov chain model [16]. These models are very traditional, but many researchers still use these methods because the background technology is mature. For instance, Moreno et al. [17] proposed a model that employs variational mode decomposition (VMD), singular spectrum analysis (SSA), and ARIMA to predict wind speed. Ding et al. [18] proposed a predictive structure based on ARIMA and back propagation neural network (BPNN) which combined empirical mode decomposition (EMD) and SSA to extract the linear components of wind speed data.

With the development of machine learning and deep learning in recent years, researchers applied machine learning and deep learning into wind speed forecasting and achieved highly accurate forecasting results [19,20]. For example, Cao et al. [21] proposed a new support vector machine (SVM) model based on the Jaya algorithm. Aly [22] used various combinations of recurrent Kalman filter (RKF), Fourier series (FS), wavelet neural network (WNN), and artificial neural network (ANN) to test 12 different hybrid models, among which WNN and RKF models have the highest prediction accuracy. Xiao et al. [23] proposed a self-adaptive kernel extreme learning machine (KELM) for wind speed prediction. The novelty of the self-adaptive KELM is that it can directly keep the training results from the repeated data. Hong et al. [24] applied CNN to obtain high accurate wind speed forecasting results. Liang et al. [25] presented a novel wind speed prediction strategy based on bidirectional long short-term memory (Bi-LSTM), MOOFADA, and transfer learning for centralized control centers. Xiang et al. [26] proposed bidirectional gated recurrent unit (BiGRU), in which a second layer is added to connect two reverse and independent hidden layers to the same output layer.

These machine learning methods have some advantages, as many models based on machine learning are far inferior to the physical and statistical models in predicting short-term wind speed. However, it is not easy to get an efficient neural network structure properly. Therefore, recent wind speed prediction solutions are hybrid models and combined models [27]. Wang and Yang [28] proposed a new MWS-CE-ENN framework combining multi-objective optimization, data preprocessing technology, and Elman neural network for wind speed forecasting. Liu et al. [29] integrated long short-term memory (LSTM), deep Boltzmann network (DBN), and echo state network (ESN) to complete the prediction of wind speed series. Yan et al. [30] proposed the ISSD-LSTM-GOASVM model, in which the ISSD solves the problem of artificial experience selection of embedding dimension in the original SSD. Zhang and Liu [31] proposed a CCGRU network, which combines causal convolution network (CCN), GRU, and multiple decomposition method. On the basis of CCGRU model, Wei et al. [32] proposed CSNN, which integrated spiking neural network (SNN) with the convolution layer and used the network to predict the error sequence. Chen et al. [33] developed a new multi-step forecasting method for very short-term wind speed forecasts, which is based on EEMD, cuckoo search (CS) algorithm, and incremental extreme learning machine (IELM). Zhou et al. [34] proposed an SSAWD-MOGAPSO-CM hybrid model that employs five neural networks and uses multiple objective optimization algorithms to combine the forecasting results. Hu et al. [35] proposed a novel noise reduction method and used the Grasshopper optimization algorithm to optimize the forecasting neural network.

Some papers combine statistical models with deep learning models. For instance, Moreno et al. [36] presented a new ensemble learning method, which integrates LSTM, adaptive neural-fuzzy system (ANFIS), ESN, support vector regression (SVR), and gaussian regression process (GRP). Duan et al. [37] proposed a new hybrid model combining improved CEEMDAN, recurrent neural network (RNN), ARIMA, and error correction methods for short-term wind speed prediction. Neshat et al. [38] proposed a novel deep learning-based evolutionary model for wind speed forecasting, and the proposed model was tested on an offshore wind turbine installed in a Swedish wind farm. Tian [39] used EMD to calculate the adaptive decomposition layer of VMD for ultra-short-term wind speed time series. Jiang et al. [40] presented a novel model based on statistical method, artificial neural networks, and deep learning methods for short-term wind speed forecasting. Jaseena and Kovoor [41] used the empirical wavelet transform (EWT) to decompose the original signal into sub-signals and applied the BiDLSTM network to predict the sub-signals. Memarzadeh and Farshid [42] established a hybrid wind speed prediction model, including cross search algorithm (CSA), wavelet analysis (WT), Mutual information (MI), and LSTM. The combined models and hybrid models involved in this paper are summarized in Table 1.

1.2. Our Contribution

Based on the above literature review, we can draw the following conclusions: (a) The method based on deep learning is effective on wind speed forecasting. (b) Decomposition technology can improve the prediction performance of a model. (c) The methods based on a hybrid model are better than an individual model. Given the above conclusions, we propose a novel wind speed prediction model. We first used SSA to decompose the wind speed time series into several components. Then we designed a deep neural network model based on CCN and attention mechanism to predict wind speed. The proposed neural network has two CCN layers and one attention layer. The CCN layers extract temporal information from the time series and eliminate the impact of future data. The attention layer was used to focus on the important information for wind speed forecasting. The main innovations and contributions of this paper are as follows:

The SSA decomposition method is used to decompose the wind speed value into several different sub-signals, and the forecasting accuracy of the prediction model is further improved by using the characteristics of each sub-signal.
A new model for short-term wind speed prediction is proposed, which uses CCN to extract features and employs the attention mechanism to make predictions from the extracted features.
In order to verify the performance of wind speed signal extraction by SSA, we adopt different decomposition technology, and put the decomposed sub-signals into our proposed model to evaluate the performance of decomposition technology.
To verify the effectiveness of the proposed model, we use four different time period data and ten comparison models and evaluate the performance of the related models in different prediction intervals.

The rest of this paper is constructed as follows: Section 2 introduces the theory about SSA, CCN, and attention mechanism. In Section 3, the proposed SSA-CCN-ATT model is presented. In Section 4, three experiments using four datasets are conducted. Section 5 presents the discussion of the comparative models. Section 6 summarizes the whole paper. Finally, Nomenclature is added to introduce the abbreviation in the paper.

2. Methodology

This section introduces the methods used in this paper, including SSA, CCN, and attention mechanism.

2.1. Singular Spectrum Analysis

SSA is a nonparametric spectrum estimation method that decomposes a time series into several meaningful components. SSA does not need any prior knowledge about the time series [43]. The method consists of two phases: decomposition and reconstruction. The specific steps are as follows.

Embedding

Let a one-dimensional sequence of length

N

be

X = [x_{1}, x_{2} \dots, x_{N}]

, the positive integer

L

is the length of the sliding window,

1 < L < N

. The original sequence

X

is constructed into

K

vectors by embedding operation, as follows:

X_{i} = {[x_{i}, x_{i + 1} \dots x_{i + L - 1}]}^{T} \in R_{L}

(1)

where

K = N - L + 1

,

i = [1, 2, \dots, K]

. The result of the mapping forms the trajectory matrix

M

:

M = [\begin{matrix} \begin{matrix} x_{1} & x_{2} \\ x_{2} & x_{3} \end{matrix} & \dots & \begin{matrix} x_{K} \\ x_{K + 1} \end{matrix} \\ ⋮ ⋮ & ⋱ & ⋮ \\ \begin{matrix} x_{L} & x_{L + 1} \end{matrix} & \dots & x_{N} \end{matrix}] \in R_{L \times K}

(2)

2.: Singular value decomposition

SVD is used to decompose the trajectory matrix. Singular value decomposition is a classical matrix decomposition method in matrix theory, and the decomposition formula is as follows:

M = \sum_{i = 1}^{d} λ_{i} U_{i} V_{i}^{T}

(3)

where

d

is the number of non-zero singular values of

X

,

λ_{1}, λ_{2}, \dots, λ_{d}

is the singular value of

X

in descending order,

U_{i}

is called left singular vector,

V_{i}

is called the right singular vector.

3.: Grouping

The purpose of grouping is to separate the additive components in the signal. If the original signal is denoised, then the grouping operation is to express the trajectory matrix

M

constructed by the original sequence

X

as the sum of the useful signal

S

and noise

E

, namely

M = S + E

.

We use SSA to analyze time series with potential structure; it is generally considered that the first

r (r < d)

large singular values reflect the main energy of the signal, while the last

d - r

small singular values are considered to be noise components. Thus, the grouping operation is to determine the appropriate

r

value to achieve signal-to-noise separation.

4.: Diagonal averaging

The purpose of diagonal averaging is to transform the matrix that is obtained by grouping into a sequence of length

N

. Let

Y \in R_{L \times K}

represent any matrix after grouping,

y_{i j}

is the elements of the matrix,

1 \leq i \leq L, 1 \leq j \leq K

. The elementary matrix corresponding to the time series

y_{r c}

is calculated with Equation (4), where

L^{*} = m i n (L, K), K^{*} = m a x (L, K)

, and

N = L + K - 1

.

g_{k} = {\begin{matrix} \frac{1}{k + 1} \sum_{m = 1}^{k + 1} y_{m, k - m + 2} & 0 \leq k \leq L^{*} \\ \frac{1}{L^{*}} \sum_{m = 1}^{L^{*}} y_{m, k - m + 2} & L^{*} - 1 \leq k \leq K^{*} \\ \frac{1}{N - k} \sum_{m = k - K^{*} + 2}^{N - K^{*} + 1} y_{m, k - m + 2} & K^{*} \leq k \leq N \end{matrix}

(4)

2.2. Causal Convolution Network

Causal convolution is proposed to capture the information in time series effectively [44]. Unlike traditional one-dimensional convolution, as shown in Figure 1, causal convolution only considers the local property on the left (previous data samples), and the information from future data samples cannot affect any analysis of given time step [45].

For the sequence problem, the main abstraction is to predict

y_{t}

according to

x_{1}, x_{2}, \dots, x_{t}

and

y_{1}, y_{2}, \dots, y_{t - 1}

, so that

y_{t}

is close to the actual value. Where

x

is the eigenvalue,

y

is the target value.

p (x) = \prod_{t = 1}^{T} p (x_{t} | x_{1}, \dots, x_{t - 1})

(5)

Another causal convolution network is called causal differentiated convolution, which can obtain a larger receivable field [46]. Only standard causal convolution is used in this experiment.

2.3. Attention Mechanism

The basic mechanism of attention is to imagine the components in the original data (Source) as a series of

< K e y, V a l u e >

pairs. At this time, given the target value element (Query), the weight coefficient of each

K e y

corresponding to value is obtained by calculating the similarity or correlation between

Q u e r y

and each

K e y

, and then weighted with the value, the final attention value is obtained. The essential idea can be rewritten into the following formula:

A t t e n t i o n (Q u e r y, S o u r c e) = \sum_{i = 1}^{L_{x}} S i m i l a r i t y (Q u e r y, K e y_{i}) * V a l u e_{i}

(6)

As for the specific calculation process of attention mechanism, it can be summarized into three processes: the first process is to calculate the weight coefficient according to

Q u e r y

and

K e y

, and different functions and computer systems can be introduced, and according to

Q u e r y

and a

K e y_{i}

. The most common methods to calculate the similarity or correlation of the two include: to find the vector dot product of the two, to find the similarity of the vector cosine of both, or to evaluate by introducing additional neural networks. In this experiment, we use these to find the vector dot product of both. The formula is as follows:

S i m i l a r i t y (Q u e r y, K e y_{i}) = Q u e r y * K e y_{i}

(7)

In the second stage, the original score of the first stage is normalized, and the first stage scores can be converted by using SoftMax calculation method. On the one hand, the original calculated scores can be normalized into probability distribution with the sum of all elements weight of 1; On the other hand, the weight of important elements can be more highlighted through the inherent mechanism of SoftMax. That is, generally, the following formula is used for calculation:

a_{i} = S o f t M a x (S i m_{i}) = \frac{e^{S i m_{i}}}{\sum_{j = 1}^{L_{x}} e^{S i m_{j}}}

(8)

In the third stage, the value is weighted to get attention value according to the weight coefficient

a_{i}

. The formula is as follows:

A t t e n t i o n (Q u e r y, S o u r c e) = \sum_{i = 1}^{L_{x}} a_{i} * V a l u e_{i}

(9)

3. The Proposed SSA-CCN-ATT Model

In this paper, we propose a new wind speed forecasting model, which includes SSA method for signal decomposition of data, two-layer CCN network for feature extraction of sub-signals, attention mechanism to give high weight to the more important features, and the fully connected neural network to get the final output. The model structure is shown in Figure 2. The design and specific steps are summarized as follows:

Data preprocessing. Considering the nonlinearity and volatility of wind speed data, we use SSA to process the original wind speed. SSA has a strict mathematical theory and fewer parameter and can efficiently extract the trend, periodic, and quasi-periodic information of the signals.
Sample construction. The wind speed data is divided into two types of datasets: the training set and the testing set, respectively. The training set is used to train the CNN-ATT network, whereas the testing data set is used to evaluate the proposed forecasting model.
CNN-ATT network forecasts. Put the de-noised wind speed time series to the CNN-ATT network. There are two CCN layers, one attention layer in the CNN-ATT network and one full connected layer. CCN is highly noise-resistant model, and it extracts nonlinear spatial features from wind speed; the attention mechanism further increases its extraction efficiency. Finally, a full connected layer is employed to obtain the forecasting result.
Evaluation. To study the efficiency of the proposed model, a comprehensive evaluation module includes four evaluation metrics, DM test, and improvement ratio analysis is designed to analyze the forecasting results.
Wind energy conversion and uncertainty analysis. Based on the wind energy conversion curve and wind speed forecasting value, the calculated electricity generation for wind turbines and the forecasting interval method are used to analyze the uncertainty of the wind energy conversion process.

4. Experimental Results

In this section, the experimental design and result analysis are conducted. We first introduce the basic information of the datasets used in the experiments. Then, the evaluation criteria and parameters of the model are described. Finally, the prediction results are presented.

4.1. Dataset Information

The wind speed data used in this study are taken from the NWTC (National Wind Technology Center) of NREL (National Renewable Energy Laboratory). The data were collected every two seconds, and an average value was recorded every minute [4]. The wind speeds at six heights were measured and recorded, which were 2 m, 5 m, 10 m, 20 m, 50 m, and 80 m, respectively [47]. In this paper, the data at heights of 5 m, 20 m, 50 m, and 80 m were used for the experiment. Among them, the wind speed at heights of 50 m, 80 m, 20 m, and 5 m are dataset 1, dataset 2, dataset 3, and dataset 4, respectively, and their time frames are 28 January 2020 18:40–1 February 2020 10:59, 15 May 2020 15:00–19 May 2020 7:19, 20 August 2020 20:20–24 August 2020 12:39, and 22 October 2020 8:20–26 October 2020 0:39, respectively. As shown in Figure 3, the horizontal axis represents the time of samples, and the vertical axis represents the wind speed. The description and statistical information of the collected wind speed data sets are shown in Table 2.

In order to analyze time series with potential structure, four decomposition methods are selected in the experiment, namely EMD, EEMD, EWT, and SSA. Through experimental comparison (see Section 4.2.1), we selected SSA as our decomposition method. In this paper, we use the psts [48] to implement SSA. When applying SSA to decompose the data, we set the number of sub-signals to 14. That is, there are 14 sub-signals in each decomposition chart. Figure 4 shows the SSA decomposition results of four datasets. In Figure 4, the top left figure is the original wind speed data, and the other figures are arranged in the order of the diagonal averaging.

4.2. Experimental Design

After data decomposition, we used the decomposed time series to train our model and used some evaluation criteria to evaluate our model. This section will introduce the evaluation criteria we used, the training process, and the experiment’s design.

4.2.1. Model Training

Before model training, we normalized the decomposed data with linear function (Min-Max scaling). The linear function converts the original data to the range of [0, 1]. The normalization formula is as follows:

X_{n o r m} = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}}

(10)

where

X_{n o r m}

is the normalized data,

X

is the original data,

X_{m a x}

and

X_{m i n}

are the maximum and minimum value of the original dataset, respectively.

After normalization, the CCN-ATT network is applied to predict the wind speed, and the above four datasets are used to verify the performance of the network. In the experiment, two layers of CCN are used to extract the features, and the convolution kernel size is 10 and 12, respectively. The extracted features are the input of the attention mechanism. The output of the attention mechanism is passed into a full connection neural network. Then, the predicted value is obtained.

The whole process of the model training can be described in the following three steps:

The one-dimensional wind speed is decomposed into 14 one-dimensional sub-signals by SSA to eliminate the randomness of the original data.
The first 4800 samples obtained in the first step are used as the training set. 10% of the samples in the training set is used as the verification set. The last 500 samples are used as the training set. Min-Max scaling is used to normalize the training set and the testing set, respectively.
The input length is 14. When making one-step forecasting, the $i$ -th sample to the $(i + 14)$ -th sample are used to predict the $(i + 15)$ -th sample. When making two-step forecasting, the $i$ -th sample to the $(i + 14)$ -th sample are used to predict the $(i + 16)$ -th sample. When making three-step forecasting, the $i$ -th sample to the $(i + 14)$ -th sample are used to predict the $(i + 17)$ -th sample.

4.2.2. Experimental Setup

In order to verify the effectiveness of the proposed model, we conduct three experiments. The detailed information of the three experiments and the comparison models are shown in Table 3. Among the three experiments, experiment I is designed to determine which algorithms are suitable for feature extraction; Experiment II compares the SSA-CCN-ATT model with different decomposition methods; Experiment III compares the SSA-CCN-ATT model with some classic individual models. All the models are implemented in Python using Keras framework. The parameter settings of ANN, SVR, CCN, LSTM, and GRU are shown in Table 4.

From Table 3, we can see that there are ten comparison models. In the ten models, there are six hybrid models and four individual models. When designing the experiment, the four individual models do not use data decomposition technology. The original wind speed values are used directly, while the other six hybrid models use the data decomposition technique for experiments. The wind speed data of all models are normalized by Min-Max normalization, and the range of normalized data is between [0, 1]. All models are trained with Adam optimizer except SVR.

4.2.3. Evaluation Criteria

In order to evaluate the accuracy of the proposed model, this section introduces some commonly used evaluation indicators, including the MAE, MSE, MAPE, and

R^{2}

. Their specific formulas are as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | {\hat{y}}_{i} - y_{i} |

(11)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} | \frac{{\hat{y}}_{i} - y_{i}}{y_{i}} |

(12)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(13)

R^{2} = (1 - \frac{\sum_{i} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i} {({\bar{y}}_{i} - y_{i})}^{2}})

(14)

where

n

is the number of samples,

y_{i}

is the real target wind speed value, and

{\hat{y}}_{i}

is the predicted wind speed value,

{\bar{y}}_{i}

represents the average of the target values. In these evaluation criteria, the smaller the value of MAE, MAPE, and MSE is, the better the model is, and the larger the

R^{2}

value is.

In addition, to compare the wind speed prediction performance of different models, this paper also introduces the improvement ratio of MAPE (

P_{M A P E}

). It is defined as follows:

P_{M A P E} = \frac{M A P E_{1} - M A P E_{2}}{M A P E_{1}} \times 100 %

(15)

where

M A P E_{1}

represents the MAPE of our proposed model, and

M A P E_{2}

represents the MAPE of a comparison model. When

P_{M A P E}

are positive, the forecasting effect of our model is better than that of the comparison models. Otherwise, the forecasting effect is worse.

4.3. Result Analysis

In this section, three experiments are conducted to verify the proposed model from different aspects.

4.3.1. Result Analysis of Experiment I

To validate the accuracy and stability of the SSA-CCN-ATT model, the forecasting results of the SSA-CCN-ATT model are compared with those of the attention-based model. It is designed to analyze the impact of feature selection on forecasting performance and select a more effective feature selection algorithm. The evaluation metrics results of the three comparison models and the proposed model can be seen in Table 5.

For dataset 1, the proposed model has the best evaluation metrics value among all models compared with the other three models. For the other three comparison models, the SSA-ANN-ATT is superior to the other two models for all indices. To see the results more intuitively, we draw the forecasting results in Figure 5, which includes the forecasting results of SSA-CCN-ATT and other attention-based model.
For dataset 2, the best forecasting method differs for different evaluation metrics. For the MAPE, the proposed SSA-CCN-ATT model has the lowest value for every step of forecasting. For the MAE, the SSA-ANN-ATT model has the lowest value for two-step forecasting.
For dataset 3, the proposed SSA-CCN-ATT model is superior to the other three models for all indices. For the other three models, in one-step and two-step forecasting, the worst is the SSA-GRU-ATT model. In three-step forecasting, the worst is the SSA-LSTM-ATT model.
For dataset 4, SSA-CCN-ATT is the best forecasting method, with the lowest values in terms of the MAE, MAPE, and MSE, and the highest value of $R^{2}$ . For the other three models, each model can get its best results for different forecasting steps. For example, SSA-ANN-ATT gets the best MAE, MAPE, MSE, and $R^{2}$ in two-step forecasting, with the values of 0.1727, 6.2570, 0.0378, and 0.9644, respectively.

Remark.

The performance of the attention-based model using CCN is better than other attention-based models using other feature extraction technology. The results can fully prove that CCN is more suitable for our proposed model.

4.3.2. Result Analysis of Experiment II

The effects of SSA and other decomposition methods, including EMD, EEMD, and EWT, are compared in this experiment. For this purpose, we keep the prediction models fixed and compare different decomposition methods, including EMD, EEMD, and EWT. Figure 6 shows the forecasting results of four models. It can be seen from the bar chart that the three indices of our model are lower than those of the other three models. Table 6 lists the four indices results of the proposed model and the other three models. From Table 6, we can get the following results.

Similar to experiment I, the proposed model can get the best evaluation metrics values for dataset 1, dataset 3, and dataset 4. The other three comparison models can get their best results for different evaluation metrics and different steps of forecasting.
For dataset 1, in three-step forecasting, the worst forecasting model is EMD-CCN-ATT.
For dataset 2, EMD-CCN-ATT gets the best MAE, and MSE in one-step forecasting, EEMD-CCN-ATT gets the best MSE and $R^{2}$ in two-step forecasting.
For dataset 3 and dataset 4, in three-step forecasting, the worst forecasting model is EMD-CCN-ATT.

Remark.

The performance of the CCN-ATT model using SSA decomposition technology is better than that of the CCN-ATT model using EMD, EEMD, and EWT decomposition technology. The results can fully prove that SSA decomposition technology is more suitable for our proposed model. SSA can efficiently extract the trend, periodic, and quasi-periodic information of the signals.

4.3.3. Result Analysis of Experiment III

In this experiment, we also use four datasets and compare the SSA-CCN-ATT model with some classic individual models, namely, ANN, SVR, CNN, and LSTM. Their parameters are the same as those of experiment I. The difference between this experiment and experiment I and experiment II is that the original data is the input of the models. We drew a radar chart with three indices in Figure 7. From Table 7 and Figure 7, we can draw the following conclusions:

For dataset 1, the proposed model achieves the best results for every step of forecasting. Among the four comparison models, LSTM performs best because it has the best values in terms of the four indices.
For dataset 2, the best forecasting method differs for different step of forecasting. For one-step forecasting, the proposed SSA-CCN-ATT model has the best values for the four indices. For two-step forecasting, LSTM model has the best MSE and $R^{2}$ values. For three-step forecasting, LSTM model has the best MAE value, with an MAE of 0.1978.
For dataset 3, by comparing with the four models, it can be seen that the proposed model has the most accurate forecasting results. For the other four individual models, in general, LSTM is the best forecasting model among the four indices, but in one-step forecasting, CNN has the best MSE and $R^{2}$ value.
Similar to the dataset 3, the performance of the SSA-CCN-ATT is better than those of the other four individual models. From the comparison of CCN, SVR, ANN, and LSTM, it can be seen LSTM always obtains the best values compared to the other three models for one-step and two-step forecasting.

Remark.

Compared with the four individual models, the proposed SSA-CCN-ATT model can get the most accurate forecasting results. Among the four individual models, LSTM is relatively better than the other three individual models. As the number of forecasting steps increases, the forecasting performance of all models becomes worse.

5. Discussion

In this section, we will conduct the Diebold–Mariano (DM) test on ten comparison models and analyze the improvement ratio of our model relative to the comparison models.

5.1. Significance of the Proposed Model

To verify the forecasting performance of the proposed SSA-CCN-ATT model, we conducted the DM test. Considering the significance level

α

, the zero hypothesis

H_{0}

indicates that there is no significant difference in forecasting performance between the proposed model and the reference model, while

H_{1}

rejects this hypothesis. The related hypotheses are as follows:

H_{0} : E [L (e r r o r_{i}^{1})] = E [L (e r r o r_{i}^{2})]

(16)

H_{1} : E [L (e r r o r_{i}^{1})] \neq E [L (e r r o r_{i}^{2})]

(17)

Among them,

L

is the loss function of the forecasting errors,

e r r o r_{i}^{p}

,

p = 1, 2

, are the forecasting errors of the two comparison models.

Moreover, the DM test statistics can be defined by:

D M = \frac{\sum_{i = 1}^{n} \frac{L (e r r o r_{i}^{1}) - L ((e r r o r_{i}^{2}))}{n}}{\sqrt{\frac{S^{2}}{n}}} s^{2}

(18)

where

S^{2}

is an estimation for the variance of

d_{i} = L (ε_{i}^{1}) - L (ε_{i}^{2})

. Assuming the given significance level

α

, the calculated values of DM are compared with

Z_{α / 2}

and

- Z_{α / 2}

.

Z_{α / 2}

is the upper (or positive)

Z

-value from the standard normal table corresponding to half of the desired

α

level of the test. It means that

H_{0}

is accepted if the DM statistic falls into the interval

[Z_{α / 2}, - Z_{α / 2}]

. This would indicate that there is a significant difference between the forecasting performances of the proposed model and the comparison models.

Table 8 shows the DM test results. From Table 8, we can see that the EWT-CCN-ATT at dataset 1 in one-step forecasting has the smallest value of 1.8028, which is greater than

Z_{0.1 / 2} = 1.645

. Subsequently, we conclude that the null hypothesis can be accepted at the 10% significance level. The probability that the alternative hypothesis will be accepted is 90%. This confirms that the proposed SSA-CCN-ATT model performs much better than the other ten comparisons.

5.2. Improvement Ratio Analysis

Table 9 shows the results of

P_{M A P E}

in four datasets. Based on the details provided in Table 9, the forecasting results of the proposed SSA-CCN-ATT model are better than those of any other comparison model. Compared with different attention-based models, the maximum improvement percentages of MAPE reached up to 23.345%, and the minimum improvement is 5.7210%. The

P_{M A P E}

values between the proposed SSA-CCN-ATT and SSA-LSTM-ATT are

P_{M A P E}^{1 - s t e p}

=16.445%,

P_{M A P E}^{2 - s t e p}

=11.416%, and

P_{M A P E}^{3 - s t e p}

=14.470% for dataset1. In comparison with some classic individual prediction models, the percentage increase of MAPE has a maximum value of 40.488% and a minimum value of 6.7730%. This proves that the proposed model has better prediction accuracy and prediction stability than the comparison models. Compared with different decomposition models, the maximum improvement percentages of MAPE reached up to 26.279%, and the minimum is 6.1830%.

6. Conclusions

In this paper, a novel wind speed forecasting model based on CCN and attention mechanism is proposed. Four wind speed datasets of different time periods and different heights are used to verify the performance of the proposed SSA-CCN-ATT model. The experimental results show that: (1) The performance of the attention-based model using CCN is better than the other attention-based model using ANN, LSTM, and GRU; (2) the proposed model using SSA decomposition method is better than other decomposition models; (3) the proposed model is superior to the classic individual model in terms of forecasting accuracy and stability. The forecasting performance of the model is more accurate than other comparison models. The data used in this model is only the original wind speed value, and no other weather factors are used, for example, wind direction and temperature. However, this model does not consider the weather features that are highly correlated with wind speed. In the future work, multiple time series forecasting methods can be employed. The weather features, such as temperature, humidity, and pressure, can be input into the model to improve the accuracy of the forecasting results.

Author Contributions

Conceptualization, Z.S. and Y.C.; Methodology, Q.W.; Software, B.Z.; Validation, M.X. and B.Z.; Formal Analysis, Z.S. and Y.C.; Investigation, Q.W.; Resources, B.Z.; Data Curation, Q.W.; Writing—Original Draft Preparation, Q.W.; Writing—Review & Editing, Z.S. and M.X.; Visualization, Z.S. and Y.C.; Supervision, Y.C.; Project Administration, Q.W.; Funding Acquisition, Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the China Postdoctoral Science Foundation, 2021M702943.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

AR	autoregressive
ARMA	autoregression integrated moving average
VMD	variational mode decomposition
SSA	singular spectrum analysis
RWT	repetitive waves transform
BPNN	back propagation neural network
EMD	empirical mode decomposition
SVM	support vector machine
RKF	recurrent Kalman filter
FS	Fourier series
WNN	wavelet neural network
ANN	artificial neural network
KELM	kernel extreme learning machine
Bi-LSTM	bidirectional long short term memory
LSTM	long short term memory
DBN	deep Boltzmann network
EWT	empirical wavelet transform
ESN	echo state network
ISSD	improved singular spectrum decomposition
GOASVM	grasshopper optimization algorithm support vector machine
SSD	singular spectrum decomposition
CSA	cross search algorithm
WT	wavelet analysis
MI	mutual information
ED	evolutionary decomposition
BiGRU	bidirectional gated recurrent unit
CCGRU	causal convolution gated recurrent unit
CCN	causal convolution network
SNN	spiking neural network
ANFIS	adaptive neural-fuzzy system
SVR	support vector regression
GRP	gaussian regression process
EEMD	ensemble empirical mode decomposition
CEEMDAN	complete EEMD with adaptive noise
RNN	recurrent neural network
CS	cuckoo search
SLFN	single hidden-layer feedforward network
IELM	incremental extreme learning machine
MCEEMDAN	modified CEEMDAN
SVD	singular value decomposition

References

Barthelmie, R.J.; Pryor, S.C. Potential contribution of wind energy to climate change mitigation. Nat. Clim. Chang. 2014, 4, 684–688. [Google Scholar] [CrossRef]
Lam, L.T.; Branstetter, L.; Azevedo, I.M.L. China’s wind electricity and cost of carbon mitigation are more expensive than anticipated. Environ. Res. Lett. 2016, 11, 84015. [Google Scholar] [CrossRef] [Green Version]
Yao, X.; Liu, Y.; Qu, S. When will wind energy achieve grid parity in China?–Connecting technological learning and climate finance. Appl. Energy 2015, 160, 697–704. [Google Scholar] [CrossRef]
He, Z.; Chen, Y.; Shang, Z.; Li, C.; Li, L.; Xu, M. A novel wind speed forecasting model based on moving window and multi-objective particle swarm optimization algorithm. Appl. Math. Model. 2019, 76, 717–740. [Google Scholar] [CrossRef]
Singh, S.N.; Mohapatra, A. Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting. Renew. Energy 2019, 136, 758–768. [Google Scholar]
Okumus, I.; Dinler, A. Current status of wind energy forecasting and a hybrid method for hourly predictions. Energy Convers. Manag. 2016, 123, 362–371. [Google Scholar] [CrossRef]
Sharma, P.; Bhatti, T.S. A review on electrochemical double-layer capacitors. Energy Convers. Manag. 2010, 51, 2901–2912. [Google Scholar] [CrossRef]
Jiang, P.; Ma, X. A hybrid forecasting approach applied in the electrical power system based on data preprocessing, optimization and artificial intelligence algorithms. Appl. Math. Model. 2016, 40, 10631–10649. [Google Scholar] [CrossRef]
Naik, J.; Satapathy, P.; Dash, P.K. Short-term wind speed and wind power prediction using hybrid empirical mode decomposition and kernel ridge regression. Appl. Soft Comput. 2018, 70, 1167–1188. [Google Scholar] [CrossRef]
Poggi, P.; Muselli, M.; Notton, G.; Cristofari, C.; Louche, A. Forecasting and simulating wind speed in Corsica by using an autoregressive model. Energy Convers. Manag. 2003, 44, 3177–3196. [Google Scholar] [CrossRef]
Kaur, D.; Lie, T.T.; Nair, N.K.C.; Vallès, B. Wind speed forecasting using hybrid wavelet transform—ARMA techniques. Aims Energy 2015, 3, 13–24. [Google Scholar] [CrossRef]
Erdem, E.; Shi, J. ARMA based approaches for forecasting the tuple of wind speed and direction. Appl. Energy 2011, 88, 1405–1414. [Google Scholar] [CrossRef]
Liu, H.; Tian, H.; Li, Y. An EMD-recursive ARIMA method to predict wind speed for railway strong wind warning system. J. Wind. Eng. Ind. Aerodyn. 2015, 141, 27–38. [Google Scholar] [CrossRef]
Cao, L.; Qiao, D.; Chen, X. Laplace ℓ1 Huber based cubature Kalman filter for attitude estimation of small satellite. Acta Astronaut. 2018, 148, 48–56. [Google Scholar] [CrossRef]
Bludszuweit, H.; Domínguez-Navarro, J.A.; Llombart, A. Statistical analysis of wind power forecast error. IEEE Trans. Power Syst. 2008, 23, 983–991. [Google Scholar] [CrossRef]
Shamshad, A.; Bawadi, M.A.; Wanhussin, W.; Majid, T.A.; Sanusi, S. First and second order Markov chain models for synthetic generation of wind speed time series. Energy 2005, 30, 693–708. [Google Scholar] [CrossRef]
Moreno, S.R.; Mariani, V.C.; dos Santos Coelho, L. Hybrid multi-stage decomposition with parametric model applied to wind speed forecasting in Brazilian Northeast. Renew. Energy 2021, 164, 1508–1526. [Google Scholar] [CrossRef]
Ding, W.; Meng, F. Point and interval forecasting for wind speed based on linear component extraction. Appl. Soft Comput. 2020, 93, 106350. [Google Scholar] [CrossRef]
Domínguez-Navarro, J.A.; Lopez-Garcia, T.B.; Valdivia-Bautista, S.M. Applying Wavelet Filters in Wind Forecasting Methods. Energies 2021, 14, 3181. [Google Scholar] [CrossRef]
Ghaderpour, E. JUST: MATLAB and python software for change detection and time series analysis. GPS Solut. 2021, 25, 85. [Google Scholar] [CrossRef]
Liu, M.; Cao, Z.; Zhang, J.; Wang, L.; Huang, C.; Luo, X. Short-term wind speed forecasting based on the Jaya-SVM model. Int. J. Electr. Power Energy Syst. 2020, 121, 106056. [Google Scholar] [CrossRef]
Aly, H.H. A novel deep learning intelligent clustered hybrid models for wind speed and power forecasting. Energy 2020, 213, 118773. [Google Scholar] [CrossRef]
Xiao, L.; Shao, W.; Jin, F.; Wu, Z. A self-adaptive kernel extreme learning machine for short-term wind speed forecasting. Appl. Soft Comput. 2021, 99, 106917. [Google Scholar] [CrossRef]
Hong, Y.Y.; Satriani, T.R.A. Day-ahead spatiotemporal wind speed forecasting using robust design-based deep learning neural network. Energy 2020, 209, 118441. [Google Scholar] [CrossRef]
Liang, T.; Zhao, Q.; Lv, Q.; Sun, H. A novel wind speed prediction strategy based on Bi-LSTM, MOOFADA and transfer learning for centralized control centers. Energy 2021, 230, 120904. [Google Scholar] [CrossRef]
Xiang, L.; Li, J.; Hu, A.; Zhang, Y. Deterministic and probabilistic multi-step forecasting for short-term wind speed based on secondary decomposition and a deep learning method. Energy Convers. Manag. 2020, 220, 113098. [Google Scholar] [CrossRef]
Niu, X.; Wang, J. A combined model based on data preprocessing strategy and multi-objective optimization algorithm for short-term wind speed forecasting. Appl. Energy 2019, 241, 519–539. [Google Scholar] [CrossRef]
Wang, J.; Yang, Z. Ultra-short-term wind speed forecasting using an optimized artificial intelligence algorithm. Renew. Energy 2021, 171, 1418–1435. [Google Scholar] [CrossRef]
Liu, H.; Yu, C.; Wu, H.; Duan, Z.; Yan, G. A new hybrid ensemble deep reinforcement learning model for wind speed short term forecasting. Energy 2020, 202, 117794. [Google Scholar] [CrossRef]
Yan, X.; Liu, Y.; Xu, Y.; Jia, M. Multistep forecasting for diurnal wind speed based on hybrid deep learning model with improved singular spectrum decomposition. Energy Convers. Manag. 2020, 225, 113456. [Google Scholar] [CrossRef]
Zhang, G.; Liu, D. Causal convolutional gated recurrent unit network with multiple decomposition methods for short-term wind speed forecasting. Energy Convers. Manag. 2020, 226, 113500. [Google Scholar] [CrossRef]
Wei, D.; Wang, J.; Niu, X.; Li, Z. Wind speed forecasting system based on gated recurrent units and convolutional spiking neural networks. Appl. Energy 2021, 292, 116842. [Google Scholar] [CrossRef]
Chen, X.J.; Zhao, J.; Jia, X.Z.; Li, Z.L. Multi-step wind speed forecast based on sample clustering and an optimized hybrid system. Renew. Energy 2021, 165, 595–611. [Google Scholar] [CrossRef]
Zhou, Q.; Wang, C.; Zhang, G. A combined forecasting system based on modified multi-objective optimization and sub-model selection strategy for short-term wind speed. Appl. Soft Comput. 2020, 94, 106463. [Google Scholar] [CrossRef]
Hu, J.; Heng, J.; Wen, J.; Zhao, W. Deterministic and probabilistic wind speed forecasting with de-noising-reconstruction strategy and quantile regression based algorithm. Renew. Energy 2020, 162, 1208–1226. [Google Scholar] [CrossRef]
Moreno, S.R.; da Silva, R.G.; Mariani, V.C.; dos Santos Coelho, L. Multi-step wind speed forecasting based on hybrid multi-stage decomposition model and long short-term memory neural network. Energy Convers. Manag. 2020, 213, 112869. [Google Scholar] [CrossRef]
Duan, J.; Zuo, H.; Bai, Y.; Duan, J.; Chang, M.; Chen, B. Short-term wind speed forecasting using recurrent neural networks with error correction. Energy 2021, 217, 119397. [Google Scholar] [CrossRef]
Neshat, M.; Nezhad, M.M.; Abbasnejad, E.; Mirjalili, S.; Tjernberg, L.B.; Garcia, D.A.; Alexander, B.; Wagner, M. A deep learning-based evolutionary model for short-term wind speed forecasting, A case study of the Lillgrund offshore wind farm. Energy Convers. Manag. 2021, 236, 114002. [Google Scholar] [CrossRef]
Tian, Z. Modes decomposition forecasting approach for ultra-short-term wind speed. Appl. Soft Comput. 2021, 105, 107303. [Google Scholar] [CrossRef]
Jiang, P.; Liu, Z.; Niu, X.; Zhang, L. A combined forecasting system based on statistical method, artificial neural networks, and deep learning methods for short-term wind speed forecasting. Energy 2021, 217, 119361. [Google Scholar] [CrossRef]
Jaseena, K.U.; Kovoor, B.C. Decomposition-based hybrid wind speed forecasting model using deep bidirectional LSTM networks. Energy Convers. Manag. 2021, 234, 113944. [Google Scholar] [CrossRef]
Memarzadeh, G.; Keynia, F. A new short-term wind speed forecasting method based on fine-tuned LSTM neural network and optimal input sets. Energy Convers. Manag. 2020, 213, 112824. [Google Scholar] [CrossRef]
Xu, S.; Hu, H.; Ji, L.; Wang, P. An adaptive graph spectral analysis method for feature extraction of an EEG signal. IEEE Sens. J. 2018, 19, 1884–1896. [Google Scholar] [CrossRef]
Van Den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A generative model for raw audio. SSW 2016, 125, 2. [Google Scholar]
Mariani, S.; Rendu, Q.; Urbani, M.; Sbarufatti, C. Causal dilated convolutional neural networks for automatic inspection of ultrasonic signals in non-destructive evaluation and structural health monitoring. Mech. Syst. Signal Process. 2021, 157, 107748. [Google Scholar] [CrossRef]
Jia, Z.; Yang, L.; Zhang, Z.; Liu, H.; Kong, F. Sequence to point learning based on bidirectional dilated residual network for non-intrusive load monitoring. Int. J. Electr. Power Energy Syst. 2021, 129, 106837. [Google Scholar] [CrossRef]
Chen, Y.; He, Z.; Shang, Z.; Li, C.; Li, L.; Xu, M. A novel combined model based on echo state network for multi-step ahead wind speed forecasting: A case study of NREL. Energy Convers. Manag. 2019, 179, 13–29. [Google Scholar] [CrossRef]
A Python Package for Time Series Classification. Available online: https://github.com/johannfaouzi/pyts (accessed on 10 April 2022).

Figure 1. The typical structure of standard convolution.

Figure 2. The flowchart of the proposed SSA-CCN-ATT model.

Figure 3. The original data of the four wind speed datasets.

Figure 4. Time series decomposition of wind speed data using SSA.

Figure 5. Forecasting results of models with different feature selection methods for dataset 1.

Figure 6. Forecasting results of models with different decomposition methods in four datasets.

Figure 7. Forecasting results of SSA-CCN-ATT and other individual models in four datasets.

Table 1. The brief summary of the reviewed wind speed forecasting models.

Model	Data Preprocessing	Forecasting Model	Optimization
Combined model [27]	CEEMDAN	ARIMA, BPNN, ENN, ELM, GRNN	MOGOA
MWS-CE-ENN [28]	CEEMDAN	Elman neural network	MWS
Q-LSTM-DBN-ESN [29]	Empirical wavelet transform	LSTM, DBN, ESN	None
SSD-LSTM-GOASVM [30]	Improved singular spectrum decomposition	LSTM, DBN	Grasshopper optimization algorithm
CCGRU [31]	CCN	Gated recurrent unit	None
DTIWSFS [32]	Empirical wavelet transform	Gated recurrent unit, Convolutional SNN	Grey Wolf Optimization
ECKIE [33]	Ensemble empirical mode decomposition	Incremental extreme learning machine	Cuckoo search (CS) algorithm
SSAWD-MOGAPSO-CM [34]	SSAWD secondary denoising algorithm	MLP-BP, NARNN, SVM, and ELM	Multi-objective optimization by modified PSO
Hybrid forecasting model [35]	Modified complete empirical mode decomposition with adaptive noise	Quantile regression-based model	Grasshopper optimization algorithm
VMD-SSA-LSTM [36]	Variational mode decomposition, Singular spectral analysis	LSTM, ESN, ANFIS, SVR, GRP	None
ICEEMDAN-RNN-ICEEMDAN-ARIMA [37]	ICEEMDAN	ARIMA, RNN, BPNN	None
CMAES-LSTM [38]	Evolutionary decomposition	Bi-LSTM	Covariance matrix adaptation evolution strategy
Proposed approach [39]	Adaptive variational mode decomposition algorithm	ARIMA, SVM, improved LSTM	Improved PSO
PCFS [40]	Singular spectral analysis	ELM, BPNN, GRNN, ARIMA, ENN, DBN, LSTM	MMODA
EWT-based BiDLSTM [41]	WT, EMD, EEMD, EWT	Bidirectional LSTM	None
WT-FS-LSTM [42]	WT	LSTM	CSA

Table 2. The statistical information of the four wind speed datasets.

Datasets	Minimum	Median	Maximum	Mean	Std
Dataset 1	0.35	4.77	23.57	6.49	4.76
Dataset 2	0.35	3.43	9.62	3.55	1.52
Dataset 3	0.36	2.59	8.22	2.71	1.19
Dataset 4	0.32	2.55	9.61	2.72	1.37

Table 3. The comparison models in the three experiments.

Experiments	Comparison Models
Experiment I	SSA-LSTM-ATT
	SSA-ANN-ATT
	SSA-GRU-ATT
Experiment II	EMD-CCN-ATT
	EEMD-CCN-ATT
	EWT-CCN-ATT
Experiment III	ANN
	SVR
	CCN
	LSTM

Table 4. Configuration of models used in this paper.

Model	Parameters	Values
ANN	Number of hidden layers	4
ANN	Number of neurons in hidden layers	(100, 70, 40, 10)
SVR	Kernel function	RBF kernel
	Kernel coefficient	{0.01, 0.1, 1, 10, 100}
	Regularization parameter	{0.01, 0.1, 1, 10, 100}
CNN	Number of hidden layers	2
CNN	Number of kernels in the CCN layer	(10, 12)
LSTM	Number of hidden layers	2
LSTM	Number of neurons in the LSTM layer	(100, 50)
GRU	Number of hidden layers	2
GRU	Number of neurons in the GRU layer	(100, 50)

Table 5. Error evaluation results of models with different feature selection methods in four datasets.

Dataset	Model	MAE			MAPE (%)			MSE			$R^{2}$
Dataset	Model	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step
Dataset 1	SSA-LSTM-ATT	0.0866	0.1034	0.1251	5.7890	6.1930	8.8250	0.0136	0.0153	0.0244	0.9624	0.9577	0.9326
	SSA-ANN-ATT	0.0940	0.0882	0.1147	5.5780	6.1610	8.0060	0.0115	0.0112	0.0214	0.9684	0.9691	0.9408
	SSA-GRU-ATT	0.0960	0.0975	0.1445	5.2620	6.6450	8.9320	0.0139	0.0127	0.0234	0.9616	0.9649	0.9353
	Proposed	0.0719	0.0863	0.1241	4.8370	5.4860	7.5480	0.0073	0.0098	0.0202	0.9797	0.9731	0.9441
Dataset 2	SSA-LSTM-ATT	0.1696	0.1960	0.2340	5.8170	6.9630	8.2430	0.0366	0.0445	0.0653	0.9756	0.9704	0.9565
	SSA-ANN-ATT	0.1642	0.1700	0.2302	5.6480	6.3740	8.9750	0.0330	0.0310	0.0600	0.9781	0.9794	0.9600
	SSA-GRU-ATT	0.1472	0.1820	0.2638	5.7780	6.9330	8.9720	0.0238	0.0359	0.0877	0.9842	0.9761	0.9416
	Proposed	0.1425	0.1762	0.2047	4.6540	5.7290	7.1170	0.0271	0.0438	0.0510	0.9820	0.9709	0.9660
Dataset 3	SSA-LSTM-ATT	0.8384	0.9588	1.4789	5.2870	6.0920	9.2410	0.7238	0.9423	2.2831	0.9504	0.9354	0.8434
	SSA-ANN-ATT	0.8453	0.9692	1.3630	5.2930	6.1990	8.5190	0.7430	0.9684	1.9140	0.9490	0.9336	0.8687
	SSA-GRU-ATT	0.9071	0.9780	1.3526	5.6330	6.2360	8.8130	0.8565	0.9767	1.9484	0.9412	0.9330	0.8663
	Proposed	0.7442	0.8724	1.2333	4.7800	5.4440	7.6940	0.5622	0.7905	1.5744	0.9614	0.9458	0.8920
Dataset 4	SSA-LSTM-ATT	0.1509	0.1760	0.2257	5.4550	6.9660	9.0030	0.0312	0.0493	0.0824	0.9706	0.9536	0.9224
	SSA-ANN-ATT	0.1590	0.1727	0.2214	5.6280	6.2570	8.7370	0.0360	0.0378	0.0809	0.9661	0.9644	0.9238
	SSA-GRU-ATT	0.1552	0.1894	0.2406	5.7570	6.2930	8.8430	0.0299	0.0522	0.0798	0.9718	0.9508	0.9248
	Proposed	0.1181	0.1558	0.1928	4.4130	5.7590	7.1140	0.0189	0.0304	0.0489	0.9822	0.9714	0.9539

Table 6. Error evaluation results of models with different decomposition methods in four datasets.

Dataset	Model	MAE			MAPE (%)			MSE			$R^{2}$
Dataset	Model	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step
Dataset 1	EMD-CCN-ATT	0.0916	0.0903	0.1446	5.8900	6.3260	8.8440	0.0106	0.0153	0.0244	0.9624	0.9577	0.9326
	EEMD-CCN-ATT	0.1071	0.1140	0.1262	6.0430	6.3340	8.3120	0.0156	0.0112	0.0214	0.9684	0.9691	0.9408
	EWT-CCN-ATT	0.0942	0.0921	0.1379	5.6240	6.0750	8.2760	0.0102	0.0127	0.0234	0.9616	0.9649	0.9353
	Proposed	0.0719	0.0863	0.1241	4.8370	5.4860	7.5480	0.0073	0.0098	0.0202	0.9797	0.9731	0.9441
Dataset 2	EMD-CCN-ATT	0.1499	0.1851	0.2467	5.8860	6.2200	9.6540	0.0246	0.0445	0.0653	0.9756	0.9704	0.9565
	EEMD-CCN-ATT	0.1699	0.1996	0.2402	5.9350	6.4650	9.2280	0.0339	0.0310	0.0600	0.9781	0.9794	0.9600
	EWT-CCN-ATT	0.1641	0.1686	0.2505	5.8240	6.4080	8.9820	0.0317	0.0359	0.0877	0.9842	0.9761	0.9416
	Proposed	0.1425	0.1762	0.2047	4.6540	5.7290	7.1170	0.0271	0.0438	0.0510	0.9820	0.9709	0.9660
Dataset 3	EMD-CCN-ATT	0.8622	1.0417	1.5789	5.3870	6.4520	9.5580	0.7838	0.9423	2.2831	0.9504	0.9354	0.8434
	EEMD-CCN-ATT	0.8120	0.9950	1.4721	5.0950	6.3070	8.9350	0.6801	0.9684	1.9140	0.9490	0.9336	0.8687
	EWT-CCN-ATT	0.9273	0.9843	1.4288	5.8140	6.3950	9.2300	0.9178	0.9767	1.9484	0.9412	0.9330	0.8663
	Proposed	0.7442	0.8724	1.2333	4.7800	5.4440	7.6940	0.5622	0.7905	1.5744	0.9614	0.9458	0.8920
Dataset 4	EMD-CCN-ATT	0.1490	0.1819	0.2656	5.4930	6.9660	9.4750	0.0267	0.0493	0.0824	0.9706	0.9536	0.9224
	EEMD-CCN-ATT	0.1507	0.1745	0.2490	5.7490	6.3260	8.9450	0.0265	0.0378	0.0809	0.9661	0.9644	0.9238
	EWT-CCN-ATT	0.1614	0.1760	0.2344	5.8430	6.3700	8.9260	0.0296	0.0522	0.0798	0.9718	0.9508	0.9248
	Proposed	0.1181	0.1558	0.1928	4.4130	5.7590	7.1140	0.0189	0.0304	0.0489	0.9822	0.9714	0.9539

Table 7. Error evaluation results of SSA-CCN-ATT and other individual models in four datasets.

Dataset	Model	MAE			MAPE (%)			MSE			$R^{2}$
Dataset	Model	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step
Dataset 1	ANN	0.1062	0.1346	0.1693	7.3840	7.8350	10.4760	0.0162	0.0297	0.0328	0.9553	0.9179	0.9094
	SVR	0.1203	0.1578	0.1582	6.7700	8.5500	9.6150	0.0233	0.0364	0.0293	0.9356	0.8994	0.9192
	CCN	0.1192	0.1257	0.1779	7.1220	7.7140	10.0040	0.0185	0.0216	0.0513	0.9488	0.9404	0.8584
	LSTM	0.0942	0.1029	0.1405	6.6140	7.1530	9.4080	0.0129	0.0155	0.0240	0.9642	0.9571	0.9337
	Proposed	0.0719	0.0863	0.1241	4.8370	5.4860	7.5480	0.0073	0.0098	0.0202	0.9797	0.9731	0.9441
Dataset 2	ANN	0.1805	0.2277	0.3493	6.4510	7.1700	11.9590	0.0373	0.0737	0.1514	0.9752	0.9509	0.8992
	SVR	0.1562	0.2229	0.3105	6.1370	7.8670	10.6160	0.0273	0.0600	0.1181	0.9818	0.9600	0.9214
	CCN	0.1592	0.2087	0.2828	5.4560	7.1100	10.2400	0.0355	0.0559	0.1048	0.9764	0.9628	0.9303
	LSTM	0.1827	0.1875	0.1978	5.8810	6.6100	8.0790	0.0507	0.0426	0.0557	0.9663	0.9717	0.9630
	Proposed	0.1425	0.1762	0.2047	4.6540	5.7290	7.1170	0.0271	0.0438	0.0510	0.9820	0.9709	0.9660
Dataset 3	ANN	1.1199	1.3021	1.4535	7.0160	8.2360	9.4780	1.3091	1.7431	2.2702	0.9102	0.8804	0.8443
	SVR	1.1496	1.2352	1.7375	7.1630	7.7130	10.9670	1.3708	1.5776	3.0919	0.9060	0.8918	0.7879
	CCN	1.0711	1.2326	1.4921	6.7240	7.7580	9.6060	1.1927	1.5805	2.2789	0.9182	0.8916	0.8437
	LSTM	1.0641	1.1366	1.3326	6.6830	7.1720	8.2530	1.1950	1.3290	1.8636	0.9180	0.9088	0.8722
	Proposed	0.7442	0.8724	1.2333	4.7800	5.4440	7.6940	0.5622	0.7905	1.5744	0.9614	0.9458	0.8920
Dataset 4	ANN	0.1823	0.1923	0.2736	6.6580	7.1660	9.9420	0.0392	0.0500	0.0902	0.9631	0.9529	0.9150
	SVR	0.1852	0.2191	0.2543	6.6700	7.9550	9.8670	0.0497	0.0609	0.1004	0.9532	0.9426	0.9054
	CCN	0.2192	0.2143	0.3633	7.1970	7.8240	10.3590	0.0806	0.0534	0.2306	0.9240	0.9497	0.7827
	LSTM	0.1717	0.859	0.2748	6.3470	7.0710	9.9150	0.0353	0.0449	0.0982	0.9667	0.9577	0.9075
	Proposed	0.1181	0.1558	0.1928	4.4130	5.7590	7.1140	0.0189	0.0304	0.0489	0.9822	0.9714	0.9539

Table 8. DM test result of different models in four datasets.

Model	Dataset 1			Dataset 2			Dataset 3			Dataset 4
Model	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step
SSA-LSTM-ATT	4.0359	4.1468	3.2596	4.4155	3.7395	4.3100	6.9371	6.4640	5.8669	5.0879	4.4626	4.8684
SSA-ANN-ATT	2.6796	2.7758	2.5898	3.1393	5.4915	4.1299	6.4450	4.8702	7.1683	5.0396	5.2263	4.2000
SSA-GRU-ATT	3.2327	2.6731	2.3547	3.2205	4.7229	6.3161	8.8089	13.2691	5.1547	5.4739	3.6886	5.0449
EMD-CCN-ATT	2.7588	1.8350	1.9807	3.0237	3.7056	2.7689	6.8340	6.7868	5.9259	7.5251	4.9170	5.7765
EEMD-CCN-ATT	4.3770	3.4384	2.7487	2.8046	2.7212	4.4261	5.9088	7.2490	8.6187	3.8240	3.6781	5.5065
EWT-CCN-ATT	1.8028	3.1977	1.9217	1.8531	1.9728	3.2618	5.8025	3.5395	6.1990	4.5648	1.9389	4.6737
ANN	4.2282	4.8446	3.4828	5.3323	5.0160	6.3535	10.7952	13.0939	6.1399	6.6524	5.1149	6.1361
SVR	3.1649	4.0294	3.4252	6.5651	7.0320	6.8778	11.3199	10.6736	12.3700	7.5924	5.4513	5.3851
CCN	5.0380	5.0494	3.8233	3.0310	3.6377	4.4585	12.1855	10.0671	10.0464	3.9994	5.4779	3.3949
LSTM	5.2074	3.7417	1.8054	3.5285	3.0930	2.5413	7.6600	12.6493	5.8276	5.4187	5.6419	8.8506

Table 9. Improvement ration of MAPE generated by the SSA-CCN-ATT from four datasets.

Model	Dataset 1			Dataset 2			Dataset 3			Dataset 4
Model	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step	1-Step	2-Step	3-Step
SSA-LSTM-ATT	16.445%	11.416%	14.470%	19.993%	17.722%	13.660%	9.590%	10.637%	16.741%	19.102%	17.327%	20.982%
SSA-ANN-ATT	13.284%	10.956%	5.7210%	17.599%	10.119%	20.702%	9.692%	12.179%	9.6840%	21.588%	7.9590%	18.576%
SSA-GRU-ATT	8.0770%	17.442%	15.495%	19.453%	17.366%	20.675%	15.143%	12.700%	12.697%	23.345%	8.4860%	19.552%
EMD-CCN-ATT	17.878%	13.279%	14.654%	20.931%	7.8940%	26.279%	11.268%	15.623%	19.502%	19.661%	17.3275	24.918%
EEMD-CCN-ATT	19.957%	13.388%	9.1920%	21.584%	11.384%	22.876%	6.1830%	13.683%	13.889%	23.239%	8.9630%	20.470%
EWT-CCN-ATT	13.994%	9.6950%	8.7970%	20.089%	10.596%	20.764%	17.785%	14.871%	16.641%	24.474%	9.5920%	20.300%
ANN	34.493%	29.981%	27.950%	27.856%	20.098%	40.488%	31.870%	33.900%	18.823%	33.719%	19.634%	28.445%
SVR	28.552%	35.836%	21.498%	24.165%	27.177%	32.960%	33.268%	29.418%	29.844%	33.838%	27.605%	27.901%
CCN	32.084%	28.883%	24.550%	14.699%	19.423%	30.498%	28.911%	29.827%	19.904%	38.683%	26.3935	31.325%
LSTM	26.867%	23.305%	19.770%	20.864%	13.328%	11.907%	28.475%	24.094%	6.7730%	30.471%	18.555%	28.250%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shang, Z.; Wen, Q.; Chen, Y.; Zhou, B.; Xu, M. Wind Speed Forecasting Using Attention-Based Causal Convolutional Network and Wind Energy Conversion. Energies 2022, 15, 2881. https://doi.org/10.3390/en15082881

AMA Style

Shang Z, Wen Q, Chen Y, Zhou B, Xu M. Wind Speed Forecasting Using Attention-Based Causal Convolutional Network and Wind Energy Conversion. Energies. 2022; 15(8):2881. https://doi.org/10.3390/en15082881

Chicago/Turabian Style

Shang, Zhihao, Quan Wen, Yanhua Chen, Bing Zhou, and Mingliang Xu. 2022. "Wind Speed Forecasting Using Attention-Based Causal Convolutional Network and Wind Energy Conversion" Energies 15, no. 8: 2881. https://doi.org/10.3390/en15082881

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wind Speed Forecasting Using Attention-Based Causal Convolutional Network and Wind Energy Conversion

Abstract

1. Introduction

1.1. Existing Methods to Forecast Wind Speed

1.2. Our Contribution

2. Methodology

2.1. Singular Spectrum Analysis

2.2. Causal Convolution Network

2.3. Attention Mechanism

3. The Proposed SSA-CCN-ATT Model

4. Experimental Results

4.1. Dataset Information

4.2. Experimental Design

4.2.1. Model Training

4.2.2. Experimental Setup

4.2.3. Evaluation Criteria

4.3. Result Analysis

4.3.1. Result Analysis of Experiment I

4.3.2. Result Analysis of Experiment II

4.3.3. Result Analysis of Experiment III

5. Discussion

5.1. Significance of the Proposed Model

5.2. Improvement Ratio Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI