CEEMDAN-RIME–Bidirectional Long Short-Term Memory Short-Term Wind Speed Prediction for Wind Farms Incorporating Multi-Head Self-Attention Mechanism

Yang, Wenlu; Zhang, Zhanqiang; Meng, Keqilao; Wang, Kuo; Wang, Rui

doi:10.3390/app14188337

Open AccessArticle

CEEMDAN-RIME–Bidirectional Long Short-Term Memory Short-Term Wind Speed Prediction for Wind Farms Incorporating Multi-Head Self-Attention Mechanism

by

Wenlu Yang

¹,

Zhanqiang Zhang

^1,*,

Keqilao Meng

²

,

Kuo Wang

¹ and

Rui Wang

¹

College of Information Engineering, Inner Mongolia University of Technology, Hohhot 010080, China

²

College of New Energy, Inner Mongolia University of Technology, Hohhot 010080, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(18), 8337; https://doi.org/10.3390/app14188337

Submission received: 5 August 2024 / Revised: 12 September 2024 / Accepted: 14 September 2024 / Published: 16 September 2024

(This article belongs to the Special Issue Advances and Challenges in Reliability and Maintenance Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate wind speed prediction is extremely critical to the stable operation of power systems. To enhance the prediction accuracy, we propose a new approach that integrates bidirectional long short-term memory (BiLSTM) with fully adaptive noise ensemble empirical modal decomposition (CEEMDAN), the RIME optimization algorithm (RIME), and a multi-head self-attention mechanism (MHSA). First, the historical data of wind farms are decomposed via CEEMDAN to extract the change patterns and features on different time scales, and different subsequences are obtained. Then, the parameters of the BiLSTM model are optimized using the frost ice optimization algorithm, and each subsequence is input into the neural network model containing the MHSA for prediction. Finally, the predicted values of each component are weighted and reconstructed to obtain the predicted values of wind speed time series. According to the experimental results, the method can predict the short-term wind speeds of wind farms more accurately. We verified the effectiveness of the method by comparing it with different models.

Keywords:

wind speed prediction; adaptive noise; modal decomposition; RIME optimization; multi-head self-attention mechanism; bidirectional long short-term memory network; deep learning

1. Introduction

In recent years, wind power generation has rapidly developed, leading to an increase in the capacity and scale of wind farms. As a result, the proportion of wind power generation in the electricity market is expanding, which has made improving the utilization efficiency of wind energy resources a crucial topic in clean energy research. Renewable energy represented by wind energy is a type of clean energy with abundant reserves and huge development potential. However, the stability of wind power generation is significantly affected by the randomness and volatility of wind speeds [1]. This uncertainty not only reduces the efficiency and stability of wind power generation but also increases the complexity of grid scheduling, which affects the stable operation and economic benefits of power systems. Therefore, in order to overcome these challenges and ensure the reliability and efficiency of wind power generation, the development of accurate wind speed prediction models is urgently needed.

To tackle the wind power generation challenges caused by wind speed volatility and unpredictability, accurate prediction models for wind speed variations are crucial. The accurate prediction of wind speed is essential for wind power generation, as it improves the power generation efficiency on wind farms and ensures the stable operation of power systems [2]. Specifically, the needs of wind speed prediction vary depending on the application scenarios. The wind speed has a direct impact on the power output of wind turbines in power prediction. Accurate short-term wind speed prediction can assist wind farm operators with optimizing their power generation plans, reducing their energy waste, lowering their production costs, and improving their wind resource utilization efficiency [3]. When evaluating the fatigue load, the high-frequency change information of the wind speed within a short period can be used to predict the transient loads and local structural stresses of the wind turbine. Through reducing the fatigue loads borne by the wind turbine tower, this prediction can effectively improve the wind robustness of the turbine and its durability [4]. In terms of power quality management, fluctuations in wind energy can lead to grid instability, affecting the frequency and voltage. Predicting and dealing with fluctuations caused by wind speed variations can improve the power quality and overall system stability [5]. Therefore, depending on the application scenarios, the requirements for wind speed prediction models are different. The aim of this study was to optimize the short-term wind speed prediction model for improved power prediction, providing more reliable decision support for wind farm operators, as well as ensuring stable power grid operation and efficient wind energy utilization.

Wind speed prediction can be divided into short-term and long-term prediction according to different time scales. The short-term prediction duration is within the next few hours, which helps wind farm operators to adjust the power scheduling in real time and optimize the power generation efficiency, while the long-term prediction duration is from a few months to a year, which provides key information for the siting of wind farms and the long-term planning of the power system [6]. In the existing studies, the common model-driven approaches for short-term wind speed prediction include probabilistic statistical models, machine learning models, deep learning models, and combined prediction [7]. Probabilistic statistical models usually rely on a statistical analysis of historical wind speed data, and the common methods include autoregressive models, autoregressive moving average models, autoregressive integral moving average models, and so on [8,9,10]. Chen et al. [11] decomposed a wind speed series and established a differential autoregressive moving average model for the obtained low-frequency subsequence. Tian et al. [12] proposed a model based on the autoregressive summation sliding averaging model with an echo state network (ESN) to predict a short-term wind speed time series with linear features. While these models perform well in handling linear and simple time series, they face limitations when dealing with nonlinear relationships and complex data patterns. Their ability to capture intricate patterns is restricted, which limits their generalization capabilities [13]. Machine learning models, such as support vector machines, random forests, artificial neural networks, boosted regression trees, and extreme learning machines [14,15,16], observe and analyze data, automatically identify patterns and regularities in the data, and decide and predict future trends. Support vector machines (SVMs) can effectively deal with nonlinear relationships and perform well on small-sample data. He et al. [17] introduced a least-squares support vector machine to address the ARIMA’s limitations with nonlinear and complex data and proposed an improved SVM model to correct prediction errors. You et al. [18] used a least-squares support vector machine as a prediction model and tuned and optimized its parameters with the help of the particle swarm algorithm to improve the prediction performance. However, the training and prediction efficiencies of SVMs for large datasets are low, especially when the data dimension is high. In order to overcome the problem of the poor SVM performance for large datasets, the random forest model can be used, which improves the accuracy and stability of the prediction by using multiple decision trees through an integrated learning approach. Zhu et al. [19] proposed a random forest regression (RFR) model based on the improved Drosophila optimization algorithm, which improves the global search capability of the prediction model by incorporating a function mechanism. Zhang et al. [20] proposed an artificial neural network (ANN) model to achieve the accurate prediction of fluctuating, unstable sequences by introducing the cuckoo algorithm. Although machine learning models excel at handling nonlinear relationships and high-dimensional data, they are still deficient at capturing long time series dependencies and complex data structures. Deep learning models, especially recurrent neural networks, long short-term memory (LSTM) networks, and convolutional neural networks, have also been widely used in wind speed prediction [21,22,23]. RNNs can effectively capture dependencies in time series data and are suitable for short-term time series prediction. Lin et al. [24] decomposed an original wind speed series into multiple subsequences and used an RNN in the subsequences to predict the wind speed and error series in the sub-model. However, RNNs suffer from the problem of vanishing or exploding gradients when handling long sequence data, which increases the training difficulty. To solve this problem, LSTM effectively captures the long-time dependence by introducing a gating mechanism. Li et al. [25] used a vulture search algorithm to optimize the LSTM hyperparameters, combining its adaptability with the advantages of LSTM time series modeling to improve the prediction accuracy and stability. Yan et al. [26] used the seasonal ARIMA to extract the linear part of a wind speed series and EMD to decompose the nonlinear residuals, and they trained the components with LSTM, integrating the results for the final prediction. LSTM can better handle long series data, capture the long-distance dependence, ensure that the weights are updated stably during the training process, and improve the training and generalization performance. To improve the prediction accuracy, BiLSTM further extends the capability of LSTM to capture bidirectional dependencies in sequences through bidirectional processing. Ding et al. [27] proposed a wind speed prediction model based on singular spectrum analysis and BiLSTM that incorporates a back-propagation layer to enhance the capture of bidirectional temporal information and accurately simulate time series dynamics. However, these methods rely excessively on selecting similar daily samples in the wind speed prediction modeling process, resulting in limited flexibility in the model construction and updating. Thus, meeting the accuracy requirements of the power grid is challenging [28].

Combined forecasting integrates multiple independent models, offering high practicality and flexibility, and leverages the strengths of each model while addressing their limitations. Through adjusting the weighting coefficients, combined forecasting can adapt to various prediction requirements [29]. The common combined prediction methods include error-based correction, multi-mode weighting, and data decomposition and optimization techniques. To improve the prediction accuracy, Fu et al. [30] proposed an ultra-short-term wind speed prediction method incorporating advanced techniques, including variational mode decomposition, phase space reconstruction, the improved Northern Eagle optimization algorithm, and a shared weight gated memory network. The method optimizes two hyperparameters to achieve the optimal parameter combinations for wind speed prediction, improving the performance and better adapting the model to wind speed data characteristics and variations. Although this method significantly optimizes the prediction performance and adapts the model to the characteristics of and variations in wind speed data, it also increases the computational cost and training time because of its complexity. Zhang et al. [31] proposed a wind speed prediction model integrating variational modal decomposition, an extreme learning machine, and LSTM. Decomposition, feature extraction, and residual modeling were combined to improve the accuracy of the short-term wind speed prediction. Although this approach integrates the advantages of multiple algorithms, it may still be limited by single-model inadequacy when processing complex, nonlinear wind speed data. Yi et al. [32] proposed an ultra-short-term wind speed prediction model that combines CEEMDAN with an attention-mechanism-enhanced BiLSTM model. The model predicts each decomposed modal component and significantly improves the prediction accuracy through the final reconstruction of the results. However, optimizing combined prediction models often requires adjusting various parameters, which typically relies on expert experience, and the hyperparameters set by experts are often not fully aligned with the optimal parameters required by the model. In addition, when predicting nonlinear wind speed series, the feature extraction may weaken as the model complexity increases, and the investment of more computational resources and training time is needed in order to obtain a good performance.

Although the existing studies have made some progress in wind speed prediction, the existing models still need to be improved in terms of their accuracy and applicability. To address this issue, we propose an improved bidirectional long short-term memory (BiLSTM) neural network model based on fully adaptive noise ensemble empirical modal decomposition (CEEMDAN), the RIME optimization algorithm (RIME), and a multi-head attention mechanism (MHSA). The model is used to predict wind speed variations in wind farms, aiming to improve the prediction accuracy and reduce the impact of the wind speed uncertainty on the grid stability. When evaluating the performance of this model in wind speed prediction in comparison with those of other methods, the wind speed prediction based on the CEEMDAN-RIME-MHSA-BiLSTM method is more accurate. Simulation results show that the model proposed in this study performs well in dealing with wind speed series fluctuations and complex dependencies, which verifies its validity and superiority with high prediction accuracy and robustness. The novelties and main contributions of this study are as follows:

Aiming at the problem of the large fluctuation in wind speed series with strong nonlinear characteristics, the CEEMDAN of wind speed series is performed to decompose the complex wind speed time series signal into multiple simpler eigenmode functions so that it effectively eliminates the noise and nonlinear perturbations in the signal, improves the robustness of the model to wind speed data noise, and enhances the expressive and generalization abilities of the model;
The RIME algorithm optimizes the hyperparameters of the BiLSTM model through iteratively updating and adjusting the weights and biases of the BiLSTM layer. This prevents the model from becoming stuck in local optimal solutions and allows it to quickly adjust to changes in the wind speed data, better fit the training data, and enhance its generalization ability;
A more accurate wind speed prediction model was designed by combining deep learning and machine learning. In order to improve the global feature extraction capability of the model, the BiLSTM model was improved by utilizing the multi-head attention mechanism. The MHSA assigns different attentional weights according to the correlation between the time steps and learns the local and global features in parallel in different representation subspaces. It extracts key information in long sequences and captures time–frequency information and complex dependencies at different scales in wind speed series, which enables the model to capture the dynamic change pattern of the wind speed more accurately, improves the ability to capture wind speed time series data, and enhances the model’s expressiveness.

The rest of this study is organized as follows: Section 2 provides the theory of the proposed prediction method. Section 3 describes the basic structure of the proposed model and the evaluation metrics. Section 4 presents the experimental data, analysis, and a discussion of the results, and the final conclusions are presented in Section 5.

2. Wind Speed Prediction Research Methodology

2.1. Forecasting Process

In this study, a combined wind speed prediction model for wind farms with CEEMDAN-RIME-MHSA-BiLSTM was constructed, and the prediction steps are shown in Figure 1.

The specific steps are as follows:

First, wind speed data are collected, followed by data preprocessing and outlier detection to remove anomalies in a timely manner, improving the data’s reliability and stability;
The dataset is split into training and validation sets. The training set is used to construct the wind speed prediction model by analyzing the historical wind speed data, while the validation set is used to evaluate the performance of the wind speed prediction model;
The CEEMDAN algorithm decomposes the time series wind speed data to generate multiple intrinsic modal functions (IMFs). The wind speed prediction model is constructed based on the decomposed IMFs;
The BiLSTM model hyperparameters are automatically tuned via the RIME algorithm, which includes parameters such as the number of neurons in the hidden layer, the learning rate, and the number of training times. The RIME automatically searches the parameter space to quickly identify the optimal parameter combination for the BiLSTM model, which serves as the foundation for constructing prediction models for each component;
A multi-head self-attention mechanism is introduced based on the BiLSTM model to capture the complex relationship between different subsequence components. Each attention head learns the feature representation of different subspaces, and each subsequence component can be regarded as a different representation subspace;
The prediction of each modal component is performed using the trained model. By weighting and summing the predicted values of the modal components at different time scales, the overall wind speed prediction is finally obtained. Decomposition and then reconstruction allow the model to capture the trend of the wind speed changes more accurately.

2.2. Fully Adaptive Noise Ensemble Empirical Modal Decomposition

CEEMDAN, proposed by Torres, is a method used for signal processing and data analysis, and is an improvement on and optimization of the empirical mode decomposition (EMD) method. CEEMDAN is suitable for the decomposition of non-smooth and nonlinear signals, and it can better control the magnitude and distribution of noise [33]. Traditional EMD decomposes a complex signal into a series of eigenmode functions, thereby revealing various features and components in the signal. However, EMD is sensitive to noise, which leads to unstable decomposition results when there is noise in the signal [34]. The main idea of CEEMDAN is to introduce positive and negative noise estimation and adaptive parameter tuning based on EMD. This improvement eliminates redundant auxiliary noise and reduces the number of iterations, increases the randomness of the data to mitigate endpoint effects and modal aliasing, and reduces the error after signal reconstruction [35]. The main steps of CEEMDAN are as follows:

Step 1: Let the original signal be

x (t)

and add Gaussian white noise (

G^{i} (t)

) to the original signal to form a noise signal [36] to obtain

x^{i} (t) = x (t) + G^{i} (t)

, including among these

i

= 1, 2, …,

n

. The EMD of the

x^{i} (t)

signal is decomposed into an intrinsic mode function (IMF) and a residual component (residual (R)) (

r_{1} (t)

), with the first component (

I_{I M F 1}

) as follows:

\begin{matrix} I_{IMF 1} = \frac{1}{n} \sum_{i = 1}^{n} l_{IMF 1}^{i} (t) \end{matrix}

(1)

r_{1} (t) = x (t) - I_{IMF 1}

(2)

where

I_{I M F 1}

is the first component,

I_{I M F 1}^{i} (t)

is the first decomposition mode function, and

n

is the number of signals;

Step 2: A new component (

r_{1}^{i} (t) = r (t) + w^{i} (t)

) is obtained by adding the standard normally distributed Gaussian white noise (

w^{i} (t)

) to the first residual component (

r_{1} (t)

). The EMD decomposition of the

r_{1}^{i} (t)

is computed as follows:

I_{IMF 2} = \frac{1}{n} \sum_{i = 1}^{n} EMD (r (t) + G^{i} (t))

(3)

r_{2} (t) = r_{1} (t) - I_{IMF 2} (t)

(4)

Step 3: The decomposition is repeated until there are only two residual signal extreme points at most. Finally,

I_{I M F 1}

,

I_{I M F 2}

, …,

I_{I M F n}

can be obtained in turn for the corresponding residual components. The original signal can be expressed as follows:

x (t) = \sum_{j = 1}^{z} I_{IMFj} + r_{z} (t)

(5)

2.3. Bidirectional Long Short-Term Memory Neural Network

The classical LSTM neural network transmits information in one direction and skillfully uses historical load data for the prediction; however, the prediction does not consider future information [37]. The BiLSTM model introduces a reverse LSTM layer on top of this, considering both past and future information. Specifically, a BiLSTM neural network is a network model with both forward and reverse LSTM. The forward LSTM layer captures the sequential information from the beginning to the end of the sequence, while the reverse LSTM layer captures the reverse-order information from the end to the beginning of the sequence [38]. This training process builds a prediction model that realizes the double training of the time series and improves the prediction performance. The LSTM neural network structure is shown in Figure 2.

The LSTM structure includes the input data (

x_{t}

), storage cell state (

C_{t}

), hidden state (

h_{t}

), forgetting gate (

f_{t}

), input gate (

i_{t}

), and output gate (

O_{t}

). The forgetting gate selectively discards weakly relevant information in the timing data, the input gate updates the state of the storage cell, and the output information is determined by the output gate [39]. The execution of the LSTM updating of the internal cell is as follows:

f_{t} = σ (ω_{f h} h_{t - 1} + ω_{f x} x_{t} + b_{f})

(6)

i_{t} = σ (ω_{i h} h_{t - 1} + w_{i x} x_{t} + b_{i})

(7)

o_{t} = σ (ω_{o h} h_{t - 1} + ω_{o x} x_{t} + b_{o})

(8)

s_{t} = \tanh (ω_{c h} h_{t - 1} + ω_{c x} x_{t} + b_{c})

(9)

s i g m o i d (x) = \frac{1}{1 + e^{- x}}

(10)

\tanh (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(11)

where

ω

is the weight matrix of the three gates, b is the bias value,

σ

is the sigmoid activation function, and tanh is used to update the internal cell and the cell output. The associated formula (

C_{t}

) for updating the state of the storage cell as well as that for the output (

h_{t}

) of the LSTM cell are as follows:

C_{t} = f_{t} C_{t - 1} + i_{t} s_{t}

(12)

h_{t} = o_{t} t a n h (C_{t})

(13)

In BiLSTM for wind speed series, the forward LSTM encodes the captured information into hidden states for the subsequent prediction and classification tasks while the backward LSTM learns from the opposite direction and can capture the reverse timing information of the sequence data [40]. In this study, BiLSTM was used to predict wind speeds on wind farms, integrating past and future information to fully capture the time series features in the wind speed series. The model was flexibly adjusted and optimized according to the actual situation to adapt it to the different prediction tasks and data characteristics to ensure the accuracy and reliability of the prediction results. The network structure of the BiLSTM is shown in Figure 3.

The input data comprise the most relevant extracted feature, while the output is the predicted value. The forward- and backward-propagation processes of the hidden-layer neurons are represented by the following equations:

h_{t} = LSTM (x_{t}, h_{t - 1})

(14)

j_{t} = LSTM (x_{t}, j_{t + 1})

(15)

Y_{t} = ω_{0} h_{f} + b_{0}

(16)

where

h_{t}

is the neuron node for the forward propagation of the model;

j_{t}

is the neuron node for the backward propagation of the model;

Y_{t}

is the predicted value of the wind speed;

h_{f}

is the output of the fully connected layer;

ω_{0}

is the weight matrix; and

b_{0}

is the deviation.

2.4. RIME Optimization Algorithm

The RIME algorithm is a bionic optimization algorithm based on the process of ice and snow formation in nature. Its basic principle is to simulate the process of snow and ice crystallization on the surface of an object and find the optimal solution by continuously adjusting the individuals in the solution space [41]. Inspired by the diffusion-limited aggregation method of metal particle aggregation, the RIME algorithm simulates the motion of the aggregation of each frost ice particle into a body of frost ice agents. In this model, the motion behavior of each frost ice particle is simulated, and the final generated frost ice agent takes the form of a strip crystal [42]. The RIME algorithm comprises four phases: the initialization of frost ice clusters, a soft-frost searching strategy, a hard-frost piercing mechanism, and an improved greedy selection mechanism. The algorithm exploits this frost ice behavior to improve the particle updating core formulation, denoting the fog ice population (

R

) by the fog ice particle (

x_{i k}

). The optimization formula is as follows:

R = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 k} \\ x_{21} & x_{22} & \dots & x_{2 k} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{i 1} & x_{i 2} & \dots & x_{i k} \end{matrix}]

(17)

where

i

is the ordinal number of the freezing agent and

k

is the ordinal number of the freezing particle.

The position of the freezing fog particles is calculated as follows:

R_{i k}^{n} = R_{b, k} + c_{1} \cdot \cos θ \cdot α \cdot (h \cdot (U d_{i k} - L d_{i k}) + L d_{i k}), c_{2} < E

(18)

θ = π \cdot \frac{t}{10 \cdot T}

(19)

β = 1 - \frac{ω \cdot t}{t} / ω

(20)

E = \sqrt{(t / T)}

(21)

where

R_{i k}^{n}

is the new position of the particle after the update;

i k

is the kth particle of the ith fog agent;

R_{b, k}

is the kth particle of the best temporal agent in the temporal population (

R

); parameters

c_{1}

are random numbers between −1 and 1, which, together with

\cos θ

, control the particle’s direction of motion, which varies with the number of iterations;

α

is the environmental factor;

h

is the attachment, which is a random number between 0 and 1 and is used to control the distance between the centers of two gray particles;

U d_{i k}

and

L d_{i k}

are the upper and lower bounds of the escape space, respectively;

E

is the size of the coefficient attachment;

c_{2}

is a random number between 0 and 1, which, together with

E

, controls whether the particles merge (i.e., whether the particle positions are updated);

t

and

T

are the numbers of the current and maximum iterations, respectively;

β

is the step function of the mathematical model; and the default value of

ω

is 5, which limits the effective region of the particle movement.

In the initialization phase of the frost ice cluster, the algorithm initializes a set of initial solutions, which are usually randomly generated. These solutions constitute the initial “Frost Ice Cluster” (i.e., a set of candidate solutions in the solution space). In the soft-frost search strategy phase, the algorithm searches for solutions in the solution space using a soft-frost search strategy. Based on the soft-frost search strategy, the algorithm introduces a hard-frost puncturing mechanism to accelerate the search process and improve the efficiency of the search. The hard-frost puncture mechanism is like the puncturing of the frozen frost ice layer, and some special operations are performed to search deeply for the local optimal solutions in the solution space. The last stage is an improved greedy selection mechanism for selecting the optimal solution from the solutions obtained from the search [43]. This mechanism usually selects the optimal solution based on the value of the objective function or those of other metrics and uses it as the output of the algorithm. The flowchart of the RIME algorithm is shown in Figure 4.

2.5. The Multi-Head Self-Attention Mechanism

The multi-head self-attention mechanism extends the attention mechanism that can learn the local and global features in parallel in different representation subspaces. The mechanism can capture the changing features of the input data at different time scales, enhancing the model’s ability to capture long-distance dependencies [44].

The multi-head self-attention mechanism maps the input query (

Q

), key (

K

), and value (

V

) vectors to multiple subspaces separately. Independent attention weights are computed in each subspace and multiple attention outputs are computed in parallel [45]. The multi-head self-attention mechanism first generates the input sequence into query, key, and value matrices via linear transformation, and the matrices are generated as follows:

Q_{i} = X W_{Q_{i}}

(22)

K_{i} = X W_{K_{i}}

(23)

V_{i} = X W_{V_{i}}

(24)

where

W_{Q_{i}}

,

W_{K_{i}}

,

W_{V_{i}}

is the weight matrix of the

i

th header.

Attention scores are calculated independently for each head, and different attention outputs are obtained:

M_{i} = softmax (\frac{Q_{i} K_{i}^{K}}{\sqrt{d}}) V_{i}

(25)

The attentional outputs from all the heads are spliced and then transformed through a linear layer:

M = C o n c a t (M_{1}, M_{2}, \dots, M_{h})

(26)

3. Wind Speed Prediction Model Based on CEEMDAN-RIME-MHSA-BiLSTM

3.1. MHSA-BiLSTM Model Structure

When dealing with time series data, BiLSTM models are widely used because of their ability to capture the backward and forward dependencies of sequence data. However, when dealing with long sequence data, it may be difficult to effectively characterize long-distance temporal dependencies using BiLSTM, resulting in an insufficient ability to capture key information in the sequence. To overcome this problem, in this study, we adopted the multi-head self-attention mechanism to improve the BiLSTM model framework, aiming to enhance the model’s ability to characterize long sequence data and improve the prediction accuracy. The model framework mainly comprises an input layer, two BiLSTM layers, a multi-head self-attention mechanism, a fully connected layer, and an output layer. The model architecture is shown in Figure 5.

The decomposed IMFs are fed into the model, and the input layer receives the subseries data ( $X = [X_{1}, X_{2}, \dots, X_{T}]$ ), which represent the eigenvalues of the different time steps in the time series;
The input data are passed for processing in the first-layer BiLSTM, which has $n$ neural units containing both forward and backward LSTM. The first-layer BiLSTM processes the data between the time steps $t = 1$ and $t = T$ ; the output of the forward LSTM at time step t is $Y_{t}^{f}$ and the output of the backward LSTM is $Y_{t}^{b}$ . The outputs of the two directions are concatenated at each time step to form the output of the first layer (i.e., the $2 n$ dimensional vector ( $Y = [Y_{t}^{f}, Y_{t}^{b}]$ ));
The output of the first BiLSTM layer is passed as input to the second BiLSTM layer, which has $m$ neural units. This layer further extracts the deeper features of the time series and generates a new output matrix of dimension $2 m (Z = [Z_{t}^{f}, Z_{t}^{b}]);$
The generated feature matrix goes to the multi-head self-attention mechanism module. This mechanism highlights the most relevant parts of the current prediction by calculating the correlation between each time step and other time steps and assigning them different weights. Outputs of the multiple attention heads are concatenated and linearly combined to generate a weighted $T \times 2 m$ feature matrix ( $A = [A_{t}^{f}, A_{t}^{b}]$ ) that further captures the important dependencies between the time steps;
Eventually, the feature matrix processed by the self-attention mechanism is passed to the fully connected layer. The fully connected layer maps these high-dimensional features to a final output value ( $O_{T}$ ), such as a prediction of the future wind speed. This output value is the prediction made by the model for the input subsequence.

In order to further improve the BiLSTM performance in dealing with long series data, CEEMDAN and the RIME algorithm are fused based on the original MHSA-BiLSTM model framework. This fusion method not only decomposes the time series but also optimizes the parameters of the BiLSTM, which significantly improves the prediction performance of the model. The input time series is first decomposed using CEEMDAN to obtain different components (IMFs). Next, a BiLSTM model containing a multi-head self-attention mechanism is defined, and its parameters are optimized via the RIME algorithm. Finally, the model generates final predictions through a training process. The CEEMDAN-RIME-MHSA-BiLSTM algorithm is shown in Algorithm 1:

Algorithm 1. CEEMDAN-BiLSTM with RIME Optimization and MHSA

1:: Input: Time series data $X$ , BiLSTM parameters, RIME optimizer parameters

2:: Output: Predicted wind speed values $\hat{y}$

3:: Decompose $X$ using CEEMDAN into Intrinsic Mode Functions (IMFs):

4:: $\{I M F_{1}, I M F_{2}, \dots, I M F_{n}\} \leftarrow C E E M D A N (X)$

5.: Initialize BiLSTM model parameters and RIME optimizer.

6.: for each training epoch e = 1 to E do

7:: $for each I M F$ $I M F_{i}$ $I M F_{1}$ to $I M F n$ do

8:: for each time step $t$ in $I M F_{i}$ do

9:: $H_{1}^{f}, H_{1} b \leftarrow B i L S T M - 1 (I M F_{i} (t))$

10:: $Pass the output of the first BiLSTM layer H_{1}$ into the second BiLSTM layer:

11:: $H_{2}^{f}, H_{2}^{b} \leftarrow B i L S T M - 2 (H_{1})$

12.: Optimize BiLSTM parameters using RIME Algorithm:

13:: $θ_{B i L S T M} \leftarrow R I M E (θ_{B i L S T M})$

14.: Apply MHSA (Multi-Head Self-Attention) on BiLSTM outputs:

15:: $M_{i} = softmax (\frac{Q_{i} K_{i}^{K}}{\sqrt{d}}) V_{i}$

16.: Concatenate the attention outputs from all heads:

17:: $M = C o n c a t (M_{1}, M_{2}, \dots, M_{h})$

18.: end for

19:: $Pass the concatenated matrix A_{t} = [A_{t}^{f}, A_{t}^{b}]$ into the fully connected layer:

20:: $O_{T} \leftarrow D e n s e (A_{t})$

21:: $for each batch (X_{b}, y_{b})$ from training data do

22.: Forward pass:

23:: ${\hat{y}}_{b} \leftarrow O_{T}$

24:: Backpropagation: Update network weights $θ$

25.: end for

26.: end for

27.: Check for convergence

28.: If the convergence criteria are met, terminate training

29:: $for each I M F$ $I M F_{i}$ $I M F_{1}$ to $I M F n$ do

30.: Aggregate predictions:

31:: $\hat{y} \leftarrow \sum_{i = 1}^{N} W e i g h t e d (O_{i})$

32.: end for

3.2. Projected Evaluation Indicators

Prediction evaluation indicators are decisive for assessing the predictive ability of a model. The selection of the appropriate evaluation indicators is essential to ensure the accuracy of the prediction results. In this study, five indicators, the root-mean-square error (

K_{R M S E}

), mean absolute percentage error (

K_{M A P E}

), mean absolute error (

K_{M A E}

), coefficient of determination (

R^{2}

), and Nash efficiency coefficient (

N S E

), were chosen to objectively and comprehensively evaluate the accuracy of the wind speed prediction model. The first three indicators were mainly used to measure the degree of error between the predicted and actual values of the model. When the predicted value deviates from the actual value, the model will have a relatively large value [46]:

K_{RMSE} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - f_{i})}^{2}}

(27)

K_{MAE} = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - f_{i}|

(28)

K_{MAPE} = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - f_{i}}{y_{i}}| \times 100 %

(29)

R^{2} = 1 - {\frac{\sum_{i = 1}^{n} (y_{i} - f_{i})}{\sum_{i = 1}^{n} {(y_{i} - \frac{\sum_{i = 1}^{n} y_{i}}{n})}^{2}}}^{2}

(30)

NSE = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - f_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(31)

where

y_{i}

is the real value of the wind speed,

f_{i}

is the predicted value of the wind speed obtained by the prediction model, and

n

is the number of samples. A decrease in the

K_{R M S E}

,

K_{M A P E}

, and

K_{M A E}

values alongside an increase in the

R^{2}

and

N S E

values shows an improvement in the model’s prediction accuracy.

R^{2}

and

N S E

values close to 1 indicate that the model prediction ability is better.

4. Experimental Results and Analysis

4.1. Raw Wind Speed Series

The spatial and temporal variations in wind speed are complex and show obvious seasonal characteristics. In this study, the wind speed data of a wind farm in Inner Mongolia for different time periods in the four seasons in 2023 were selected as the sample dataset. The data were sampled from 0:00 to 24:00 locally and collected every minute, with a total of 5760 samples. To ensure the validity of the model training and testing, the collected data were processed with outliers. Then, the sample data were divided into training and testing sets in a ratio of 7:3, ensuring that the model achieved accurate predictions. In addition, the official website of Iowa State University was used to obtain the wind speed data of a wind farm in the United States to facilitate the comparative study. The following experimental simulations were completed on the Matlab R2021a platform. The raw data had high volatility, as did the measured wind speed data on 20 March 2023. The raw wind speed is shown in Figure 6 as an example.

4.2. Wind Speed Decomposition

For the prediction of wind speed data, the nonlinear characteristics of the original sequence may lead to a limited applicability of the prediction model. CEEMDAN is used to decompose the original wind speed series into different modal functions, and each IMF has a clear frequency characteristic, which effectively eliminates the nonlinear characteristics in the original sequence.

In this study, the CEEMDAN algorithm was used to preprocess the original wind speed series. The noise weight was set to 0.2, and the number of noise additions was 100. The decomposition results of the CEEMDAN are shown in Figure 7. From the figure, the results show that the CEEMDAN effectively decomposed the original sequence into components with significant frequency characteristics, and there was no obvious oscillation of the original signal at different time scales. Starting from IMF4, the periodic variation in the wind speed series was captured.

The wind speed series obtained after reconstructing the IMF component of the original signal is shown in Figure 8. The reconstructed signal is smoother and more stable and is highly consistent with the original signal, verifying the effectiveness of the signal reconstruction. By using the CEEMDAN algorithm for decomposition, key periodic features in the signal are successfully retained, while fluctuations are reduced, and harmonic interference is removed. This noise reduction treatment significantly improves the prediction accuracy and overall stability of the model.

4.3. Validation of Wind Speed Prediction Algorithms

After the CEEMDAN and reconstruction of the wind speed data, the BiLSTM model was used to predict the processed wind speed series. The specific configuration and validation process of the BiLSTM model are described below. The BiLSTM model uses a hidden-layer structure containing 25 neurons with a ReLU activation layer and a regression layer. The model was compiled with the mean square error (MSE) as the loss function, and the initial learning rate was set to 0.005. The number of training times (epoch) was set to 20, and an L2 regularization coefficient of 0.01 was used to prevent overfitting. During the training process, the RIME algorithm was used to adjust the learning rate to optimize the model performance. The RIME optimization curve is shown in Figure 9, which finds the optimal value at two iterations and converges rapidly. The trend of the RIME algorithm in different parameters is shown in Figure 10; all the parameters were stabilized after three iterations.

To more deeply verify the improvement in the model accuracy via the multi-head self-attention mechanism and the RIME optimization ability on the model, ablation experiments were performed. The model prediction results are shown in Figure 11. The predictions of the hybrid models are better than those of the single BiLSTM model. Among the hybrid models, the prediction curve of the CEEMDAN-RIME-MHSA-BiLSTM model is closer to the real value, with a higher fit to the real value and a better prediction effect, demonstrating that the MHSA-BiLSTM model combines the time decomposition capability of CEEMDAN with RIME optimization to effectively capture the dynamic properties and dependencies of the wind speed.

A comparison of the prediction results was performed using evaluation indices, and the evaluation indices for each module of the proposed model are shown in Table 1. From Table 1, it can be seen that BiLSTM was selected as the underlying single model because of the improvement in the prediction performance by adding decomposition. After CEEMDAN, the evaluation indices of the CEEMDAN-BiLSTM model (

K_{R M S E}

) were reduced by 20%, the

K_{M A E}

was reduced by 17%, and the K_MAPE was reduced by 3.64%. After the RIME optimization, the evaluation indices of the RIME-BiLSTM model (

K_{R M S E}

) decreased by 44%, the

K_{M A E}

decreased by 36%, and the K_MAPE decreased by 6.25%. To minimize the error in the hybrid model, decomposition, optimization, and multi-head self-attention mechanisms were incorporated into the BiLSTM model. The optimization results show that the R² value of the proposed model improved by 10% and 3% compared to the other two hybrid models. Additionally, the prediction curve of this model is closer to the true values, with the NSE increasing by 8% and 3%. Based on these metrics, it can be seen that the proposed combined model has the smallest error in the

K_{R M S E}

,

K_{M A E}

, and

K_{M A P E}

evaluation metrics and provides accurate and effective prediction results. The results show that RIME optimization and CEEMDAN have a significant effect on the prediction performance of the model. The best prediction results can only be obtained if both are introduced into the model at the same time. CEEMDAN reduces the complexity of the data through multiscale decomposition, while RIME optimization improves the accuracy of the model parameter selection. Both are essential for improving the accuracy of the model over a wide range of predictions.

To further verify the prediction capability and robustness of the proposed CEEMDAN-RIME-BiLSTM model under diverse conditions, several hybrid models, including EMD-BiLSTM, CNN-LSTM, and the standalone GRU model, were constructed. These models were tested on the same dataset, comparing the wind speed prediction metrics across different time periods and geographic locations. Table 2 lists the evaluation results of the abilities of the different models to predict wind speeds across various seasons. Table 3 compares the performances of the models in predicting wind speed intervals on different wind farms. The results show that the proposed model can stably predict wind speeds across different time periods and scenarios, demonstrating its reliability. In the following sections, we analyze the results for each scenario in detail.

According to the vertical comparison of the wind speed prediction metrics on any day across different seasons in Table 2, the proposed model exhibits high accuracy. The R² values are 0.98, 0.99, 0.98, and 0.99, with errors consistently smaller than those of the other models. In the horizontal comparison of the metrics for a specific day across different seasons, all four models had good predictive performances. However, the proposed model had lower error metrics, with R² and NSE values closer to 1. The proposed model consistently outperformed the others in all seasons, displaying a stronger generalization ability and stability, which confirms the superior predictive performance of the CEEMDAN-RIME-MHSA-BiLSTM model. The other wind speed prediction methods (EMD-BiLSTM, CNN-BiLSTM, and GRU), despite their better performances in certain linear time series predictions, often face the problems of weak model feature capture, low accuracy, and poor generalization when dealing with highly nonlinear and volatile time series data such as wind speed data. The model proposed in this paper effectively overcomes these problems through multiscale decomposition and the MHSA, which is capable of discovering potential patterns and dependencies in complex wind speed data and not only captures local features effectively but also focuses on global information at the same time. The MHSA improves the model’s ability to adapt to different seasons and wind speed change patterns, and it effectively enhances the accuracy and stability of the prediction, indicating that the method can effectively deal with the complexity and volatility of wind speed data.

Table 3 was calculated based on the wind speed data from the official website of Iowa State University. According to a comparison of the prediction results across different wind farms, the proposed model continued to exhibit a strong predictive performance at other wind farm locations. The prediction results are shown in Figure 12.

Based on the wind speed data from March 20 in Table 2, the proposed model showed a superior performance in predicting the wind speeds at both the Inner Mongolia and Iowa wind farms, with R² values of 0.98 and 0.99 and NSE values of 0.97 and 0.99, respectively. By analyzing the daily wind speed data from both wind farms and comparing the results with the other models, it can be verified that this method remains applicable across different geographic locations, highlighting its strong adaptability in predicting data from different areas.

The prediction results for the different wind farm test sets, as shown in Figure 13, reveal that the error fluctuates with changes in the test set length. The evaluation metrics for the different training sets are provided in Table 4. When the training set length is 70%, the CEEMDAN-RIME-MHSA-BiLSTM model reduces the KRMSE by 4% and 10% compared to training set lengths of 50% and 60%, respectively. Similarly, the KMAE decreases by 6% and 17%, while the KMAPE drops by 2.45%. The experiment shows that the CEEMDAN-RIME-BiLSTM short-term wind speed prediction model achieves a strong predictive accuracy and stability with an appropriately sized training set. In particular, when the training set length reaches 70%, the model’s prediction accuracy is higher, with NSE values closer to 1, showing a closer fit to the actual values.

5. Conclusions

Aimed at addressing the obvious discrepancies between the predicted and actual results when using existing wind speed prediction methods, in this study, we propose the MHSA-BiLSTM method, which incorporates multiscale decomposition and RIME optimization for short-term wind speed prediction. The MHSA-BiLSTM method aims to improve the power generation efficiency of wind farms as well as reduce the impact of the wind power output on the grid-connected stability. Through experimental analysis, the method proposed in this paper showed superior prediction performance on several experimental datasets. The main findings are summarized as follows:

MHSA-BiLSTM, as the core of CEEMDAN-RIME-MHSA-BiLSTM, is good at extracting key information in long series and capturing different time–frequency scale information and complex dependencies in wind speed series, which greatly improves the prediction accuracy of the model. The experimental results show that the KRMSE, KMAE, and KMAPE values of the CEEMDAN-RIME-MHSA-BiLSTM are 0.33, 0.17, and 4.12%, respectively, which are 36% lower in the RMSE, 31% lower in the MAE, and 2.62% lower in the MAPE compared with the EMD-BiLSTM model. Compared with the CNN-BiLSTM model, the RMSE is reduced by 45%, the MAE by 38%, and the MAPE by 3.86%. Compared with the single-model GRU, the R² and NSE are improved by 28% and 36%, respectively. The combination of the RIME algorithm and CEEMDAN with MHSA-BiLSTM effectively reduces the complexity and improves the performance of long-time wind speed series prediction, which demonstrates the superiority of the combined model in short-term wind speed prediction;
The original wind speed sequence is decomposed using CEEMDAN, which decomposes the complex wind speed signal into multiple modal components with different frequency characteristics. When the prediction performance of CEEMDAN-BiLSTM is compared with that of the single-model BiLSTM, the $R^{2}$ and $N S E$ values are improved by 6% and 8%, and the CEEMDAN-BiLSTM exhibits better accuracy. This indicates that the decomposition strategy can improve the prediction accuracy in a limited way, which reduces the noise and nonlinear perturbations in the original data and improves the robustness of the model;
The RIME algorithm optimizes the pairwise hidden-layer nodes, the number of training times, and the learning rate of the long short-term memory network, which solves the limitations of the traditional BiLSTM model in the parameter selection and significantly improves the prediction performance of the model. According to the experimental results, the optimized BiLSTM model of the RIME algorithm is better than the single prediction model in terms of the RMSE and MAE, which verifies the hypothesis that the optimization algorithm can improve the prediction accuracy of the model;
CEEMDAN-RIME-MHSA-BiLSTM maintains a stable and highly accurate performance at different wind speeds in different geographic locations and at different time periods. The predictive performances of the KRMSE, KMAE, and KMAPE for the Iowa wind farm in the U.S. are 0.12, 0.08, and 1.09%, respectively, and the R² and NSE reach 0.99. The predictive performance of the model is greater than those of the other models for any one day in different seasons. As a result, the model proposed in this paper is highly applicable to wind speed prediction in scenarios with different geographical locations and time periods.

In conclusion, the CEEMDAN-RIME-BiLSTM model proposed in this study showed superior predictive performance in the analysis and processing of wind farm data and achieved the objectives of this study. Despite its significant advantages, there are still some limitations regarding its applicability. Future research could consider introducing wind speed data from more geographic regions and could try to validate the model under different climatic conditions. In addition, exploring more efficient model training algorithms, such as through introducing lightweight network structures or distributed computing techniques, may help to improve the computational efficiency of the model.

Author Contributions

Z.Z.: conceptualization, methodology, formal analysis, survey, resources, data management, validation, writing—original draft, project management. W.Y.: conceptualization, methodology, formal analysis, investigation, data management, experimental validation, writing—original draft, writing—review and editing. K.M.: supervision, methodology, survey, resources, validation, project management. R.W.: software, formal analysis, visualization. K.W.: supervision, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Inner Mongolia Autonomous Region “Open Competition Mechanism to Select the Best Candidates” Project (grant number 2022JBGS0045), the Inner Mongolia Autonomous Region Science and Technology Major Special Program Project (grant number 2020ZD0016 and 2021ZD0032), and the Inner Mongolia Autonomous Region Science and Technology Program Project (grant number 2020GG0281).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this paper can be obtained by contacting the authors of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, N.N.; Xiong, G.J.; Chen, J.L.; Yao, G. Unit commitment of the power system containing wind power via quantum discrete differential evolution. Power Syst. Clean Energy 2022, 38, 89–96. [Google Scholar]
Wang, D.F.; Zhang, Z.Y.; Gu, Z.Y.; Huang, Y. Wind speed prediction of wind farm based on hybrid copula function and whale optimization algorithm. Electr. Power Sci. Eng. 2022, 38, 33–40. [Google Scholar]
Sun, P.X.; Wang, J.; Yan, Z.G. Ultra-short-term wind speed prediction based on TCN-MCM-EKF. Energy Rep. 2024, 11, 2127–2140. [Google Scholar] [CrossRef]
Feng, L.; Zhou, Y.; Luo, Q. Complex-valued artificial hummingbird algorithm for global optimization and short-term wind speed prediction. Expert Syst. Appl. 2024, 246, 123160. [Google Scholar] [CrossRef]
Xiao, X.Z.; Zi, X.X.; Yu, W.; Gao, X.X.; Huang, X.Y.; Liu, R.Z.; Chen, Y.; Liu, H.X. Research on wind speed behavior prediction method based on multi-feature and multi-scale integrated learning. Energy 2023, 263, 125593. [Google Scholar]
Jiang, Z.Y.; Jia, Q.S. A Review of Multi-temporal-and-spatial-scale Wind Power Forecasting Method. Acta Autom. Sin. 2019, 45, 51–71. [Google Scholar]
Sun, R.; Li, Q.; Luo, H.F.; Dou, X.; Deng, Y.H. Wind power forecasting based on error correction using adaptive moving smoothing and time convolution network. J. Glob. Energy Interconnect. 2022, 5, 11–22. [Google Scholar]
Mark, C.; Liu, S. Distributionally robust model predictive control for wind farms. IFAC-PapersOnLine 2023, 56, 7680–7685. [Google Scholar] [CrossRef]
Liu, M.D.; Ding, L.; Bai, Y.L. Application of hybrid model based on empirical mode decomposition, novel recurrent neural networks and the ARIMA to wind speed prediction. Energy Convers. Manag. 2021, 233, 113917. [Google Scholar] [CrossRef]
Liu, X.; Lin, Z.; Feng, Z. Short-term offshore wind speed forecast by seasonal ARIMA—A comparison against GRU and LSTM. Energy 2021, 227, 120492. [Google Scholar] [CrossRef]
Chen, H.F.; Wang, H.; Li, Y.; Xiong, M. Short-term wind speed prediction by combining two- step decomposition and ARIMA-LSTM. Acta Energiae Solaris Sin. 2024, 45, 164–171. [Google Scholar]
Tian, Z.D.; Li, J.S.; Wang, Y.H.; Gao, X.W. Short-term wind speed hybrid prediction model based on ARIMA and ESN. Acta Energiae Solaris Sin. 2019, 37, 1603–1610. [Google Scholar]
Chen, X.H.; Wang, Z.S.; Wu, C. Research on short-term integrated forecasting model of hour-based load in micro-grid based on long-short-term memory network. Chin. J. Manag. Sci. 2023, 1574, 1–12. [Google Scholar]
Li, Z.; Luo, X.R.; Liu, M.J.; Cao, X.; Du, S.H.; Sun, H.X. Wind power prediction based on EEMD-Tent-SSA-LS-SVM. Energy Rep. 2022, 8, 3234–3243. [Google Scholar] [CrossRef]
Wang, J.; Niu, X.; Zhang, L.; Liu, Z.; Huang, X. A wind speed forecasting system for the construction of a smart grid with two-stage data processing based on improved ELM and deep learning strategies. Expert Syst. Appl. 2024, 241, 122487. [Google Scholar] [CrossRef]
Chaka, M.D.; Semie, A.G.; Mekonned, Y.S.; Geffe, C.A.; Kebede, H.; Mersha, Y.; Anose, F.; Benti, N.E. Improving wind speed forecasting at Adama wind farm II in Ethiopia through deep learning algorithms. Case Stud. Chem. Environ. Eng. 2024, 9, 100594. [Google Scholar] [CrossRef]
He, J.; Wang, X.F. Short-Term wind speed prediction based on ARIMA and LS-SVM composite model. Electr. Eng. Technol. 2023, 52, 30–33. [Google Scholar]
You, Q. Research on Pridiction of Short Term Wind Speed Based on PSO Optimizing LSSVM. Master’s Thesis, Harbin Institute of Technology, Harbin, China, 2022. [Google Scholar]
Zhu, C.S.; Li, S.H. Random forest regression model based on improved fruit fly optimization algorithm and its application in wind speed forecasting. J. Lanzhou Univ. Technol. 2021, 47, 83–90. [Google Scholar]
Zhang, J.P.; Yu, X.J.; Chen, D.; Ji, H.P. An improved RCSA-ANN model for the prediction of offshore short-term wind speed. Acta Aerodyn. Sin. 2022, 40, 110–116. [Google Scholar]
Jonkers, J.; Avendano, D.N.; Wallendael, G.V.; Hoecke, S.V. A novel day-ahead regional and probabilistic wind power forecasting framework using deep CNNs and conformalized regression forests. Appl. Energy 2024, 361, 122900. [Google Scholar] [CrossRef]
Liu, Z.H.; Wang, C.T.; Wei, H.L.; Zeng, B.; Li, M.; Song, X.P. A wavelet-LSTM model for short-term wind power forecasting using wind farm SCADA data. Expert Syst. Appl. 2024, 247, 123237. [Google Scholar] [CrossRef]
Zhao, Z.; Yun, S.; Jia, L.; Guo, J.; Meng, Y.; He, N.; Li, X.J.; Shi, J.R.; Ynag, L. Hybrid VMD-CNN-GRU-based model for short-term forecasting of wind power considering spatio-temporal features. Eng. Appl. Artif. Intell. 2023, 121, 105982. [Google Scholar] [CrossRef]
Duan, J.; Chang, M.; Chen, X.; Wang, W.; Zuo, H.; Bai, Y.; Chen, B. A combined short-term wind speed forecasting model based on CNN–RNN and linear regression optimization considering error. Renew. Energy 2022, 200, 788–808. [Google Scholar] [CrossRef]
Li, S.Y.; Liu, C. Short-term wind speed forecast based on LSTM by optimized bald eagle search algorithm. Ningxia Electr. Power 2023, 38, 19–24. [Google Scholar]
Yan, Y.; Wang, X.; Ren, F.; Shao, Z.; Tian, C. Wind speed prediction using a hybrid model of EEMD and LSTM considering seasonal features. Energy Rep. 2022, 8, 8965–8980. [Google Scholar] [CrossRef]
Ding, R.Q.; Zhou, W.N.; Cheng, H.Y.; Liu, J.L. A novel method based on SSA-BiLSTM networks under deep learning framework for wind speed forecasting. Comput. Digit. Eng. 2020, 48, 45–50. [Google Scholar]
Liu, T.Y.; Chen, W.; Chang, L.; Gu, T.L. Research advances in the knowledge tracing based on deep learning. J. Comput. Res. Dev. 2022, 59, 81–104. [Google Scholar]
Zhi, L.H.; Zi, Y.; Xu, K. Combination prediction of wind speed based on variational mode decomposition and neural network. J. Hefei Univ. Technol. 2022, 45, 1505–1510, 1584. [Google Scholar]
Fu, W.L.; Zhang, X.R.; Zhang, H.R.; Fu, Y.C.; Liu, X.T. Ultra-short-term wind speed prediction based on INGO-SWGMN hybrid model. Acta Energiae Sol. Sin. 2022, 45, 133–143. [Google Scholar]
Zhang, Y.N.; Shi, J.R.; Li, J.; Yun, S.N. Short-term wind speed prediction based on residual and VDM-ELM-LSTM. Acta Energiae Solaris Sin. 2023, 44, 340–343. [Google Scholar]
Yi, Y.Y.; Pan, W.H.; Zhao, W.G.; Su, Z.P.; Han, Y. Ultra-short-term wind speed prediction method based on CEEMDAN and BiLSTM-AM. Electr. Meas. Instrum. 2024, 45, 1–9. Available online: http://kns.cnki.net/kcms/detail/23.1202.th.20240617.0943.002.html (accessed on 12 September 2024).
Phan, B.; Nguyen, T.T. Enhancing wind speed forecasting accuracy using a GWO-nested CEEMDAN-CNN-BiLSTM model. ICT Express 2024, 10, 485–490. [Google Scholar] [CrossRef]
Xiong, Z.; Yao, J.; Huang, Y.; Yu, Z.; Liu, Y. A wind speed forecasting method based on EMD-MGM with switching QR loss function and novel subsequence superposition. Appl. Energy 2024, 353, 122248. [Google Scholar] [CrossRef]
Wei, X.; Shi, Q.; Fu, W.X.; Chen, L. Short-term wind speed prediction with CEEMDAN sample entropy and SVR. Water Resour. Power 2020, 38, 207–210. [Google Scholar]
Song, K.; Yu, Y.; Zhang, T.; Li, X.; Lei, Z.; He, H.; Wang, Y.; Gao, S. Short-term load forecasting based on CEEMDAN and dendritic deep learning. Knowl.-Based Syst. 2024, 294, 111729. [Google Scholar] [CrossRef]
Liu, F.; Liang, C. Short-term power load forecasting based on AC-BiLSTM model. Energy Rep. 2024, 11, 1570–1579. [Google Scholar] [CrossRef]
Peng, S.; Zhu, J.; Wu, T.; Yuan, C.; Cang, J.; Zhang, K.; Pecht, M. Prediction of wind and PV power by fusing the multi-stage feature extraction and a PSO-BiLSTM model. Energy 2024, 298, 131345. [Google Scholar] [CrossRef]
Malakouti, S.M.; Krimi, F.; Abdollahi, H.; Menhaj, M.B.; Suratgar, A.A.; Moradi, M.H. Advanced Techniques for Wind Energy Production Forecasting: Leveraging Multi-Layer Perceptron Bayesian Optimization, Ensemble Learning, and CNN-LSTM Models. Case Stud. Chem. Environ. Eng. 2024, 10, 100881. [Google Scholar] [CrossRef]
Ma, Y.; Li, J.; Gao, J.; Chen, H. State of health prediction of lithium-ion batteries under early partial data based on IWOA-BiLSTM with single feature. Energy 2024, 295, 131085. [Google Scholar] [CrossRef]
Ismalle, A.K.; Houssein, E.H.; Khafaga, D.S.; Aldakheel, E.A.; Said, M. Performance of rime-ice algorithm for estimating the PEM fuel cell parameters. Energy Rep. 2024, 11, 3641–3652. [Google Scholar] [CrossRef]
Abdel-Salam, M.; Hu, G.; Celik, E.; Gharehchopogh, F.S.; El-Hasnony, I.M. Chaotic RIME optimization algorithm with adaptive mutualism for feature selection problems. Comput. Biol. Med. 2024, 179, 108803. [Google Scholar] [CrossRef] [PubMed]
Pandy, S.B.; Kalita, K.; Jangir, P.; Cep, R.; Migdady, H.; Chohan, J.S.; Abualigah, L.; Mallik, S. Multi-objective RIME algorithm-based techno economic analysis for security constraints load dispatch and power flow including uncertainties model of hybrid power systems. Energy Rep. 2024, 11, 4423–4451. [Google Scholar] [CrossRef]
Liu, W.; Bai, Y.; Yue, X.; Wang, R.; Song, Q. A wind speed forcasting model based on rime optimization based VMD and multi-headed self-attention-LSTM. Energy 2024, 294, 130726. [Google Scholar] [CrossRef]
Pourdaryaei, A.; Mohammadi, M.; Mubarak, H.; Abdellatif, A.; Karimi, M.; Gryazina, E.; Terzija, V. A new framework for electricity price forecasting via multi-head self-attention and CNN-based techniques in the competitive electricity market. Expert Syst. Appl. 2024, 235, 121207. [Google Scholar] [CrossRef]
Chen, Y.; Dono, Z.; Wang, Y.; Su, J.; Han, Z.; Zhou, D.; Zhang, K.; Zhao, Y.; Bao, Y. Short-term wind speed predicting framework based on EEMD-GA-LSTM method under large scaled wind history. Energy Convers. Manag. 2021, 227, 113559. [Google Scholar] [CrossRef]

Figure 1. Steps of CEEMDAN-RIME-MHSA-BiLSTM wind speed combination prediction algorithm.

Figure 2. Structure of LSTM network.

Figure 3. Structure of BiLSTM network.

Figure 4. Flowchart of RIME optimization algorithm.

Figure 5. Diagram of MHSA-BiLSTM model framework.

Figure 6. Raw wind speed series.

Figure 7. Results of CEEMDAN of wind speed series.

Figure 8. Comparison of the original and reconstructed signals.

Figure 9. RIME adaptation curve.

Figure 10. Variation in RIME hyperparameters with the number of iterations.

Figure 11. (a) BiLSTM prediction model; (b) CEEMDAN-BiLSTM prediction model; (c) RIME-BiLSTM prediction model; (d) CEEMDAN-RIME-MHSA-BiLSTM prediction model.

Figure 12. Model predictions for the Iowa wind farm.

Figure 13. (a) Prediction with 50% training set length; (b) prediction with 60% training set length; (c) prediction with 70% training set length.

Table 1. Evaluation indices obtained in ablation experiments.

Predictive Model	$R M S E$	$M A E$	$M A P E$	$R^{2}$	$N S E$
CEEMDAN-RIME-MHSA-BiLSTM	0.33	0.17	4.12%	0.98	0.97
CEEMDAN-BiLSTM	0.72	0.57	8.33%	0.88	0.89
RIME-BiLSTM	0.48	0.38	5.72%	0.95	0.94
BiLSTM	0.92	0.74	11.97%	0.82	0.81

Table 2. Evaluation indicators for different models under the four seasons.

Date	Predictive Model	$R M S E$	$M A E$	$M A P E$	$R^{2}$	$N S E$
March 20th (Spring)	CEEMDAN-RIME-MHSA-BiLSTM	0.33	0.17	4.12%	0.98	0.97
	EMD-BiLSTM	0.69	0.48	6.74%	0.87	0.86
	CNN-BiLSTM	0.78	0.55	7.98%	0.85	0.83
	GRU	1.42	0.85	12.19%	0.70	0.61
July 9th (Summer)	CEEMDAN-RIME-MHSA-BiLSTM	0.19	0.12	2.10%	0.99	0.98
	EMD-BiLSTM	0.64	0.45	6.43%	0.90	0.88
	CNN-BiLSTM	0.74	0.50	7.12%	0.86	0.83
	GRU	1.21	0.78	11.12%	0.72	0.65
November 15th (Fall)	CEEMDAN-RIME-MHSA-BiLSTM	0.27	0.15	2.74%	0.98	0.97
	EMD-BiLSTM	0.71	0.51	6.92%	0.88	0.86
	CNN-BiLSTM	0.83	0.60	9.12%	0.84	0.82
	GRU	1.92	1.10	15.11%	0.60	0.55
December 25th (Winter)	CEEMDAN-RIME-MHSA-BiLSTM	0.21	0.14	2.71%	0.99	0.98
	EMD-BiLSTM	0.71	0.50	6.94%	0.88	0.86
	CNN-BiLSTM	0.75	0.55	8.13%	0.85	0.83
	GRU	1.33	0.94	12.58%	0.70	0.65

Table 3. Evaluation of single-day wind speed prediction for Iowa wind farm.

Predictive Model	$R M S E$	$M A E$	$M A P E$	$R^{2}$	$N S E$
CEEMDAN-RIME-MHSA-BiLSTM	0.12	0.08	1.09%	0.99	0.99
EMD-BiLSTM	0.54	0.32	6.17%	0.93	0.92
CNN-BiLSTM	0.69	0.48	8.32%	0.88	0.86
GRU	0.87	0.61	12.14%	0.80	0.78

Table 4. Prediction results of the model under different training sets.

Training Set Length	$R M S E$	$M A E$	$M A P E$	$R^{2}$	$N S E$
50%	0.37	0.23	4.72%	0.97	0.96
60%	0.43	0.34	6.57%	0.94	0.94
70%	0.33	0.17	4.12%	0.98	0.97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, W.; Zhang, Z.; Meng, K.; Wang, K.; Wang, R. CEEMDAN-RIME–Bidirectional Long Short-Term Memory Short-Term Wind Speed Prediction for Wind Farms Incorporating Multi-Head Self-Attention Mechanism. Appl. Sci. 2024, 14, 8337. https://doi.org/10.3390/app14188337

AMA Style

Yang W, Zhang Z, Meng K, Wang K, Wang R. CEEMDAN-RIME–Bidirectional Long Short-Term Memory Short-Term Wind Speed Prediction for Wind Farms Incorporating Multi-Head Self-Attention Mechanism. Applied Sciences. 2024; 14(18):8337. https://doi.org/10.3390/app14188337

Chicago/Turabian Style

Yang, Wenlu, Zhanqiang Zhang, Keqilao Meng, Kuo Wang, and Rui Wang. 2024. "CEEMDAN-RIME–Bidirectional Long Short-Term Memory Short-Term Wind Speed Prediction for Wind Farms Incorporating Multi-Head Self-Attention Mechanism" Applied Sciences 14, no. 18: 8337. https://doi.org/10.3390/app14188337

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CEEMDAN-RIME–Bidirectional Long Short-Term Memory Short-Term Wind Speed Prediction for Wind Farms Incorporating Multi-Head Self-Attention Mechanism

Abstract

1. Introduction

2. Wind Speed Prediction Research Methodology

2.1. Forecasting Process

2.2. Fully Adaptive Noise Ensemble Empirical Modal Decomposition

2.3. Bidirectional Long Short-Term Memory Neural Network

2.4. RIME Optimization Algorithm

2.5. The Multi-Head Self-Attention Mechanism

3. Wind Speed Prediction Model Based on CEEMDAN-RIME-MHSA-BiLSTM

3.1. MHSA-BiLSTM Model Structure

3.2. Projected Evaluation Indicators

4. Experimental Results and Analysis

4.1. Raw Wind Speed Series

4.2. Wind Speed Decomposition

4.3. Validation of Wind Speed Prediction Algorithms

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI