Next Article in Journal
Adaptive Online Extraction Method of Slot Harmonics for Multiphase Induction Motor
Next Article in Special Issue
Advances in Urban Power Distribution System
Previous Article in Journal
Prediction of Voltage Sag Relative Location with Data-Driven Algorithms in Distribution Grid
Previous Article in Special Issue
A Game-Theoretic Approach of Optimized Operation of AC/DC Hybrid Microgrid Clusters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Wind Power Generation Forecast Based on Multi-Step Informer Network

Guangxi Key Laboratory of Power System Optimization and Energy Technology, Guangxi University, Nanning 530004, China
*
Author to whom correspondence should be addressed.
Energies 2022, 15(18), 6642; https://doi.org/10.3390/en15186642
Submission received: 19 July 2022 / Revised: 22 August 2022 / Accepted: 7 September 2022 / Published: 11 September 2022
(This article belongs to the Special Issue Advances in Urban Power Distribution System)

Abstract

:
Accurate forecast results of medium and long-term wind power quantity can provide an important basis for power distribution plans, energy storage allocation plans and medium and long-term power generation plans after wind power integration. However, there are still some problems such as low forecast accuracy and a low degree of integration for wind power physical processes. In this study, the Multi-step Informer network is proposed to add meteorological parameters to wind power generation forecast and make network interpretable. The Multi-step Informer network uses Informer to obtain the initial training model according to the historical data of wind power generation, introduces the Informer model of wind speed and air pressure training involved in the dynamic pressure model, and compares the historical data of wind power generation to obtain model modification, so as to further improve the forecast accuracy of Multi-step Informer network. The backpropagation process of the pre-trained Informer should be truncated to avoid being influenced by the pre-trained Informer during training of the Multi-step Informer network, which also guarantees the interpretability of the running results of the network. The Multi-step Informer network has the advantage of error correction of wind power generation, which improves the forecast accuracy. From the calculation results of the root mean square error, Multi-step Informer network improves forecast accuracy by 29% compared to Informer network.

1. Introduction

With the development of the society, a larger amount of electricity is required. However, the burning of fossil fuels leads to environmental pollution. Currently, the world is committed to realizing zero net carbon emission by 2050, and a sustainable energy transition is required. Wind power is becoming the backbone of renewable energy systems. Furthermore, wind power specifically has characteristics including environmental friendliness, economical efficiency, and safety. Wind turbines can transform wind power into electricity for homes, schools, and businesses [1]. By using wind power, carbon emission can be reduced in many areas such as manufacturing, industries, and chemical plants. Specifically, the wind power industry has experienced a breakthrough, and the capacity reaches approximately 743 GW. However, wind power is stochastic and volatile, and it depends on weather meteorology. The accuracy of wind power generation forecasting determines its efficient application. For example, low accuracy of wind power generation forecast restricts energy dispatch in the microgrid. Meanwhile, the use of wind energy can promote energy conservation and emission reduction, and the forecast of wind power generation is an important element of the wind energy research. Therefore, accurate forecasts of wind power generation contribute to allocating energy storage, reducing financial risk in the market, aiding power system dynamics balance, and realizing net-zero carbon emissions.
The forecast of wind power generation is classified into ultra-short term (hour), short-term (day), medium-term (week), and long-term (month) based on time [2]. Ultra-short and short-term forecasts are used for load dispatch, power market operational security, and energy storage equipment management [3]. Medium-term forecasts play a crucial role in maintenance planning and commitment agreements [4]. Long-term forecasts are always required for power market optimization and wind farm maintenance design [5]. For this reason, most of the existing studies incorporate factors such as user education, income, and average age into the problem of uncertainty of demand. The existing methods of wind power forecast mainly include physical methods, statistical methods, machine learning methods, and hybrid methods. The grouping of forecast methods is shown in Table 1. The physical method is based on weather conditions, and it forecasts wind speed to predict wind power generation. In the statistical and machine learning method, short-term forecasts require historical and measured values of the wind power generation forecast, and long-term forecasts require weather and meteorological data in addition to the historical and measured values. The physical method performs better for medium-term and long-term forecasting compared to the other methods [6]. However, the speed of calculation of the physical method cannot satisfy the requirements of the grid. Statistical methods deal with time series and count the nonlinear relationship between wind speed and electricity, and they include autoregressive moving average [7], autoregressive integrated moving average [8], and Kalman filter [9]. Artificial intelligence has been introduced to forecast wind power generation with remarkable results. Furthermore, artificial intelligence methods, such as extreme learning machines [10], fuzzy inference systems [11,12], support vector machines [13], and different forms of neural networks, have received increasing attention owing to their powerful nonlinear processing capabilities. The artificial intelligence method tends to exhibit higher accuracy and less error than the physical method and statistical method in wind power generation forecast. The hybrid method circumvents the shortcomings of a single model in wind power generation forecast. The hybrid method characterizes the fluctuations of wind power generation from different perspectives. Typically, the hybrid method combines a forecasting method with a data processing method [14,15,16,17,18]. Praveena proposed a hybrid method combining Fuzzy K-means clustering and Neural Network (NN), which reduced the interval time of collecting wind data and introduced a back propagation algorithm into the NN [19]. Recently, the combination of neural networks and physical models is also an important branch of hybrid methods. Famoso combines Multilayer Perceptron Neural Network and Wake Physical Model and a new forecasting method for wind power generation in wind power plant is proposed [20]. A common hybrid method is the decomposition-based hybrid method: the time series of wind speed and air pressure are divided into subseries, and models of the smooth subseries are generated to improve accuracy [21]. The hybrid method considers multiple time series of wind speed and wind power.
The strong learning capability of artificial neural networks are capable of handling wind features with dynamic, nonlinear, and complex time series [22]. Various neural networks have been reported for wind power generation forecasts [23] and include deep belief networks [24], convolutional networks [25], deep learning networks [26], and long short term memory (LSTM). Additionally, optimization-seeking algorithms are applied to find suitable parameters for artificial intelligence algorithms. The window size and number of neurons in the LSTM layers were optimized and a new genetic LSTM was proposed [27]. A hybrid machine learning algorithm was developed to predict electricity prices more accurately by combining an adaptive neuro-fuzzy inference system with a backtracking search algorithm in the learning process [28]. The combination of particle swarm optimization and Adabusto algorithm addresses the weak generalization of wind power generation forecast [29].
Table 1. Grouping of prediction methods.
Table 1. Grouping of prediction methods.
Methods ClassificationReferencesTraining AlgorithmData TypeAccuracy
Physical methodsJ. Hu et al., 2020 [6]MCEEMDAN-GOA-QRNN-IIDataA: 1 h
DataB: 2 h
DataA:RMSE = 0.8065
MSE = 0.6505
DataB:RMSE = 1.1079
MSE = 1.2274
Statistical methodsQ. Han et al., 2017 [7]ARMA, NP, AI/ML,
HAN, HNA et al.,
1 hMREMAX = 0.1396
RMSEMAX = 0.1367
F. Zhang et al., 2021 [8]ARDAData I: 400 s
Data II: 1000 s
Data I: RMSE = 8.11–14.9 (KW)
Data II: RMSE = 93.32–208.19 (KW)
Z. Zheng et al., 2019 [9]A Kalman filter-based bottom-up approach24 hSMAPE = 0.151
Artificial intelligenceC. Yildiz et al., 2020 [23]An improved residual-based deep
Convolutional Neural Network (CNN)
1 yearRMSE = 0.0247–0.1362
Z. Lin et al., 2020 [24]Based on high-frequency SCADA data
and deep learning neural network
1 monthRMSE = 545.28 (KW)
F. Shahid et al., 2021 [25]GLSTM12 hMSE = 0.00924
MAE = 0.07271
MSE = 0.09615
G. An et al., 2021 [27]PSO-ELM1 yearMSE = 4549 (KW)
RMSE = 67.4460 (KW)
Z. Sun et al., 2020 [28]Based on VMD Decomposition,
ConvLSTM Networks and Error Analysis
7 monthsRMSE15min = 1210.05 (KW)
RMSE20min = 1889.2 (KW)
RMSE30min = 2345.89 (KW)
Recurrent neural networks have been applied with gradient disappearance; however, they do not satisfactorily handle long sequence time-series forecast. The LSTM belongs to recurrent networks, which adds extra long-time memory to remember past information based on the recurrent network. The LSTM solves the gradient disappearance problem of the recurrent network to an extent, and it is widely applied in wind power generation forecast. The LSTM is particularly sensitive to time series. Behera designed wind power generation forecast intervals with LSTM and optimized the model parameters [30]. Shabbir uses a combination of Recurrent Neural Networks (RNN) and LSTM to solve the problem of short-term wind power generation forecast in Estonia for the first time [31]. Wu extracted spatially and temporally correlated vectors with convolutional neural networks as input vectors for the LSTM, which was reconstructed over time [32]. The temporal feature is extracted with convolutional operations in the LSTM models, and the results showed that convolutional LSTM has high forecast accuracy [33]. Bidirectional Long Short-Term Memory (BidLSTM) was derived from LSTM, and it runs two hidden layers side-by-side. One layer remembers the past information and the other layer remembers the future information. Zhen applied BidLSTM to short-term wind power generation forecast, and by comparing the forecast results with those of neural networks, such as CNN, LSTM-CNN, and BidLSTM, it was shown that BidLSTM achieved the highest accuracy and had a better forecast potential [34]. Recently, Convolution Long Short-Term Memory (ConvLSTM) has been gradually applied to wind power generation forecast as a variant of LSTM [35], and it uses convolution instead of matrix multiplication to capture spatial features from data. The researchers designed ConvLSTM network to obtain a smoother forecast sequence for each sequence [36]. Hybrid models based on advanced LSTM models exhibit significant potential. LSTM was optimized using crow search algorithm and wavelet transform, and the results were better than those obtained via a single LSTM model [37]. The authors improved the LSTM based on variational modal decomposition and singular spectrum analysis [38] and indicate that the improved model has better robustness in extracting trend information by comparing eight models.
In the algorithm proposed in the aforementioned study, the following aspects can be improved:
1.
The LSTM has the advantages of sequential learning rather than parallel learning. Hence, there is still room for further improvement in learning speed.
2.
The possibility of improving the accuracy of the wind power generation forecast with the continuous proposal of the new neural networks.
3.
There are fewer studies that integrate physical processes into wind power generation forecast.
Subsequently, a transformer network architecture entirely based on the attention was proposed. The transformer network combines the encoder and decoder with the attention to reduce training time [39]. Transformer networks are used in a large number of applications in various fields including wind power generation forecast [40]. However, transformer network was not able to solve the large memory usage of long time input and output sequences, or the computation time quadratic complexity. An Informer network that improved the sparsity of self-attention was proposed [41]. It utilized a probspare self-attention instead of a normative self-attention and performs a self-attention distilling operation to reduce the computation time quadratic complexity. The Informer network shortens the length of the inputs sequence at each layer and accepts a large number of long time inputs sequences, which reduces the size of the long time inputs sequence.
To solve the aforementioned problems, a Multi-step Informer network (MSIN) is proposed in this study. The physical processes of wind power generation are considered when forecasting wind power generation. A dynamic pressure model is introduced, which consists of physical quantities of wind pressure and wind power density. The air pressure and wind speed associated with wind pressure and wind power density are explored. Therefore, wind power generation is described as a physical quantity related to air pressure and wind speed. The MSIN obtains the initial training model based on the historical data of forecast, and trains the model using the wind speed and air pressure via the Informer network. The wind speed and air pressure are described by combining the physical processes of wind power generation and dynamic pressure model. The MSIN corrects the initial wind power generation forecast by inputting the results of the wind speed and air pressure. The main contributions of this study are as follows:
1.
The MSIN improves forecast accuracy by 29% compared to Informer network. The potential forecast relationship between input sequence and output sequence in wind power long time series is verified.
2.
The MSIN has the ability of parallel learning and accelerates learning speed further.
3.
The forecast results are corrected by the wind speed and air pressure that are described by combining the wind power generation physical processes and dynamic pressure model. The wind speed and air pressure related to the dynamic pressure model can improve the prediction accuracy of wind power generation.
It is important to understand the reasons behind the forecast to ensure safety and satisfy the series of requirements [42]. Hence, interpretable machine learning is favored. Interpretable machine learning ensures that some aspects are secured because of interpretation instead of ensuring that an explicit goal is achieved. Major decisions and troubleshooting in engineering applications can partly rely on interpretable machine learning [43]. Once interpretable machine learning is added in particular examples, it aids engineers in finding solutions for existing or long-standing problems.

2. Dynamic Pressure Model

Wind is a natural phenomenon that occurs due to the flow of air, which is caused by temperature differences induced by solar radiation. Wind refers to the horizontal component of air movement, direction, and magnitude. Wind power generation is the process in which wind turbines convert wind power into mechanical power, which is subsequently converted into electrical power.
The wind speed and air pressure are deeply related to wind pressure and wind power density, which are explored in this paper. The magnitude of wind pressure is closely related to wind speed, and a strong correlation between wind power density with wind speed and air pressure is demonstrated. It is feasible to use the dynamic pressure model to correct the wind power generation forecast error, and it is feasible to use the wind speed and air pressure to improve the wind power generation prediction accuracy.

2.1. Dynamic Pressure Model-Wind Pressure

Wind pressure is perpendicular to the plane of the direction of airflow and it can be obtained from Bernoulli’s equation as follows:
W P = 1 2 ρ V s 2
where W P denotes the wind pressure, ρ denotes the air density, and V s denotes the wind speed. The relationship between air density ρ and air weight rate r can be expressed as r = ρ g . The wind pressure can be further expressed as (2) given that ρ = r r g g .
W P = 1 2 r V s 2 g
Equation (2) gives the standard wind pressure. In the standard state (air pressure of 1013.25 and temperature of 15 C), r = 0.01225 kN/m3 and g = 9.8 m/s2, (2) it can be expressed as follows:
W P = V s 2 1600
Equation (3) is a general equation for wind pressure in terms of wind speed. It should be noted that air gravity and the acceleration due to gravitational force vary with the latitude and altitude. Typically, the air density is smaller on the plateau than on the plain, which implies that the wind pressure is smaller on the plateau than on the plain at the same wind speed and same temperature.
Equations (1)–(3) shows that wind pressure is proportional to wind speed, and wind pressure increasing with the increasing of wind speed. It is desirable to correct the wind power generation forecast using the forecast result of wind speed to improve the forecast accuracy.

2.2. Dynamic Pressure Model-Wind Power Density

The wind power generation forecast curves are generally provided by wind motor manufacturers without considering the wind power generation process. Wind power density is an important parameter to describe wind power generation. It is the ability of wind turbines to convert wind power into electrical power and is introduced as one aspect of the dynamic pressure model in this paper.
(1) Air pressure
The air density is introduced firstly. Specifically, ρ is derived by extrapolation of the ideal gas law as follows [44]:
ρ = P G T W
Air density is expressed in terms of the pressure (P), temperature ( T W ), and atmospheric gas constant ( G ) at the height of the hub. Dry air only is considered, to reduce the complexity and facilitate the calculation in this paper. The atmospheric gas constant is 287.058 J/kg K. Air density and atmospheric pressure exhibit a strong correlation. In this case, the temperature is one of the variables, and air density is inversely proportional to temperature. However, the altitude at the hub can be determined to exclude altitude-induced temperature changes in this paper.
(2) Wind power density
Wind power density is the most convenient and valuable quantity for wind power generation. It is defined as the mean annual power available per square meter of swept area of a turbine. Wind power density is a quantitative measure of wind energy available at any location. The output force of the wind turbine is modeled as a function [45,46]. The equation consists of the rotor blade area, air density, and wind power as follows:
P W = 1 2 ρ S V s 3 C P , max
The air density is referred to in (4) and substituted into (5) to obtain (6) as follows:
P W = 1 2 P S V s 3 C P , max G T W
where V s denotes the wind speed, which is an uncertain quantity. The rotor area (S) is a property of the wind turbine. The Betz power coefficient ( C P , max ) is the efficiency of a wind turbine in converting wind power into mechanical power, and the paper takes the value 16/27. The wind power density is proportional to the third power of the wind speed, and is proportional to the first power of the air pressure.
Equations (4)–(6) prove that wind pressure and wind power density are strongly correlated with wind speed and air pressure in wind power generation by dynamic pressure model. It is feasible that the uncertain quantity that combines the dynamic pressure model is added to the input of the network to correct the forecast results via multiple steps. Therefore, it has great significance to consider the wind physical characteristics in the forecast of wind power generation and use the dynamic pressure model to modify the forecast results.

3. Multi-Step Informer Network

Long sequence time-series forecast has been utilized in a variety of fields, including financial stock market forecast, traffic flow detection, multi-smart grid management, and disease transmission analysis. However, the long time input sequence and the long time output sequence are difficult to deal with, which makes the long-time forecast poorer than short-term, the increasing of the long time series becomes the obstacle of Long sequence time-series forecast.
The transformer network proposed in 2017 minimizes the propagation path and avoids the recursive structure in the recurrent network [41]. It gains the ability to capture long-time dependencies and receive better results in long sequence time-series forecasts. However, the transformer network speed is slow, the time complexity is large, and the processing time is long. A new method for long sequence time-series forecast is proposed: Informer network [43]. It handles long time input sequences and long time output sequences intensively and realizes highly desirable results.

3.1. Informer Network

The Informer network is composed of an encoder and a decoder. The encoder handles long-time input sequences and reduces the overall time complexity via probspare self-attention. The distilling operation is added to the self-attention to reduce the time dimension of the long-time input sequences. The generative decoder solves the problem of the decrease in speed of the forecast. The results in the generative decoder are generated in one step as opposed to in a step-by-step manner. The overall structure is shown in Figure 1.
(1) Probspare self-attention
The normative self-attention consists of query vectors, key vectors, and value vectors. The dot product of the query vectors and key vectors was calculated, and the dot product was divided by d (denoting the length of the vector k i ). The result was performed by the normalization operation to obtain a normalized matrix. The normalized matrix is multiplied with the value vectors to determine the weight of each dot product to obtain the output matrix.
The query vectors (Q), key vectors (K), and value vectors (V) are composed of q i , k i , and v i , respectively; the input sequence f ( x ) is mapped to a i ; and q i , k i , and v i are equal to a i multiplied with W q , W k , and W v , respectively. Furthermore, multi-head self-attention is proposed in the transformer network. In the process of multi-head self-attention, q i , k i , and v i are mapped into h parts according to the number of heads. Then the combination results of Q i , K i ,and V i are obtained for each head. The normalized results are obtained for each header via self-attention as follows:
Atten ( Q i , K i , V i ) = softmax ( Q i K i T d ) V i
Each header is stitched together, and the final output matrix is obtained via fusion.
M H S A = C o n c a t ( h e a d 1 h e a d h )
The ordinary self-attention requires the memory of, and a computational cost is required for the quadratic dot product in the transformer network. The probspare self-attention is used in the Informer network to replace traditional self-attention. In a previous study, it was demonstrated that the probspare self-attention is sparse distribution and obeys a long-tailed distribution [43]. In the probspare self-attention, only a few dot products contribute to the main attention, while others can be ignored. There are active queries and lazy queries in the query vectors. The dot product of the active queries should be calculated. The dot product of the lazy queries is replaced by the average of the value vectors, and this in turn leads to a reduction in the computational task. The new one is composed of active queries, and the probspare self-attention is as follows:
Atten ( Q i , K i , V i ) = softmax ( Q i K i T d ) V i
where Q i denotes the distance that is calculated by the KL scatter between the attention probability distribution and uniform distribution, and the active queries with the largest distance Top u are selected. According to the sampling factor c, u = c ln L Q is defined. Equation (10) reduces the memory to O L 2 when the dot product is calculated in probspare self-attention. However, the time complexity still corresponds to O L 2 . The distance based on the KL scatter is an approximation. The dot product L Q = L K = L should be selected for distance comparison, and the rest are zero under the long-tailed distribution. Typically, L Q = L K = L , and the time complexity of the self-attention becomes O L ln L .
(2) Encoder stack
Redundant combinations of value vectors correspond to the results of the self-attention of the Informer network. The long-time input sequences are handled by the distilling operation, which is inspired by dilated convolution to reduce the input time dimension. The sequences with dominant features are prioritized and a self-attention feature mapping is generated at the next level. The forward process of distilling from i to i + 1 is generated as follows:
F i + 1 = Maxpool E L U C o n v 1 d F i P S A
where F i P S A performs the basic operation of the Multi-head probspare self-attention. The input sequences are halved in each layer by convolution and maximum pooling such that the feature layer is reduced step by step to realize the reduction in input sequences. The memory usage is reduced to O 2 ε L log L .
The encoder stack includes multiple coding layers and convolutional layers, and the sub-layers connections are sequential. Figure 2 shows the main stack.
The second stack is half of the main stack, and the coding and convolution layers are each reduced by one layer to enhance the robustness of the distilling operation. Furthermore, each stack has the same output dimension. Finally, the results of all stacks are stitched together to obtain the final hidden output of the encoder.
(3) Decoder
The standard decoder was added in the structure. It consists of two multi-head self-attention levels, wherein the first level applies the probspare self-attention and the second level applies the normative attention. The input of the decoder consists of the outputs of the encoder and input sequences after embedded projection. The input sequences are divided into two parts as follows:
X fdec = concat X token , X phol R L token + L y × d model
where X fdec denotes the input sequence to the decoder; X token denotes the start flag and X phol denotes the target placeholder. The timestamps are padded with zeros to maintain the dimensionality consistent at the time of input in the predicted sequence. The masked multi-head self-attention applies to self-attention to mask future information. Each position pays attention to the present information to avoid self-regression.

3.2. Multi-Step Informer Network

Multi-step Informer network is proposed in this study to incorporate meteorological parameters into the forecast of wind power generation and to make the network interpretable. As shown in Figure 3, in MSIN, wind power generation, wind speed and air pressure are individually predicted by the Informer network firstly. Then, the forecast results of wind power generation will be modified by the forecast results of wind speed and air pressure through the convolution layer. A multi-step process should be built for MSIN to interpret the network properly. Firstly, wind speed and air pressure are pre-trained by the Informer network. Then, the pre-trained wind speed and air pressure informer are applied to the MSIN, and the MSIN is trained with historical wind speed, air pressure, and wind power generation data. The back-propagation process of the pre-trained informer should be truncated to avoid the impact on the pre-trained informer when training MSIN and guarantee the interpretability of the operating results of the network. MSIN is shown in Figure 3.

4. Case Study

In this section, the performance of the MSIN is examined. DNN as a common method for forecasting wind power generation is used as a comparison. LSTM is often used for long sequence time-series forecast and long time wind power generation forecast is proposed in this paper. LSTM is also used as a comparison. Informer as the object being improved is also used as a comparison. In this study, the Sotavento wind farm is chosen as a validation object. The Sotavento wind farm is located in Galicia, Spain. It consists of 24 wind turbines with an installed capacity of 17.56 MW. Furthermore, the local historical weather of Sotavento can be used for verification. Wind power generation and historical weather data for the Sotavento wind farm is available in (http://www.sotaventogalicia.com/en/situation (accessed on 7 December 2021) and (http://www.meteogalicia.gal/web/modelos/threddsIndex.action (accessed on 7 December 2021), respectively. The missing data can be complemented by the linear interpolation, and thereby, time scale correspondence can be obtained in the aforementioned data. The experiments were conducted using Torch and Python 3 on a system with an Intel(R) Core(TM) i9-11900H CPU, 32 GB of RAM, and an NVIDIA RTX3070 graphics card with 8 GB of video memory.
One day in each of the four quarters is selected to ensure that the experimental results are representative of the entire year. The four days are 8 January 2021, 8 April 2021, 12 July 2021, and 29 October 2021. Figure 4, Figure 5, Figure 6 and Figure 7 takes the time to 24 h and the unit of the time is 10 min. Meanwhile, DNN, LSTM, and Informer network are added as comparison algorithms to verify the performance of MSIN.

4.1. Hyperparameter Regulation

The MSIN uses the grid search method when performing hyper-parameter tuning. For Informer, the number of encoder layers is chosen among {6, 3, 4, 2} and the decoder layer is set as 2. The dimension of the multi-head attention’s output note is 512, and the head number of multi-head attention is chosen among {8,10,16}. Informer in MSIN has the same structure, containing a three-layer stack in the decoder and a two-layer stack in the two-layer decoder. The MSIN is optimized with the Adam optimizer, whose learning rate starts from 1 × 10 5 . The paper sets the total number of epochs to 10 and decays to 10 times smaller every two epochs. In the experiment, the start token of the decoder is after the input sequence truncation of the encoder, so the length of the input of the encoder is greater than the length of the start token of the decoder.

4.2. Analysis of Forecast Results

Figure 4, Figure 5, Figure 6 and Figure 7 show that the forecast values always fluctuated around the actual values, and the accuracy is high at the beginning of the phase in forecasting and decreases at the end. However, DNN does not have good forecast accuracy at the beginning of the phase in forecasting. It happens to achieve an equal value in a few cases. Among the compared algorithms, DNN exhibited the weakest ability to track the trend of power generation because DNN has no memory unit. Figure 5 shows that the error of the DNN can be up to 1000 KWh in wind power generation forecast, and these large errors may incorrectly tell the energy supplier to use other alternative energy sources instead of wind power. In addition, it may increase the cost of suppliers and customers, and the abandonment of wind power is serious. The LSTM performs well for short-term forecast of wind power generation, and Figure 5 and Figure 6 show high training ability at the beginning. However, after 720 min, the LSTM accuracy decreases significantly and is accompanied by the forecast delay characteristic, and this pattern is most evident in Figure 7. The LSTM can only perform sequential learning that leads to a gradual accumulation of errors. The forecast values of LSTM are desirable from 0 to 720 min, however, the forecast values are generally not used. after 1100 min. The ability of tracking trends in wind power generation is strong in the Informer network, and the Informer network maintains high forecast accuracy in long sequence time-series forecast from Figure 4 and Figure 5. Informer reduces the forecast errors to 300 KWh, it has a high forecast capability of approximately 0 to 1200 min. The MSIN further improves the forecast accuracy of the Informer network. Additionally, The MSIN further strengthens the Informer’s ability to accurately track trends with a real database. Figure 5 and Figure 6 show that the forecast junction of MSIN is around the actual value with a maximum error of 100 KWh after 1200 min, the forecast curve of MSIN meets the requirement of high forecast accuracy was compared with the forecast curve of DNN.
Root mean square error (RMSE), Mean absolute error (MAE), Mean square error (MSE) are selected as the evaluation criterion to quantify the performance of MSIN. The RMSE, MAE, MSE are calculated as follows:
RMSE = 1 n i = 1 n y y i 2
M A E = 1 n i = 1 n y y i
M S E = 1 n i = 1 n y y i 2
where y y i denotes the difference between the forecast value and true value; where n indicates the total number of forecast values.
The calculation results are shown in Table 2, Table 3 and Table 4.
The calculation results of Table 2 can be visually represented by Figure 8.
From the calculation results of RMSE (Figure 8), it can be observed that the Multi-step Informer network improves forecast accuracy by 29 % comparing Informer network. From the comparison of the tables, the results in Table 3 and Table 4 are consistent with the expected prediction accuracy of each network. The DNN performs the worst in forecast because it suffers from gradient explosion and gradient disappearance problems when the number of layers increases, which seriously affects the forecast accuracy. The LSTM performs second best in forecast results because it tries to avoid the hazards caused by gradient explosion and gradient disappearance, but the LSTM can only perform sequential learning and the error results tend to accumulate. The Informer network implements parallel learning, which is not influenced by the accumulation of previous forecast errors and has a large improvement in forecast accuracy. The MSIN introduces the dynamic pressure model based on Informer network and corrects the forecast error using wind speed and air pressure, which performs best in Table 2, Table 3 and Table 4 Multi-step Informer network exhibits higher forecast accuracy than other networks.
Table 2, Table 3 and Table 4 show the the result of the MAE, RMSE and MSE for the four seasons in MSIN. According to the results in Table 2, Table 3 and Table 4 and Figure 8, it is concluded that the forecast is worst in winter and best in summer among the four seasons. Similarly, the forecast curve in summer has the highest fit and the lowest error in Figure 5; the curve in winter has the lowest fit and the highest error in Figure 8. In winter by the influence of cold air, the temperature difference between morning and evening results a large temperature gradient. Meanwhile, the air cooling contraction leads to a large change in air pressure, so the correction of the dynamic pressure model is not effective. In summer, the temperature difference is small, the air conditions are relatively stable. The correction ability of the dynamic pressure model is better, so the errors of the forecast results are small. In general, the forecast results of MSIN are realistic and have the ability to forecast with high accuracy.

5. Discussion

As mentioned above, with the wide application of machine learning, the research on its interpretability is gradually deepening. There is no way to explain the scope of machine learning. Therefore, the underlying purpose of this paper is to find the interpretable machine learning. Wind power generation forecast process is no longer a process without control and the results are not only rely on the single data of wind power generation.
The multi-step embodiment of the MSIN proposed in this study requires pre-training of the wind speed and air pressure by the Informer network. This is performed mainly to obtain the interpretable network. The wind power generation forecast is an engineering application, and the process of wind power generation forecast by networks are understood or partially understood. This in turn leads to safety and ethical implications [47]. Hence, the non-pre-trained MSIN is also tested in this study to ensure that it has the ability to predict. However, the network hyperparameters should be continuously adjusted to obtain accurate forecast. As the training process continued, intermediate data showed that the forecast of wind speed and air pressure was no longer performed in a predetermined manner by the Informer network. Hence, the false results raise concerns about interference attacks.

6. Conclusions

In this paper, the MSIN is proposed to forecast wind power generation for the Sotavento wind farm in Galicia, Spain. It takes advantage of Informer to make the forecast results better. The advantages of MSIN obtained via theoretical analysis and arithmetic verification are as follows:
1. Multi-step Informer network improves forecast accuracy by 29% when compared to Informer network. It exhibits high accuracy ability of wind power generation forecast.
2. A dynamic pressure model is introduces in MSIN to modify wind power generation forecast such that the forecast of wind power generation incorporates highly correlated physical characteristics.
3. The multi-step process is built in MSIN, which makes MSIN have the process interpretability and is beneficial to the anti-risk ability and security of the network.
The paper compares the recent learning methods (DNN, LSTM, Infomer) with MSIN, where the effectiveness of MSIN is demonstrated. This test demonstrates that MSIN is significantly different from Informer, which enhances the innovative nature of the model.
However, wind power generation forecast is still a challenge, and how to reduce the training time with high prediction accuracy is the focus of further research. The limitations and future research of this paper are reflected in the following three areas:
1. The limitation of the model proposed in this paper is that only the effects of wind speed and air pressure on wind power generation forecast are considered. In addition, the surface temperature, relative humidity and other meteorological factors can not be ignored. The coupling factor between multiple wind turbines also needs to be investigated.
2. The interpretable machine learning proposed in this paper is only reflected in the establishment of the multi-step process of MSIN and explain why the prediction accuracy was improved in MSIN. However, there is a lack of depth study on the internal structure of interpretable machine learning.
3. The methods for wind power generation forecast still need to be innovated, and the accuracy of results can be improved. The non-trivial correlations of meteorological variables are considered without relying on single historical data in forecasting is also the forward way.

Author Contributions

Conceptualization, A.J. and X.H.; methodology, X.H.; software, X.H.; validation, A.J., X.H.; formal analysis, X.H.; investigation, X.H.; resources, X.H.; data curation, X.H.; writing—original draft preparation, X.H.; writing—review and editing, X.H.; visualization, X.H.; supervision, A.J.; project administration, A.J.; funding acquisition, A.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:
AttenAttention
BidLSTMBidirectional Long Short-Term Memory
ConvLSTMConvolution Long Short-Term Memory
LSTMLong Short Term Memory
MSINMulti-step Informer network
MAEMean absoluter error
MSEMean square error
MHSAMmulti-head self-attention
RMSERoot mean square error
a i map of input sequence
C P , max Betz power coefficient
csampling factor
drepresent the length of k i
[ F i ] P S A the basic operation of the Multi-head probspare self-attention
G atmospheric gas constant
gGravitational acceleration
hthe head number of multi-head attention
k i a single of key vectors
Kkey vectors
L Q length of query vectors
L K length of key vectors
Ppressure
P W wind power
ρ air density
Qquery vectors
Q a sparse matrix of the same size of q i
q i a single of query vectors
rstandard state
Srotor area
T W temperature
V s wind speed
Vvalue vectors
v i a single of value vectors
W P wind pressure
W q matrix of q
W k matrix of k
W v matrix of v
X token start flag
X placeholder target placeholder

References

  1. Shahid, F.; Zameer, A.; Mehmood, A.; Raja, M.A.Z. A novel wavenets long short term memory paradigm for wind power prediction. Appl. Energy 2020, 269, 115098. [Google Scholar] [CrossRef]
  2. Gazafroudi, A.S. Assessing the Impact of Load and Renewable Energies’ Uncertainty on a Hybrid System. Int. J. Energy Power Eng. 2016, 5, 1. [Google Scholar] [CrossRef]
  3. Jiang, P.; Wang, Y.; Wang, J. Short-term wind speed forecasting using a hybrid model. Energy 2017, 119, 561–577. [Google Scholar] [CrossRef]
  4. Karakuş, O.; Kuruoğlu, E.E.; Altınkaya, M.A. Altinkaya, One-day ahead wind speed/power prediction based on polynomial autoregressive model. IET Renew. Power Gener. Res. 2017, 11, 1430–1439. [Google Scholar] [CrossRef]
  5. Barbounis, T.G.; Theocharis, J.B.; Alexiadis, M.C.; Dokopoulos, P.S. Dokopoulos, Long-term wind speed and power forecasting using local recurrent neural network models. IEEE Trans. Energy Convers. 2006, 21, 273–284. [Google Scholar] [CrossRef]
  6. Hu, J.; Heng, J.; Wen, J.; Zhao, W. Deterministic and probabilistic wind speed forecasting with de-noising-reconstruction strategy and quantile regression based algorithm. Renew. Energy 2020, 162, 1208–1226. [Google Scholar] [CrossRef]
  7. Han, Q.; Meng, F.; Hu, T.; Chu, F. Non-parametric hybrid models for wind speed forecasting. Energy Convers. Manag. 2017, 148, 554–568. [Google Scholar] [CrossRef]
  8. Zhang, F.; Li, P.C.; Gao, L.; Liu, Y.Q.; Ren, X.Y. Application of autoregressive dynamic adaptive (ARDA) model in real-time wind power forecasting. Renew. Energy 2021, 169, 129–143. [Google Scholar] [CrossRef]
  9. Zheng, Z.; Chen, H.; Luo, X. A Kalman filter-based bottom-up approach for household short-term load forecast. Appl. Energy 2019, 250, 882–894. [Google Scholar] [CrossRef]
  10. Rayi, V.K.; Mishra, S.P.; Naik, J.; Dash, P.K. Adaptive VMD based optimized deep learning mixed kernel ELM autoencoder for single and multistep wind power forecasting. Energy 2022, 224, 122585. [Google Scholar] [CrossRef]
  11. Sharifian, A.; Ghadi, M.J.; Ghavidel, S.; Li, L.; Zhang, J. A new method based on Type-2 fuzzy neural network for accurate wind power forecasting under uncertain data. Renew. Energy 2018, 120, 220–230. [Google Scholar] [CrossRef]
  12. Li, L.; Yin, X.L.; Jia, X.C.; Sobhani, B. Day ahead powerful probabilistic wind power forecast using combined intelligent structure and fuzzy clustering algorithm. Energy 2020, 192, 116498. [Google Scholar] [CrossRef]
  13. Liu, M.; Cao, Z.; Zhang, J.; Wang, L.; Huang, C.; Luo, X. Short-term wind speed forecasting based on the Jaya-SVM model. Int. J. Electr. Power Energy Syst. 2020, 121, 106056. [Google Scholar] [CrossRef]
  14. Wang, D.; Luo, H.; Grunder, O.; Lin, Y. Multi-step ahead wind speed forecasting using an improved wavelet neural network combining variational mode decomposition and phase space reconstruction. Renew. Energy 2017, 113, 1345–1358. [Google Scholar] [CrossRef]
  15. Wang, S.; Zhang, N.; Wu, L.; Wang, Y. Wind speed forecasting based on the hybrid ensemble empirical mode decomposition and GA-BP neural network method. Renew. Energy 2016, 94, 629–636. [Google Scholar] [CrossRef]
  16. Zhang, C.; Wei, H.; Zhao, J.; Liu, T.; Zhu, T.; Zhang, K. Short-term wind speed forecasting using empirical mode decomposition and feature selection. Renew. Energy 2016, 99, 727–737. [Google Scholar] [CrossRef]
  17. Du, P.; Wang, J.; Guo, Z.; Yang, W. Research and application of a novel hybrid forecasting system based on multi-objective optimization for wind speed forecasting. Energy Convers. Manag. 2017, 27, 90–107. [Google Scholar] [CrossRef]
  18. Deo, R.C.; Ghorbani, M.A.; Samadianfard, S.; Maraseni, T.; Bilgili, M.; Biazar, M. Multi-layer perceptron hybrid model integrated with the firefly optimizer algorithm for windspeed prediction of target site using a limited set of neighboring reference station data. Renew. Energy 2018, 116, 309–323. [Google Scholar] [CrossRef]
  19. Praveena, R.; Dhanalakshmi, K. Wind Power Forecasting in Short-Term using Fuzzy K-Means Clustering and Neural Network. In Proceedings of the of I2C2SW IEEE International Conference on Intelligent Computing and Communication for Smart World, Erode, India, 14–15 December 2018; pp. 336–339. [Google Scholar]
  20. Famoso, F.; Brusca, S.; Galvagno, A.; Messina, M.; Lanzafame, R. On the wake effect in wind farm power forecasting: A new data-driven approach. E3S Web Conf. 2020, 197, 08016. [Google Scholar] [CrossRef]
  21. Qian, Z.; Pei, Y.; Zareipour, H.; Chen, N. A review and discussion of decomposition-based hybrid models for wind energy forecasting applications. Appl. Energy 2018, 235, 939–953. [Google Scholar] [CrossRef]
  22. Zhang, Y.; Li, R.; Zhang, J. Optimization scheme of wind energy prediction based on artificial intelligence. Environ. Sci. Pollut. Res 2021, 28, 39966–39981. [Google Scholar] [CrossRef] [PubMed]
  23. Wang, Y.; Zou, R.; Liu, F.; Zhang, L.; Liu, Q. A review of wind speed and wind power forecasting with deep neural networks. Appl. Energy 2021, 304, 117766. [Google Scholar] [CrossRef]
  24. Wang, H.Z.; Wang, G.B.; Li, G.Q.; Peng, J.C.; Liu, Y.T. Deep belief network based deterministic and probabilistic wind speed forecasting approach. Appl. Energy 2016, 182, 80–93. [Google Scholar] [CrossRef]
  25. Yildiz, C.; Acikgoz, H.; Korkmaz, D.; Budak, U. An improved residual-based convolutional neural network for very short-term wind power forecasting. Energy Convers. Manag. 2021, 228, 113731. [Google Scholar] [CrossRef]
  26. Lin, Z.; Liu, X. Wind power forecasting of an offshore wind turbine based on high-frequency SCADA data and deep learning neural network. Energy 2020, 201, 117693. [Google Scholar] [CrossRef]
  27. Shahid, F.; Zameer, A.; Muneeb, M. A novel genetic LSTM model for wind power forecast. Energy 2021, 223, 120069. [Google Scholar] [CrossRef]
  28. Pourdaryaei, A.; Mokhlis, H.; Illias, H.A.; Kaboli, S.H.A.; Ahmad, S. Short-Term Electricity Price Forecasting via Hybrid Backtracking Search Algorithm and ANFIS Approach. IEEE Access 2019, 7, 77674–77691. [Google Scholar] [CrossRef]
  29. An, G.; Jiang, Z.; Cao, X.; Liang, Y.; Zhao, Y.; Li, Z. Short-Term Wind Power Prediction Based on Particle Swarm Optimization-Extreme Learning Machine Model Combined with Adaboost Algorithm. IEEE Access 2021, 9, 94040–94052. [Google Scholar] [CrossRef]
  30. Banik, A.; Behera, C.; Sarathkumar, T.V.; Goswami, A.K. Uncertain wind power forecasting using LSTM-based prediction interval. IET Renew. Power Gener. 2020, 14, 2657–2667. [Google Scholar] [CrossRef]
  31. Shabbir, N.; Kütt, L.; Jawad, M.; Husev, O.; Ur Rehman, A.; Gardezi, A.A.; Shafiq, M.; Choi, J.-G. Short-Term Wind Energy Forecasting Using Deep Learning-Based Predictive Analytics. Comput. Mater. Contin. 2022, 72, 1017–1033. [Google Scholar] [CrossRef]
  32. Wu, Q.; Guan, F.; Lv, C.; Huang, Y. Ultra-short-term multi-step wind power forecasting based on CNN-LSTM. IET Renew. Power Gener 2021, 15, 1019–1029. [Google Scholar] [CrossRef]
  33. Sun, Z.; Zhao, M. Short-Term Wind Power Forecasting Based on VMD Decomposition, ConvLSTM Networks and Error Analysis. IEEE Access 2020, 8, 134422–134434. [Google Scholar] [CrossRef]
  34. Zhen, H.; Niu, D.; Yu, M.; Wang, K.; Liang, Y.; Xu, X. A hybrid deep learning model and comparison for wind power forecasting considering temporal-spatial feature extraction. Sustainability 2020, 12, 9490. [Google Scholar] [CrossRef]
  35. Agga, A.; Abbou, A.; Labbadi, M.; El Houm, Y. Short-term self consumption PV plant power production forecasts based on hybrid CNN-LSTM, ConvLSTM models. Renew. Energy 2021, 177, 101–112. [Google Scholar] [CrossRef]
  36. Chen, G.; Li, L.; Zhang, Z.; Li, S. Short-term wind speed forecasting with principle-subordinate predictor based on Conv-LSTM and improved BPNN. IEEE Access 2020, 8, 67955–67973. [Google Scholar] [CrossRef]
  37. Memarzadeh, G.; Keynia, F. A new short-term wind speed forecasting method based on fine-tuned LSTM neural network and optimal input sets. Energy Convers. Manag. 2020, 213, 112824. [Google Scholar] [CrossRef]
  38. Liu, H.; Mi, X.; Li, Y. Smart multi-step deep learning model for wind speed forecasting based on variational mode decomposition, singular spectrum analysis, LSTM network and ELM. Energy Convers. Manag. 2018, 159, 54–64. [Google Scholar] [CrossRef]
  39. Furfari(tony), F.A. Attention Is All You Need. IEEE Ind. Appl. Manag. 2002, 8, 8–15. [Google Scholar]
  40. Fu, X.; Gao, F.; Wu, J.; Wei, X.; Duan, F. Spatiotemporal attention networks for wind power forecasting. In 2019 International Conference on Data Mining Workshops (ICDMW), Beijing, China, 8–11 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 149–154. [Google Scholar]
  41. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI 2021), Virtually, 2–9 February 2020; Volume 12B, pp. 11106–11115. [Google Scholar]
  42. Naser, M.Z. An engineer’s guide to eXplainable Artificial Intelligence and Interpretable Machine Learning: Navigating causality, forced goodness, and the false perception of inference. Autom. Constr. 2021, 129, 103821. [Google Scholar] [CrossRef]
  43. Rudin, C.; Chen, C.; Chen, Z.; Huang, H.; Semenova, L.; Zhong, C. Interpretable machine learning: Fundamental principles and 10 grand challenges. Stat. Surv. 2022, 16, 1–85. [Google Scholar] [CrossRef]
  44. Jung, C.; Schindler, D.; Laible, J. National and global wind resource assessment under six wind turbine installation scenarios. Energy Convers. Manag. 2018, 156, 403–415. [Google Scholar] [CrossRef]
  45. Zhang, H.; Yu, Y.J.; Liu, Z.Y. Study on the Maximum Entropy Principle applied to the annual wind speed probability distribution: A case study for observations of intertidal zone anemometer towers of Rudong in East China Sea. Appl. Energy 2013, 114, 931–938. [Google Scholar] [CrossRef]
  46. Shu, Z.R.; Li, Q.S.; Chan, P.W. Investigation of offshore wind energy potential in Hong Kong based on Weibull distribution function. Appl. Energy 2015, 156, 362–373. [Google Scholar] [CrossRef]
  47. Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–3 October 2018; pp. 80–89. [Google Scholar]
Figure 1. Informer model.
Figure 1. Informer model.
Energies 15 06642 g001
Figure 2. The main stack encoder.
Figure 2. The main stack encoder.
Energies 15 06642 g002
Figure 3. Multi-Step Informer Network.
Figure 3. Multi-Step Informer Network.
Energies 15 06642 g003
Figure 4. Wind power generation forecast on 8 January 2021.
Figure 4. Wind power generation forecast on 8 January 2021.
Energies 15 06642 g004
Figure 5. Wind power generation forecast on 8 April 2021.
Figure 5. Wind power generation forecast on 8 April 2021.
Energies 15 06642 g005
Figure 6. Wind power generation forecast on 12 July 2021.
Figure 6. Wind power generation forecast on 12 July 2021.
Energies 15 06642 g006
Figure 7. Wind power generation forecast on 29 October 2021.
Figure 7. Wind power generation forecast on 29 October 2021.
Energies 15 06642 g007
Figure 8. The forecast results of different networks.
Figure 8. The forecast results of different networks.
Energies 15 06642 g008
Table 2. Comparison of RMSE results for different networks.
Table 2. Comparison of RMSE results for different networks.
SeasonDNNLSTMInformerMSIN
Winter836.0671043.567602.245459.397
Spring936.623478.078288.463288.022
Summer667.646482.656291.008182.564
Autumn1041.964908.245565.065301.674
Average870.645727.754436.576307.553
Table 3. Comparison of MAE results for different networks.
Table 3. Comparison of MAE results for different networks.
SeasonDNNLSTMInformerMSIN
Winter215.85493.32677.97986.125
Spring106.542162.97978.47253.361
Summer145.292106.558.937516.869
Autumn315.569185.27131.0130.43
Average195.814137.01961.60041.701
Table 4. Comparison of MSE results for different networks.
Table 4. Comparison of MSE results for different networks.
SeasonDNNLSTMInformerMSIN
Winter699,736.4101,087,969.965362,914.479211,170.611
Spring877,188.375228,561.96583,442.90335,347.708
Summer481,942232,731.29285,198.67433,485.438
Autumn1,085,352.875826,097.854319,239.757161,493.194
Average786,054.915593,840.269212,698.953110,374.238
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huang, X.; Jiang, A. Wind Power Generation Forecast Based on Multi-Step Informer Network. Energies 2022, 15, 6642. https://doi.org/10.3390/en15186642

AMA Style

Huang X, Jiang A. Wind Power Generation Forecast Based on Multi-Step Informer Network. Energies. 2022; 15(18):6642. https://doi.org/10.3390/en15186642

Chicago/Turabian Style

Huang, Xiaohan, and Aihua Jiang. 2022. "Wind Power Generation Forecast Based on Multi-Step Informer Network" Energies 15, no. 18: 6642. https://doi.org/10.3390/en15186642

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop