1. Introduction
The sharp decline in storage of fossil energy sources has resulted in a new energy revolution all around the world. It has become a new consensus that renewable energies need to be embraced speedily. Wind power is used in large-scale as a new source that is pollution-free and low-carbon in the field of electricity generation with renewable energies. The global capacity of wind fans has reached 837 GW by the end of 2021 and wind power is transforming from a standby power to a main power source [
1]. Wind power is an intermittent power source and its volatility and intermittency adversely affect the safe and stable operation of power systems. The accurate prediction of wind power is the basis for power systems to formulate a reasonable dispatch plan, which is conducive to improving the safety and economic operation of the power systems and promotes the consumption of wind power [
2].
From the spatial scale, the current research on the level of wind farms is extensive. The output power of a single wind farm can be used as a prediction target. Simple power prediction for a wind farm on its own is not sufficient for a power system to dispatch from the perspective and demand of grid dispatch. On one hand, dispatchers pay more attention to the amount of uncertain power in the entire power system when arranging the operation mode and rotating backup rather than to single wind farms. On the other hand, the increase in penetration power of wind power increases the difficulty in controlling the dispatch in real-time and exchange power of tie lines. So, the power prediction of the wind power clusters is more conducive to formulate a dispatch plan [
3].
The methods for power prediction of wind farm clusters [
4] mainly include superposition [
5], matching methods of spatial resources, and statistical scaling methods [
6]. The superposition can directly integrate the results of power prediction of all wind farm clusters involved as soon as it is applied to small wind farms that are sparsely distributed because of simplicity and economy with low prediction accuracy [
7]. The matching method of spatial resources, based on a large amount of historical data obtained by extrapolation from similar historical data [
8], firstly divides the wind farm clusters into several sub-regions according to the correlation of wind farm resources and grid topology [
9]; then it analyzes the similarity between the predicted data of numerical weather [
10] and the data of historical meteorological phenomena [
11] for every region. The dataset of historical wind speed with the highest similarity with the predicted wind speed was selected as the analysis object and a moving average model was established [
12]. Wind power is predicted based on kernel functions [
13], regression, and related methods [
14] using historical data. The total prediction power is obtained by adding the value predicted within the scope of power prediction of every region. The advantage of the matching method of spatial resource is that it does not need to select a standard wind farm [
6] because it only considers the correlation of wind speed. The correlation of wind direction and temperature of wind farms is neglected. In addition, it is highly dependent on historical data, so the predicted result may contain errors [
15]. In order to further improve the prediction accuracy of these methods, a statistical scaling method based on statistical techniques and spatiotemporal correlation is proposed [
16], which can be used for regional wind power forecasting. In this approach, the entire wind farm cluster is divided into several sub-regions. Wind farms with strong correlation and high prediction accuracy of power are selected as representatives of every region, and the weighting coefficient of each zone is calculated using statistical methods. Then the power of total region is predicted using superposition [
17]. However, the method lacks the analysis and utilization of factors related to inter-zonal wind power and with the weighting coefficient it is difficult to objectively reflect the change in operating conditions, so the prediction accuracy is lower.
In addition, Liu et al. [
18] propose a regional forecasting method of wind power based on adaptive subareas and long-term and short-term matching. The power of the total region is evaluated during every time frame by adding the predicted power of every subregion. The prediction accuracy of this method depends on the quality of historical data because the correlation among wind farms of wind farm clusters is not considered. Lobo et al. [
19] propose a matching method of single-feature relationship to construct the result of total power. Qu et al. [
20] propose extracting the correlations of different features from the data of several wind power clusters and express and standardize vine genealogy, combining it with segmented cloud to build the target model. The algorithm is robust enough to compensate for the complex implementation process. Damousis et al. [
21] propose a genetic fuzzy model that uses a set of weather stations surrounding a cluster of wind farms with a radius exceeding 15 km to predict the change in wind speed in different time windows in the future. Jursa et al. [
22] propose a technique that can select the input parameters and internal model parameters automatically based on particle swarm optimization and the technology that is assessed with the data of 30 wind farms in the extended area. Kumari et al. [
23] propose a deep mixed approach to predict solar irradiance. Meanwhile, a spatial characteristic matrix is generated using a single correlation analysis method with long short-term memory (LSTM) neural networks to extract features and convolutional neural networks (CNNs) to predict solar irradiance.
In previous studies, the overall research direction of cluster prediction has been fully studied, especially the improvement of prediction models and algorithms, while the current research questions that are waiting to be solved are as follows:
The current prediction methods for wind power clusters are mostly optimized for small and medium-sized enterprises, while the prediction methods for large-scale wind power clusters have not been fully developed yet. In fact, the power prediction results of large-scale wind farms are of greater significance for the dispatch in regional grids.
Different from the power prediction mode of a single wind farm, the input data of the wind power cluster prediction is a data set that is provided by a single wind farm. Additionally, the data divergence is difficult to express in the power feature of wind power clusters directly. It is necessary to further extract the mode of characterizing the output of the wind power cluster from the data set of single wind farm.
The modeling efficiency of a large number of large-scale wind farms decreases as the number of wind farms increases. Therefore, an urgent problem to be solved is how to improve the efficiency of these models.
Based on the above analysis, this paper proposes an ultra-short-term prediction method for wind power of massive wind power clusters based on feature mining of spatiotemporal correlations. The main contributions of this method are as follows:
Through the construction of statistical features to describe the fluctuation feature of wind farms, the 159 wind farms are divided into several wind power clusters in the region using a fuzzy clustering algorithm to simplify the number of models aiming at every wind power cluster.
The data of wind power clusters is reduced in dimensionality using the Kernel principal component analysis algorithm and combined with an average sequence that can reflect the meteorological attributes of wind power clusters in a region to form a spatiotemporal feature representation matrix to reduce the dimensionality of data and accelerate the model convergence in the training stage.
Under the Seq2Seq framework, a spatiotemporal attention neural network is constructed and key features are dynamically given important attention in the model training stage to improve the training accuracy.
The remaining chapters of this paper are as follows:
Section 2 introduces the ultra-short-term forecasting method for large-scale wind power clusters,
Section 3 describes our experimental analysis, and
Section 4 is the conclusions obtained in this paper.
3. Experimental Analysis
An experimental analysis was carried out for large-scale wind power clusters of 159 wind farms in inner Mongolia, including the data of measured power and the data of numerical weather predictions from 1 March 2019 to July 2019, with a data resolution of 15 min. The data do not include curtailment when the wind power is reduced. If the sample of wind power is less than 0, the power will be set to zero.
In order to achieve uniform dimensions and improve the calculation efficiency, the maximum–minimum normalization method is used to normalize the input and output data. By taking a certain feature as an example, the principle of normalization is as follows:
In Equation (16), x′ is the feature vector after normalization. xmin and xmax represent the maximum and minimum of x, respectively.
In the stage of prediction, the predicted results are restored to the original power interval according to the denormalization formula, and the denormalization principle is shown in Equation (17):
The fluctuating characteristics are taken as input and the fuzzy clustering algorithm is used as the classifier to realize the division of wind power clusters. The results of the cluster division are shown in
Table 1.
Taking cluster No. 1 as an example, the principal component analysis was used to reduce the dimension for the three characteristics wind speed, temperature, and humidity of 32 wind farms in a defined space. The number of principal components was taken as 10. The contribution rate of the principal component of each feature is shown in
Figure 4. The first and second contribution rate of principal components of wind speed were relatively high and the sum was higher than 70%, so the first and second principal components of wind speed were taken as inputs of modeling inputs. The first principal component of temperature was higher than 90%, so the first principal component of temperature was taken as the input of modeling. The first principal component of humidity was higher than 70%, so the first principal component of temperature was chosen as the input of m spatial features composed of wind speed, temperature, and humidity modeling. The analysis algorithm of principal component reduces the value of spatial features for wind speed, temperature, and humidity from 96 to 4, which greatly reduces the data dimension, retains the consistency of spatial features, and promotes the convergence of the algorithm.
Table 2 shows the training parameters of the data-driven model. To reduce the risk of overfitting, the dropout was set to 0.2.
The attention weights of the training set of the No. 1 wind power cluster are shown in
Figure 5. For the power extrapolation modeling of 16 steps, the spatial attention weights of different prediction steps change little. However, for a certain prediction step, historical power has the highest attention weight, which shows that the autocorrelation of the power series plays a leading role in ultra-short-term power prediction modeling. Then, the first principal component of wind speed and spatial average wind speed also play a leading role in building a model, so the wind speed provided by numerical weather prediction contributes to build model. Strictly speaking, there is a causal correlation between wind speed and wind power; the attention weight of wind speed should be high. However, the causal relationship is diluted in the externalization of attention weight because of the error of numerical weather prediction.
Compared to wind speed and historical power, temperature, humidity, and pressure contribute little to model.
The prediction curves of the 16th step for every wind power cluster and wind power group are shown in
Figure 6. The prediction curve of every cluster can accurately track the actual power curve with less error. However, it is worth to be pointed out that the STAN model, like other multi-step prediction models, also has a delay in the time series for its prediction results. The delay becomes more obvious as the prediction step increases.
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7 explain performance of each wind power cluster.
The prediction indicators of the STAN model are shown in
Figure 7. The index of the qualified rate is almost stable at 100%. The absolute error of most sample predictions is less than 25% of the installed capacity. The correlation coefficient and R2 coefficient showed a downward trend with the increase in the predicted time scale. However, they still maintained high values when extrapolated to the 16th step. This indicates that the curve of prediction and actual power maintain the high degree of similarity, and wave peaks and wave troughs can be effectively fitted. The two evaluation indicators of error, MAE and RMSE, have a slight upward trend, which indicates the error is lower in the test set. The RMSE can still be stable within 10% of the installed capacity for the prediction result of the 4th hour. The performance of the power prediction for the STAN model for large-scale wind power clusters is stable.
Extreme error events have the greatest impact on the power system, and extreme positive errors indicate that the predicted value is seriously lower than the actual value. If the actual output of wind power is higher, the system will be prone to wind abandonment. The extreme negative errors indicate that the predicted value is seriously higher than the actual value. If the actual output of wind power is lower, load shedding events will be prone to occur. The extreme error statistics of the 16-step prediction of the STAN model are shown in
Figure 8. According to a description in the literature [
28,
29], an absolute error that exceeds 40% of the installed capacity is recognized as an extreme error event. According to this statement, comparing the forecast on a single-field basis, extreme error events of wind power clusters are rare in the large scale. As the prediction step increases, the absolute value of the extreme error tends to increase approximately. For the 16th step, the extreme of the positive error of the installed capacity is 25.278%, and the extreme negative error of the installed capacity is 25.719%. They are lower than the described 40% of the installed capacity, so the stability of our STAN model is proven.
The performance of STAN, a time convolutional network, a long short-term memory network (LSTM), a bidirectional long short-term memory network (BiLSTM), and an LSTM cyclic autoencoder proposed in this paper are compared. As shown in
Figure 9, the comparison index is the average RMSE of the predicted result during the 4th in the future. The error is lowest for the STAN algorithm. The TCN algorithm has the lowest performance. The difference between the highest RMSE and the lowest RMSE is 0.00734. However, the total installed capacity of wind power clusters is 16,887 MW. The error between TCN and STAN is 124 MW, which is equivalent to the installed capacity of a small or medium-sized wind farm.
The performance of the cluster–subcluster prediction result superimposed in this paper, the prediction mode superimposed on the single-field prediction, and the prediction mode directly predicted by the total cluster are compared, and the performance is also the average RMSE of 4 h predictions. The average RMSE of the prediction mode in this paper is 0.0522, the average RMSE of single-field prediction overlay is 0.0618, and the average RMSE of the prediction mode of the direct total cluster is 0.0639. The prediction model proposed in this paper has the lowest error and makes greater contribution to ensuring the power supply capacity of the area.
At the same time, we compare STAN, vanishing gradient mitigation and recurrent neural network (VGM–RNN), sequential floating forward selection feature selection and bidirectional long short-term memory (SFFS–BLSTM), complementary ensemble empirical mode decomposition and multi-scale permutation entropy (CEEMD–MPE), and combining complementary ensemble empirical mode decomposition, multi-scale permutation entropy, least squares support vector machine and particle swarm optimization (CEEMDAN–MPE–LSSVM–PSO). The comparison index is the RMSE of the predicted results in the next 4 h. As shown in
Figure 10, compared to other prediction models, the STAN model has the smallest error value and its prediction accuracy is increased by 24.1%, 20.1%, 16.2%, and 10.5%, respectively.