Ultra-Short-Term Power Prediction of Large Offshore Wind Farms Based on Spatiotemporal Adaptation of Wind Turbines

An, Yuzheng; Zhang, Yongjun; Lin, Jianxi; Yi, Yang; Fan, Wei; Cai, Zihan

doi:10.3390/pr12040696

Open AccessArticle

Ultra-Short-Term Power Prediction of Large Offshore Wind Farms Based on Spatiotemporal Adaptation of Wind Turbines

by

Yuzheng An

¹,

Yongjun Zhang

^1,*,

Jianxi Lin

²,

Yang Yi

²,

Wei Fan

² and

Zihan Cai

¹

School of Electrical Power, South China University of Technology, Guangzhou 510641, China

²

System Analysis Department, Electric Dispatching and Control Center, Guangdong Power Grid Co., Ltd., Guangzhou 510000, China

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(4), 696; https://doi.org/10.3390/pr12040696

Submission received: 22 February 2024 / Revised: 20 March 2024 / Accepted: 24 March 2024 / Published: 29 March 2024

(This article belongs to the Section Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Accurately predicting the active power output of offshore wind power is of great significance for reducing the uncertainty in new power systems. By utilizing the spatiotemporal correlation characteristics among wind turbine unit outputs, this paper embeds the Diffusion Convolutional Neural Network (DCNN) into the Gated Recurrent Unit (GRU) for the feature extraction of spatiotemporal correlations in wind turbine unit outputs. It also combines graph structure learning to propose a sequence-to-sequence model for ultra-short-term power prediction in large offshore wind farms. Firstly, the electrical connection graph within the wind farm is used to preliminarily determine the reference adjacency matrix for the wind turbine units within the farm, injecting prior knowledge of the adjacency matrix into the model. Secondly, a convolutional neural network is utilized to convolve the historical curves of units within the farm along the time dimension, outputting a unit connection probability vector. The Gumbel–softmax reparameterization method is then used to make the probability vector differentiable, thereby generating an optimal adjacency matrix for the prediction task based on the probability vector. At the same time, the difference between the two adjacency matrices is added as a regularization term to the loss function to reduce model overfitting. The simulation of actual cases shows that the proposed model has good predictive performance in ultra-short-term power prediction for large offshore wind farms.

Keywords:

wind power; spatiotemporal correlation; graph structure learning; ultra-short-term power prediction; Gumbel–softmax; DCGRU

1. Introduction

The proposal of carbon peaking and carbon neutrality goals has led to a rise in the number and capacity of wind farms in recent years [1]. However, the strong intermittency and variability in wind power pose significant challenges to the stable operation of the power system. Accurately predicting the active power output of offshore wind is extremely important for reducing the uncertainty in new power systems, optimizing power scheduling, peak shaving and frequency regulation, as well as for power market clearing. Traditional wind power generation forecasting techniques are primarily categorized into physical modeling methods [2,3] and statistical modeling methods [4,5]. Physical modeling methods focus on simulating the physical factors affecting wind power output. This approach does not rely on extensive historical data, but its modeling process is complex, and it has limited resistance to interference. On the other hand, statistical modeling methods, such as the commonly used time series models [6,7], primarily predict wind power output by analyzing historical time series data. However, because of their limitations in effectively predicting nonlinear factors, these methods result in suboptimal forecasting accuracy.

Recently, Deep Learning (DL) has been applied to wind power forecasting and has achieved promising results. Reference [8] initially applies wavelet decomposition to time series data for extracting features in both time and frequency domains. Subsequently, it incorporates an attention mechanism and constructs a model based on Wavelet Decomposition and a Bi-directional Long Short-Term Memory Network, aimed at ultra-short-term power prediction of wind power output. The authors of [9] employed cross-attention to reconstruct the relationship between wind power and meteorological factors. While favorable predictive results have been achieved by the aforementioned studies, a sensitivity to meteorological forecasting data is exhibited. Considering the current state of partial micro-meteorological forecasting techniques for offshore wind power, it may be the case that their accuracy and timeliness do not fully meet the requirements of ultra-short-term power prediction for wind power.

In ultra-short-term power prediction, due to the inertia of wind turbines, the historical power output from the wind farm in preceding time steps is often used for making more accurate predictions for subsequent time steps’ power output. Additionally, offshore wind farms are frequently characterized by centralized layouts, resulting in wake effects among individual units and the presence of strong spatial correlations [10,11]. As a result, spatiotemporal correlation information is considered in wind power forecasting. Ref. [12] studied the impact of integrating spatiotemporal wind data on the performance of wind forecast neural networks, highlighting the significant influence of regional and seasonal wind conditions on the predictive model’s performance. Ref. [13] proposed a wind power prediction method that considers spatiotemporal correlations by capturing spatial features with convolutional neural networks and temporal features with MLP. Ref. [14] calculated a spatiotemporal correlation matrix that describes the correlations between neighboring wind farms and introduced an ultra-short-term power prediction method for wind farm clusters based on dynamic spatiotemporal correlations and a hierarchical directed graph structure. Ref. [15] proposed a Spatiotemporal Graph Cross-Attention Autoencoder Network (STGCAN) for wind power prediction. Ref. [16] utilized a spatiotemporal multi-clustering algorithm and a hybrid neural network method for regional wind farm power prediction to learn the potential spatiotemporal dependencies of regional wind farms. Ref. [17] used three correlation coefficients to characterize the spatiotemporal correlation properties within wind clusters. Ref. [18] dynamically analyzed the spatial correlation between a specific wind farm and other wind farms in a region using correlation coefficients. Regarding DL, Recurrent Neural Networks (RNNs) can capture temporal dynamics in sequences, and their variant Gated Recurrent Units (GRUs) can effectively address the shortcomings of recurrent neural networks, resulting in more significant performance in temporal feature extraction. Graph Neural Networks (GNNs) can effectively aggregate spatial dependencies in graph data based on adjacency matrices. Ref. [19] introduced an innovative approach for ultra-short-term power prediction using a GNN combined with an enhanced Bootstrap method. The implementation of the Diffusion Convolutional Recurrent Neural Network (DCRNN) merges the advantages of both GNNs and RNNs, efficiently identifying the spatial and temporal relationships among the input data [20]. This method offers fresh perspectives on elucidating the spatial-temporal interactions within wind turbine units.

However, when applying GNNs and their variants, if wind farms are abstracted as graph data, each wind turbine is treated as a graph node, and the adjacency matrix characterizes the temporal correlation between wind turbines. A natural idea is to use the electrical connection diagram between wind turbines, but it cannot reflect the spatiotemporal correlation of wind speeds from different directions between wind turbines, as the electrical connection diagram remains unchanged. Therefore, it is crucial to determine the adjacency matrix input to the network so that it effectively reflects the spatiotemporal correlation between nodes. The introduction of graph structure learning methods transforms the adjacency matrix into a variable amenable to optimization and enables its generation under the guidance of downstream tasks [21]. This technique has found extensive applications in natural language processing [22,23,24] and computer vision [25,26,27]; yet, it remains unexplored for representing the spatiotemporal correlations among wind turbines within a wind farm.

Inspired by the aforementioned analysis, this paper proposes a short-term wind farm power prediction model that relies solely on historical measured data from wind farms. The model is based on the DCGRU architecture, which improves its ability to capture the spatiotemporal correlations among wind turbine units. To optimize the extraction of spatiotemporal correlations between wind turbine units, the model adaptively adjusts the input adjacency matrix, refining its guidance in prediction. Additionally, a graph regularization technique has been introduced, infusing the model with prior knowledge. This method allows the model to more effectively grasp the spatiotemporal relationships present in wind farms, thus enhancing the precision of ultra-short-term power prediction.

The paper is structured as follows: Section 2 provides a detailed introduction to the proposed model, including the workings and improvements of the adaptive adjacency matrix, and the method of the improved graph regularization. Section 3 of this work introduces the prediction metrics for wind farms. Section 4 presents the specific experimental results. Finally, Section 5 provides a brief summary of the work.

2. Improved GA-DCGRU-GR Model

2.1. Wind Turbine Node Connection Probability Vector Generation and Parameterization of the Adjacency Matrix

To seek the optimal representation of the adjacency matrix that can better characterize the spatiotemporal correlations within a wind farm, it is necessary to parameterize it and iteratively train it to ultimately obtain an optimal adjacency matrix. This paper utilizes a one-dimensional Convolutional Neural Network (1DCNN) to convolve the historical data of wind turbine units along the time dimension. Through the activation layer and fully connected layer, the connection probability vector between every two wind turbines is finally obtained. The specific process is shown in Figure 1.

To establish an adaptive learning mechanism for the adjacency matrix, it is necessary to parameterize and learn the adjacency matrix A of the graph. When parameterizing the graph’s adjacency matrix

A \in {0,1}^{n \times n}

, a differentiable function is required that outputs values of 0 or 1. Suppose A is a random variable of a Bernoulli distribution matrix parameterized by

θ \in {[0,1]}^{n \times n}

; then, for all A_ij in

A ~ B e r (θ_{i j})

, they are independent, where θ_ij is the success (A_ij = 1)probability of the Bernoulli distribution.

To further parameterize θ as θ_(ω), and since the backpropagation in neural networks requires derivative calculations to update parameters, it is essential to ensure that the parameterization of θ as θ₍_ω₎ is differentiable. Ref. [28] uses reparameterization technique, which is defined as follows:

Let z be a categorical variable with class probabilities p₁, p₂, …, p_k. To extract a sample z from a distribution with class probabilities p, and by adding Gumbel noise, the following can be obtained:

z_{i} = \arg \max_{i} [g_{i} + \log p_{i}]

(1)

where g_i represents a random variable from the standard Gumbel distribution, independently and identically distributed.

g_{i} = - \log (- \log (u_{i})), u_{i} ~ U (0,1)

. Considering that argmax is non-differentiable, it is replaced with softmax, resulting in:

z_{i} = \frac{\exp ((\frac{\log (p_{i}) + g_{i}}{τ}))}{\sum_{j = 1}^{k} \exp ((\frac{\log (p_{j}) + g_{i}}{τ}))}, for i = 1, \dots, k

(2)

where τ is the temperature parameter. The smaller τ is, the closer the result of softmax is to the result of argmax in g.

By applying the Gumbel reparameterization technique to the adjacency relationships, we obtain:

A_{i j} = \frac{\exp ((\frac{\log (θ_{i j}) + g_{i j}}{τ}))}{\sum_{j = 1}^{k} \exp ((\frac{\log (θ_{i j}) + g_{i j}}{τ}))}, for i = 1, \dots, k

(3)

where A_ij represents the probability of connection between node i and node j after the Gumbel–softmax operation; the smaller τ is, the closer A_ij approaches 1; θ_ij is the larger value in the probability vector of the connection between node i and node j after adaptive feature extraction; and g_ij represents a random variable from the standard Gumbel distribution, independently and identically distributed.

After applying the Gumbel–softmax reparameterization technique, it ensures the differentiability of the probability vector obtained from feature extraction.

To ensure the credibility of the adjacency matrix generated by the model during iterations, it is necessary to inject certain prior knowledge into the model to ensure that it learns the adjacency matrix in the correct trend. Therefore, a graph regularization term is proposed to be added to the model’s loss function. As we already know the electrical connection diagram of the wind farm, although it cannot fully reflect the spatiotemporal correlation between wind turbines in the wind farm (for instance, two geographically close wind turbines may not be connected in the electrical connection diagram), it does reflect, to some extent, the spatial connectivity between the wind turbines. Graph adaptive learning, based on the electrical connection diagram of the wind farm, explores a more optimal adjacency matrix, and its results are credible.

2.2. Spatiotemporal Correlation Extraction Model

(1): Gated Recurrent Unit (GRU)

In dealing with time series data, traditional RNNs and Long Short-Term Memory (LSTM) Networks face challenges due to their high computational complexity and practical application difficulties. The GRU [29] network introduces a gating mechanism, effectively resolving the issues of vanishing and exploding gradients present in traditional RNN and LSTM networks. This makes it more efficient in processing long time series data. The GRU network was proposed in 2014 to capture dependencies across different time scales. It includes an update gate and a reset gate internally. The reset gate decides the method of blending new input with previous memories, and the update gate establishes the proportion of prior memories to maintain at the present moment. The model is shown in Figure 2.

As shown in Figure 1, the output of the Gated Recurrent Unit is:

h_{t} = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ {h^{'}}_{t}

(4)

where h_t represents the hidden state at time t, which is a linear combination of the hidden state h_t₋₁ at time t − 1, and the candidate state

h_{t}^{'}

; z_t is the update gate, activated by the sigmoid function.

z_{t} = s i g m o i d (ω_{z} \cdot [h_{t - 1} ∥ x_{t}] + b_{z})

(5)

where

ω_{z}

represents the weight parameters of the update gate; “

| |

” represents the concatenation operation; x_t represents the input features; and b_z represents the bias value.

{h^{'}}_{t} = \tanh (ω_{h} \cdot [(r_{t} ⊙ h_{t - 1}) ∥ x_{t}] + b_{h})

(6)

where “tanh” represents the hyperbolic tangent function;

ω_{h}

represents the weight parameters for the candidate state; b_h represents the bias value; “⊙” indicates element-wise multiplication of matrices; and r_t is the reset gate, which is also activated by the sigmoid function.

r_{t} = s i g m o i d (ω_{r} \cdot [h_{t - 1} ∥ x_{t}] + b_{r})

(7)

where

ω_{r}

represents the weight parameters for the update gate; “

| |

” represents the concatenation operation; and b_r represents the bias value.

(2): Diffusion Convolutional Neural Network (DCNN)

This section references the Diffusion Convolutional Network proposed in [15], which extracts spatial correlation features by modeling the stochasticity of dynamics through the randomness of the diffusion process. The model is shown in Figure 3.

The random walk matrix is first defined for diffusion convolution:

P = {(D^{- 1} A)}^{k}

(8)

where D represents the degree matrix of the graph, with

D_{i i} = \sum_{j} A_{i j}

being the adjacency matrix of the graph; k represents the number of diffusion steps, which is used to truncate the diffusion process. P is the random walk transition matrix on the graph, where P_ij represents the transition probability from node i to node j, and P_ji represents the transition probability from node j to node i, with P_ij and P_ji not necessarily being equal.

Diffusion convolution is defined as:

X^{'} = ω ⋆_{G} X = \sum_{k = 0}^{K} (ω {(D_{O}^{- 1} A)}^{k}) X

(9)

where X′ represents the new node feature representation after the diffusion convolution operation;

⋆_{G}

denotes the operation of diffusion convolution on the graph; X represents the node features; and

ω

is the trainable network weight.

Diffusion convolution can capture the random characteristics of wind flow over a wind farm and reflect them in the random walk matrix, effectively capturing the spatial correlations among different wind turbine units at the same time.

(3): Diffusion Convolutional Gated Recurrent Unit (DCGRU)

We constructed the DCGRU model by replacing the RNN with the GRU, following the model presented in reference [15]. The DCGRU combines the diffusion convolution model with the Gated Recurrent Unit, as shown in Figure 4. In this model, the hidden state h_t₋₁ from the GRU at time t − 1 is concatenated with the features x_t, and then, the resulting feature matrix undergoes diffusion convolution to obtain a new feature representation, as shown in the following equation:

[h_{t - 1} ∥ x_{t}]^{'} = ω ⋆_{G} X = \sum_{k = 0}^{K} (ω {(D_{O}^{- 1} A)}^{k}) . [h_{t - 1} ∥ x_{t}]

(10)

Therefore, the computation formula for the DCGRU model is:

\begin{matrix} z_{t} = s i g m o i d (ω_{z} ⋆_{G} [h_{t - 1} ∥ x_{t}] + b_{z}) \\ r_{t} = s i g m o i d (ω_{r} ⋆_{G} [h_{t - 1} ∥ x_{t}] + b_{r}) \\ {h^{'}}_{t} = \tanh (ω_{h} ⋆_{G} \cdot [(r_{t} ⊙ h_{t - 1}) ∥ x_{t}] + b_{h}) \\ h_{t} = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ {h^{'}}_{t} \end{matrix}

(11)

2.3. Graph Regularization

In order to inject certain prior graph knowledge into the model, aiding the model in better exploring the optimal adjacency matrix, we use the electrical connection graph of the wind farm as the model’s initial adjacency matrix. Assuming the adjacency matrix of the wind farm’s electrical connection diagram is A_ij, and the adjacency matrix generated iteratively is θ_ij, the cross-entropy loss between the two is used as a regularization term in the loss function. The calculation of the regularization term is as follows:

l_{r e g} = \sum_{i j} - A_{i j} \log θ_{i j} - (1 - A_{i j}) \log (1 - θ_{i j})

(12)

In the formula, A_ij denotes the entry located at the intersection of the i-th row and j-th column in the adjacency matrix corresponding to the electrical connection diagram, where A∈{0,1}, indicating whether there is a direct electrical connection between the corresponding wind turbines. θ_ij denotes the entry located at the intersection of the i-th row and j-th column in the adjacency matrix obtained after graph structure learning, with θ_ij∈[0,1]. This implies that θ_ij is a learned probability or strength of connection, as opposed to the binary connection in A.

2.4. Consider the DCGRU Model with Graph Adaptive Learning and Graph Regularization (GA-DCGRU-GR)

As previously mentioned, to enhance the predictive performance of DCGRU in the ultra-short-term power prediction of wind farms, optimize the graph adjacency matrix to better represent the spatiotemporal correlations between wind turbine units, and guide the prediction task, this paper has designed the model as shown in Figure 5.

2.5. The Process of the GA-DCGRU-GR Model

In this paper, the DCGRU model is applied to ultra-short-term power prediction in wind farms. Combining graph structure learning, the adjacency matrix iteratively and adaptively inputs into the diffusion convolution, better representing the spatiotemporal correlations among wind turbine units and guiding the prediction tasks based on this adjacency matrix. At the same time, a graph regularization method is proposed to prevent the model from blindly learning the adjacency matrix, thereby enhancing the model’s predictive performance in ultra-short-term power prediction for wind farms. The prediction process is illustrated in Figure 6.

As shown in Figure 6, different colors represent different steps, the prediction process is as follows:

(1): Interpolate, denoise, and standardize the historical data from each wind turbine unit in the wind farm.
(2): Use a CNN to perform convolutions along the time dimension of the wind farm’s historical data and ultimately obtain the connection probability vectors for each wind turbine unit.
(3): Apply the Gumbel–softmax algorithm to the probability vectors for a differentiable transformation and obtain the adaptive adjacency matrix.
(4): Input the adaptive adjacency matrix and the historical observed data from the wind turbine units into the DCGRU.
(5): Calculate the model loss using a loss function that incorporates graph regularization.
(6): Determine if the model has converged; if not, continue iterative training. If it has converged, end the training, input the test set, and output the model’s prediction values.

3. Data Preprocessing and Evaluation Metrics

3.1. Data Preprocessing and Sample Generation

(1): In order to improve model convergence and performance, and to mitigate the impact of extreme data on the model, the actual power data of each wind turbine generator is subjected to the following processing: the actual power data of each wind turbine generator is normalized using the following formula:

x^{*} = \frac{x - \bar{x}}{\sqrt{\frac{\sum_{n = 1}^{N} {(x_{n} - \bar{x})}^{2}}{N}}}

(13)

In the equation, x* represents the standardized feature data value, x represents the data value of the input variable,

\bar{x}

is the mean of the input variable, and N is the number of input variables.

(2): The data are divided into segments of twelve time points each. The first twelve time points are used as feature data, and the subsequent twelve time points are used as label data. A sliding window is applied to the dataset along the time axis with a step size of 1.
(3): The entire generated dataset is shuffled and divided into a ratio of 7:1:2 for the training set, validation set, and test set, respectively.

3.2. Evaluation Metrics

The study employs Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) as the evaluation metrics for forecasting models. The definitions and formulas for these metrics are as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(14)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(15)

Here, n represents the number of data points,

y_{i}

denotes the actual values, and

{\hat{y}}_{i}

represents the predicted values. RMSE calculates the square root of the mean squared deviation between predicted and actual values. The smaller these two indicators are, the lower the predictive performance of the model is, and both indicators are calculated using the normalized true values.

In summary, these three metrics are used to evaluate the predictive performance of the models, where smaller RMSE and MAE values are preferred.

4. Case Study

To demonstrate the effectiveness of the proposed method in predicting ultra-short-term power output in large offshore wind farms, this study employs actual data from two wind farms in southern China for model validation. The first wind farm consists of 36 wind turbines, each with a maximum power output of 5.5 MW, and the second wind farm comprises 73 wind turbines, each also with a maximum power output of 5.5 MW. The electrical connection diagrams for the first and second wind farms are shown in Figure 7 and Figure 8, respectively. Colors are used to represent different regions. Different colors represent different regions. To avoid incorrect spatiotemporal relationships due to excessively long collection lines, this study divides the wind farms into different regions based on their turbine connection diagrams. Within each region, wind turbines are interconnected in the adjacency matrix, while turbines from different regions are not connected. This approach establishes the initial adjacency matrix for the wind farms. The data for the first case covers the year 2022 with a time granularity of 15 min and a prediction range of 12 time points, representing the power output for the next 3 h. The data for the first case starts from 1 January 2021 to 31 December 2023, with a time granularity of 15 min. The prediction range for both cases is 12 time points, representing the power output for the next 3 h. To ensure the reliability of the experimental results, all results in the case analysis are presented as the average of ten experiments to avoid any accidental variations.

4.1. Adaptive Learning from Graphs Improves Model Predictive Performance

To validate the improvement in model predictive performance due to graph adaptive learning, we first compare the GA-DCGRU with graph adaptive learning to the DCGRU that uses an electrical connection diagram of a wind farm as the adjacency matrix. The results are shown in Table 1.

As shown in Table 1, after implementing graph adaptive learning, in case 1, the model’s prediction metrics, RMSE and MAE, were reduced from 13.25 MW and 8.78 MW to 9.39 MW and 6.72 MW, respectively, improving by 29.13% and 23.46%. In case 2, the model’s prediction metrics, RMSE and MAE, were reduced from 31.45 MW and 17.19 MW to 24.31 MW and 14.52 MW, respectively, improving by 22.7% and 15.53%. This improvement is attributed to learning the elements of the adjacency matrix as parameters of the model, with the sole objective of minimizing the prediction loss. Under this objective, the values of the elements in the adjacency matrix continuously change, and the structural changes in the adjacency matrix more effectively guide the ultra-short-term power prediction task of the wind farm, enhancing the model’s predictive performance. However, it is worth noting that in this part, we did not apply graph regularization methods, hence the learned adjacency matrix does not have the injection of prior knowledge; the model is merely repeating feature extraction through 1DCNN and generating the adjacency matrix based on the probability vector.

4.2. The Impact of Graph Regularization on Model Performance

This section examines the effects of graph regularization methods on model performance, comparing them with the L2 regularization method. Notably, all models in this section utilize adaptive adjacency matrices. As illustrated in Table 2, for case 1, the model without graph regularization (GA-DCGRU) achieved results of 6.72 MW and 9.39 MW for MAE and RMSE metrics, respectively. In contrast, the model proposed in this paper (GA-DCGRU-GR) attained 4.29 MW and 6.91 MW for the MAE and RMSE metrics, respectively, marking improvements of 36.16% and 26.41%. The DCGRU model applying L2 regularization (GA-DCGRU-L2) scored 6.11 MW and 8.94 MW in MAE and RMSE, respectively, surpassing the DCGRU model without graph regularization, as L2 regularization mitigates overfitting on the training set. In case 2, the GA-DCGRU model’s MAE and RMSE metrics reached 17.49 MW and 24.31 MW, respectively, while the GA-DCGRU-GR model achieved 4.82 MW and 7.33 MW in MAE and RMSE metrics, respectively, with improvements of 72.44% and 69.85%, showing that the proposed method is more effective in scenarios with more nodes, yielding greater performance enhancements. Overall, models applying graph regularization methods outperform those using L2 regularization, with models not utilizing any graph regularization showing the poorest performance. The reason why the L2 regularization method performs worse than the graph regularization method is that L2 regularization aims to minimize the model’s parameters. In this model, the elements in the graph adjacency matrix are also optimizable parameters of the model, and blindly reducing the values in the adjacency matrix is obviously counterproductive for feature extraction. Therefore, the L2 regularization method performs worse than the graph regularization method. If λ is set to a larger value, the predictive performance of the model using L2 regularization could even be worse than that of the model without any graph regularization.

Comparing Table 1 and Table 2, since graph regularization methods are based on adaptive adjacency matrices, the model in Table 1 using the wind farm electrical connection graph as an adjacency matrix (DCGRU) is inferior to the model in Table 2 that does not apply graph regularization but uses adaptive adjacency matrices (GA-DCGRU). The graph regularization methods inject certain prior knowledge into the model’s adaptive adjacency matrix and also reduce overfitting on the training set. Therefore, the proposed model GA-DUGRU-GR in this paper outperforms the GA-DUGRU model without graph regularization, the GA-DUGRU-L2 model applying L2 regularization, and the DUGRU model without adaptive adjacency matrix. Figure 9 shows the changes in the adjacency matrix based on the prior knowledge of the electrical connection graph of the wind farm after graph adaptive learning in case 1. Figure 10 shows the changes in the adjacency matrix based on the prior knowledge of the electrical connection graph of the wind farm after graph adaptive learning in case 2. The number in the circle represents the fan serial number. The added interconnections between wind turbines are obtained from the probability vector (θ_ij, 1 − θ_ij) after passing through the softmax function. If the probability vector after passing through softmax is (0,1), there is no connectivity between the wind turbines; if it is (1,0), there is connectivity.

4.3. Different Model Prediction Results Comparison

In this section, a comparison is made with LSTM-Seq2Seq, GRU-Seq2Seq, and the improved CNN-LSTM and DCGRU, discussing and demonstrating the superiority of the model proposed in this paper in ultra-short-term power prediction for large offshore wind farms. The prediction results comparison is shown in Table 3. The specific prediction results in case 1 and case 2 are shown in Figure 11 and Figure 12.

As shown in Table 3, sequence-to-sequence prediction models composed solely of LSTM or GRU networks, focusing only on temporal features, exhibit the weakest prediction performance due to the lack of spatial feature consideration. Meanwhile, the improved CNN-LSTM network, which utilizes CNN for spatial feature extraction and LSTM for temporal feature extraction, somewhat enhances the prediction performance. However, the spatial feature extraction capability of CNN is evidently not as strong as that based on GNN. Therefore, the method proposed in this paper, both in terms of spatial and temporal feature extraction, surpasses traditional methods, demonstrating superior prediction performance.

5. Conclusions

This paper utilizes the DCGRU to capture spatial and temporal correlations, proposing a sequence-to-sequence model for ultra-short-term power prediction in offshore wind farms. We optimized the adjacency matrix input into the diffusion convolution, enabling it to better reflect the spatiotemporal correlations between wind turbine units and effectively guide downstream tasks. Additionally, a regularization method is proposed to inject prior knowledge into the network, enhancing the model’s prediction accuracy. This paper concludes with the following:

(1): The spatiotemporal correlation between wind turbine units can effectively guide the ultra-short-term power prediction of wind farms.
(2): Effective prior knowledge can guide the correct optimization of the adjacency matrix, preventing the model from inaccurately learning. Comparing the two cases, it is known that effective prior knowledge is more important in models with more nodes. In case 1 of Table 2, graph regularization improved the model’s predictive performance by 36.16% and 26.41% on the MAE and RMSE metrics, respectively. In case 2, graph regularization improved the model’s predictive performance by 72.44% and 69.85% on the MAE and RMSE metrics, respectively, which is a significant improvement.
(3): Compared to fixed geographical relationships, such as the electrical connection graph of wind turbine units, optimized adjacency relationships can enhance the feature aggregation capability of graph learning models (such as diffusion convolution), reflecting the potential spatiotemporal correlations between wind turbine units and thus better guiding downstream prediction tasks. In case 1, the proposed method (GA-DCGRU-GR) achieved an MAE of 4.29 MW and an RMSE of 6.91 MW, showing improvement over other methods.

In future research, for ultra-short-term power prediction in wind farms, feature extraction can be performed on historical power curves to increase input features and reduce model latency; consideration can also be given to correcting and denoising historical wind power output curves to improve data quality, conducting error correction research, and further enhancing the model’s prediction effects.

Author Contributions

Conceptualization, Y.A.; methodology, Y.Z. and Y.A.; software, J.L. and Y.Y.; validation, W.F. and Z.C.; writing—original draft preparation, Y.A., Y.Z., J.L., Y.Y., W.F. and Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Key R&D Program Projects of Guangdong Province (2021B0101230001) and the Southern Power Grid Corporation Technology Project ([036000KK52222013 (GDKJXM20222142)]).

Data Availability Statement

The data in this article is sourced from two actual wind farms in southern China, but due to privacy issues, it cannot be open-source.

Conflicts of Interest

The authors declare that this study received funding from Southern Power Grid Corporation. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

References

Li, Q.; Zhang, Y.; Ji, T.; Lin, X.; Cai, Z. Volt/Var Control for Power Grids with Connections of Large-Scale Wind Farms: A Review. IEEE Access 2018, 6, 26675–26692. [Google Scholar] [CrossRef]
Cheng, W.Y.; Liu, Y.; Bourgeois, A.J.; Wu, Y.; Haupt, S.E. Short term wind forecast of a data assimilation/weather forecasting system with wind turbine anemometer measurement assimilation. Renew. Energy 2017, 107, 340–351. [Google Scholar] [CrossRef]
Ma, T.; Yang, H.; Lu, L. Solar photovoltaic system modeling and performance prediction. Renew. Sustain. Energy Rev. 2014, 36, 304–315. [Google Scholar] [CrossRef]
Masseran, N.; Razali, A.M.; Ibrahim, K.; Latif, M.T. Fitting a mixture of von Mises distributions in order to model data on wind direction in Peninsular Malaysia. Energy Convers. Manag. 2013, 72, 94–102. [Google Scholar] [CrossRef]
Zhou, J.; Shi, J.; Li, G. Fine tuning support vector machines for short-term wind speed forecasting. Energy Convers. Manag. 2011, 52, 1990–1998. [Google Scholar] [CrossRef]
Dong, Y.; Ma, S.; Zhang, H.; Yang, G. Wind Power Prediction Based on Multi-class Autoregressive Moving Average Model with Logistic Function. J. Mod. Power Syst. Clean Energy 2022, 10, 1184–1193. [Google Scholar] [CrossRef]
Kavasseri, R.G.; Seetharaman, K. Day-ahead wind speed forecasting using f-ARIMA models. Renew. Energy 2009, 34, 1388–1393. [Google Scholar] [CrossRef]
Xie, X.Y.; Zhou, J.H.; Zhang, Y.J.; Wang, J.; Su, J.Y. W-BiLSTM Based Ultra-short-term Generation Power Prediction Method of Renewable Energy. Autom. Electr. Power Syst. 2021, 45, 175–184. [Google Scholar]
Zhang, Y.; Zang, H.Y.; Cheng, L.L.; Liu, H.X.; Wei, Z.N.; Sun, G.Q. Ultra-short-term wind power forecasting based on adaptive time series representation and multi-level attention. Electr. Power Autom. Equip. 2024, 44, 117–125. [Google Scholar] [CrossRef]
Ge, W.C.; Li, J.R.; Teng, Y.; Li, J.J.; Zhang, T.; Hui, Q. Wind farm speed output vector optimization based on wake wind velocity field calculation. Acta Energiae Solaris Sin. 2019, 40, 641–648. [Google Scholar]
Zhao, S.; Jin, T.; Li, Z.; Liu, J.; Li, Y. Wind power scenario generation for multiple wind farms considering temporal and spatial correlations. Power Syst. Technol. 2019, 43, 3997–4004. [Google Scholar]
Shin, H.; Rüttgers, M.; Lee, S. Effects of spatiotemporal correlations in wind data on neural network-based wind predictions. Energy 2023, 279, 128068. [Google Scholar] [CrossRef]
Zhu, Q.; Chen, J.; Zhu, L.; Duan, X.; Liu, Y. Wind speed prediction with spatio–temporal correlation: A deep learning approach. Energies 2018, 11, 705. [Google Scholar] [CrossRef]
Wang, F.; Chen, P.; Zhen, Z.; Yin, R.; Cao, C.; Zhang, Y.; Duić, N. Dynamic spatio-temporal correlation and hierarchical directed graph structure based ultra-short-term wind farm cluster power forecasting method. Appl. Energy 2022, 323, 119579. [Google Scholar] [CrossRef]
Yu, R.; Sun, Y.; He, D.; Gao, J.; Liu, Z.; Yu, M. Spatio-temporal graph cross-correlation auto-encoding network for wind power prediction. Int. J. Mach. Learn. Cybern. 2024, 15, 51–63. [Google Scholar] [CrossRef]
Yu, G.; Liu, C.; Tang, B.; Chen, R.; Lu, L.; Cui, C.; Hu, Y.; Shen, L.; Muyeen, S.M. Short term wind power prediction for regional wind farms based on spatial-temporal characteristic distribution. Renew. Energy 2022, 199, 599–612. [Google Scholar] [CrossRef]
Zhang, J.; Liu, D.; Li, Z.; Han, X.; Liu, H.; Dong, C.; Wang, J.; Liu, C.; Xia, Y. Power prediction of a wind farm cluster based on spatiotemporal correlations. Appl. Energy 2021, 302, 117568. [Google Scholar] [CrossRef]
Pei, M.; Ye, L.; Li, Y.; Luo, Y.; Song, X.; Yu, Y.; Zhao, Y. Short-term regional wind power forecasting based on spatial–temporal correlation and dynamic clustering model. Energy Rep. 2022, 8, 10786–10802. [Google Scholar] [CrossRef]
Liao, W.; Wang, S.; Bak-Jensen, B.; Pillai, J.R.; Yang, Z.; Liu, K. Ultra-short-term Interval Prediction of Wind Power Based on Graph Neural Network and Improved Bootstrap Technique. J. Mod. Power Syst. Clean Energy 2023, 11, 1100–1114. [Google Scholar] [CrossRef]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
Zhu, Y.; Xu, W.; Zhang, J.; Liu, Q.; Wu, S.; Wang, L. Deep Graph Structure Learning for Robust Representations: A Survey. arXiv 2021, arXiv:2103.03036. [Google Scholar]
Yu, X.; Xu, W.; Cui, Z.; Wu, S.; Wang, L. Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021. [Google Scholar]
Tian, Y.; Chen, G.; Song, Y.; Wan, X. Dependency-driven Relation Extraction with Attentive Graph Convolutional Networks. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021. [Google Scholar]
Yasunaga, M.; Ren, H.; Bosselut, A.; Liang, P.; Leskovec, J. QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering. In Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Online, 6–11 June 2021. [Google Scholar]
Suhail, M.; Mittal, A.; Siddiquie, B.; Broaddus, C.; Eledath, J.; Medioni, G.; Sigal, L. Energy-based Learning for Scene Graph Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Liu, D.; Bober, M.; Kittler, J. Constrained Structure Learning for Scene Graph Generation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11588–11599. [Google Scholar] [CrossRef] [PubMed]
Kan, X.; Cui, H.; Lukemire, J.; Guo, Y.; Yang, C. FBNetGen: Task-aware GNN-based fMRI Analysisvia Functional Brain Network Generation. In Proceedings of the International Conference on Medical Imaging with Deep Learning, Zurich, Switzerland, 6–8 July 2022. [Google Scholar]
Jang, E.; Gu, S.; Poole, B. Categorical Reparameterization with Gumbel-Softmax. arXiv 2016, arXiv:1611.01144. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]

Figure 1. 1DCNN feature extraction process.

Figure 2. Gated Recurrent Unit.

Figure 3. Diffusion Convolutional Neural Network.

Figure 4. Diffusion Convolutional Gated Recurrent Unit.

Figure 5. GA−DCGRU−GR.

Figure 6. GA-DCGRU-GR ultra-short-term power prediction process for wind farms.

Figure 7. Wind Farm 1 Electrical Connection Diagram.

Figure 8. Wind Farm 2 Electrical Connection Diagram.

Figure 9. Visualization of adjacency matrix changes in case 1.

Figure 10. Visualization of adjacency matrix changes in case 2.

Figure 11. Prediction results comparison in case 1.

Figure 12. Prediction results comparison in case 2.

Table 1. The impact of graph adaptive learning on model prediction performance.

Method of Determining the Adjacency Matrix		RMSE/MW	MAE/MW
Case1	Wind Farm Electrical Connection Diagram	13.25	8.78
Case1	Graph adaptive learning (without graph regularization)	9.39	6.72
Case2	Wind Farm Electrical Connection Diagram	31.45	17.49
Case2	Graph adaptive learning (without graph regularization)	24.31	14.52

Table 2. The impact of graph regularization on model prediction performance.

Regularization Method		RMSE/MW	MAE/MW
Case 1	Without graph regularization (GA-DCGRU)	9.39	6.72
	With L2 regularization (GA-DCGRU-L2)	8.94	6.11
	With graph regularization (proposed model)	6.91	4.29
Case 2	Without graph regularization (GA-DCGRU)	24.31	17.49
	With L2 regularization (GA-DCGRU-L2)	22.56	16.43
	With graph regularization (proposed model)	7.33	4.82

Table 3. Prediction results comparison chart.

	Model	RMSE/MW	MAE/MW
Case 1	GA-DCGRU-GR	6.91	4.29
	LSTM-Seq2Seq	24.24	16.49
	GRU-Seq2Seq	23.25	15.27
	Improved CNN-LSTM	15.05	11.64
Case 2	GA-DCGRU-GR	7.33	4.82
	LSTM-Seq2Seq	34.72	24.64
	GRU-Seq2Seq	31.69	22.56
	Improved CNN-LSTM	17.65	14.97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

An, Y.; Zhang, Y.; Lin, J.; Yi, Y.; Fan, W.; Cai, Z. Ultra-Short-Term Power Prediction of Large Offshore Wind Farms Based on Spatiotemporal Adaptation of Wind Turbines. Processes 2024, 12, 696. https://doi.org/10.3390/pr12040696

AMA Style

An Y, Zhang Y, Lin J, Yi Y, Fan W, Cai Z. Ultra-Short-Term Power Prediction of Large Offshore Wind Farms Based on Spatiotemporal Adaptation of Wind Turbines. Processes. 2024; 12(4):696. https://doi.org/10.3390/pr12040696

Chicago/Turabian Style

An, Yuzheng, Yongjun Zhang, Jianxi Lin, Yang Yi, Wei Fan, and Zihan Cai. 2024. "Ultra-Short-Term Power Prediction of Large Offshore Wind Farms Based on Spatiotemporal Adaptation of Wind Turbines" Processes 12, no. 4: 696. https://doi.org/10.3390/pr12040696

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ultra-Short-Term Power Prediction of Large Offshore Wind Farms Based on Spatiotemporal Adaptation of Wind Turbines

Abstract

1. Introduction

2. Improved GA-DCGRU-GR Model

2.1. Wind Turbine Node Connection Probability Vector Generation and Parameterization of the Adjacency Matrix

2.2. Spatiotemporal Correlation Extraction Model

2.3. Graph Regularization

2.4. Consider the DCGRU Model with Graph Adaptive Learning and Graph Regularization (GA-DCGRU-GR)

2.5. The Process of the GA-DCGRU-GR Model

3. Data Preprocessing and Evaluation Metrics

3.1. Data Preprocessing and Sample Generation

3.2. Evaluation Metrics

4. Case Study

4.1. Adaptive Learning from Graphs Improves Model Predictive Performance

4.2. The Impact of Graph Regularization on Model Performance

4.3. Different Model Prediction Results Comparison

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI