Probabilistic Forecasting of Provincial Regional Wind Power Considering Spatio-Temporal Features

Li, Gang; Lin, Chen; Li, Yupeng

doi:10.3390/en18030652

Open AccessArticle

Probabilistic Forecasting of Provincial Regional Wind Power Considering Spatio-Temporal Features

by

Gang Li

^*,

Chen Lin

and

Yupeng Li

Institute of Hydropower and Hydroinformatics, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(3), 652; https://doi.org/10.3390/en18030652

Submission received: 25 December 2024 / Revised: 28 January 2025 / Accepted: 28 January 2025 / Published: 30 January 2025

(This article belongs to the Section A3: Wind, Wave and Tidal Energy)

Download

Browse Figures

Versions Notes

Abstract

Accurate prediction of regional wind power generation intervals is an effective support tool for the economic and stable operation of provincial power grid. However, it involves a large amount of high-dimensional meteorological and historical power generation information related to massive wind power stations in a province. In this paper, a lightweight model is developed to directly obtain probabilistic predictions in the form of intervals. Firstly, the input features are formed through a fused image generation method of geographic and meteorological information as well as a power aggregation strategy, which avoids the extensive and tedious data processing process prior to modeling in the traditional approach. Then, in order to effectively consider the spatial meteorological distribution characteristics of regional power stations and the temporal characteristics of historical power, a parallel prediction network architecture of a convolutional neural network (CNN) and long short-term memory (LSTM) is designed. Meanwhile, an efficient channel attention (ECA) mechanism and an improved quantile regression-based loss function are introduced in the training to directly generate prediction intervals. The case study shows that the model proposed in this paper improves the interval prediction performance by at least 12.3% and reduces the deterministic prediction root mean square error (RMSE) by at least 19.4% relative to the benchmark model.

Keywords:

provincial regional wind power; interval forecast; feature images; spatial meteorological distribution; improved quantile regression

1. Introduction

As one of the renewable energy resources, wind power plays an important role in accelerating the green and low-carbon transformation of the power system, with its advantages of zero pollution and low cost [1,2]. The global wind turbine capacity increased from 181 GW in 2010 to 1021 GW in 2023, and the total wind turbine capacity in China was 441 GW by the end of 2023, with an average annual growth rate of about 20% since 2012. However, with the increase in wind power penetration, due to the inherent uncertainty and intermittency of wind power generation, the impact and challenges to the power system will also increase, such as generation scheduling, spinning reserve, demand response and economic dispatch [3,4].

For the provincial power grid, the system operators pay more attention to the total output process of regional wind power in order to ensure the safe and stable operation of the power system and fully absorb clean energy power. For this reason, models to forecast regional wind power have received more and more research attention. Among them, probabilistic prediction can provide more comprehensive uncertainty information and has become an effective means to quantify the random intermittency of wind power [5]. The performance of probabilistic prediction results in the form of interval prediction [6,7,8] is more intuitive, and it has been widely applied to the power system [9,10,11]. Therefore, it is very imperative to carry out research on interval forecasting of provincial regional wind power (IFPRWP).

At present, there are two challenges to constructing an efficient IFPRWP model. The first one is the challenge of modeling massive objects. For region-level forecasting, historical power generation information from multiple power stations and corresponding high-dimensional meteorological information are usually involved. The collection of historical power from all power stations is difficult and time-consuming, possibly due to factors such as confidentiality agreements. At the same time, coupled with the multi-dimensional meteorological variables corresponding to dozens or even hundreds of wind power farms, it will be difficult to collect and process preliminary data and construct, train and validate so many individual wind farm forecasting models. The other one is the challenge of generating prediction intervals considering accuracy and sharpness. This requires efficiently capturing the complex high-dimensional non-linear mapping relationships between regional meteorological information, historical output and total regional wind power.

For the massive object modeling challenge, the main existing methods are Direct Aggregation (DA) and Statistical Uplifting (SU). DA predicts each power station in the region individually before directly overlaying them. It is rarely used because it suffers from high computational costs and the accuracy is affected by individual power plants in the region. SU usually selects one or more representative wind farms based on the correlation with the regional total power and extrapolates the regional total power prediction results from the prediction results of the representative wind farms. Its extrapolation methods include the use of installed capacity proportionality relationships [12,13] and deep learning techniques [14,15]. Although SU only needs to model representative power stations, in essence, it is consistent with DA in that they are both bottom-up modeling approaches, i.e., gradually extrapolating from the prediction of individual power stations to the overall prediction of the region. They require a lot of work in data collection, processing, validation, etc., even if the data just cannot be completed at the collection stage due to reasons such as confidentiality agreements.

In addition to DA and SU, there are some studies that attempted to directly build models and directly predict the total regional power generation [16,17]. The idea is taking the meteorological and power information from all the power stations and inputting it as a vector to directly predict the total power output of the region. However, when the scope of the region and the number of power stations are expanded, the order of magnitude of the relevant features will climb dramatically, and the heavy data pressure is still unavoidable. In recent years, with the wide application of convolutional neural networks (CNNs) in the field of image recognition, the use of features in the form of images as inputs to the prediction model has also gradually become an efficient way of prediction [16,18,19]. Yildiz et al. [20] used decomposed feature rearrangement to form an RGB image as an input to a modified residual-based convolutional neural network for ultra-short-term wind power prediction, and Xu et al. [21] constructed a hybrid LSTM-InformerStack model for fine multi-step irradiance forecasting based on all-sky images. Wang et al. [22] spliced features of different time scales as inputs to a convolution kernel, while later utilizing multilayer convolution for probabilistic power system load forecasting. However, the objects of the above studies are singular, and if the image features are generated according to their methods to be applied to IFRWPS, it will not be possible to consider the spatial correlation features of wind power in the region, which is very critical for IFRWPS. Therefore, there is a need to design simpler and more efficient mechanisms for generating images of spatial features of considered power stations, as well as to develop new regional prediction frameworks using holistic modeling ideas.

For the challenge of generating prediction probabilistic intervals considering accuracy and sharpness, the current research methods are mainly categorized into parametric and nonparametric methods. Parametric methods are usually based on probability distributional assumptions, such as Gaussian [23] and beta [24,25]. The process of parametric methods is usually multi-stage and distributional assumptions errors are inevitable. So many scholars have developed nonparametric prediction frameworks that directly obtain interval prediction results. Wan et al. [26] converted the mapping relationship between historical power and predicted power quartiles into a linear optimization model based on Extreme Learning Machine to directly generate different quartiles. On this basis, Zhang et al. [11] constructed an Extreme Learning Machine-based multi-objective optimization problem to directly generate day-ahead tariff intervals that balance reliability and acuity requirements. However, the above studies only considered temporal features and relied on the specific single-layer structure of Extreme Learning Machine, and the strategy cannot be efficiently extended to other deep learning frameworks. Huang et al. [27] first classified the 48 PV power plants in the region into different regional weather patterns, and then performed quantile regression analysis based on the regional weather patterns in order to predict the seasonal power generation at the regional level. But this study only focuses on the spatial meteorological distribution characteristics of PV power plants and ignores the temporal characteristics of regional power.

Based on the above discussion, this paper proposes a lightweight probabilistic forecasting method for wind power in provincial areas considering spatio-temporal features. The main contributions of the work in this paper are as follows:

(1): Based on the fusion mechanism of geographic and meteorological information, redundant power station meteorological features are eliminated and meteorological feature images are generated, which improves the attention of the forecasting model. Meanwhile, considering the smoothing effect, the aggregated historical power is used as model input together with the generated meteorological images.
(2): In order to consider the temporal features of regional wind power generation and the spatial meteorological features of distributed power stations in the region, this paper designs a prediction network architecture with CNN and LSTM in parallel. The upper layer of this architecture extracts the spatial meteorological features of the image through a CNN module incorporating the ECA attention mechanism, and the lower layer extracts the time-series features of the historical power through LSTM.
(3): Based on the original quantile loss function, a new loss function is constructed by adding penalty coefficients for interval prediction, and the function can be flexibly combined with various complex network architectures to improve the prediction performance of the model effectively.

The rest of the paper is organized as follows: Section 2 focuses on the proposed forecasting framework, and Section 3 is the case study section. Section 4 is the results discussion section, which validates the effectiveness of the proposed methodology. Finally, Section 5 is the conclusion.

2. Proposed Method

The overall forecasting framework of this paper is shown in Figure 1. The provincial regional meteorological variables we downloaded from the European Centre for Medium-Range Weather Forecasts (ECWMF, https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview accessed on 30 January 2025) are reconstructed into feature images by an image generation method and input into the upper layer CNN network. The hourly scale historical sequences of regional wind power obtained from the grid dispatch agency are input into the lower layer LSTM network as one-dimensional vectors. Then, an improved quantile loss function is used for training to directly predict the two quantile values corresponding to the intervals.

The architecture consists of 3 convolutional layers, 2 pooling layers, 2 fully connected layers, 1 feature fusion layer, and an ECA layer. In this case, the convolutional layer uses the RELU activation function and the fully connected layer uses the Sigmoid activation function. The ECA module is placed after the convolutional layer, and its allocation of channel attention can effectively enhance the learning ability of the network.

2.1. Feature Image Generation Based on Geographic and Meteorological Information

The process of feature image generation is shown in Figure 2. The method is based on the original spatial distribution of power stations in the provincial region, and the image features are generated by quickly scanning and filling the regional source meteorological files, which effectively amplifies the relevant features of the power stations in the region.

Since weather information is gridded and presents consistency over a narrow range, all wind power stations are first clustered by nearest neighbor partitioning, i.e., each station is clustered at the nearest meteorological grid point. The process of clustering is as follows.

For wind power stations in the region, the Euclidean distance to the meteorological grid

g_{i}

points is calculated:

L_{p} (w i n d_{i}, g_{i}) = {[{(w i n d_{i, l o n g}, - g_{i, l o n g})}^{2} + {(w i n d_{i, l a t}, - g_{i, l a t})}^{2}]}^{\frac{1}{2}}

(1)

where

w i n d_{i, l o n g}, w i n d_{i, l a t}

denote the latitude and longitude of the location of the wind power station

w i n d_{i}

, and

g_{i, l o n g}, g_{i, l a t}

denote the latitude and longitude of the meteorological grid point

g_{i}

, respectively.

Based on the above calculation process, the distance between the wind power station and all the grid points in the meteorological grid point set can be obtained, and the following formula is used to determine to which grid point the wind power station belongs.

w i n d_{i} \in \underset{\min L_{p} (w_{i}, g)}{g} (g_{i} \in G)

(2)

The spatial distribution matrix

D^{m \times n}

can be obtained after all the power stations have been divided to the corresponding grid points:

D^{m \times n} = (\begin{matrix} D_{m 1} & \dots & D_{m n} \\ ⋮ & ⋱ & ⋮ \\ D_{11} & \dots & D_{1 n} \end{matrix})

(3)

where

m, n

represent the included latitude and longitude grid points, respectively (e.g., the study area spans longitudes of

103^{°} E ~ 110^{°} E

,

24^{°} N ~ 29^{°} N

,with a resolution of

{0.25}^{°}

, in order to ensure that the meteorological grid points cover the study area,

m = 21, n = 29

). It is worth noting that here the elements of the matrix

D^{m \times n}

and the meteorological grid points correspond to each other. The values of the elements of the matrix

D^{m \times n}

can be determined from Equation (2), and if Equation (2) holds for the meteorological grid point corresponding to

D^{m \times n}

, then its value is 1; otherwise, it is 0.

Next, the corresponding regional meteorological data are downloaded from the ECWMF. As an example, the wind speed is in the form of the following:

M_{u 100}^{m \times n} = (\begin{matrix} M_{m 1} & \dots & M_{m n} \\ ⋮ & ⋱ & ⋮ \\ M_{11} & \dots & M_{1 n} \end{matrix})

(4)

The final generated image can be obtained by performing the Hadamard product of the spatial distribution matrix

D^{m \times n}

with the wind speed meteorological matrix

M_{u 100}^{m \times n}

.

Γ^{m \times n} = D^{m \times n} ⊙ M_{u 100}^{m \times n} = (\begin{matrix} D_{m 1} \times M_{m 1} & \dots & D_{m n} \times M_{m n} \\ ⋮ & ⋱ & ⋮ \\ D_{11} \times M_{11} & \dots & D_{1 n} \times M_{1 n} \end{matrix}) = (\begin{matrix} Γ_{m 1} & \dots & Γ_{m n} \\ ⋮ & ⋱ & ⋮ \\ Γ_{11} & \dots & Γ_{1 n} \end{matrix})

(5)

This assignment process is only for pixel points with wind power stations that can essentially be interpreted as an amplification of the feature of concern. For each meteorological variable, we repeat the above process to obtain its corresponding meteorological image, and stack multiple meteorological images to form a multi-channel meteorological image as the input to the convolution kernel.

2.2. Extraction of Spatial Distribution Feature

CNN obtains effective image feature information with sparse connectivity and shared weights by co-operating the convolutional and pooling layers. The convolution process can be described as follows: Q convolution kernels scan the input weather image with a fixed step size to carry out the convolution operation, and then add the corresponding bias vectors

b^{c o n v}

, after the activation function RELU, and finally output to obtain Q feature maps. To ensure that edge information is not lost, edge filling techniques are generally adopted. The output after the convolutional layer can be expressed in the following equation:

a^{conv} = R E L U (b^{c o n v} + \sum_{i \in M} m_{i} * q)

(6)

where

q

is the convolution kernel whose number of channels is consistent with the number of channels in the input image, and

m_{i}

is the local image corresponding to the size of the convolution kernel in the original input image.

After the convolutional layer, the output of the convolutional layer is then down-sampled by the pooling layer.

a^{p o o l} = f_{d o w n} (a^{c o n v}, m^{p o o l})

(7)

Here,

f_{d o w n} ()

is the pooling function, which usually includes average pooling and maximum pooling.

m^{p o o l}

denotes the pooling block.

However, not all meteorological information has a beneficial effect on the prediction of wind power, and redundant features may degrade the performance of the model while increasing the computational pressure. To this end, we introduce the ECA mechanism [28], which effectively realizes the information interaction between channels, overcomes the contradiction of the performance and complexity trade-offs, involves only a small number of parameters, and at the same time brings significant performance gains. The structure of ECA is shown in Figure 3.

The ECA module is efficient in two main ways. One is that it only considers the interaction between channel

l_{i}

and

k

neighboring channels, and its weights are calculated as follows:

ω_{i} = σ (\sum_{j = 1}^{k} ω_{i}^{j} l_{i}^{j}), l \in Ω_{i}^{k}

(8)

where

Ω_{i}^{k}

denotes the set of

k

neighboring channels of

l_{i}

.

The second one is that it makes all channels share the same learning parameter

w

by fast 1D convolution with

k

convolutional kernels, which greatly improves the efficiency and can be expressed as follows:

ω = σ (C 1 D_{k} (y))

(9)

where

C 1 D

denotes

1 D

convolution. The module involves only

k

parameters.

At this time, there is also a problem: the range of cross-channel interactions (i.e., the kernel size of the convolution) needs to be determined.

The coverage of the interaction is considered reasonable in proportion to the channel dimension

C

. For this purpose, a non-linear mapping is introduced:

C = ϕ (k) = 2^{(γ * k - b)}

(10)

Then, given the channel dimension

C

, the kernel size

k

can be adaptively determined by the following equation:

k = φ (C) = {|\frac{\log_{2} (C)}{γ} + \frac{b}{γ}|}_{o d d}

(11)

{|t|}_{o d d}

denotes the nearest odd number of

t

, and

γ, b

are to be set to 2 and 1, respectively.

2.3. Extraction of Temporal Feature

In addition to meteorological factors, the historical wind power generation is also a key factor influencing the daily power to be predicted. Taking into account regional smoothing effects, we aggregate the historical output of power stations in the region into a single series. This is usually easily available in dispatch institutions, thus avoiding the cumbersome process of collecting data from power stations.

LSTM is a great solution to the problem of the long-term dependence of recurrent neural networks in the process of training [29,30], which is commonly used to extract the features of time series. Here, we use LSTM to extract the temporal features of the aggregated historical power.

2.4. Loss Function

The essence of the neural network regression problem is an optimization problem on a training set, where the decision variables are the parameters of the neural network, and for deterministic prediction, the loss function can take the following form:

\min \frac{1}{N} \sum_{i = 1}^{N} {(y_{p} - y_{i})}^{2}

(12)

y_{p}

is the predicted power value, and

y_{i}

is the observed actual power value. For probability interval prediction, the interval

[q_{i}^{\underline{α}}, q_{i}^{\bar{α}}]

can be constructed using the quantile prediction values

q_{i}^{\bar{α}}, q_{i}^{\underline{α}}

, which satisfy the following relationship:

P (y_{i} \in [q_{i}^{\underline{α}}, q_{i}^{\bar{α}}]) = 1 - β (β \in [0, 1])

(13)

\underline{α} = 1 - \bar{α} = \frac{β}{2}

(14)

where

\bar{α}, \underline{α}

are the upper and lower quartile ratios of the prediction intervals, respectively, and

(1 - β)

denotes the confidence level of the corresponding interval. The form of the corresponding optimized objective function is as follows:

\min \sum_{i = 1}^{N} [ρ_{\bar{α}} (y_{i} - q_{i}^{\bar{α}}) + ρ_{\underline{α}} (y_{i} - q_{i}^{\underline{α}})]

(15)

ρ

denotes the pinball loss function. The formula for the pinball loss function is shown below.

ρ_{α} (τ) = \{\begin{cases} α τ (τ \geq 0) \\ (α - 1) τ (τ < 0) \end{cases}

(16)

Based on the above two equations, Equation (15) can then be expanded as follows:

\{\begin{cases} \min \sum_{i = 1}^{N} [(1 - \bar{α}) |y_{i} - q_{i}^{\bar{α}}| + \underline{α} |y_{i} - q_{i}^{\underline{α}}|] (q_{i}^{\underline{α}} < y_{i} < q_{i}^{\bar{α}}) \\ \min \sum_{i = 1}^{N} [\bar{α} |y_{i} - q_{i}^{\bar{α}}| + \underline{α} |y_{i} - q_{i}^{\underline{α}}|] (y_{i} > q_{i}^{\bar{α}}) \\ \min \sum_{i = 1}^{N} [(1 - \bar{α}) |y_{i} - q_{i}^{\bar{α}}| + (1 - \underline{α}) |y_{i} - q_{i}^{\underline{α}}|] (y_{i} < q_{i}^{\underline{α}}) \\ \min \sum_{i = 1}^{N} [\bar{α} |y_{i} - q_{i}^{\bar{α}}| + (1 - \underline{α}) |y_{i} - q_{i}^{\underline{α}}|] (q_{i}^{\bar{α}} < y_{i} < q_{i}^{\underline{α}}) \end{cases}

(17)

Typically, the lower bound corresponds to a quantile

\underline{α}

less than 0.5 and the upper bound corresponds to a quantile

\bar{α}

greater than 0.5, which means that

1 - \underline{α} > \underline{α}, 1 - \bar{α} < \bar{α}

is true. And the optimization problem is a minimum, which makes the problem optimize in the direction of smaller weights, i.e., the objective function tends to the first case.

As stated above, the true value should satisfy the inequality

q_{i}^{\underline{α}} < y_{i} < q_{i}^{\bar{α}}

. However, the actual predicted values will not fall exactly within the corresponding intervals, and we introduce penalty coefficients to widen the gap between the weights for this reason. The improved objective function is expanded in the following form:

\{\begin{cases} \min \sum_{i = 1}^{N} [(1 - \bar{α}) |y_{i} - q_{i}^{\bar{α}}| + \underline{α} |y_{i} - q_{i}^{\underline{α}}|] (q_{i}^{\underline{α}} < y_{i} < q_{i}^{\bar{α}}) \\ \min \sum_{i = 1}^{N} [λ \bar{α} |y_{i} - q_{i}^{\bar{α}}| + \underline{α} |y_{i} - q_{i}^{\underline{α}}|] (y_{i} > q_{i}^{\bar{α}}) \\ \min \sum_{i = 1}^{N} [(1 - \bar{α}) |y_{i} - q_{i}^{\bar{α}}| + λ (1 - \underline{α}) |y_{i} - q_{i}^{\underline{α}}|] (y_{i} < q_{i}^{\underline{α}}) \\ \min \sum_{i = 1}^{N} [λ \bar{α} |y_{i} - q_{i}^{\bar{α}}| + λ (1 - \underline{α}) |y_{i} - q_{i}^{\underline{α}}|] (q_{i}^{\bar{α}} < y_{i} < q_{i}^{\underline{α}}) \end{cases}

(18)

The addition of the penalty coefficient

λ

significantly increases the value of the objective function when the optimized objective function is taken for the second, third and fourth cases, which forces the objective to be effectively optimized towards

q_{i}^{\underline{α}} < y_{i} < q_{i}^{\bar{α}}

.

3. Case Study

3.1. Study Area

This paper investigates wind power prediction for a provincial grid in southern China, which contains a total of 75 wind farms in the region. As shown in Figure 4, meteorological data were obtained based on publicly available datasets from the ECMWF. Referring to related studies [31,32,33], 12 meteorological datapoints were downloaded for IFPRWP for the whole year of 2020. Figure 5 shows the various meteorological images generated at a given point in time.

3.2. Model Setting

We use 80% of the original data as a training set and 20% as a test set. Normalization can scale the features of different magnitudes to between 0 and 1, which is necessary for the process of network training. In this paper, the maximum–minimum normalization processing strategy is adopted, and its formula is as follows:

Z = \frac{X - X_{\min}}{X_{\max} - X_{\min}}

(19)

where

X, X_{m a x}, X_{\min}

are the original eigenvalues and maximum and minimum values of the original eigenvalues, respectively.

In the upper layer of the designed network, the constructed image features of size 21 × 29 × 12 are normalized and fed into the convolutional layer. The convolutional layers have three layers, all of which contain 32 filters, and their corresponding convolutional kernel size is 3 × 3. The ECA attention mechanism module embedded in the convolutional layer acts on the channel dimension of the input image, which can be commonly understood as a selection process for weather features. In the lower layer of the designed network, the historical power is fed into the LSTM network in the form of a vector with a corresponding time step parameter of 96, which means that the historical power information of the previous four days will be taken into account for each training. The features extracted from the upper and lower layers are spliced into one-dimensional vectors and fed into the fully connected layer, which consists of two layers with output dimensions of 128 × 1 and 64 × 1. A dropout mechanism is added after each fully connected layer to prevent overfitting. Meanwhile, the Adam optimizer acts on the training process along with the loss function proposed in Equation (18).

The hyperparameters of the above neural network are obtained using a simple and practical grid search method. The experimental platform of this paper is based on TensorFlow 2.10.0 with Python 3.9. The NVIDIA GeForce RTX 4050 Laptop GPU of a personal laptop was used for the computation.

3.3. Evaluation Metrics

The interval prediction evaluation metrics used are prediction interval coverage probability (PICP), prediction interval average width (PIAW) and Winkler score (WS), and the expressions for PICP and PIAW are as follows.

P I C P = 100 \times \frac{1}{N} \sum_{i = 1}^{N} C_{i}

(20)

C_{i} = \{\begin{cases} 1 y_{i} \in [q_{i}^{\underline{α}}, q_{i}^{\bar{α}}] \\ 0 y_{i} \notin [q_{i}^{\underline{α}}, q_{i}^{\bar{α}}] \end{cases}

(21)

η_{i}^{α} = q_{i}^{\bar{α}} - q_{i}^{\underline{α}}

(22)

P I A W = \frac{1}{N} \sum_{i = 1}^{N} η_{i}^{α}

(23)

N

is the total number of samples in the test set. Equation (21) is used to determine if the true value falls within the probability prediction interval,

C_{i}

has two values of 0 and 1, the judgment is as in Equation (21), and

η_{i}^{α}

denotes the interval width corresponding to the sample

i

.

The WS is calculated as follows:

\begin{array}{l} W S_{i}^{α} & = - 2 α η_{i}^{α} \\ - 4 [q_{i}^{\underline{α}} - y_{i}] \times I \{y_{i} < q_{i}^{\underline{α}}\} \\ - 4 [y_{i} - q_{i}^{\bar{α}}] \times I \{y_{i} > q_{i}^{\bar{α}}\} \end{array}

(24)

{WS}^{α} = \frac{1}{N} \sum_{i = 1}^{N} {WS}_{i}^{α}

(25)

where

I \{\cdot\}

is the indicator function, which is equal to 1 when the condition in curly brackets is valid, and 0 otherwise.

η^{α} (x_{i})

denotes the interval width. For the evaluation indicator WS, a larger value indicates better prediction performance.

The deterministic prediction evaluation metrics include the root mean square error (RMSE), the mean absolute error (MAE) and the coefficient of determination R².

4. Discussion

In this section, the proposed model is first compared with the benchmark model to demonstrate its excellent performance in both interval prediction and deterministic prediction. In addition, experiments on the effectiveness analysis of the loss function and the sensitivity analysis of the penalty coefficients are conducted and the effectiveness of the ECA module is verified.

4.1. Interval Prediction Results

In order to verify the excellent performance of the proposed interval prediction model, it is compared with other models. The designed comparison models include the commonly used models for time series, TCN [34], GRU [35] and ANN [36], the parametric method interval prediction model BELM [26] and CNN, LSTM [37] and QR-LIFF. BELM, i.e., ELM based on the bootstrap method, generates prediction intervals based on the assumption of normal distribution. The inputs to the benchmark model are all in vector form. QR-LIFF, the lightweight interval forecasting framework based on quantile regression, adopts the original interval prediction loss function, and IQR-LIFF is the proposed lightweight interval forecasting framework with improved quantile regression. Considering that the recursive strategy leads to error accumulation and the direct prediction method is more stable and has better performance [38], the direct prediction method is used for day-ahead prediction. A prediction interval nominal confidence (PINC) of 90% is constructed by the quantile interval of

[0.05, 0.95]

, and similarly, a PINC of 80% and 70% is constructed by the quantile intervals of

[0.1, 0.9]

and

[0.15, 0.85]

, respectively, and the comparison results are shown in the table below.

From the Table 1, it is clear that the interval generation method proposed in this paper has the best performance. The performance of ANN, TCN, GRU, and LSTM is similar, and the TCN model has a PICP value of 85.749% at a nominal coverage of 90%, which is higher than the proposed IQR-LIFF model. However, in terms of the PIAW metric, TCN is 33.2% higher than IQR-LIFF, which is obvious in Figure 6. It sacrifices model sharpness for reliability, and the TCN model is still worse than the IQR-LIFF model in terms of the comprehensive index WS.

As can be seen in Figure 6, the parametric method BELM already lacks a clear boundary between the upper and lower bounds of the prediction interval. It has a more aggressive interval and lacks reliability. The CNN model is the most radical, having the smallest PIAW. The sharpness of the CNN model and the QR-LIFF model are similar, but the QR-LIFF model improves the coverage performance by about 30% without changing the model sharpness.

The IQR-LIFF model proposed in this paper demonstrates good coverage and high sharpness in Figure 6. This indicates that the designed parallel framework of CNN and LSTM effectively extracts spatial meteorological features and temporal features, which in turn improves the accuracy of interval prediction.

4.2. Validation of the Validity of the Loss Function

Combined with Table 1, it can be seen that in terms of the composite indicator WS, despite the better performance of the QR-LIFF model, the PICP at all levels deviates from the PINC. It can also be seen from Figure 7 that the true values are beyond the coverage of the prediction intervals of the QR-LIFF model at times of power rise and sharp fall. And the intervals predicted by the IQR-LIFF model cover the fluctuation range of the true value better in all time periods. The proposed model still shows excellent performance in a long period of a low-wind-power scenario from 5 October to 9 October. From the above comparisons, it can be seen that after training with the loss function proposed in this paper, the interval widths are nearly doubled when the PINC is 90%, 80%, and 70%, respectively. However, the coverage and performance metrics are substantially improved, which fully demonstrates the effectiveness of the proposed loss function.

4.3. Deterministic Prediction Results

To demonstrate that the designed network also performs extremely well for deterministic prediction, the loss function is replaced with the loss function corresponding to deterministic prediction. In addition to the above interval prediction models, the persistence method is added as the base predictor, which assumes that the predicted value is equal to the most recent actual observation, and the day-ahead prediction errors for each model are shown in Table 2 below, with the prediction curves for each model shown in Figure 8.

The model in this paper has the highest accuracy in terms of both RMSE and MAE. The persistence approach is the worst, as shown in Figure 8, where it deviates significantly from the true value, suggesting that it has limited application on a short-term scale. The GRU, ELM, ANN and LSTM models have comparable accuracies, and CNN shows excellent performance, which indicates that there is a strong correlation between meteorological feature images and the total regional power, and nice prediction results can be achieved by the mapping relationship between the two established by CNN. Meanwhile, after considering the historical power (i.e., the framework proposed in this paper), the RMSE is reduced by 19.4% and the MAE is reduced by 22.8%.

4.4. Sensitivity Analysis of Penalty Coefficients

Theoretically, the penalty coefficients can tend to infinity, but in practice, we found that an overly large penalty factor will reduce the computational efficiency, and there is even an overfitting situation, so it is necessary to find the optimization in a suitable range. In the case of one-hour-ahead interval prediction, for example, the other parameters of the model are maintained unchanged, and only the penalty coefficients are varied to choose the appropriate values based on their performance on the test set. The other model parameters were selected in a similar way.

As can be seen in Figure 9, the PICP values show an overall increasing trend in the first half of the period as the p-value increases, and a smooth trend in the second half of the period. The 90% confidence intervals, 80% confidence intervals and 70% confidence intervals achieved great values of PICP at 2.5, 3.0 and 4.0, respectively, and the corresponding values of the width of the intervals tended to increase, and for a PINC of 90%, the second half of the interval showed a surge, and, in fact, there was an overfitting situation. Therefore, continuing to increase the value of the penalty coefficients may lead to overfitting, while considering the sharpness of the model, the extremely large value here is considered optimal.

4.5. Effectiveness Analysis of the ECA Module

In order to validate the effectiveness of the ECA module, we conducted comparative experiments with different time step correlations on an interval prediction model with 90% confidence intervals; the experiments’ set up include the following:

(1): Without any attention mechanism module, denoted as Model 1.
(2): Addition of the Squeeze-and-Excitation Network [39] channel attention mechanism, which effectively captures the dependencies between all channels, denoted as Model 2.
(3): Add the ECA mechanism, denoted as Model 3.

In addition, all other parameters of the models are kept consistent, and the performance comparison of the models is shown in Table 3. Model 3 has a Winkler score of −0.009 at a prediction step size of t, which is better than Model 1 without an attentional mechanism and Model 2 with the addition of the SE attentional mechanism. In terms of computation time, Model 3 is shorter than Model 2, which becomes more apparent as the prediction time step increases, and also suggests that SE is inefficient and unnecessary in capturing the dependencies between all channels.

5. Conclusions

In this paper, we construct a lightweight IFPRWP model based on meteorological feature images and improved quantile regression to forecast provincial regional wind power fluctuation intervals. Firstly, the inputs of spatial meteorological distribution and temporal features of the model are constructed through image generation and power aggregation. On this basis, a parallel CNN-LSTM prediction architecture is designed, which can effectively extract temporal features and spatial meteorological distribution features. Then, an efficient ECA mechanism module is introduced and trained with an improved quantile loss function to directly generate prediction intervals. The effectiveness of this model is validated with actual data from a region containing 75 wind stations in Guizhou province, Southwest China. The results show that the model proposed in this paper improves the interval prediction performance by at least 12.3%, reduces the deterministic prediction RMSE by at least 19.4%, and reduces the MAE by 26.7% compared to the benchmark model. Meanwhile, the effectiveness of the proposed improved loss function is also verified in the comparative analysis, and it can effectively improve the quality of the prediction interval.

The proposed model has potential to be expanded to other renewable energy applications and scales. However, there are still improvements that need to be made to the model, firstly, how to achieve rapid updating of the model when new power stations are added, and secondly, experiments related to the model on datasets with low data quality or longer time horizons need to be carried out to optimize the model to enhance its robustness.

Author Contributions

G.L.: writing—review and editing, validation, resources and funding acquisition. C.L.: conceptualization, methodology, investigation, data curation and writing—original draft. Y.L.—data curation and visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by National Natural Science Foundation of China (Nos. 51879030).

Data Availability Statement

All the data can be obtained from the authors upon reasonable request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:

IFPRWP	Interval forecasting of provincial regional wind power;
CNN	Convolutional neural network
LSTM	Long short-term memory
ECA	Efficient channel attention
DA	Direct Aggregation
SU	Statistical Uplifting
QR-LIFF	Lightweight interval forecasting framework based on quantile regression
IQR-LIFF	lightweight interval forecasting framework with improved quantile regression

References

Yang, M.; Che, R.; Yu, X.; Su, X. Dual NWP wind speed correction based on trend fusion and fluctuation clustering and its application in short-term wind power prediction. Energy 2024, 302, 131802. [Google Scholar] [CrossRef]
Bouche, D.; Flamary, R.; D’alché-Buc, F.; Plougonven, R.; Clausel, M.; Badosa, J.; Drobinski, P. Wind power predictions from nowcasts to 4-hour forecasts: A learning approach with variable selection. Renew. Energy 2023, 211, 938–947. [Google Scholar] [CrossRef]
Yu, Y.; Han, X.; Yang, M.; Yang, J. Probabilistic Prediction of Regional Wind Power Based on Spatiotemporal Quantile Regression. IEEE Trans. Ind. Appl. 2020, 56, 6117–6127. [Google Scholar] [CrossRef]
Petersen, C.; Reguant, M.; Segura, L. Measuring the impact of wind power and intermittency. Energy Econ. 2024, 129, 107200. [Google Scholar] [CrossRef]
de Azevedo Takara, L.; Teixeira, A.C.; Yazdanpanah, H.; Mariani, V.C.; Dos Santos Coelho, L. Optimizing multi-step wind power forecasting: Integrating advanced deep neural networks with stacking-based probabilistic learning. Appl. Energy 2024, 369, 123487. [Google Scholar] [CrossRef]
Liu, Z.-F.; Liu, Y.-Y.; Chen, X.-R.; Zhang, S.-R.; Luo, X.-F.; Li, L.-L.; Yang, Y.-Z.; You, G.-D. A novel deep learning-based evolutionary model with potential attention and memory decay-enhancement strategy forshort-term wind power point-interval forecasting. Appl. Energy 2024, 360, 122785. [Google Scholar] [CrossRef]
Khodayar, M.; Wang, J.; Manthouri, M. Interval Deep Generative Neural Network for Wind Speed Forecasting. IEEE Trans. Smart Grid 2019, 10, 3974–3989. [Google Scholar] [CrossRef]
Zhang, C.; Fu, Y. Probabilistic Electricity Price Forecast with Optimal Prediction Interval. IEEE Trans. Power Syst. 2024, 39, 442–452. [Google Scholar] [CrossRef]
Attarha, A.; Amjady, N.; Dehghan, S.; Vatani, B. Adaptive Robust Self-Scheduling for a Wind Producer With Compressed Air Energy Storage. IEEE Trans. Sustain. Energy 2018, 9, 1659–1671. [Google Scholar] [CrossRef]
Qiu, H.; Gu, W.; Xu, Y.; Wu, Z.; Zhou, S.; Wang, J. Interval-Partitioned Uncertainty Constrained Robust Dispatch for AC/DC Hybrid Microgrids with Uncontrollable Renewable Generators. IEEE Trans. Smart Grid 2019, 10, 4603–4614. [Google Scholar] [CrossRef]
Zhang, Y.; Wen, H.; Wu, Q. A Contextual Bandit Approach for Value-oriented Prediction Interval Forecasting. IEEE Trans. Smart Grid 2024, 15, 2271–2281. [Google Scholar] [CrossRef]
Pierro, M.; De Felice, M.; Maggioni, E.; Moser, D.; Perotto, A.; Spada, F.; Cornaro, C. Data-driven upscaling methods for regional photovoltaic power estimation and forecast using satellite and numerical weather prediction data. Sol. Energy 2017, 158, 1026–1038. [Google Scholar] [CrossRef]
Lai, W.; Zhen, Z.; Wang, F.; Fu, W.; Wang, J.; Zhang, X.; Ren, H. Sub-region division based short-term regional distributed PV power forecasting method considering spatio-temporal correlations. Energy 2024, 288, 129716. [Google Scholar] [CrossRef]
Li, G.; Guo, S.; Li, X.; Cheng, C. Short-term Forecasting Approach Based on bidirectional long short-term memory and convolutional neural network for Regional Photovoltaic Power Plants. Sustain. Energy Grids Netw. 2023, 34, 101019. [Google Scholar] [CrossRef]
Pierro, M.; Gentili, D.; Liolli, F.R.; Cornaro, C.; Moser, D.; Betti, A.; Moschella, M.; Collino, E.; Ronzio, D.; van der Meer, D. Progress in regional PV power forecasting: A sensitivity analysis on the Italian case study. Renew. Energy 2022, 189, 983–996. [Google Scholar] [CrossRef]
Zhang, J.; Liu, D.; Li, Z.; Han, X.; Liu, H.; Dong, C.; Wang, J.; Liu, C.; Xia, Y. Power prediction of a wind farm cluster based on spatiotemporal correlations. Appl. Energy 2021, 302, 117568. [Google Scholar] [CrossRef]
Chen, W.; Zhou, H.; Cheng, L.; Xia, M. Prediction of regional wind power generation using a multi-objective optimized deep learning model with temporal pattern attention. Energy 2023, 278, 127942. [Google Scholar] [CrossRef]
Chen, Y.; Wang, Y.; Dong, Z.; Su, J.; Han, Z.; Zhou, D.; Zhao, Y.; Bao, Y. 2-D regional short-term wind speed forecast based on CNN-LSTM deep learning model. Energy Convers. Manag. 2021, 244, 114451. [Google Scholar] [CrossRef]
Zhu, X.; Liu, R.; Chen, Y.; Gao, X.; Wang, Y.; Xu, Z. Wind speed behaviors feather analysis and its utilization on wind speed prediction using 3D-CNN. Energy 2021, 236, 121523. [Google Scholar] [CrossRef]
Yildiz, C.; Acikgoz, H.; Korkmaz, D.; Budak, U. An improved residual-based convolutional neural network for very short-term wind power forecasting. Energy Convers. Manag. 2021, 228, 113731. [Google Scholar] [CrossRef]
Xu, S.; Liu, J.; Huang, X.; Li, C.; Chen, Z.; Tai, Y. Minutely multi-step irradiance forecasting based on all-sky images using LSTM-InformerStack hybrid model with dual feature enhancement. Renew. Energy 2024, 224, 120135. [Google Scholar] [CrossRef]
Wang, J.; Wang, K.; Li, Z.; Lu, H.; Jiang, H.; Xing, Q. A Multitask Integrated Deep-Learning Probabilistic Prediction for Load Forecasting. IEEE Trans. Power Syst. 2024, 39, 1240–1250. [Google Scholar] [CrossRef]
Tahmasebifar, R.; Moghaddam, M.P.; Sheikh-El-Eslami, M.K.; Kheirollahi, R. A new hybrid model for point and probabilistic forecasting of wind power. Energy 2020, 211, 119016. [Google Scholar] [CrossRef]
Zhang, H.; Liu, Y.; Yan, J.; Han, S.; Li, L.; Long, Q. Improved Deep Mixture Density Network for Regional Wind Power Probabilistic Forecasting. IEEE Trans. Power Syst. 2020, 35, 2549–2560. [Google Scholar] [CrossRef]
Fernandez-Jimenez, L.A.; Monteiro, C.; Ramirez-Rosado, I.J. Short-term probabilistic forecasting models using Beta distributions for photovoltaic plants. Energy Rep. 2023, 9, 495–502. [Google Scholar] [CrossRef]
Wan, C.; Xu, Z.; Pinson, P.; Dong, Z.Y.; Wong, K.P. Probabilistic Forecasting of Wind Power Generation Using Extreme Learning Machine. IEEE Trans. Power Syst. 2014, 29, 1033–1044. [Google Scholar] [CrossRef]
Huang, H.; Huang, Y. Probabilistic forecasting of regional solar power incorporating weather pattern diversity. Energy Rep. 2024, 11, 1711–1722. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar] [CrossRef]
Memarzadeh, G.; Keynia, F. A new short-term wind speed forecasting method based on fine-tuned LSTM neural network and optimal input sets. Energy Convers. Manag. 2020, 213, 112824. [Google Scholar] [CrossRef]
Rubasinghe, O.; Zhang, X.; Chau, T.K.; Chow, Y.; Fernando, T.; Iu, H.H. A Novel Sequence to Sequence Data Modelling Based CNN-LSTM Algorithm for Three Years Ahead Monthly Peak Load Forecasting. IEEE Trans. Power Syst. 2024, 39, 1–15. [Google Scholar] [CrossRef]
Zhang, J.; Cheng, C.; Yu, S. Recognizing the mapping relationship between wind power output and meteorological information at a province level by coupling GIS and CNN technologies. Appl. Energy 2024, 360, 122791. [Google Scholar] [CrossRef]
Baggio, R.; Muzy, J. Improving probabilistic wind speed forecasting using M-Rice distribution and spatial data integration. Appl. Energy 2024, 360, 122840. [Google Scholar] [CrossRef]
Lu, P.; Ye, L.; Pei, M.; Zhao, Y.; Dai, B.; Li, Z. Short-term wind power forecasting based on meteorological feature extraction and optimization strategy. Renew. Energy 2022, 184, 642–661. [Google Scholar] [CrossRef]
Gong, M.; Yan, C.; Xu, W.; Zhao, Z.; Li, W.; Liu, Y.; Li, S. Short-term wind power forecasting model based on temporal convolutional network and Informer. Energy 2023, 283, 129171. [Google Scholar] [CrossRef]
Mellit, A.; Pavan, A.M.; Lughi, V. Deep learning neural networks for short-term photovoltaic power forecasting. Renew. Energy 2021, 172, 276–288. [Google Scholar] [CrossRef]
Zhang, W.; Quan, H.; Gandhi, O.; Rajagopal, R.; Tan, C.; Srinivasan, D. Improving Probabilistic Load Forecasting Using Quantile Regression NN with Skip Connections. IEEE Trans. Smart Grid 2020, 11, 5442–5450. [Google Scholar] [CrossRef]
Tan, M.; Yuan, S.; Li, S.; Su, Y.; Li, H.; He, F. Ultra-Short-Term Industrial Power Demand Forecasting Using LSTM Based Hybrid Ensemble Learning. IEEE Trans. Power Syst. 2020, 35, 2937–2948. [Google Scholar] [CrossRef]
Yaghoubirad, M.; Azizi, N.; Farajollahi, M.; Ahmadi, A. Deep learning-based multistep ahead wind speed and power generation forecasting using direct method. Energy Convers. Manag. 2023, 281, 116760. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef]

Figure 1. Structure of the forecasting model.

Figure 2. Mechanism of image generation.

Figure 3. The structure of ECA.

Figure 4. Study area.

Figure 5. Various meteorological images generated at a given moment. (The deeper the color, the greater the value corresponding to the pair of quantities of this meteorological variable).

Figure 6. Distribution of predicted values between intervals.

Figure 7. Comparison of QR-LIFF model and IQR-LIFF continuous prediction.

Figure 8. Deterministic prediction results.

Figure 9. Sensitivity analysis of penalty coefficients.

Table 1. Performance comparison of interval prediction.

	PINC = 90%			PINC = 80%			PINC = 70%
	PICP	PIAW	WS	PICP	PIAW	WS	PICP	PIAW	WS
Units	%	MW	-	%	MW	-	%	MW	-
ANN	63.708	910.665	−0.168	50.725	660.062	−0.201	41.667	505.215	−0.220
TCN	85.749	1409.952	−0.081	73.671	1060.800	−0.125	62.077	843.245	−0.160
GRU	79.710	1330.917	−0.186	68.297	1010.449	−0.214	58.998	805.475	−0.228
BELM	57.230	695.074	−0.169	45.037	539.098	−0.198	35.907	435.163	−0.184
CNN	43.720	533.900	−0.182	33.575	400.578	−0.198	26.993	321.317	−0.206
LSTM	78.321	1450.466	−0.108	68.539	1108.649	−0.140	59.783	885.219	−0.162
QR-LIFF	58.092	545.270	−0.119	46.860	409.309	−0.137	38.949	325.702	−0.136
IQR-LIFF	81.159	941.462	−0.071	75.242	838.046	−0.086	72.162	784.275	−0.093

Table 2. Comparison of deterministic prediction performance.

	RMSE	MAE	R²
Persistence	0.111	0.080	21.872
ANN	0.092	0.067	46.193
TCN	0.083	0.061	56.123
GRU	0.077	0.057	61.981
ELM	0.086	0.068	52.999
CNN	0.072	0.057	66.709
LSTM	0.081	0.060	58.128
Proposed	0.058	0.044	78.621

Table 3. Analysis of the effectiveness of attentional mechanisms.

	1 h		8 h		16 h		24 h
	WS	Run Time	WS	Run Time	WS	Run Time	WS	Run Time
Model 1	−0.012	45.159	−0.094	46.326	−0.091	47.473	−0.087	45.454
Model 2	−0.011	48.094	−0.080	87.117	−0.090	101.025	−0.084	109.040
Model 3	−0.009	46.407	−0.063	75.088	−0.079	86.808	−0.082	87.783

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, G.; Lin, C.; Li, Y. Probabilistic Forecasting of Provincial Regional Wind Power Considering Spatio-Temporal Features. Energies 2025, 18, 652. https://doi.org/10.3390/en18030652

AMA Style

Li G, Lin C, Li Y. Probabilistic Forecasting of Provincial Regional Wind Power Considering Spatio-Temporal Features. Energies. 2025; 18(3):652. https://doi.org/10.3390/en18030652

Chicago/Turabian Style

Li, Gang, Chen Lin, and Yupeng Li. 2025. "Probabilistic Forecasting of Provincial Regional Wind Power Considering Spatio-Temporal Features" Energies 18, no. 3: 652. https://doi.org/10.3390/en18030652

APA Style

Li, G., Lin, C., & Li, Y. (2025). Probabilistic Forecasting of Provincial Regional Wind Power Considering Spatio-Temporal Features. Energies, 18(3), 652. https://doi.org/10.3390/en18030652

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Probabilistic Forecasting of Provincial Regional Wind Power Considering Spatio-Temporal Features

Abstract

1. Introduction

2. Proposed Method

2.1. Feature Image Generation Based on Geographic and Meteorological Information

2.2. Extraction of Spatial Distribution Feature

2.3. Extraction of Temporal Feature

2.4. Loss Function

3. Case Study

3.1. Study Area

3.2. Model Setting

3.3. Evaluation Metrics

4. Discussion

4.1. Interval Prediction Results

4.2. Validation of the Validity of the Loss Function

4.3. Deterministic Prediction Results

4.4. Sensitivity Analysis of Penalty Coefficients

4.5. Effectiveness Analysis of the ECA Module

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI