1. Introduction
As one of the renewable energy resources, wind power plays an important role in accelerating the green and low-carbon transformation of the power system, with its advantages of zero pollution and low cost [
1,
2]. The global wind turbine capacity increased from 181 GW in 2010 to 1021 GW in 2023, and the total wind turbine capacity in China was 441 GW by the end of 2023, with an average annual growth rate of about 20% since 2012. However, with the increase in wind power penetration, due to the inherent uncertainty and intermittency of wind power generation, the impact and challenges to the power system will also increase, such as generation scheduling, spinning reserve, demand response and economic dispatch [
3,
4].
For the provincial power grid, the system operators pay more attention to the total output process of regional wind power in order to ensure the safe and stable operation of the power system and fully absorb clean energy power. For this reason, models to forecast regional wind power have received more and more research attention. Among them, probabilistic prediction can provide more comprehensive uncertainty information and has become an effective means to quantify the random intermittency of wind power [
5]. The performance of probabilistic prediction results in the form of interval prediction [
6,
7,
8] is more intuitive, and it has been widely applied to the power system [
9,
10,
11]. Therefore, it is very imperative to carry out research on interval forecasting of provincial regional wind power (IFPRWP).
At present, there are two challenges to constructing an efficient IFPRWP model. The first one is the challenge of modeling massive objects. For region-level forecasting, historical power generation information from multiple power stations and corresponding high-dimensional meteorological information are usually involved. The collection of historical power from all power stations is difficult and time-consuming, possibly due to factors such as confidentiality agreements. At the same time, coupled with the multi-dimensional meteorological variables corresponding to dozens or even hundreds of wind power farms, it will be difficult to collect and process preliminary data and construct, train and validate so many individual wind farm forecasting models. The other one is the challenge of generating prediction intervals considering accuracy and sharpness. This requires efficiently capturing the complex high-dimensional non-linear mapping relationships between regional meteorological information, historical output and total regional wind power.
For the massive object modeling challenge, the main existing methods are Direct Aggregation (DA) and Statistical Uplifting (SU). DA predicts each power station in the region individually before directly overlaying them. It is rarely used because it suffers from high computational costs and the accuracy is affected by individual power plants in the region. SU usually selects one or more representative wind farms based on the correlation with the regional total power and extrapolates the regional total power prediction results from the prediction results of the representative wind farms. Its extrapolation methods include the use of installed capacity proportionality relationships [
12,
13] and deep learning techniques [
14,
15]. Although SU only needs to model representative power stations, in essence, it is consistent with DA in that they are both bottom-up modeling approaches, i.e., gradually extrapolating from the prediction of individual power stations to the overall prediction of the region. They require a lot of work in data collection, processing, validation, etc., even if the data just cannot be completed at the collection stage due to reasons such as confidentiality agreements.
In addition to DA and SU, there are some studies that attempted to directly build models and directly predict the total regional power generation [
16,
17]. The idea is taking the meteorological and power information from all the power stations and inputting it as a vector to directly predict the total power output of the region. However, when the scope of the region and the number of power stations are expanded, the order of magnitude of the relevant features will climb dramatically, and the heavy data pressure is still unavoidable. In recent years, with the wide application of convolutional neural networks (CNNs) in the field of image recognition, the use of features in the form of images as inputs to the prediction model has also gradually become an efficient way of prediction [
16,
18,
19]. Yildiz et al. [
20] used decomposed feature rearrangement to form an RGB image as an input to a modified residual-based convolutional neural network for ultra-short-term wind power prediction, and Xu et al. [
21] constructed a hybrid LSTM-InformerStack model for fine multi-step irradiance forecasting based on all-sky images. Wang et al. [
22] spliced features of different time scales as inputs to a convolution kernel, while later utilizing multilayer convolution for probabilistic power system load forecasting. However, the objects of the above studies are singular, and if the image features are generated according to their methods to be applied to IFRWPS, it will not be possible to consider the spatial correlation features of wind power in the region, which is very critical for IFRWPS. Therefore, there is a need to design simpler and more efficient mechanisms for generating images of spatial features of considered power stations, as well as to develop new regional prediction frameworks using holistic modeling ideas.
For the challenge of generating prediction probabilistic intervals considering accuracy and sharpness, the current research methods are mainly categorized into parametric and nonparametric methods. Parametric methods are usually based on probability distributional assumptions, such as Gaussian [
23] and beta [
24,
25]. The process of parametric methods is usually multi-stage and distributional assumptions errors are inevitable. So many scholars have developed nonparametric prediction frameworks that directly obtain interval prediction results. Wan et al. [
26] converted the mapping relationship between historical power and predicted power quartiles into a linear optimization model based on Extreme Learning Machine to directly generate different quartiles. On this basis, Zhang et al. [
11] constructed an Extreme Learning Machine-based multi-objective optimization problem to directly generate day-ahead tariff intervals that balance reliability and acuity requirements. However, the above studies only considered temporal features and relied on the specific single-layer structure of Extreme Learning Machine, and the strategy cannot be efficiently extended to other deep learning frameworks. Huang et al. [
27] first classified the 48 PV power plants in the region into different regional weather patterns, and then performed quantile regression analysis based on the regional weather patterns in order to predict the seasonal power generation at the regional level. But this study only focuses on the spatial meteorological distribution characteristics of PV power plants and ignores the temporal characteristics of regional power.
Based on the above discussion, this paper proposes a lightweight probabilistic forecasting method for wind power in provincial areas considering spatio-temporal features. The main contributions of the work in this paper are as follows:
- (1)
Based on the fusion mechanism of geographic and meteorological information, redundant power station meteorological features are eliminated and meteorological feature images are generated, which improves the attention of the forecasting model. Meanwhile, considering the smoothing effect, the aggregated historical power is used as model input together with the generated meteorological images.
- (2)
In order to consider the temporal features of regional wind power generation and the spatial meteorological features of distributed power stations in the region, this paper designs a prediction network architecture with CNN and LSTM in parallel. The upper layer of this architecture extracts the spatial meteorological features of the image through a CNN module incorporating the ECA attention mechanism, and the lower layer extracts the time-series features of the historical power through LSTM.
- (3)
Based on the original quantile loss function, a new loss function is constructed by adding penalty coefficients for interval prediction, and the function can be flexibly combined with various complex network architectures to improve the prediction performance of the model effectively.
The rest of the paper is organized as follows:
Section 2 focuses on the proposed forecasting framework, and
Section 3 is the case study section.
Section 4 is the results discussion section, which validates the effectiveness of the proposed methodology. Finally,
Section 5 is the conclusion.
2. Proposed Method
The overall forecasting framework of this paper is shown in
Figure 1. The provincial regional meteorological variables we downloaded from the European Centre for Medium-Range Weather Forecasts (ECWMF,
https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview accessed on 30 January 2025) are reconstructed into feature images by an image generation method and input into the upper layer CNN network. The hourly scale historical sequences of regional wind power obtained from the grid dispatch agency are input into the lower layer LSTM network as one-dimensional vectors. Then, an improved quantile loss function is used for training to directly predict the two quantile values corresponding to the intervals.
The architecture consists of 3 convolutional layers, 2 pooling layers, 2 fully connected layers, 1 feature fusion layer, and an ECA layer. In this case, the convolutional layer uses the RELU activation function and the fully connected layer uses the Sigmoid activation function. The ECA module is placed after the convolutional layer, and its allocation of channel attention can effectively enhance the learning ability of the network.
2.1. Feature Image Generation Based on Geographic and Meteorological Information
The process of feature image generation is shown in
Figure 2. The method is based on the original spatial distribution of power stations in the provincial region, and the image features are generated by quickly scanning and filling the regional source meteorological files, which effectively amplifies the relevant features of the power stations in the region.
Since weather information is gridded and presents consistency over a narrow range, all wind power stations are first clustered by nearest neighbor partitioning, i.e., each station is clustered at the nearest meteorological grid point. The process of clustering is as follows.
For wind power stations in the region, the Euclidean distance to the meteorological grid
points is calculated:
where
denote the latitude and longitude of the location of the wind power station
, and
denote the latitude and longitude of the meteorological grid point
, respectively.
Based on the above calculation process, the distance between the wind power station and all the grid points in the meteorological grid point set can be obtained, and the following formula is used to determine to which grid point the wind power station belongs.
The spatial distribution matrix
can be obtained after all the power stations have been divided to the corresponding grid points:
where
represent the included latitude and longitude grid points, respectively (e.g., the study area spans longitudes of
,
,with a resolution of
, in order to ensure that the meteorological grid points cover the study area,
). It is worth noting that here the elements of the matrix
and the meteorological grid points correspond to each other. The values of the elements of the matrix
can be determined from Equation (2), and if Equation (2) holds for the meteorological grid point corresponding to
, then its value is 1; otherwise, it is 0.
Next, the corresponding regional meteorological data are downloaded from the ECWMF. As an example, the wind speed is in the form of the following:
The final generated image can be obtained by performing the Hadamard product of the spatial distribution matrix
with the wind speed meteorological matrix
.
This assignment process is only for pixel points with wind power stations that can essentially be interpreted as an amplification of the feature of concern. For each meteorological variable, we repeat the above process to obtain its corresponding meteorological image, and stack multiple meteorological images to form a multi-channel meteorological image as the input to the convolution kernel.
2.2. Extraction of Spatial Distribution Feature
CNN obtains effective image feature information with sparse connectivity and shared weights by co-operating the convolutional and pooling layers. The convolution process can be described as follows: Q convolution kernels scan the input weather image with a fixed step size to carry out the convolution operation, and then add the corresponding bias vectors
, after the activation function RELU, and finally output to obtain Q feature maps. To ensure that edge information is not lost, edge filling techniques are generally adopted. The output after the convolutional layer can be expressed in the following equation:
where
is the convolution kernel whose number of channels is consistent with the number of channels in the input image, and
is the local image corresponding to the size of the convolution kernel in the original input image.
After the convolutional layer, the output of the convolutional layer is then down-sampled by the pooling layer.
Here, is the pooling function, which usually includes average pooling and maximum pooling. denotes the pooling block.
However, not all meteorological information has a beneficial effect on the prediction of wind power, and redundant features may degrade the performance of the model while increasing the computational pressure. To this end, we introduce the ECA mechanism [
28], which effectively realizes the information interaction between channels, overcomes the contradiction of the performance and complexity trade-offs, involves only a small number of parameters, and at the same time brings significant performance gains. The structure of ECA is shown in
Figure 3.
The ECA module is efficient in two main ways. One is that it only considers the interaction between channel
and
neighboring channels, and its weights are calculated as follows:
where
denotes the set of
neighboring channels of
.
The second one is that it makes all channels share the same learning parameter
by fast 1D convolution with
convolutional kernels, which greatly improves the efficiency and can be expressed as follows:
where
denotes
convolution. The module involves only
parameters.
At this time, there is also a problem: the range of cross-channel interactions (i.e., the kernel size of the convolution) needs to be determined.
The coverage of the interaction is considered reasonable in proportion to the channel dimension
. For this purpose, a non-linear mapping is introduced:
Then, given the channel dimension
, the kernel size
can be adaptively determined by the following equation:
denotes the nearest odd number of , and are to be set to 2 and 1, respectively.
2.3. Extraction of Temporal Feature
In addition to meteorological factors, the historical wind power generation is also a key factor influencing the daily power to be predicted. Taking into account regional smoothing effects, we aggregate the historical output of power stations in the region into a single series. This is usually easily available in dispatch institutions, thus avoiding the cumbersome process of collecting data from power stations.
LSTM is a great solution to the problem of the long-term dependence of recurrent neural networks in the process of training [
29,
30], which is commonly used to extract the features of time series. Here, we use LSTM to extract the temporal features of the aggregated historical power.
2.4. Loss Function
The essence of the neural network regression problem is an optimization problem on a training set, where the decision variables are the parameters of the neural network, and for deterministic prediction, the loss function can take the following form:
is the predicted power value, and
is the observed actual power value. For probability interval prediction, the interval
can be constructed using the quantile prediction values
, which satisfy the following relationship:
where
are the upper and lower quartile ratios of the prediction intervals, respectively, and
denotes the confidence level of the corresponding interval. The form of the corresponding optimized objective function is as follows:
denotes the pinball loss function. The formula for the pinball loss function is shown below.
Based on the above two equations, Equation (15) can then be expanded as follows:
Typically, the lower bound corresponds to a quantile less than 0.5 and the upper bound corresponds to a quantile greater than 0.5, which means that is true. And the optimization problem is a minimum, which makes the problem optimize in the direction of smaller weights, i.e., the objective function tends to the first case.
As stated above, the true value should satisfy the inequality
. However, the actual predicted values will not fall exactly within the corresponding intervals, and we introduce penalty coefficients to widen the gap between the weights for this reason. The improved objective function is expanded in the following form:
The addition of the penalty coefficient significantly increases the value of the objective function when the optimized objective function is taken for the second, third and fourth cases, which forces the objective to be effectively optimized towards .
4. Discussion
In this section, the proposed model is first compared with the benchmark model to demonstrate its excellent performance in both interval prediction and deterministic prediction. In addition, experiments on the effectiveness analysis of the loss function and the sensitivity analysis of the penalty coefficients are conducted and the effectiveness of the ECA module is verified.
4.1. Interval Prediction Results
In order to verify the excellent performance of the proposed interval prediction model, it is compared with other models. The designed comparison models include the commonly used models for time series, TCN [
34], GRU [
35] and ANN [
36], the parametric method interval prediction model BELM [
26] and CNN, LSTM [
37] and QR-LIFF. BELM, i.e., ELM based on the bootstrap method, generates prediction intervals based on the assumption of normal distribution. The inputs to the benchmark model are all in vector form. QR-LIFF, the lightweight interval forecasting framework based on quantile regression, adopts the original interval prediction loss function, and IQR-LIFF is the proposed lightweight interval forecasting framework with improved quantile regression. Considering that the recursive strategy leads to error accumulation and the direct prediction method is more stable and has better performance [
38], the direct prediction method is used for day-ahead prediction. A prediction interval nominal confidence (PINC) of 90% is constructed by the quantile interval of
, and similarly, a PINC of 80% and 70% is constructed by the quantile intervals of
and
, respectively, and the comparison results are shown in the table below.
From the
Table 1, it is clear that the interval generation method proposed in this paper has the best performance. The performance of ANN, TCN, GRU, and LSTM is similar, and the TCN model has a PICP value of 85.749% at a nominal coverage of 90%, which is higher than the proposed IQR-LIFF model. However, in terms of the PIAW metric, TCN is 33.2% higher than IQR-LIFF, which is obvious in
Figure 6. It sacrifices model sharpness for reliability, and the TCN model is still worse than the IQR-LIFF model in terms of the comprehensive index WS.
As can be seen in
Figure 6, the parametric method BELM already lacks a clear boundary between the upper and lower bounds of the prediction interval. It has a more aggressive interval and lacks reliability. The CNN model is the most radical, having the smallest PIAW. The sharpness of the CNN model and the QR-LIFF model are similar, but the QR-LIFF model improves the coverage performance by about 30% without changing the model sharpness.
The IQR-LIFF model proposed in this paper demonstrates good coverage and high sharpness in Figure 6. This indicates that the designed parallel framework of CNN and LSTM effectively extracts spatial meteorological features and temporal features, which in turn improves the accuracy of interval prediction.
4.2. Validation of the Validity of the Loss Function
Combined with
Table 1, it can be seen that in terms of the composite indicator WS, despite the better performance of the QR-LIFF model, the PICP at all levels deviates from the PINC. It can also be seen from
Figure 7 that the true values are beyond the coverage of the prediction intervals of the QR-LIFF model at times of power rise and sharp fall. And the intervals predicted by the IQR-LIFF model cover the fluctuation range of the true value better in all time periods. The proposed model still shows excellent performance in a long period of a low-wind-power scenario from 5 October to 9 October. From the above comparisons, it can be seen that after training with the loss function proposed in this paper, the interval widths are nearly doubled when the PINC is 90%, 80%, and 70%, respectively. However, the coverage and performance metrics are substantially improved, which fully demonstrates the effectiveness of the proposed loss function.
4.3. Deterministic Prediction Results
To demonstrate that the designed network also performs extremely well for deterministic prediction, the loss function is replaced with the loss function corresponding to deterministic prediction. In addition to the above interval prediction models, the persistence method is added as the base predictor, which assumes that the predicted value is equal to the most recent actual observation, and the day-ahead prediction errors for each model are shown in
Table 2 below, with the prediction curves for each model shown in
Figure 8.
The model in this paper has the highest accuracy in terms of both RMSE and MAE. The persistence approach is the worst, as shown in
Figure 8, where it deviates significantly from the true value, suggesting that it has limited application on a short-term scale. The GRU, ELM, ANN and LSTM models have comparable accuracies, and CNN shows excellent performance, which indicates that there is a strong correlation between meteorological feature images and the total regional power, and nice prediction results can be achieved by the mapping relationship between the two established by CNN. Meanwhile, after considering the historical power (i.e., the framework proposed in this paper), the RMSE is reduced by 19.4% and the MAE is reduced by 22.8%.
4.4. Sensitivity Analysis of Penalty Coefficients
Theoretically, the penalty coefficients can tend to infinity, but in practice, we found that an overly large penalty factor will reduce the computational efficiency, and there is even an overfitting situation, so it is necessary to find the optimization in a suitable range. In the case of one-hour-ahead interval prediction, for example, the other parameters of the model are maintained unchanged, and only the penalty coefficients are varied to choose the appropriate values based on their performance on the test set. The other model parameters were selected in a similar way.
As can be seen in
Figure 9, the PICP values show an overall increasing trend in the first half of the period as the
p-value increases, and a smooth trend in the second half of the period. The 90% confidence intervals, 80% confidence intervals and 70% confidence intervals achieved great values of PICP at 2.5, 3.0 and 4.0, respectively, and the corresponding values of the width of the intervals tended to increase, and for a PINC of 90%, the second half of the interval showed a surge, and, in fact, there was an overfitting situation. Therefore, continuing to increase the value of the penalty coefficients may lead to overfitting, while considering the sharpness of the model, the extremely large value here is considered optimal.
4.5. Effectiveness Analysis of the ECA Module
In order to validate the effectiveness of the ECA module, we conducted comparative experiments with different time step correlations on an interval prediction model with 90% confidence intervals; the experiments’ set up include the following:
- (1)
Without any attention mechanism module, denoted as Model 1.
- (2)
Addition of the Squeeze-and-Excitation Network [
39] channel attention mechanism, which effectively captures the dependencies between all channels, denoted as Model 2.
- (3)
Add the ECA mechanism, denoted as Model 3.
In addition, all other parameters of the models are kept consistent, and the performance comparison of the models is shown in
Table 3. Model 3 has a Winkler score of −0.009 at a prediction step size of t, which is better than Model 1 without an attentional mechanism and Model 2 with the addition of the SE attentional mechanism. In terms of computation time, Model 3 is shorter than Model 2, which becomes more apparent as the prediction time step increases, and also suggests that SE is inefficient and unnecessary in capturing the dependencies between all channels.
5. Conclusions
In this paper, we construct a lightweight IFPRWP model based on meteorological feature images and improved quantile regression to forecast provincial regional wind power fluctuation intervals. Firstly, the inputs of spatial meteorological distribution and temporal features of the model are constructed through image generation and power aggregation. On this basis, a parallel CNN-LSTM prediction architecture is designed, which can effectively extract temporal features and spatial meteorological distribution features. Then, an efficient ECA mechanism module is introduced and trained with an improved quantile loss function to directly generate prediction intervals. The effectiveness of this model is validated with actual data from a region containing 75 wind stations in Guizhou province, Southwest China. The results show that the model proposed in this paper improves the interval prediction performance by at least 12.3%, reduces the deterministic prediction RMSE by at least 19.4%, and reduces the MAE by 26.7% compared to the benchmark model. Meanwhile, the effectiveness of the proposed improved loss function is also verified in the comparative analysis, and it can effectively improve the quality of the prediction interval.
The proposed model has potential to be expanded to other renewable energy applications and scales. However, there are still improvements that need to be made to the model, firstly, how to achieve rapid updating of the model when new power stations are added, and secondly, experiments related to the model on datasets with low data quality or longer time horizons need to be carried out to optimize the model to enhance its robustness.