Next Article in Journal
Upcycling PVC and PET as Volume-Enhancing Functional Fillers for the Development of High-Performance Bio-Based Rigid Polyurethane Foams
Previous Article in Journal
Research on the Impact Mechanism and Empirical Study of the Digital Economy on Rural Revitalization in the Yangtze River Economic Belt
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Short-Term Photovoltaic Power Probabilistic Forecasting Based on Temporal Decomposition and Vine Copula

by
Xinghua Wang
,
Zilv Li
*,
Chenyang Fu
,
Xixian Liu
,
Weikang Yang
,
Xiangyuan Huang
,
Longfa Yang
,
Jianhui Wu
and
Zhuoli Zhao
Department of Electrical Engineering, School of Automation, Guangdong University of Technology, Guangzhou 510006, China
*
Author to whom correspondence should be addressed.
Sustainability 2024, 16(19), 8542; https://doi.org/10.3390/su16198542
Submission received: 20 August 2024 / Revised: 22 September 2024 / Accepted: 27 September 2024 / Published: 30 September 2024
(This article belongs to the Special Issue Advances in Sustainable Energy Technologies and Energy Systems)

Abstract

:
With the large-scale development of solar power generation, highly uncertain photovoltaic (PV) power output has an increasing impact on distribution networks. PV power generation has complex correlations with various weather factors, while the time series embodies multiple temporal characteristics. To more accurately quantify the uncertainty of PV power generation, this paper proposes a short-term PV power probabilistic forecasting method based on the combination of decomposition prediction and multidimensional variable dependency modeling. First, a seasonal and trend decomposition using a Loess (STL)-based PV time series feature decomposition model is constructed to obtain periodic, trend, and residual components representing different characteristics. For different components, this paper develops a periodic component prediction model based on TimeMixer for multi-scale temporal feature mixing, a long short-term memory (LSTM)-based trend component extraction and prediction model, and a multidimensional PV residual probability density prediction model optimized by Vine Copula optimized with Q-Learning. These components’ results form a short-term PV probabilistic forecasting method that considers both temporal features and multidimensional variable correlations. Experimentation with data from the Desert Knowledge Australia Solar Center (DKASC) demonstrates that the proposed method reduced root mean square error (RMSE) and mean absolute percentage error (MAPE) by at least 14.8% and 22%, respectively, compared to recent benchmark models. In probability interval prediction, while improving accuracy by 4% at a 95% confidence interval, the interval width decreased by 19%. The results show that the proposed approach has stronger adaptability and higher accuracy, which can provide more valuable references for power grid planning and decision support.

1. Introduction

Due to the energy transition and decarbonization of the energy sector, a high degree of development and utilization of solar energy has become a priority in global energy research. In this context, PV generation, as the primary method of solar energy utilization, has moved into the large-scale development stage [1,2,3]. However, PV power has high variability depending on weather conditions, resulting in huge challenges for distribution network security and stable operation [4]. The randomness and uncertainty of PV power hinder its pace of grid connection. Consequently, the development of PV power forecasting technologies has garnered significant attention worldwide.
As a form of time series existence, PV power series also presents intricate time series characteristics such as non-stationarity and volatility due to the complex influence of multiple factors in the real world. Especially for long time spans, the deep mixing of multiple variation features such as rising levels, regular fluctuations, and random mutations brings severe challenges to the forecasting task. To address this issue, the more accepted paradigm is to use methods like moving average decomposition [5], empirical modal decomposition (EMD) [6], and wavelet decomposition [7] to decompose the complex time series into sub-series with independent features and more predictability. However, these methods are sensitive to outliers and require the artificial selection of high- and low-frequency components as different features. Finding matching models for different component characteristics to extract feature information separately is much more accurate than using a single model to predict the complete sequence [8]. Therefore, a decomposition method that can give different components a clear physical meaning and higher reliability must be adopted to facilitate the model to learn specific component features better.
PV power forecasting methods can be divided into deterministic forecasting and probabilistic forecasting based on the type of prediction [9]. With the increasing scale of PV grid connection and the growing requirements for power supply quality, the reliability of PV deterministic forecast models, such as artificial neural network (ANN), auto-regression moving average (ARMA) [10], and physical models [11], is decreasing. A comprehensive overview of mainstream deterministic forecast models is provided by R. Ahmed et al. [12], which offers an elaborate introduction to the categorization of PV power forecasting techniques as well as the principles behind state-of-the-art forecast models. Furthermore, it is observed in the study that PV time series show different time-varying characteristics when observed at different scales, and future changes are a blend of multiple scale characteristics. For example, at smaller sampling frequencies (e.g., hours, days), the time series can demonstrate detail-rich fluctuations and short-term cyclical changes, while at time scales with larger frequencies (e.g., weeks, months), macro trends and long-term cyclical fluctuations are more apparent. The difficulty in making accurate time series forecasts is how to account for the multi-scale temporal variation. For more advanced paradigms capturing multi-scale temporal features, models such as temporal convolutional network (TCN) [13] and Transformer [14] are widely recognized for their predictive performance. However, the former suffers from dimensionality limitations, which makes it difficult to mine long-term dependencies, and the latter introduces a self-attention mechanism that requires a large amount of computational resources. It restricts the application in power systems. In [15], a multilayer perceptron (MLP) is used to establish the relationship between long-term meteorological data and multi-scale temporal data, but the model complexity is too low to deal with high-dimensional variable nonlinearity. The authors in [16] proposed a TimeMixer multi-scale mixing architecture, which achieves bidirectional coarse- and fine-scale mixing in time series characterization and integrates multi-scale historical information through multiple predictors in the forecasting stage. TimeMixer is able to capture the characteristics of the changes in time series under different scales of observation, and it is entirely based on the MLP-based architecture, which achieves high efficiency comparable to linear models and state-of-the-art performance.
In a recent review of probabilistic forecasts by Meer, D.W. et al. [17], probabilistic forecasts are divided into parametric and nonparametric methods. Constructing prediction intervals by fitting a known density function to the errors of a forecast with Gaussian distribution [18] or beta distribution [19] is a common parametric method. Salinas D et al. [20] constructed a negative log-likelihood function to find the parameters of the probability distribution but it has poor model prediction accuracy. Assuming a distribution beforehand leads to a significant reduction in the accuracy of parametric methods. Nonparametric methods derive distribution information directly from the original data characteristics, which have stronger generalization and performance. Quantile regression (QR) [21] is the most common nonparametric method; others include kernel density estimation (KDE) [22], bootstrap [23], Gaussian processes (GP) [24], and so on. More recently, nonparametric methods have been combined with ANN. F. Lin et al. [25] derive the closed analytic form of the fractional integral of the continuous rank probability score to be used as a loss function for training. However, due to the low interpretability of machine learning and the fact that each quantile is predicted independently, it can only reflect partial probability information. It also suffers from quantile crossing problems that violate the monotonicity property [26]. Zhang et al. [27] introduces the strategy of decomposition, combining predictions of other variables with KDE-modeled residual component distributions to obtain probability intervals. However, the challenge in selecting optimal bandwidth factors limits its ability to accurately reflect the sample’s true distribution.
Probabilistic forecasting results need to reflect the true probability distribution of a random variable to accurately quantify uncertainty. Y. Sun et al. [28] used Copula to establish a joint probability distribution between two PV plants, which provides ideas for portraying the correlation between two variables. Z.L. Li et al. [29] proposed a combined Copula function to establish the solar/wind joint PDF in each time slot. Müller, A. et al. [30] proposed a Copula-based time series model, which described the dependence between hourly and daily time series. However, Copula suffers from poor fitting of multidimensional variables. In this context, a Vine Copula approach is proposed to describe the spatial dependence of PV power forecast errors from physical models across multiple grid nodes by Schinke-Nendza A. et al. [31]. R. Zhang et al. [32] used fuzzy C-means to cluster data under different climate conditions and then established D-Vine Copula models for three typical climates. All of the above studies directly model the multivariate dependence structure of the complete PV series and variables. Although better reflecting the original probability distribution information, the results are unable to meet the standard error requirements for short-term forecasting in the fine-scale observation. In this paper, the main accuracy improvement task is given to the deep learning model, and the uncertainty quantification is mainly performed through Vine Copula, which is a framework implemented by time series decomposition and modeling the components separately.
Vine Copula uses the “Vine tree” in graph theory to represent the correlation between different variables, which is represented by the connection of variable points. Commonly used modeling approaches like [31,32,33,34] use Kendall’s rank correlation coefficients to measure variable dependence, then apply maximum spanning tree (MST) to select Vine trees with the highest coefficient sum. However, MST often yields locally optimal, suboptimal dependency structures. This paper proposes a novel Vine Copula structure optimization method using Q-Learning to select variable connections for Vine trees.
To sum up, this paper proposes a probabilistic forecasting method for PV power that considers different component characteristics under time series decomposition for constructing prediction models and probability modeling. The innovation lies in developing a framework that combines deep learning prediction models with multidimensional variable uncertainty quantification models based on time series feature decomposition, where different features of the time series are represented as periodic, trend, and residual components. The objectives include: (1) accurately predicting the periodic component of PV power via TimeMixer; (2) efficiently predicting the trend component using LSTM’s long-term memory capability; (3) constructing a multidimensional joint distribution based on Q-Learning optimized Vine Copula to quantify its stochastic process and obtain probability distribution results for the forecast time using QR. The detailed implementation process of this method is as follows. Firstly, STL is used to decompose the PV time series into periodic, trend, and residual components. A TimeMixer-based periodic component prediction model and an LSTM-based trend component prediction model are constructed. The mutual information (MI) method and variance inflation factor (VIF) method are used to screen highly correlated influencing factor variables for the PV residual component for subsequent Vine Copula modeling. Q-Learning is used to construct the variable connection relationships of the screened variables. Then, the maximum likelihood estimation (MLE) and Bayesian information criterion (BIC) are used to determine the parameters of the Vine tree, forming an optimal Vine Copula structure to generate a joint probability distribution of PV residuals and multiple factors. Finally, QR is performed based on the derived conditional distribution of PV residuals to obtain interval results. Probabilistic forecasts are obtained by combining deterministic predictions of periodic and trend components with uncertainty models of residual components. Simulation verifications are conducted using PV output information collected from the DKASC Alice Springs PV station in Australia. Results demonstrate that compared with Transformer, TCN, CNN-GRU, LSTM, XGBoost, and traditional Vine Copula methods, the proposed method exhibits more powerful performance in probabilistic forecasting. Meanwhile, each model utilized in this method can effectively handle the corresponding PV component information, outperforming several mentioned state-of-the-art models. The main contributions of this paper are as follows.
(1) This study proposes a probabilistic forecasting method that combines deterministic prediction models with uncertainty models under time series feature decomposition. Appropriate prediction models according to the characteristics of each decomposed component improved prediction accuracy, including STL, TimeMixer, LSTM, and Q-Learning optimized Vine Copula models.
(2) The experiments involve multi-faceted comparisons, verifying the effectiveness of probabilistic forecasting under the decomposition framework, the advanced nature of the TimeMixer model and LSTM model in predicting respective components, and the effectiveness of Q-Learning optimized Vine Copula in uncertainty modeling.
(3) This study verifies that the TimeMixer model performs better in multi-scale time series feature extraction and prediction compared to some state-of-the-art models and that improvements in deterministic prediction efficacy are beneficial to probabilistic forecasting results.
The remaining parts of the article are arranged as follows: various models, methods, and frameworks involved are presented in Section 2; data introduction and preprocessing, modeling effects, prediction results, comparisons, and discussions are given in Section 3; Section 4 provides detailed conclusions and suggestions for future research.

2. Methodology

2.1. Framework for Proposed Methods

The general framework of this paper for probabilistic PV power forecasting is shown in Figure 1.
1. Data preprocessing. After obtaining the original data, the first step is to resample the interval of the data according to the short-term prediction of the need for resampling data, remove abnormal data, and fill in missing values after normalization. The above process needs to be carried out on all the original data obtained. STL decomposition is performed for the PV series to obtain the period component, trend component, and residual component, and finally, the training set is divided for all the data.
2. Deterministic forecasting. Deterministic forecasting includes the PV periodic component and trend component, with these two as the targets to construct TimeMixer and LSTM forecasting models, respectively. Historical data and meteorological data are their inputs.
3. Uncertainty modeling. MI and VIF are used to screen the variables that are strongly correlated with the PV residual component and define the number of variable dimensions, and Q-Learning is used to select the variable connection to determine the initial structure of the Vine Tree. The parameters and types of Copula functions in the Vine Tree are calculated according to MLE and BIC to obtain the complete Vine Copula model. The conditional distribution function of the PV residual component in the multidimensional joint distribution is derived and QR is used to obtain the confidence interval of the predicted moments to the uncertainty quantification results. Finally, the deterministic prediction results above are combined to form the complete probabilistic forecasting results.

2.2. STL Decomposition

For the PV time series, the variation trend is influenced by multiple factors. On the one hand, it includes the effects of solar activity patterns and seasonal changes, leading to periodic fluctuation characteristics in the time series. On the other hand, it is affected by economic development needs, such as the more intuitive changes in installed capacity, resulting in a relatively smooth trend in overall power output characteristics. Finally, there are random characteristics that only appear when observed at a fine scale. The original sequence with mixed characteristics is very difficult for model learning. If predictions could be made for specific characteristics, it would be beneficial for the model to focus on specific change patterns unaffected by random fluctuations.
Therefore, based on the idea of decomposition prediction, this paper uses STL decomposition to decompose the PV sequence into periodic, trend, and residual components. The decomposition formula can be expressed as:
y t = S t + T t + R t
where  S t T t , and  R t  represent the periodic component, trend component, and residual component of the sequence at time t, respectively. The periodic component represents the regular part of PV output, which can be observed as obvious and similar peaks in the curve. The trend component represents the smooth trend line in the time series with a small fluctuation amplitude. The residual part represents the fine changes caused by uncertain external factors. STL is a time series decomposition method that uses locally weighted regression as a smoothing approach. It uses locally estimated scatterplot smoothing (LOESS) to extract smooth estimates of components, achieving separation of different components [35]. The algorithm is based on a double-layer structure of inner and outer loops. The inner loop obtains periodic and trend components through LOESS smoothing operations on the detrended sequence, while the outer loop adjusts the robustness weights of the algorithm to reduce the impact of outliers on LOESS regression. The algorithm flow is shown in Figure 2.
In the outer loop, STL decomposition calculates robust weights based on the residual term to update the neighborhood weights in the LOESS regression of the inner loop smoothing operation. This separates the noise points in the data into the residual term, improving the robustness of the algorithm. The robust weights are calculated using the Bisquare function, specifically as follows:
B ( μ ) = ( 1 μ 2 ) 2 , 0 μ < 1 0 , μ 1
ρ t = B ( R t 6 m e d i a n ( R t ) )
where  B ( μ )  is the Bisquare function,  R t  is the residual component, and  m e d i a n ( · )  is the median calculation function.

2.3. TimeMixer Model

Through observation, it is found that time series exhibit different information at different sampling scales. For example, a PV sequence recorded hourly shows power output changes at different times within a day, while this is not observable in a daily recorded sequence, which instead reveals daily power output level changes. Coarse and fine scales can reflect macro and micro information, respectively, and future deterministic information is jointly determined by changes at multiple scales.
Addressing the issue of multi-scale feature fusion for time series, TimeMixer proposes an MLP-based multi-scale feature mixing architecture. This architecture extracts different scale temporal features from past changes through a past-decomposable-mixing (PDM) module and then integrates the extracted multi-scale past information to predict future sequences through a future-multipredictor-mixing (FMM) module [16]. Benefiting from the MLP-based analysis of different feature components of multi-scale sequences and the realization of complementary prediction capabilities, TimeMixer achieves state-of-the-art performance in short-term and long-term forecasting with excellent computational efficiency. The overall architecture of TimeMixer is shown in Figure 3.
Specifically, to unravel complex changes, TimeMixer first applies average pooling to the original sequence  x 0 R P × C  to generate M sub-sequences of different scales, resulting in multi-scale time series  X = { x 0 , x M } , where  x m R | P 2 m | × C , m { 0 , , M } , C is the number of variables, and P is the length of the original sequence. Sub-sequences of different scales are generated from  x 0 R P × C  through downsampling at different time steps. The original sequence  x 0  contains the most subtle change information, while the highest level  x M  represents the most distant macro information to be extracted. The multi-scale sequences are then projected into deep features  X 0  through an embedding layer, which can be represented as  X 0 = E m b e d ( X ) , thus obtaining a multi-scale representation of the input.
Even the coarsest scale sequence contains mixed feature changes. To enable the model to learn different characteristics, TimeMixer decomposes each scale sequence and then performs multi-scale mixing. In the l-th PDM module, the decomposition block [36] first decomposes  X l  into trend  T l = { t 0 l , , t M l }  and periodic  S l = { s 0 l , , s M l } , then performs feature mixing on seasonal and trend terms separately, achieving multi-scale interaction of the same feature, as shown in the following equations:
s m l , t m l = D e c o m p ( x m l ) , l { 1 , , L } , x m l R | P 2 m | × d
X l = X l 1 + F e e d F o r w a r d ( S _ M i x ( { s m l } m = 0 m = M ) + T _ M i x ( { t m l } m = M m = 0 ) )
where L is the total number of layers;  D e c o m p ( · )  represents the decomposition block, which obtains the trend term through moving average processing, treats the remaining as the periodic term, and keeps the sequence length unchanged through padding [37];  x m l  is the deep feature representation of different time scales with d channels;  F e e d F o r w a r d ( · )  consists of two linear layers, exchanging information between channels through the GELU activation function;  S _ M i x ( · ) , T _ M i x ( · )  represent the mixing operations of periodic and trend information. As seen from Equation (5), TimeMixer uses stacked PDM modules to mix past information from different scales, allowing each layer to extract and mix multi-scale information from the output of the previous layer, which helps to capture complex patterns and high-level features in the data layer by layer, enhancing the model’s representational ability.
TimeMixer adopts different mixing modes according to the different change characteristics of different features. For periodic terms, large-scale periodic information can be seen as a collection of small-scale periods corresponding to macro and micro information, respectively, so a bottom-up mixing method is adopted. For example, observing the periodic component of photovoltaic, fusing information upward from the daily scale periodic sequence can form a coarser monthly scale periodic sequence. In the technical implementation of the model, residual connections [38] are used to achieve the interaction of multi-scale periodic term information  S l = { s 0 l , , s M l } , as intuitively shown in Figure 4a, and can be formulated as:
S m l = ( B o t t o m _ U p ) S _ M i x ( s m 1 l ) + s m 1 l
where  ( B o t t o m _ U p ) S _ M i x ( · )  consists of two linear layers in the time dimension, with input and output dimensions of  [ P 2 m 1 ] [ P 2 m ] , respectively. Through residual connections, scale information from the previous layer can be directly passed to later layers, allowing the network to fit the residual mapping  H ( x ) s m l  as shown in Figure 4a while not losing early layer information, thus making the network focus only on the information interaction between the current scale and the next scale, avoiding network degradation.
For trend terms, fine-scale changes introduce noise into macro trend information. Taking the PV trend sequence as an example, the coarse-scale trend component better exhibits the clear overall PV output level for the entire study period. Therefore, a top-down direction is adopted to mix multi-scale trend information, using the macro trend to guide the micro trend direction of fine scales. This is intuitively understood as shown in Figure 4b and can be formulated as:
T m l = ( T o p _ D o w n ) T _ M i x ( t m 1 l ) + t m 1 l
where  ( T o p _ D o w n ) T _ M i x ( · )  consists of two linear layers, but the input and output dimensions become  [ P 2 m + 1 ] [ P 2 m ] . The PDM module differs from the approach of directly mixing multi-scale feature sequences. It aggregates micro and macro information based on the periodic and trend terms of subsequences, respectively, ultimately achieving multi-scale mixing in past information extraction.
After passing through the aforementioned L PDM modules, the model obtains rich and complete multi-scale past information  X L = { x 0 L , , x M L } . Since past information at different scales exhibits varying predictive capabilities, to fully utilize multi-scale information, TimeMixer employs an FMM module to achieve multi-scale complementary prediction. In this module, past information from each scale  x m L  is input into a corresponding scale-specific predictor, and predictions from multi-scale sequences are aggregated:
x m = p r e d m ( x m L )
x = m = 0 M x m
where  x m R F × C  represents the prediction of the future based on the m-th scale sequence; F denotes the future length to be predicted;  x R F × C  is the final output;  p r e d m ( · )  represents the predictor for the m-th scale sequence. It first uses a single linear layer to directly extract past information of length  [ P 2 m ]  to regress a future of length F, then projects the regressed deep representation back to C target variables, and finally aggregates the multi-scale results to obtain the prediction result, as shown in Figure 4c.

2.4. Long Short-Term Memory Network

The LSTM model is a variant of recurrent neural network (RNN) that effectively addresses gradient vanishing while preserving the temporal correlations. It adds a “gate” mechanism control to determine whether data are retained or discarded, thereby enabling the network to learn time series long short-term dependency information. The structural diagram of the LSTM network is shown in Figure 5 [9].
The basic units of LSTM networks include forget gates, input gates, and output gates. The input  x t  in the forget gate, along with the state memory unit  S t 1  and intermediate output  h t 1 , jointly determine the forgotten part. The  x t  in input gate is jointly determined by the sigmoid  σ  and tanh activation functions to retain the vector in the state memory unit. The intermediate output  h t  is determined by the updated  S t  and output  o t , and the calculation formula is shown as follows [39]:
f t = σ ( W f x x t + W f h h t 1 + b f )
i t = σ ( W i x x t + W i h h t 1 + b i )
g t = ϕ ( W g x x t + W g h h t 1 + b g )
o t = σ ( W o x x t + W o h h t 1 + b o )
S t = g t i t + S t 1 f t
h t = ϕ ( S t ) o t
where  f t , i t , g t , o t , h t  and  S t  are the states of forget gate, input gate, input node, output gate, intermediate output, and state unit, respectively.  W f x , W f h , W i x , W i h , W g x , W g h , W o x , and  W o h  represent the matrix weights of the corresponding gates multiplied by input  x t  and intermediate output  h t 1 b f , b i , b g , b o  are the bias of the corresponding gates, respectively;   represents multiplication of elements in a matrix;  σ  and  ϕ  represent the activation function sigmoid and hyperbolic tangent activation function.

2.5. Deterministic Forecasting Model Construction with Periodic and Trend Components

STL decomposition is based on a preset data periodic length. To ensure that the obtained trend component reflects the smooth developmental pattern in long-term PV changes, especially highlighting changes in overall power output levels due to seasonal variations, the periodic length cannot be set too short. Consequently, the periodic component will contain complex and extremely detailed change patterns, while the trend component will be relatively smooth. TimeMixer can capture information at different time scales, detecting both the most minute changes and global features. LSTM, on the other hand, excels at coupling and memorizing different moments within long-term time patterns. Therefore, a TimeMixer-based periodic component prediction model and an LSTM-based trend component prediction model are constructed.

2.5.1. Periodic Component Forecasting Model Based on TimeMixer

The key to the PV component forecasting model based on TimeMixer lies in mining periodic characteristics and the correlation between time series data and multidimensional external factors, which requires establishing a clear data structure for input. The PV periodic component exhibits certain autocorrelation characteristics with its own changes. Furthermore, since it is decomposed from the original output sequence, the component also has a certain cross-correlation with its historical output. The initial input data for the PV component prediction can be expressed as:
x 0 s = s 1 m 1 x 1 h 1 u 1 s n m n x n h n u n
where n represents the number of samples;  s i , i = { 1 , n }  represents the periodic component of PV generation;  m i  denotes meteorological information for the region, including multiple input variables;  x i  is the complete output of PV;  h i  is the hourly label feature, reflecting the intra-day variation pattern; and  u i  is the monthly label feature, reflecting the variation of output levels across different months of the year, which determines the range of periodic component changes. In the prediction process, the time label information of the moment to be predicted is completely known, while the meteorological data comes from numerical weather prediction (NWP), which has some uncertainty itself. Considering that it is difficult to obtain the precise NWP information on the day to be predicted in the actual use of the model, this paper adopts the minimum, maximum, and average values of temperature, humidity, and solar radiance on the prediction day as the auxiliary input features, and extends them to keep the time scale consistent with PV output with a daily change. Therefore, the final training process can be expressed as:
x s , L , K = F ( x 0 , L s , x L , h K , u K , a K ; θ )
where  x s , L , K  represents the predicted future PV periodic component with a step length of K obtained from historical data of length L x 0 , L s  denotes the historical initial data of length L x L  represents the historical PV output of length L h K  denotes the hourly labels for the next K steps;  u K  represents the monthly labels for the next K steps;  a K  is the auxiliary features on the prediction day.  θ  is the parameter vector for model training, and  F ( )  is the mapping function used in the training. Figure 6a illustrates the model prediction process in the form of a sliding window.

2.5.2. Trend Component Forecasting Model Based on LSTM

The changes in the PV trend component are mainly influenced by the total solar radiation. A lower overall radiation during a certain period results in a lower overall level of the trend component. Observations on an annual scale reveal the fluctuations of the trend component. Similarly to the predicted inputs for the periodic components, the initial input data and training process of LSTM can be expressed as:
x 0 t = t 1 m 1 x 1 q 1 u 1 t n m n x n q n u n
x t , L , K = F ( x 0 , L t , x L , q K , u K , a K ; θ )
where  t i  represents the trend component of PV generation;  q i  denotes the quarter label feature, reflecting the level changes of the trend component with seasonal transitions;  x t , L , K  is the predicted future PV trend component with a step length of K obtained from historical data of length L q K , u K  represents the quarterly and monthly labels for the next K steps. It can be observed that, unlike the training process for the periodic component, the auxiliary temporal label features input for the trend component include monthly and seasonal labels on a longer time scale. Similarly, Figure 6b demonstrates the sliding window processing for the trend component.

2.6. Uncertainty Modeling Based on Vine Copula Optimized by Q-Learning

2.6.1. The Theory of Vine Copula

The foundation of the Copula theory is laid by Sklar’s theorem [40]. Essentially, the theorem decomposes a complex joint distribution into multiple marginal distributions while connecting them through a “linking” function. This function is defined as the joint distribution function coupling uniform one-dimensional marginal distributions on multiple intervals within [0, 1]. According to the definition, the multivariate joint PDF is [30]:
F ( x 1 , x 2 , , x m ) = C ( F 1 ( x 1 ) , F 2 ( x 2 ) , , F m ( x m ) )
where  F ( · )  represents joint PDF containing d-dimensional marginal distributions,  F 1 , F 2 , , F m  is the marginal cumulative distribution function (CDF) for  i - t h  variable, and  C ( · )  is the Copula function.
Vine Copula decomposes the multivariate joint distribution in a cascading manner into a series of conditional 2-dimensional Copulas, primarily in two forms: C-vine and D-vine [41]. When a dominant variable exerts a strong influence on other variables, the C-vine Copula structure is chosen, ensuring each Vine Tree carries a dominant factor. Conversely, while the relationships between variables are relatively balanced, the variables in the Vine Tree are arranged in rows, forming D-vine Copula. Figure 7 illustrates a 5-dimensional C-vine Copula structure, and it can be observed that a  d -dimensional Vine structure is composed of  d 1  Vine Trees. The first Vine Tree consists of  d  variables as nodes connected by edges, and the nodes of  i - t h  Vine Tree are the edges of  i 1  Vine Tree, where  i = 1 , 2 , , d . The arrangement of variables in the first tree is of great importance, as it determines the composition of nodes and edges in the subsequent Vine Trees [34].

2.6.2. Variables Screening Based on MI and VIF

Vine Copula is more likely to capture the dependence structure between strongly correlated variables with stronger significance, and introducing variables with weak correlation in uncertainty modeling will increase the complexity. Therefore, it is necessary to select suitable variables from multiple meteorological factors.
(1) MI
MI is used to measure the degree of mutual dependence between two random variables. Unlike the correlation coefficient, MI is not limited to real-valued variables and expresses the similarity between the joint distribution and the marginal distributions. For continuous random variables, the MI between two discrete variables can be defined as [42]:
I ( X ; Y ) = Y X p ( x , y ) log ( p ( x , y ) p ( x ) p ( y ) d x d y
where  p ( x , y )  is the joint probability density function of x and y.  p ( x ) p ( y )  are the marginal probability density functions of x and y, respectively. MI can effectively describe the nonlinear relationship between two variables, making it particularly suitable for the Vine Copula model to construct an appropriate dependence structure.
(2) VIF
VIF is a method used to assess the linear relationship between input variables in a model. The basic idea is to determine the importance of a feature variable in a regression model by measuring its correlation with other feature variables. VIF is an essential tool for identifying and addressing multicollinearity issues. The calculation of VIF involves treating each feature as the dependent variable and the other features as independent variables to fit a linear regression model. The VIF is then computed as the ratio of the mean squared error of the independent variables to the dependent variable, as follows [43]:
V I F i = 1 1 R i 2
where  R i 2  is the coefficient of determination obtained by fitting a linear regression model between the independent variable and the other variables. In the context of Vine Copula modeling, the construction of a multivariate joint distribution requires multiple correlation relationships between influencing variables and the target variable. Therefore, after selecting multiple highly dependent variables using MI, VIF is then used to filter out variables that exhibit multicollinearity with the PV residuals.

2.6.3. Improved High-Dimensional Vine Copula Modeling Based on Q-Learning

In Vine Copula, each edge contains information about the dependence between two variables. In order to find the Vine Tree with the highest dependency, this paper utilizes the Q-Learning strategy to find the connection relationship of each variable.
Q-Learning is a value-based reinforcement learning algorithm that utilizes the Q-function to represent the rewards of taking actions in the current state. The objective is to acquire the maximum Q-value throughout the entire process, involving interactions between the agent and the environment. Q-Learning involves state space  S , action space  A , state transfer function  P , reward function  R , and discount factor  γ . In terms of timing, there are:  s t  is the state of the agent at time  t a t  is the action taken by the intelligent agent at time  t P ( s t + 1 | s t , a t )  is the probability of transferring to state  s t + 1  after taking action  a t  in state  s t r t  is the reward value obtained after taking action at in state  s t . 0 γ 1 , which is used to balance the impact of immediate and future returns on the decision-making process [44]. The construction of Vine Tree can be represented by the connections between variables. This paper defines the state as a variable connected and the action as a variable selected. The state  s t  can be defined as:
s t = [ v 1 , v 2 , , v N ]
where  v i  is variable node;  N  is the number of variables that need to be connected. To avoid duplicate connections of variables, it is necessary to record the variables selected for each action and remove the variables selected from time steps 1 to  t 1  in the action space  a t . The initial state is defined as connecting variables starting from the PV residuals value  P . Algorithm adoption epsilon-greedy strategy to choose actions. ε represents the level of exploration of the agent, actions will be taken with the probability of ε to select an action to explore the environment in order to obtain a potential global optimal solution. At the same time, it will take the probability of 1 − ε to select the action with the highest Q-value in the current Q-table, emphasizing the acquisition of immediate rewards. Each row in the Q-table corresponds to a current state, and each column represents available actions. After each action is taken, the value in Q-table will be updated accordingly [45]:
Q t + 1 ( S , A ) = Q t ( S , A ) + l [ R t + 1 + γ max Q t + 1 ( S , A ) Q t ( S , A ) ]
where  Q t + 1 ( S , A )  is the updated Q-value,  Q t ( S , A )  is the current value in Q-table;  l  is the learning rate;  R t + 1  denotes the immediate reward obtained by the agent after taking action;  max Q t + 1 ( S , A )  represents the maximum value in the row of the Q-table when the agent queries it for the state  s t + 1 . It can be seen that the reward function  R  not only affects the update of Q-table but also determines the action selection of the agent. This paper considers using the correlation coefficient value between variables as an immediate reward, and the reward function is defined as follows:
R t = C ( s t 1 , a t )
where  C  is the correlation coefficient matrix for all variables. The formula indicates that after making action  a t , the correlation coefficient value between the current state variable  s t 1  and the selected action variable  a t  will be given as a reward.
Due to the focus of Vine Copula on the collaborative relationships between variables, Kendall’s rank correlation coefficients can effectively assess whether the relative relationships are consistent. Therefore, the quantified variable correlation magnitude based on Kendall’s rank correlation coefficients is used as the immediate reward in Q-Learning. Figure 8 is a visualization of the Q-Learning selection of variables to form the Vine Tree.
The method of forming a Vine Tree in Vine Copula modeling is given above, and the complete modeling steps are explained below, including the following 3 steps.
(1) Probability integral transformation (PIT). This paper employs a rank-based transformation method for PIT. This method transforms the data into pseudo-observational samples with dependency. The transformation formula is as follows:
u j = R j / ( n + 1 )
where  u j  represents the pseudo-observation matrix,  R j  is the rank of the  j  variable in the sample, and  n  is the sample size.
(2) Selection of Copula function and parameters calculation. The initial Vine Tree was formed according to Figure 8, and then the Copula parameters between each two variables were calculated and the type of fitting function was selected. MLE is a method for calculating the parameters of a known distribution that best fits the data, maximizing the likelihood of the assumed distribution. It allows the Copula parameters to be directly estimated using the following formula [46]:
θ = A r g max θ t = 1 T ln C ( F j ( x j t ) , F k ( x k t ) ; θ )
where  A r g max θ ( · )  represents the parameter values  θ  that maximize the function output. This yields a set of Copula function parameters for each edge in Vine Tree. Subsequently, BIC is applied to select the function type that best fits each edge. BIC corrects the prior probability of an event using the Bayesian formula. The formula is as follows [47]:
B I C = ln ( n ) k 2 ln ( L )
where  k  represents the number of model parameters,  n  is the sample size, and  L  is the likelihood function. It takes into account the sample size, which helps prevent overfitting caused by excessive model complexity.
(3) Calculation of conditional CDF. The construction of the Vine Copula was carried out one by one with each Vine Tree, and from Figure 7, it is observed that the second Vine Tree appears to have a conditional variable. According to Joe [48], for variables j and k under a given set of conditional variables D, the Copula can be denoted as  C j , k D , and their conditional CDF can be expressed using the h function:
F ( x j | x k D ) = C j , k D ( F ( x j | x D ) , F ( x k | x D ) F ( x k | x D )
h x j | x k D ( F ( x j | x D ) , F ( x k | x D ) ) = F ( x j | x k D )
In the above equations,  x D  is the variable  x k D  with variable  k  excluded, representing the set of variables jointly connected by variables  j  and  k  in Vine Tree. The h function  h x j | x k D ( · )  provides a concise representation for computing CDF. Except for the first Vine Tree, the variable inputs for all other Vine Trees come from each CDF value of the previous, and iterative calculation completes Vine Copula modeling in these steps.

2.6.4. Probability Interval Results Based on PV Residual’s Quantile Regression

Since the mathematical analytical expression for the high-dimensional joint distribution of PV residuals established is highly complicated, this paper uses the h-inverse function form to solve CDF. The h-inverse is applied to invert the conditional distribution for the first variable, which is also the inverse form of h function [49]:
h x j | x k D 1 ( F ( x j | x k D ) , F ( x k | x D ) ) = F ( x j | x D )
The above equations indicate that h-inverse can eliminate variable  x k  from high-dimensional CDF, resulting in  F ( x j | x D ) . Taking a five-dimensional C-vine structure as an example, it can be observed that the derivation is as follows:
x 1 = w 1 x 2 = h 21 1 ( w 2 | 1 , w 1 ) x 3 = h 31 1 ( h 3 | 12 1 ( w 3 | 12 , w 2 | 1 ) , w 1 ) x 4 = h 41 1 ( h 4 | 12 1 ( h 4 | 123 1 ( w 4 | 123 , w 3 | 12 ) , w 2 | 1 ) , w 1 ) x 5 = h 51 1 ( h 5 | 12 1 ( h 5 | 123 1 ( h 5 | 1234 1 ( w 5 | 1234 , w 4 | 123 ) , w 3 | 12 ) , w 2 | 1 , w 1 )
where  x i  is the i-dimension variable to be solved, and  w i | 1 , , i 1 = h ( x i | x 1 , , x i 1 ) . It can be seen that the solution is dynamic, and the value obtained from a higher dimension is what is needed to solve in the lower dimension. Wherein, the highest-dimensional conditional variable  h 5 | 1234  is the input conditional probability, also known as the quantile  a , a ( 0 , 1 ) . Obtain different quantiles of PV residuals by inputting different quantiles.

3. Results and Discussion

3.1. Background of Data and Preprocessing

The data used in this paper are selected from the DKASC Alice Springs PV system datasets [50] (http://dkasolarcentre.com.au/source/alice-springs/dka-m11-3-phase, accessed on 19 August 2024) in Australia, covering the period from 1 June 2014 to 1 June 2016. The data included PV output (kW), wind speed (m/s), temperature (°C), relative humidity (%rh), wind direction (°), global horizontal radiation, daily average precipitation (mm), etc. The PV array-rated power is 26.5 kWp. According to the study and testing conducted in [51,52] on key factors affecting the performance of PV modules, initially, PV power, global horizontal radiation, temperature, relative humidity, wind speed, and daily average precipitation were selected for the study, and their temporal granularity was 5 min covering daily moments. Data preprocessing was performed using the DBSCAN method from [53] to detect outliers, fill missing values, and perform a series of operations. Based on the PV output characteristics of this region and the requirements for short-term prediction, the data were resampled to a daily cycle from 7:00 to 18:00, with the time step adjusted to 15 min. Consequently, there are 45 sampling points per day, totaling 32,940 datasets. The PV power data are decomposed into periodic, trend, and residual components using the STL method, as described in Section 2.2. Through multiple experiments, it was found that setting the data periodic size to 315 (one week) could yield trend features reflecting the PV output level. Due to the significant periodicity of PV output changes, setting the periodic size to multiples of daily sampling points generally results in periodic components with significant features. Figure 9 shows the STL decomposition results of the PV time series and the extraction of certain weather features. It can be observed that the trend component can present output levels that vary with seasons over a long time range and can show overall lower output levels within certain time windows. The periodic component demonstrates the daily variation pattern of PV output, with the sequence being stable. The distribution of the residual component is generally uniform but influenced by meteorological factors. In time ranges where weather features fluctuate greatly, the proportion of residual values tends to increase. Finally, data from 1 June 2014 to 31 May 2015 were selected as the training set, 1 June 2015 to 31 December 2015 were used as the validation set to adjust the model hyperparameters, and the remaining days were used as the testing set to verify the model’s generalization.

3.2. Performance Indicators and Model Architecture

This paper uses RMSE and MAPE to measure the accuracy of deterministic forecasting [54] and uses prediction interval coverage percentage (PICP) and the prediction interval normalized average width (PINAW) to measure the accuracy of probabilistic forecasting [55]. Among them, PICP measures the proportion of the predicted interval covering the true value, and PINAW calculates the width of the interval. Their formulas are as follows:
γ R M S E = 1 F i = 1 F ( y i y i ) 2
γ M A P E = 100 % F i = 1 F ( y i y i y i )
P I C P = i = 1 n c i n
δ a = 1 n i = 1 n U a ( x i ) L a ( x i )
where  y i y i  are the i-th actual and predicted values and F is the length of the prediction;  c i  determines whether the true value  y i  is within the interval and is a Boolean value. When  y i L i , U i c i = 1 ; otherwise,  c i = 0 L i  and  U i  represent the lower and upper bounds of the interval for the  i -th sample;  U a ( x i )  and  L a ( x i )  are the upper and lower bounds of the prediction interval for sample  x i  at a confidence level of  100 ( 1 a ) % a  is the quantile.
The architecture of all the models proposed in this paper is shown in Table 1. Among them, the TimeMixer and LSTM models use the Adam optimizer and employ Python 3.11.5 and pytorch2.1.1 + CPU deep learning frameworks, and the related hardware includes an Intel i5-13400 4.6 GHz-10cores CPU, 32 GB (Intel, Santa Clara, CA, USA) of onboard RAM, and 128 training batches per round. Vine Copula was modeled based on Matlab R2023a, where “W-P” is the variable connection, followed by the Copula function and the parameter values in parentheses. Among them, “P” is the residual component of PV, “H” is relative humidity, “R” is radiance, “T” is temperature, and “W” is wind speed.

3.3. Deterministic Forecasting Results of Periodic and Trend Components

In order to compare the prediction performance of the same component, this paper introduces Transformer [14], TCN [13], CNN-GRU, and XGBoost to predict two components and cross-verifies the TimeMixer and LSTM models proposed in this paper for different components. Figure 10 and Figure 11 show some window results of deterministic prediction of periodic and trend components using different prediction methods.
From Figure 10 and Figure 11, it can be seen that this paper has selected the most suitable forecasting model for each component. For periodic components with strong volatility, TimeMixer can effectively further extract different characteristics from the component, thereby learning the features of future changes. The predicted results are closest to the actual values and exhibit the best predictive performance. However, other models, such as attention-based Transformer, can only capture similar shapes of future output. The trend component presents a straight line within a short-term window, and all models can capture the corresponding features. However, LSTM can demonstrate excellent performance while maintaining low computational costs. The results of TimeMixer and Transformer are also relatively accurate, thanks to their more advanced paradigms, but they also bring some overfitting problems. CNN-based models do not have long-term memory capabilities and have poor performance; XGBoost based on classifiers is more suitable for classification problems and has worse performance in the field of temporal prediction.
In order to further explore the effectiveness and performance of the TimeMixer in extracting past information, a portion of the RMSE (kW) of the prediction results of TimeMixer and Transformer for periodic components under hyperparameter combinations were extracted, as shown in Table 2. Table 3 shows the RMSE, MAPE, and time consumption of all predicted results for two components by different models.
From Table 2, it can be seen that the number of layers in the PDM module greatly contributes to the improvement of predictive performance. The downsampling scale determines the number of scales at which the model extracts time series. The more scales, the higher the complexity of the model and the better it can learn complex features. The prediction effect based on STL component results mainly comes from the fact that inputting the component feature sequence into TimeMixer can significantly improve the prediction effect. The reason is that the model target undergoes multiple smoothing and feature separation by STL before being input into the PDM module. Essentially, the original decomposed components are further smoothed and decomposed into other features that do not belong to the original component features. Through multi-scale feature interaction, the model can better learn specific feature changes. Different prediction lengths represent different prediction tasks; for example, a prediction length of 90 represents the period components that need to be predicted for the next two days, while the input length represents the length of historical data input by the model. Table 3 shows that the TimeMixer model exhibits the best predictive performance. Meanwhile, as the TimeMixer model is based on an entire MLP runtime architecture, it greatly reduces the runtime compared to TCN and CNN-GRU, which require multiple convolutional layers. The Transformer, which is entirely based on the attention mechanism, has gone through multiple layers of encoders and decoders, each layer having multiple residual blocks requiring a large amount of parameter optimization, resulting in the longest training time.

3.4. Analysis of Probabilistic Forecasting Results

This paper quantifies uncertainty by constructing a Vine Copula model for PV residuals, using the MST-based Vine Copula as a comparison. Figure 12a shows the sampling results of the Vine Copula model optimized based on Q-Learning, and Figure 12b shows the CDF of the real PV residuals, the PV residuals generated by the proposed method, and the PV residuals generated by the MST-based Vine Copula model. It can be observed that the data distribution obtained through Q-Learning-based Vine Copula is highly similar to the original data, with good fitting accuracy. From the CDF results, the PV residual CDF obtained by Q-Learning-based Vine Copula is significantly closer to reality, thanks to Q-Learning’s optimization of variable structure, resulting in a higher fitting joint distribution.
By combining the deterministic prediction results of the periodic and trend components and the interval results of the QR estimation of Vine Copula, a complete PV power probabilistic forecasting result can be obtained. Comparing the Transformer model with the dropout (d-Transformer) and LSTM model based on QR (QR-LSTM), the former approximates the model uncertainty by setting a dropout layer and a certain random deactivation ratio to calculate the variance of the random output of the feedforward layer, while the latter uses pinballs to output different quantiles. Select 1 January 2016 (rainy day), 10 February 2016 (sunny day), 23 March 2016 (cloudy day), and 18 May 2016 (cloudy day) from the test samples to validate the probability prediction method. Provide prediction intervals at 95%, 80%, and 70% confidence levels, which are shown in Figure 13. Table 4 presents the probabilistic forecasting performance indicators of different methods at different confidence levels.
According to Figure 13 and Table 4, it is clear that the prediction intervals provided by this paper’s method can cover the true values at corresponding confidence levels, and their PICPs are consistently higher than those of other methods. While maintaining the accuracy of probabilistic predictions, it provides relatively narrow interval widths. As the confidence level decreases, the interval width narrows, resulting in a certain decrease in PICP. When the confidence level of the d-Transformer drops to 70%, its PICP fails to meet requirements, whereas the proposed method consistently maintains a higher coverage rate than the corresponding confidence level. This is due to more accurate deterministic predictions of components based on time series decomposition and high-fidelity modeling of uncertain components. At a 95% confidence level, this method improves accuracy by 4% compared to d-Transformer while reducing interval width by 19%; the most significant accuracy improvement is 12% at the lowest confidence level of 70%, with PINAW still 0.02 kW less than d-Transformer.
In summary, through various experimental comparisons, the proposed method maintains state-of-the-art performance in all aspects, and its overall performance reflects this. In deterministic prediction, the TimeMixer model in this paper reduces the RMSE metric by 14.8% and the MAPE metric by 22% compared to the currently excellent Transformer model. Considering practical power system applications, using LSTM for trend component prediction is necessary, as state-of-the-art models improve accuracy by 9.3% but consume 62.5% more time, which is very unfavorable for the orderly optimization of solar energy resource scheduling. The final probabilistic prediction results show that this method can effectively improve the accuracy of quantifying short-term PV output uncertainty, and the obtained prediction intervals can reflect real output changes, complementing the limitations of deterministic predictions. Applying this method in practice can help power grid enterprises and power sales companies better understand the output patterns of renewable energy under different weather conditions or time periods, allowing them to formulate optimization strategies in advance to improve the stability and reliability of other energy generation or load consumption. Based on the interval characteristics at different confidence levels, which need to be carefully selected to effectively quantify benefits and risks, thereby enhancing the operational efficiency of the power system.

4. Conclusions

Faced with a new power system with continuously increasing penetration of renewable energy, this paper proposes a novel probabilistic forecasting method based on the combination of time series decomposition and Vine Copula uncertainty modeling to quantify the uncertainty of PV output. The goal is to use the periodic, trend, and residual components under STL decomposition to establish models to obtain probabilistic forecasting results of PV power, and at the same time, rich experimental comparisons have been conducted on the models established for each component to demonstrate their progressiveness. Using the PV renewable energy forecasting method proposed in this paper, various PV output scenarios can be predicted in advance to prepare operational strategies to promote the safe and stable operation of the power grid and improve the economic benefits of electricity sellers. The main contributions are as follows.
(1) Compared with the probabilistic forecasting of single models (d-Transformer, QR-LSTM), the proposed method fully utilizes various temporal features of the PV output time series and combines the advantages of different models to provide the most advanced prediction results. Firstly, modeling different features of the time series can effectively improve the learning ability of the model, greatly enhancing the accuracy of deterministic predictions. Considering the strong volatility and the laws at different scales of the periodic component, using TimeMixer to further decompose the past information of the periodic component and extract multi-scale temporal features can further improve the prediction performance. For the randomly unpredictable residual component, using the Vine Copula model for multi-factor correlation modeling can obtain conditional probability results that better reflect the real information. The proposed optimization method using Q-Learning for the Vine Tree can improve the model’s fitting power.
(2) In the deterministic forecasting of periodic components, the proposed TimeMixer model outperforms Transformer, TCN, CNN-GRU, LSTM, and XGBoost models with lower RMSE and MAPE. Across all test samples, the RMSE of the TimeMixer model was reduced by 0.1404 kW, 0.18 kW, 0.5304 kW, 0.6886 kW, and 1.4821 kW compared to other models, respectively. The MAPE was the lowest at 0.9167%, while the others were 1.1755%, 1.4022%, 1.8814%, 1.9549%, and 3.3816%, respectively. Compared to the second-best performing Transformer, it required 66 min of training time, a speed improvement of 24 min, and the total time was only 15 min slower than the fastest training LSTM.
(3) The probabilistic prediction results demonstrate that the method combining deterministic prediction models with uncertainty modeling under the time series decomposition proposed in this paper exhibits the best performance at all confidence levels and across all test samples.
The method proposed in this paper does not consider the correlation between different PV power stations. In the future, the possibility of applying this method to short-term probabilistic forecasting of distributed PV power generation will be explored.

Author Contributions

Conceptualization, X.W. and Z.L.; methodology, Z.L. and C.F.; software, Z.L. and X.L.; validation X.L. and W.Y.; formal analysis, W.Y. and X.H.; investigation, X.W. and L.Y.; resources, Z.L.; data curation, C.F.; writing—original draft preparation, Z.L.; writing—review and editing, X.W. and Z.Z.; visualization, J.W.; supervision, Z.Z.; project administration, Z.Z. and X.W.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62273104.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We thank the editors and reviewers for their helpful suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Jafarizadeh, H.; Yamini, E.; Zolfaghari, S.M.; Esmaeilion, F.; Assad, M.E.H.; Soltani, M. Navigating Challenges in Large-scale Renewable Energy Storage: Barriers, Solutions, and Innovations. Energy Rep. 2024, 12, 2179–2192. [Google Scholar] [CrossRef]
  2. Xin, B.; Shan, B.; LI, Q.; Yan, H.; Wang, C. Rethinking of the “Three Elements of Energy” Toward Carbon Peak and Carbon Neutrality. Proc. CSEE 2022, 42, 3117–3126. [Google Scholar] [CrossRef]
  3. Fang, G.; Zhou, H.; Meng, A.; Tian, L. How to Crack the Impossible Triangle of New Energy Coupled System—Evidence from China. Appl. Energy 2024, 374, 124065. [Google Scholar] [CrossRef]
  4. Liu, Z.; Du, Y. Evolution Towards Dispatchable PV using Forecasting, storage, and curtailment: A review. Electr. Power Syst. Res. 2023, 223, 109554. [Google Scholar] [CrossRef]
  5. Sun, X.; Tian, Z. A Novel Air Quality Index Prediction Model based on Variational Mode Decomposition and SARIMA-GA-TCN. Process Saf. Environ. Prot. 2024, 184, 961–992. [Google Scholar] [CrossRef]
  6. Guo, Z.; Wei, F.; Qi, W.; Han, Q.; Liu, H.; Feng, X.; Zhang, M. A Time Series Prediction Model for Wind Power Based on the Empirical Mode Decomposition–Convolutional Neural Network–Three-Dimensional Gated Neural Network. Sustainability 2024, 16, 3474. [Google Scholar] [CrossRef]
  7. Yu, C.; Li, Y.; Chen, Q.; Lai, X.; Zhao, L. Matrix-based Wavelet Transformation Embedded in Recurrent Neural Networks for Wind Speed Prediction. Appl. Energy 2022, 324, 119692. [Google Scholar] [CrossRef]
  8. Zhang, D.; Wang, S.; Liang, Y.; Du, Z. A Novel Combined Model for Probabilistic Load Forecasting based on Deep Learning and Improved Optimizer. Energy 2023, 264, 126172. [Google Scholar] [CrossRef]
  9. Hong, T.; Pinson, P.; Weron, R.; Yang, D.; Zareipour, H. Energy Forecasting: A Review and Outlook. IEEE Open Access J. Power Energy 2020, 7, 376–388. [Google Scholar] [CrossRef]
  10. Zhang, Q.; Chen, J.; Xiao, G.; He, S.; Deng, K. TransformGraph: A Novel Short-term Electricity Net Load Forecasting Model. Energy Rep. 2023, 9, 2705–2717. [Google Scholar] [CrossRef]
  11. Dolara, A.; Leva, S.; Manzolini, G. Comparison of Different Physical Models for PV Power Output Prediction. Sol. Energy 2015, 119, 83–99. [Google Scholar] [CrossRef]
  12. Ahmed, R.; Sreeram, V.; Mishra, Y.; Arif, M.D. A Review and Evaluation of the State-of-the-Art in PV Solar Power Forecasting: Techniques and optimization. Renew. Sustain. Energy Rev. 2020, 124, 109792. [Google Scholar] [CrossRef]
  13. Zhao, Y.; Wang, H.; Kang, L.; Zhang, Z. Temporal Convolution Network-based Short-term Electrical Load Forecasting. Trans. China Electrotech. Soc. 2020, 39, 1242–1251. [Google Scholar] [CrossRef]
  14. Zhao, H.; Wu, Y.; Wen, K.; Sun, C.; Xue, Y. Short-Term Load Forecasting for Multiple Customers in a Station Area based on Spatial-Temporal Attention Mechanism. Trans. China Electrotech. Soc. 2024, 39, 2104–2115. [Google Scholar] [CrossRef]
  15. Mayer, M.J.; Biró, B.; Szücs, B.; Aszódi, A. Probabilistic Modeling of Future Electricity Systems with High Renewable Energy Penetration using Machine Learning. Appl. Energy 2023, 336, 120801. [Google Scholar] [CrossRef]
  16. Wang, S.; Wu, H.; Shi, X.; Hu, T.; Luo, H.; Ma, L.; Zhang, J.Y.; Zhuo, J. TIMEMIXER: Decomposable Multiscale Mixing for Time Series Forecasting. arXiv 2024, arXiv:2405.14616. [Google Scholar]
  17. Meer, D.W.; Widén, J.; Munkhammar, J. Review on Probabilistic Forecasting of Photovoltaic Power Production and Electricity Consumption. Renew. Sustain. Energy Rev. 2018, 81, 1484–1512. [Google Scholar] [CrossRef]
  18. David, M.; Ramahatana, F.; Trombe, P.J.; Lauret, P. Probabilistic Forecasting of the Solar Irradiance with Recursive ARMA and GARCH Models. Sol. Energy 2016, 133, 55–72. [Google Scholar] [CrossRef]
  19. Fernandez-Jimenez, L.A.; Monteiro, C.; Ramirez-Rosado, I.J. Short-term Probabilistic Forecasting Models using Beta Distributions for Photovoltaic plants. Energy Rep. 2023, 9, 495–502. [Google Scholar] [CrossRef]
  20. Salinas, D.; Flunkert, V.; Gasthaus, J.; Januschowski, T. Probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 2020, 36, 1181–1191. [Google Scholar] [CrossRef]
  21. Bacher, P.; Madsen, H.; Nielsen, H.A. Online short-term solar power forecasting. Sol. Energy 2009, 83, 1772–1783. [Google Scholar] [CrossRef]
  22. Chai, S.; Niu, M.; Xu, Z.; Lai, L.; Wong, K.P. Nonparametric Conditional Interval Forecasts for PV Power Generation Considering the Temporal Dependence. In Proceedings of the IEEE Power Energy Society General Meeting, Boston, MA, USA, 17–21 July 2016; pp. 1–5. [Google Scholar] [CrossRef]
  23. Mitrentsis, G.; Lens, H.; Boland, J. An Interpretable Probabilistic Model for Short-term Solar Power Forecasting using Natural Gradient Boosting. Appl. Energy 2022, 309, 118473. [Google Scholar] [CrossRef]
  24. Meer, D.W.; Shepero, M.; Svensson, A.; Widén, J.; Munkhammar, J. Probabilistic Forecasting of Electricity Consumption, Photovoltaic Power Generation and Net Demand of An Individual Building using Gaussian Processes. Appl. Energy 2018, 213, 195–207. [Google Scholar] [CrossRef]
  25. Lin, F.; Zhang, Y.; Dong, Q.; Cui, G.; Wang, J.; Zhu, M. Probability Prediction of Photovoltaic Output Based on Quantile Interpolation and Deep Autoregressive Network. Autom. Electr. Power Syst. 2023, 47, 79–87. [Google Scholar]
  26. Ben, S.B.; Huser, R.; Hyndman, R.J.; Genton, M.G. Forecasting Uncertainty in Electricity Smart Meter Data by Boosting Additive Quantile Regression. IEEE Trans. Smart Grid 2016, 7, 2448–2455. [Google Scholar] [CrossRef]
  27. Zhang, K.; Cai, S.; Zhang, T.; Pan, Y.; Wang, S.; Lin, Z. Medium- and Long-term Industry Load Forecasting Method Considering Multi-dimensional Temporal Features. Autom. Electr. Power Syst. 2023, 47, 104–114. [Google Scholar] [CrossRef]
  28. Sun, Y.; Cheng, K.; Xu, Q.; Li, D.; Li, Y. Identification of Weak Link for Active Distribution Network Considering Correlation of Photovoltaic Output. Autom. Electr. Power Syst. 2022, 46, 96–103. [Google Scholar]
  29. Li, Z.; Li, P.; Yuan, Z.; Xia, J.; Tian, D. Optimized Utilization of Distributed Renewable Energies for Island Microgrid Clusters Considering Solar-wind Correlation. Electr. Power System. Res. 2022, 206, 107822. [Google Scholar] [CrossRef]
  30. Müller, A.; Reuber, M. A Copula-based Time Series Model for Global Horizontal Irradiation. Int. J. Forecast. 2023, 39, 869–883. [Google Scholar] [CrossRef]
  31. Schinke-Nendza, A.; von Loeper, F.; Osinski, P.; Schaumann, P.; Schmidt, V.; Weber, C. Probabilistic Forecasting of Photovoltaic Power Supply—a Hybrid Approach using D-vine Copulas to Model Spatial Dependencies. Appl. Energy 2021, 304, 117599. [Google Scholar] [CrossRef]
  32. Zhang, R.; Li, J.; Yang, Z. Prediction of Photovoltaic Power Generation Based on D-vine Copula Model in Typical Climates. In Proceedings of the IEEE 11th Data Driven Control and Learning Systems Conference (DDCLS), Chengdu, China, 3–5 August 2022. [Google Scholar] [CrossRef]
  33. Wang, Z.; Wang, W.; Liu, C.; Wang, Z.; Hou, Y. Probabilistic Forecast for Multiple Wind Farms Based on Regular Vine Copulas. IEEE Trans. Power Syst. 2018, 33, 578–589. [Google Scholar] [CrossRef]
  34. Von Loeper, F.; Kirstein, T.; Idlbi, B.; Ruf, H.; Heilscher, G.; Schmidt, V. Probabilistic Analysis of Solar Power Supply Using D-Vine Copulas Based on Meteorological Variables. Mathematical Modeling, Simulation and Optimization for Power Engineering and Management; Springer: Cham, Switzerland, 2021; pp. 51–68. [Google Scholar] [CrossRef]
  35. Tebong, N.K.; Simo, T.; Takougang, A.N.; Ntanguen, P.H. STL-decomposition Ensemble Deep Learning Models for Daily Reservoir Inflow Forecast for Hydroelectricity Production. Heliyon 2023, 9, e16456. [Google Scholar] [CrossRef]
  36. Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition Transformers with Auto-Correlation for Long-term Series Forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar] [CrossRef]
  37. Wang, H.; Yan, S.; Ju, D.; Ma, N.; Fang, S.; Wang, S.; Li, H.; Zhang, T.; Xie, Y.; Wang, J. Short-Term Photovoltaic Power Forecasting Based on a Feature Rise-Dimensional Two-Layer Ensemble Learning Model. Sustainability 2023, 15, 15594. [Google Scholar] [CrossRef]
  38. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
  39. Kong, W.; Dong, Y.Z.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network. IEEE Trans. Smart Grid 2019, 10, 841–851. [Google Scholar] [CrossRef]
  40. Nelsen, R.B. An Introduction to Copulas; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar] [CrossRef]
  41. Czado, C.; Nagler, T. Vine Copula based Modeling. Annu. Rev. Stat. Its Appl. 2022, 9, 453–477. [Google Scholar] [CrossRef]
  42. You, H.; Wang, P.; Li, Z. Feature selection for label distribution learning based on the statistical distribution of data and fuzzy mutual information. Inf. Sci. 2024, 679, 121085. [Google Scholar] [CrossRef]
  43. Cheng, J.; Sun, J.; Yao, K.; Cao, X. A Variable Selection Method based on Mutual Information and Variance Inflation Factor. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 268, 120652. [Google Scholar] [CrossRef]
  44. Shi, D.; Xu, H.; Wang, S.; Hu, J.; Chen, L.; Yin, C. Deep Reinforcement Learning based Adaptive Energy Management for Plug-in Hybrid Electric Vehicle with Double Deep Q-network. Energy 2024, 305, 132402. [Google Scholar] [CrossRef]
  45. Iqbal, M.; Namoun, A.; Tufail, A.; Kim, K.H. Deep Learning Methods Utilization in Electric Power Systems. Energy Rep. 2023, 10, 2138–2151. [Google Scholar] [CrossRef]
  46. Krishna, A.B.; Abhyankar, A.R. Time-coupled Day-ahead Wind Power Scenario Generation: A Combined Regular Vine Copula and Variance Reduction Method. Energy 2023, 265, 126173. [Google Scholar] [CrossRef]
  47. Wang, X.; Ahn, S.H. Real-time Prediction and Anomaly Detection of Electrical Load in a Residential Community. Appl. Energy 2020, 259, 114145. [Google Scholar] [CrossRef]
  48. Joe, H. Dependence Modeling with Copulas; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
  49. Zhao, W.; Wang, W.; Liu, C.; Wang, B. Forecasted Scenarios of Regional Wind Farms Based on Regular Vine Copulas. J. Mod. Power Syst. Clean Energy 2020, 8, 77–85. [Google Scholar] [CrossRef]
  50. Dka Solar Center. 26.5kW, eco-Kinetics. Available online: https://dkasolarcentre.com.au/source/alice-springs/dka-m11-3-phase (accessed on 1 April 2024).
  51. Saeed, A.; Rehman, S.; Al-Sulaiman, F.A. Study of a grid-connected floating photovoltaic power plant of 1.0 MW installed capacity in Saudi Arabia. Heliyon 2024, 10, e35180. [Google Scholar] [CrossRef] [PubMed]
  52. Zayed, M.E.; Kabeel, A.E.; Shboul, B.; Ashraf, W.M.; Ghazy, M.; Irshad, K.; Rehman, S.; Zayed, A.A.A. Performance augmentation and machine learning-based modeling of wavy corrugated solar air collector embedded with thermal energy storage: Support vector machine combined with Monte Carlo simulation. J. Energy Storage 2023, 74, 109533. [Google Scholar] [CrossRef]
  53. Deng, Y. Research on Regional Photovoltaic Power Prediction Based on Spatiotemporal Correlation; Zhejiang University: Zhejiang, China, 2021. [Google Scholar] [CrossRef]
  54. Chang, W.; Chen, X.; He, Z.; Zhou, S. A Prediction Hybrid Framework for Air Quality Integrated with W-BiLSTM(PSO)-GRU and XGBoost Methods. Sustainability 2023, 15, 16064. [Google Scholar] [CrossRef]
  55. Sansine, V.; Ortega, P.; Hissel, D.; Hopuare, M. Solar Irradiance Probabilistic Forecasting Using Machine Learning, Metaheuristic Models and Numerical Weather Predictions. Sustainability 2022, 14, 15260. [Google Scholar] [CrossRef]
Figure 1. Framework of the proposed methodology.
Figure 1. Framework of the proposed methodology.
Sustainability 16 08542 g001
Figure 2. Flowchart of the STL decomposition process.
Figure 2. Flowchart of the STL decomposition process.
Sustainability 16 08542 g002
Figure 3. The overall framework of TimeMixer.
Figure 3. The overall framework of TimeMixer.
Sustainability 16 08542 g003
Figure 4. Network layer connection methods in (a) seasonal mixing, (b) trend mixing, and (c) complementary prediction.
Figure 4. Network layer connection methods in (a) seasonal mixing, (b) trend mixing, and (c) complementary prediction.
Sustainability 16 08542 g004
Figure 5. The unit structure of the LSTM network.
Figure 5. The unit structure of the LSTM network.
Sustainability 16 08542 g005
Figure 6. The sliding window processing process for deterministic prediction of periodic and trend components: (a) periodic component prediction, (b) trend component prediction.
Figure 6. The sliding window processing process for deterministic prediction of periodic and trend components: (a) periodic component prediction, (b) trend component prediction.
Sustainability 16 08542 g006
Figure 7. Example of 5-dimension C-Vine Copula.
Figure 7. Example of 5-dimension C-Vine Copula.
Sustainability 16 08542 g007
Figure 8. Q-Learning selects variables to form a Vine Tree process.
Figure 8. Q-Learning selects variables to form a Vine Tree process.
Sustainability 16 08542 g008
Figure 9. STL decomposition results of PV time series and extracted weather feature series.
Figure 9. STL decomposition results of PV time series and extracted weather feature series.
Sustainability 16 08542 g009
Figure 10. Prediction results of periodic components using different methods under the same setting of inputting historical data for one day and predicting future data for the next day.
Figure 10. Prediction results of periodic components using different methods under the same setting of inputting historical data for one day and predicting future data for the next day.
Sustainability 16 08542 g010
Figure 11. Prediction results of trend components using different methods under the same setting of inputting historical data for one day and predicting future data for the next day.
Figure 11. Prediction results of trend components using different methods under the same setting of inputting historical data for one day and predicting future data for the next day.
Sustainability 16 08542 g011
Figure 12. Demonstration of the modeling effect of the Vine Copula model: (a) Comparison of scatter plots and distribution plots of sampled data and real data. (b) Comparison between the PV residual CDF obtained by the MST-based Vine Copula, Q-Learning-based Vine Copula, and the true value.
Figure 12. Demonstration of the modeling effect of the Vine Copula model: (a) Comparison of scatter plots and distribution plots of sampled data and real data. (b) Comparison between the PV residual CDF obtained by the MST-based Vine Copula, Q-Learning-based Vine Copula, and the true value.
Sustainability 16 08542 g012
Figure 13. The probabilistic forecasting results of different methods at 95%, 80%, and 70% confidence levels. (a) 1 January 2016 (rainy day), (b) 10 February 2016 (sunny day), (c) 23 March 2016 (cloudy day), (d) 18 May 2016 (cloudy day).
Figure 13. The probabilistic forecasting results of different methods at 95%, 80%, and 70% confidence levels. (a) 1 January 2016 (rainy day), (b) 10 February 2016 (sunny day), (c) 23 March 2016 (cloudy day), (d) 18 May 2016 (cloudy day).
Sustainability 16 08542 g013
Table 1. The model parameters established in this paper.
Table 1. The model parameters established in this paper.
ModelParameter Parameter
TimeMixerInput length45Downsampling scales3
Prediction length45Num of Encoder layers10
Layers of PDM3Channels128
LSTMSliding window width90Num of layers2
Hidden layer 1 size128Hidden layer 2 size256
Vine CopulaDimension5VariablesP, H, R, T, WTypeC-Vine
Vine Tree 1W-P: clayton (0.7162), W-H: surgumbel (1.5680), W-R: t([−0.7330, 1.63 × 107]), W-T: t([0.5197, 1.0374 × 107])
Vine Tree 2R-P|W: clayton (0.2157), R-H|W: amhaq (−0.6675), R-T|W: gauss (−0.0803)
Vine Tree 3T-P|RW: amhaq (−0.0924), T-P|RW: gauss (−0.1094)
Vine Tree 4H-P|RTW: placket (0.6425)
Table 2. RMSE (kW) of TimeMixer and Transformer’s prediction results for periodic components under different parameter combinations.
Table 2. RMSE (kW) of TimeMixer and Transformer’s prediction results for periodic components under different parameter combinations.
Input LengthPrediction LengthTimeMixerTransformer
Downsampling Scales Num of Encoders
11234223
Layers of PDMNum of Attention
12334488
45451.75441.17700.85110.80860.81192.24121.18710.9490
902.06671.45021.10140.99470.91703.25951.92341.2202
1352.45091.69101.33191.11731.10213.75132.51101.4711
90451.52281.43170.85400.80920.81231.80421.11400.8912
901.89441.79900.90110.81170.83452.17591.52280.9405
135451.40111.33901.04900.83120.85741.72051.11240.9109
901.77031.62721.25561.14701.15531.89681.26271.1242
1352.30192.11451.57111.37251.13492.77101.39211.1771
Table 3. Prediction performance of different models for periodic and trend components.
Table 3. Prediction performance of different models for periodic and trend components.
TargetModelTimeMixerLSTMTransformerTCNCNN-GRUXGBoost
Periodic componentRMSE (kW)0.80861.49720.94900.98861.33902.2907
MAPE (%)0.91671.95491.17551.40221.88143.3816
Cost time (min)422766615229
Trend componentRMSE (kW)0.07980.08800.08920.11280.12320.2031
MAPE (%)0.10220.11370.11590.16800.21390.4410
Cost time (min)392461434528
Table 4. Performance metrics of probabilistic forecasting results generated by different methods at 95%, 80%, and 70% confidence levels.
Table 4. Performance metrics of probabilistic forecasting results generated by different methods at 95%, 80%, and 70% confidence levels.
Confidence LevelModelPICPPINAW (kW)
95%Proposed method11.12
d-Transformer0.961.39
QR-LSTM0.891.81
80%Proposed method0.921.03
d-Transformer0.841.18
QR-LSTM0.771.49
70%Proposed method0.810.88
d-Transformer0.690.90
QR-LSTM0.621.07
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, X.; Li, Z.; Fu, C.; Liu, X.; Yang, W.; Huang, X.; Yang, L.; Wu, J.; Zhao, Z. Short-Term Photovoltaic Power Probabilistic Forecasting Based on Temporal Decomposition and Vine Copula. Sustainability 2024, 16, 8542. https://doi.org/10.3390/su16198542

AMA Style

Wang X, Li Z, Fu C, Liu X, Yang W, Huang X, Yang L, Wu J, Zhao Z. Short-Term Photovoltaic Power Probabilistic Forecasting Based on Temporal Decomposition and Vine Copula. Sustainability. 2024; 16(19):8542. https://doi.org/10.3390/su16198542

Chicago/Turabian Style

Wang, Xinghua, Zilv Li, Chenyang Fu, Xixian Liu, Weikang Yang, Xiangyuan Huang, Longfa Yang, Jianhui Wu, and Zhuoli Zhao. 2024. "Short-Term Photovoltaic Power Probabilistic Forecasting Based on Temporal Decomposition and Vine Copula" Sustainability 16, no. 19: 8542. https://doi.org/10.3390/su16198542

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop