1. Introduction
Due to the energy transition and decarbonization of the energy sector, a high degree of development and utilization of solar energy has become a priority in global energy research. In this context, PV generation, as the primary method of solar energy utilization, has moved into the large-scale development stage [
1,
2,
3]. However, PV power has high variability depending on weather conditions, resulting in huge challenges for distribution network security and stable operation [
4]. The randomness and uncertainty of PV power hinder its pace of grid connection. Consequently, the development of PV power forecasting technologies has garnered significant attention worldwide.
As a form of time series existence, PV power series also presents intricate time series characteristics such as non-stationarity and volatility due to the complex influence of multiple factors in the real world. Especially for long time spans, the deep mixing of multiple variation features such as rising levels, regular fluctuations, and random mutations brings severe challenges to the forecasting task. To address this issue, the more accepted paradigm is to use methods like moving average decomposition [
5], empirical modal decomposition (EMD) [
6], and wavelet decomposition [
7] to decompose the complex time series into sub-series with independent features and more predictability. However, these methods are sensitive to outliers and require the artificial selection of high- and low-frequency components as different features. Finding matching models for different component characteristics to extract feature information separately is much more accurate than using a single model to predict the complete sequence [
8]. Therefore, a decomposition method that can give different components a clear physical meaning and higher reliability must be adopted to facilitate the model to learn specific component features better.
PV power forecasting methods can be divided into deterministic forecasting and probabilistic forecasting based on the type of prediction [
9]. With the increasing scale of PV grid connection and the growing requirements for power supply quality, the reliability of PV deterministic forecast models, such as artificial neural network (ANN), auto-regression moving average (ARMA) [
10], and physical models [
11], is decreasing. A comprehensive overview of mainstream deterministic forecast models is provided by R. Ahmed et al. [
12], which offers an elaborate introduction to the categorization of PV power forecasting techniques as well as the principles behind state-of-the-art forecast models. Furthermore, it is observed in the study that PV time series show different time-varying characteristics when observed at different scales, and future changes are a blend of multiple scale characteristics. For example, at smaller sampling frequencies (e.g., hours, days), the time series can demonstrate detail-rich fluctuations and short-term cyclical changes, while at time scales with larger frequencies (e.g., weeks, months), macro trends and long-term cyclical fluctuations are more apparent. The difficulty in making accurate time series forecasts is how to account for the multi-scale temporal variation. For more advanced paradigms capturing multi-scale temporal features, models such as temporal convolutional network (TCN) [
13] and Transformer [
14] are widely recognized for their predictive performance. However, the former suffers from dimensionality limitations, which makes it difficult to mine long-term dependencies, and the latter introduces a self-attention mechanism that requires a large amount of computational resources. It restricts the application in power systems. In [
15], a multilayer perceptron (MLP) is used to establish the relationship between long-term meteorological data and multi-scale temporal data, but the model complexity is too low to deal with high-dimensional variable nonlinearity. The authors in [
16] proposed a TimeMixer multi-scale mixing architecture, which achieves bidirectional coarse- and fine-scale mixing in time series characterization and integrates multi-scale historical information through multiple predictors in the forecasting stage. TimeMixer is able to capture the characteristics of the changes in time series under different scales of observation, and it is entirely based on the MLP-based architecture, which achieves high efficiency comparable to linear models and state-of-the-art performance.
In a recent review of probabilistic forecasts by Meer, D.W. et al. [
17], probabilistic forecasts are divided into parametric and nonparametric methods. Constructing prediction intervals by fitting a known density function to the errors of a forecast with Gaussian distribution [
18] or beta distribution [
19] is a common parametric method. Salinas D et al. [
20] constructed a negative log-likelihood function to find the parameters of the probability distribution but it has poor model prediction accuracy. Assuming a distribution beforehand leads to a significant reduction in the accuracy of parametric methods. Nonparametric methods derive distribution information directly from the original data characteristics, which have stronger generalization and performance. Quantile regression (QR) [
21] is the most common nonparametric method; others include kernel density estimation (KDE) [
22], bootstrap [
23], Gaussian processes (GP) [
24], and so on. More recently, nonparametric methods have been combined with ANN. F. Lin et al. [
25] derive the closed analytic form of the fractional integral of the continuous rank probability score to be used as a loss function for training. However, due to the low interpretability of machine learning and the fact that each quantile is predicted independently, it can only reflect partial probability information. It also suffers from quantile crossing problems that violate the monotonicity property [
26]. Zhang et al. [
27] introduces the strategy of decomposition, combining predictions of other variables with KDE-modeled residual component distributions to obtain probability intervals. However, the challenge in selecting optimal bandwidth factors limits its ability to accurately reflect the sample’s true distribution.
Probabilistic forecasting results need to reflect the true probability distribution of a random variable to accurately quantify uncertainty. Y. Sun et al. [
28] used Copula to establish a joint probability distribution between two PV plants, which provides ideas for portraying the correlation between two variables. Z.L. Li et al. [
29] proposed a combined Copula function to establish the solar/wind joint PDF in each time slot. Müller, A. et al. [
30] proposed a Copula-based time series model, which described the dependence between hourly and daily time series. However, Copula suffers from poor fitting of multidimensional variables. In this context, a Vine Copula approach is proposed to describe the spatial dependence of PV power forecast errors from physical models across multiple grid nodes by Schinke-Nendza A. et al. [
31]. R. Zhang et al. [
32] used fuzzy C-means to cluster data under different climate conditions and then established D-Vine Copula models for three typical climates. All of the above studies directly model the multivariate dependence structure of the complete PV series and variables. Although better reflecting the original probability distribution information, the results are unable to meet the standard error requirements for short-term forecasting in the fine-scale observation. In this paper, the main accuracy improvement task is given to the deep learning model, and the uncertainty quantification is mainly performed through Vine Copula, which is a framework implemented by time series decomposition and modeling the components separately.
Vine Copula uses the “Vine tree” in graph theory to represent the correlation between different variables, which is represented by the connection of variable points. Commonly used modeling approaches like [
31,
32,
33,
34] use Kendall’s rank correlation coefficients to measure variable dependence, then apply maximum spanning tree (MST) to select Vine trees with the highest coefficient sum. However, MST often yields locally optimal, suboptimal dependency structures. This paper proposes a novel Vine Copula structure optimization method using Q-Learning to select variable connections for Vine trees.
To sum up, this paper proposes a probabilistic forecasting method for PV power that considers different component characteristics under time series decomposition for constructing prediction models and probability modeling. The innovation lies in developing a framework that combines deep learning prediction models with multidimensional variable uncertainty quantification models based on time series feature decomposition, where different features of the time series are represented as periodic, trend, and residual components. The objectives include: (1) accurately predicting the periodic component of PV power via TimeMixer; (2) efficiently predicting the trend component using LSTM’s long-term memory capability; (3) constructing a multidimensional joint distribution based on Q-Learning optimized Vine Copula to quantify its stochastic process and obtain probability distribution results for the forecast time using QR. The detailed implementation process of this method is as follows. Firstly, STL is used to decompose the PV time series into periodic, trend, and residual components. A TimeMixer-based periodic component prediction model and an LSTM-based trend component prediction model are constructed. The mutual information (MI) method and variance inflation factor (VIF) method are used to screen highly correlated influencing factor variables for the PV residual component for subsequent Vine Copula modeling. Q-Learning is used to construct the variable connection relationships of the screened variables. Then, the maximum likelihood estimation (MLE) and Bayesian information criterion (BIC) are used to determine the parameters of the Vine tree, forming an optimal Vine Copula structure to generate a joint probability distribution of PV residuals and multiple factors. Finally, QR is performed based on the derived conditional distribution of PV residuals to obtain interval results. Probabilistic forecasts are obtained by combining deterministic predictions of periodic and trend components with uncertainty models of residual components. Simulation verifications are conducted using PV output information collected from the DKASC Alice Springs PV station in Australia. Results demonstrate that compared with Transformer, TCN, CNN-GRU, LSTM, XGBoost, and traditional Vine Copula methods, the proposed method exhibits more powerful performance in probabilistic forecasting. Meanwhile, each model utilized in this method can effectively handle the corresponding PV component information, outperforming several mentioned state-of-the-art models. The main contributions of this paper are as follows.
(1) This study proposes a probabilistic forecasting method that combines deterministic prediction models with uncertainty models under time series feature decomposition. Appropriate prediction models according to the characteristics of each decomposed component improved prediction accuracy, including STL, TimeMixer, LSTM, and Q-Learning optimized Vine Copula models.
(2) The experiments involve multi-faceted comparisons, verifying the effectiveness of probabilistic forecasting under the decomposition framework, the advanced nature of the TimeMixer model and LSTM model in predicting respective components, and the effectiveness of Q-Learning optimized Vine Copula in uncertainty modeling.
(3) This study verifies that the TimeMixer model performs better in multi-scale time series feature extraction and prediction compared to some state-of-the-art models and that improvements in deterministic prediction efficacy are beneficial to probabilistic forecasting results.
The remaining parts of the article are arranged as follows: various models, methods, and frameworks involved are presented in
Section 2; data introduction and preprocessing, modeling effects, prediction results, comparisons, and discussions are given in
Section 3;
Section 4 provides detailed conclusions and suggestions for future research.
2. Methodology
2.1. Framework for Proposed Methods
The general framework of this paper for probabilistic PV power forecasting is shown in
Figure 1.
1. Data preprocessing. After obtaining the original data, the first step is to resample the interval of the data according to the short-term prediction of the need for resampling data, remove abnormal data, and fill in missing values after normalization. The above process needs to be carried out on all the original data obtained. STL decomposition is performed for the PV series to obtain the period component, trend component, and residual component, and finally, the training set is divided for all the data.
2. Deterministic forecasting. Deterministic forecasting includes the PV periodic component and trend component, with these two as the targets to construct TimeMixer and LSTM forecasting models, respectively. Historical data and meteorological data are their inputs.
3. Uncertainty modeling. MI and VIF are used to screen the variables that are strongly correlated with the PV residual component and define the number of variable dimensions, and Q-Learning is used to select the variable connection to determine the initial structure of the Vine Tree. The parameters and types of Copula functions in the Vine Tree are calculated according to MLE and BIC to obtain the complete Vine Copula model. The conditional distribution function of the PV residual component in the multidimensional joint distribution is derived and QR is used to obtain the confidence interval of the predicted moments to the uncertainty quantification results. Finally, the deterministic prediction results above are combined to form the complete probabilistic forecasting results.
2.2. STL Decomposition
For the PV time series, the variation trend is influenced by multiple factors. On the one hand, it includes the effects of solar activity patterns and seasonal changes, leading to periodic fluctuation characteristics in the time series. On the other hand, it is affected by economic development needs, such as the more intuitive changes in installed capacity, resulting in a relatively smooth trend in overall power output characteristics. Finally, there are random characteristics that only appear when observed at a fine scale. The original sequence with mixed characteristics is very difficult for model learning. If predictions could be made for specific characteristics, it would be beneficial for the model to focus on specific change patterns unaffected by random fluctuations.
Therefore, based on the idea of decomposition prediction, this paper uses STL decomposition to decompose the PV sequence into periodic, trend, and residual components. The decomposition formula can be expressed as:
where
,
, and
represent the periodic component, trend component, and residual component of the sequence at time
t, respectively. The periodic component represents the regular part of PV output, which can be observed as obvious and similar peaks in the curve. The trend component represents the smooth trend line in the time series with a small fluctuation amplitude. The residual part represents the fine changes caused by uncertain external factors. STL is a time series decomposition method that uses locally weighted regression as a smoothing approach. It uses locally estimated scatterplot smoothing (LOESS) to extract smooth estimates of components, achieving separation of different components [
35]. The algorithm is based on a double-layer structure of inner and outer loops. The inner loop obtains periodic and trend components through LOESS smoothing operations on the detrended sequence, while the outer loop adjusts the robustness weights of the algorithm to reduce the impact of outliers on LOESS regression. The algorithm flow is shown in
Figure 2.
In the outer loop, STL decomposition calculates robust weights based on the residual term to update the neighborhood weights in the LOESS regression of the inner loop smoothing operation. This separates the noise points in the data into the residual term, improving the robustness of the algorithm. The robust weights are calculated using the Bisquare function, specifically as follows:
where
is the Bisquare function,
is the residual component, and
is the median calculation function.
2.3. TimeMixer Model
Through observation, it is found that time series exhibit different information at different sampling scales. For example, a PV sequence recorded hourly shows power output changes at different times within a day, while this is not observable in a daily recorded sequence, which instead reveals daily power output level changes. Coarse and fine scales can reflect macro and micro information, respectively, and future deterministic information is jointly determined by changes at multiple scales.
Addressing the issue of multi-scale feature fusion for time series, TimeMixer proposes an MLP-based multi-scale feature mixing architecture. This architecture extracts different scale temporal features from past changes through a past-decomposable-mixing (PDM) module and then integrates the extracted multi-scale past information to predict future sequences through a future-multipredictor-mixing (FMM) module [
16]. Benefiting from the MLP-based analysis of different feature components of multi-scale sequences and the realization of complementary prediction capabilities, TimeMixer achieves state-of-the-art performance in short-term and long-term forecasting with excellent computational efficiency. The overall architecture of TimeMixer is shown in
Figure 3.
Specifically, to unravel complex changes, TimeMixer first applies average pooling to the original sequence to generate M sub-sequences of different scales, resulting in multi-scale time series , where , C is the number of variables, and P is the length of the original sequence. Sub-sequences of different scales are generated from through downsampling at different time steps. The original sequence contains the most subtle change information, while the highest level represents the most distant macro information to be extracted. The multi-scale sequences are then projected into deep features through an embedding layer, which can be represented as , thus obtaining a multi-scale representation of the input.
Even the coarsest scale sequence contains mixed feature changes. To enable the model to learn different characteristics, TimeMixer decomposes each scale sequence and then performs multi-scale mixing. In the
l-th PDM module, the decomposition block [
36] first decomposes
into trend
and periodic
, then performs feature mixing on seasonal and trend terms separately, achieving multi-scale interaction of the same feature, as shown in the following equations:
where
L is the total number of layers;
represents the decomposition block, which obtains the trend term through moving average processing, treats the remaining as the periodic term, and keeps the sequence length unchanged through padding [
37];
is the deep feature representation of different time scales with
d channels;
consists of two linear layers, exchanging information between channels through the GELU activation function;
represent the mixing operations of periodic and trend information. As seen from Equation (5), TimeMixer uses stacked PDM modules to mix past information from different scales, allowing each layer to extract and mix multi-scale information from the output of the previous layer, which helps to capture complex patterns and high-level features in the data layer by layer, enhancing the model’s representational ability.
TimeMixer adopts different mixing modes according to the different change characteristics of different features. For periodic terms, large-scale periodic information can be seen as a collection of small-scale periods corresponding to macro and micro information, respectively, so a bottom-up mixing method is adopted. For example, observing the periodic component of photovoltaic, fusing information upward from the daily scale periodic sequence can form a coarser monthly scale periodic sequence. In the technical implementation of the model, residual connections [
38] are used to achieve the interaction of multi-scale periodic term information
, as intuitively shown in
Figure 4a, and can be formulated as:
where
consists of two linear layers in the time dimension, with input and output dimensions of
,
, respectively. Through residual connections, scale information from the previous layer can be directly passed to later layers, allowing the network to fit the residual mapping
as shown in
Figure 4a while not losing early layer information, thus making the network focus only on the information interaction between the current scale and the next scale, avoiding network degradation.
For trend terms, fine-scale changes introduce noise into macro trend information. Taking the PV trend sequence as an example, the coarse-scale trend component better exhibits the clear overall PV output level for the entire study period. Therefore, a top-down direction is adopted to mix multi-scale trend information, using the macro trend to guide the micro trend direction of fine scales. This is intuitively understood as shown in
Figure 4b and can be formulated as:
where
consists of two linear layers, but the input and output dimensions become
,
. The PDM module differs from the approach of directly mixing multi-scale feature sequences. It aggregates micro and macro information based on the periodic and trend terms of subsequences, respectively, ultimately achieving multi-scale mixing in past information extraction.
After passing through the aforementioned
L PDM modules, the model obtains rich and complete multi-scale past information
. Since past information at different scales exhibits varying predictive capabilities, to fully utilize multi-scale information, TimeMixer employs an FMM module to achieve multi-scale complementary prediction. In this module, past information from each scale
is input into a corresponding scale-specific predictor, and predictions from multi-scale sequences are aggregated:
where
represents the prediction of the future based on the
m-th scale sequence;
F denotes the future length to be predicted;
is the final output;
represents the predictor for the
m-th scale sequence. It first uses a single linear layer to directly extract past information of length
to regress a future of length
F, then projects the regressed deep representation back to
C target variables, and finally aggregates the multi-scale results to obtain the prediction result, as shown in
Figure 4c.
2.4. Long Short-Term Memory Network
The LSTM model is a variant of recurrent neural network (RNN) that effectively addresses gradient vanishing while preserving the temporal correlations. It adds a “gate” mechanism control to determine whether data are retained or discarded, thereby enabling the network to learn time series long short-term dependency information. The structural diagram of the LSTM network is shown in
Figure 5 [
9].
The basic units of LSTM networks include forget gates, input gates, and output gates. The input
in the forget gate, along with the state memory unit
and intermediate output
, jointly determine the forgotten part. The
in input gate is jointly determined by the sigmoid
and tanh activation functions to retain the vector in the state memory unit. The intermediate output
is determined by the updated
and output
, and the calculation formula is shown as follows [
39]:
where
and
are the states of forget gate, input gate, input node, output gate, intermediate output, and state unit, respectively.
, and
represent the matrix weights of the corresponding gates multiplied by input
and intermediate output
;
are the bias of the corresponding gates, respectively;
represents multiplication of elements in a matrix;
and
represent the activation function sigmoid and hyperbolic tangent activation function.
2.5. Deterministic Forecasting Model Construction with Periodic and Trend Components
STL decomposition is based on a preset data periodic length. To ensure that the obtained trend component reflects the smooth developmental pattern in long-term PV changes, especially highlighting changes in overall power output levels due to seasonal variations, the periodic length cannot be set too short. Consequently, the periodic component will contain complex and extremely detailed change patterns, while the trend component will be relatively smooth. TimeMixer can capture information at different time scales, detecting both the most minute changes and global features. LSTM, on the other hand, excels at coupling and memorizing different moments within long-term time patterns. Therefore, a TimeMixer-based periodic component prediction model and an LSTM-based trend component prediction model are constructed.
2.5.1. Periodic Component Forecasting Model Based on TimeMixer
The key to the PV component forecasting model based on TimeMixer lies in mining periodic characteristics and the correlation between time series data and multidimensional external factors, which requires establishing a clear data structure for input. The PV periodic component exhibits certain autocorrelation characteristics with its own changes. Furthermore, since it is decomposed from the original output sequence, the component also has a certain cross-correlation with its historical output. The initial input data for the PV component prediction can be expressed as:
where
n represents the number of samples;
represents the periodic component of PV generation;
denotes meteorological information for the region, including multiple input variables;
is the complete output of PV;
is the hourly label feature, reflecting the intra-day variation pattern; and
is the monthly label feature, reflecting the variation of output levels across different months of the year, which determines the range of periodic component changes. In the prediction process, the time label information of the moment to be predicted is completely known, while the meteorological data comes from numerical weather prediction (NWP), which has some uncertainty itself. Considering that it is difficult to obtain the precise NWP information on the day to be predicted in the actual use of the model, this paper adopts the minimum, maximum, and average values of temperature, humidity, and solar radiance on the prediction day as the auxiliary input features, and extends them to keep the time scale consistent with PV output with a daily change. Therefore, the final training process can be expressed as:
where
represents the predicted future PV periodic component with a step length of
K obtained from historical data of length
L;
denotes the historical initial data of length
L;
represents the historical PV output of length
L;
denotes the hourly labels for the next
K steps;
represents the monthly labels for the next
K steps;
is the auxiliary features on the prediction day.
is the parameter vector for model training, and
is the mapping function used in the training.
Figure 6a illustrates the model prediction process in the form of a sliding window.
2.5.2. Trend Component Forecasting Model Based on LSTM
The changes in the PV trend component are mainly influenced by the total solar radiation. A lower overall radiation during a certain period results in a lower overall level of the trend component. Observations on an annual scale reveal the fluctuations of the trend component. Similarly to the predicted inputs for the periodic components, the initial input data and training process of LSTM can be expressed as:
where
represents the trend component of PV generation;
denotes the quarter label feature, reflecting the level changes of the trend component with seasonal transitions;
is the predicted future PV trend component with a step length of
K obtained from historical data of length
L;
represents the quarterly and monthly labels for the next
K steps. It can be observed that, unlike the training process for the periodic component, the auxiliary temporal label features input for the trend component include monthly and seasonal labels on a longer time scale. Similarly,
Figure 6b demonstrates the sliding window processing for the trend component.
2.6. Uncertainty Modeling Based on Vine Copula Optimized by Q-Learning
2.6.1. The Theory of Vine Copula
The foundation of the Copula theory is laid by Sklar’s theorem [
40]. Essentially, the theorem decomposes a complex joint distribution into multiple marginal distributions while connecting them through a “linking” function. This function is defined as the joint distribution function coupling uniform one-dimensional marginal distributions on multiple intervals within [0, 1]. According to the definition, the multivariate joint PDF is [
30]:
where
represents joint PDF containing d-dimensional marginal distributions,
is the marginal cumulative distribution function (CDF) for
variable, and
is the Copula function.
Vine Copula decomposes the multivariate joint distribution in a cascading manner into a series of conditional 2-dimensional Copulas, primarily in two forms: C-vine and D-vine [
41]. When a dominant variable exerts a strong influence on other variables, the C-vine Copula structure is chosen, ensuring each Vine Tree carries a dominant factor. Conversely, while the relationships between variables are relatively balanced, the variables in the Vine Tree are arranged in rows, forming D-vine Copula.
Figure 7 illustrates a 5-dimensional C-vine Copula structure, and it can be observed that a
-dimensional Vine structure is composed of
Vine Trees. The first Vine Tree consists of
variables as nodes connected by edges, and the nodes of
Vine Tree are the edges of
Vine Tree, where
. The arrangement of variables in the first tree is of great importance, as it determines the composition of nodes and edges in the subsequent Vine Trees [
34].
2.6.2. Variables Screening Based on MI and VIF
Vine Copula is more likely to capture the dependence structure between strongly correlated variables with stronger significance, and introducing variables with weak correlation in uncertainty modeling will increase the complexity. Therefore, it is necessary to select suitable variables from multiple meteorological factors.
(1) MI
MI is used to measure the degree of mutual dependence between two random variables. Unlike the correlation coefficient, MI is not limited to real-valued variables and expresses the similarity between the joint distribution and the marginal distributions. For continuous random variables, the MI between two discrete variables can be defined as [
42]:
where
is the joint probability density function of
x and
y. ,
are the marginal probability density functions of
x and
y, respectively. MI can effectively describe the nonlinear relationship between two variables, making it particularly suitable for the Vine Copula model to construct an appropriate dependence structure.
(2) VIF
VIF is a method used to assess the linear relationship between input variables in a model. The basic idea is to determine the importance of a feature variable in a regression model by measuring its correlation with other feature variables. VIF is an essential tool for identifying and addressing multicollinearity issues. The calculation of VIF involves treating each feature as the dependent variable and the other features as independent variables to fit a linear regression model. The VIF is then computed as the ratio of the mean squared error of the independent variables to the dependent variable, as follows [
43]:
where
is the coefficient of determination obtained by fitting a linear regression model between the independent variable and the other variables. In the context of Vine Copula modeling, the construction of a multivariate joint distribution requires multiple correlation relationships between influencing variables and the target variable. Therefore, after selecting multiple highly dependent variables using MI, VIF is then used to filter out variables that exhibit multicollinearity with the PV residuals.
2.6.3. Improved High-Dimensional Vine Copula Modeling Based on Q-Learning
In Vine Copula, each edge contains information about the dependence between two variables. In order to find the Vine Tree with the highest dependency, this paper utilizes the Q-Learning strategy to find the connection relationship of each variable.
Q-Learning is a value-based reinforcement learning algorithm that utilizes the Q-function to represent the rewards of taking actions in the current state. The objective is to acquire the maximum Q-value throughout the entire process, involving interactions between the agent and the environment. Q-Learning involves state space
, action space
, state transfer function
, reward function
, and discount factor
. In terms of timing, there are:
is the state of the agent at time
;
is the action taken by the intelligent agent at time
;
is the probability of transferring to state
after taking action
in state
;
is the reward value obtained after taking action at in state
.
, which is used to balance the impact of immediate and future returns on the decision-making process [
44]. The construction of Vine Tree can be represented by the connections between variables. This paper defines the state as a variable connected and the action as a variable selected. The state
can be defined as:
where
is variable node;
is the number of variables that need to be connected. To avoid duplicate connections of variables, it is necessary to record the variables selected for each action and remove the variables selected from time steps 1 to
in the action space
. The initial state is defined as connecting variables starting from the PV residuals value
. Algorithm adoption epsilon-greedy strategy to choose actions. ε represents the level of exploration of the agent, actions will be taken with the probability of ε to select an action to explore the environment in order to obtain a potential global optimal solution. At the same time, it will take the probability of 1 − ε to select the action with the highest Q-value in the current Q-table, emphasizing the acquisition of immediate rewards. Each row in the Q-table corresponds to a current state, and each column represents available actions. After each action is taken, the value in Q-table will be updated accordingly [
45]:
where
is the updated Q-value,
is the current value in Q-table;
is the learning rate;
denotes the immediate reward obtained by the agent after taking action;
represents the maximum value in the row of the Q-table when the agent queries it for the state
. It can be seen that the reward function
not only affects the update of Q-table but also determines the action selection of the agent. This paper considers using the correlation coefficient value between variables as an immediate reward, and the reward function is defined as follows:
where
is the correlation coefficient matrix for all variables. The formula indicates that after making action
, the correlation coefficient value between the current state variable
and the selected action variable
will be given as a reward.
Due to the focus of Vine Copula on the collaborative relationships between variables, Kendall’s rank correlation coefficients can effectively assess whether the relative relationships are consistent. Therefore, the quantified variable correlation magnitude based on Kendall’s rank correlation coefficients is used as the immediate reward in Q-Learning.
Figure 8 is a visualization of the Q-Learning selection of variables to form the Vine Tree.
The method of forming a Vine Tree in Vine Copula modeling is given above, and the complete modeling steps are explained below, including the following 3 steps.
(1) Probability integral transformation (PIT). This paper employs a rank-based transformation method for PIT. This method transforms the data into pseudo-observational samples with dependency. The transformation formula is as follows:
where
represents the pseudo-observation matrix,
is the rank of the
variable in the sample, and
is the sample size.
(2) Selection of Copula function and parameters calculation. The initial Vine Tree was formed according to
Figure 8, and then the Copula parameters between each two variables were calculated and the type of fitting function was selected. MLE is a method for calculating the parameters of a known distribution that best fits the data, maximizing the likelihood of the assumed distribution. It allows the Copula parameters to be directly estimated using the following formula [
46]:
where
represents the parameter values
that maximize the function output. This yields a set of Copula function parameters for each edge in Vine Tree. Subsequently, BIC is applied to select the function type that best fits each edge. BIC corrects the prior probability of an event using the Bayesian formula. The formula is as follows [
47]:
where
represents the number of model parameters,
is the sample size, and
is the likelihood function. It takes into account the sample size, which helps prevent overfitting caused by excessive model complexity.
(3) Calculation of conditional CDF. The construction of the Vine Copula was carried out one by one with each Vine Tree, and from
Figure 7, it is observed that the second Vine Tree appears to have a conditional variable. According to Joe [
48], for variables
j and
k under a given set of conditional variables
D, the Copula can be denoted as
, and their conditional CDF can be expressed using the h function:
In the above equations, is the variable with variable excluded, representing the set of variables jointly connected by variables and in Vine Tree. The h function provides a concise representation for computing CDF. Except for the first Vine Tree, the variable inputs for all other Vine Trees come from each CDF value of the previous, and iterative calculation completes Vine Copula modeling in these steps.
2.6.4. Probability Interval Results Based on PV Residual’s Quantile Regression
Since the mathematical analytical expression for the high-dimensional joint distribution of PV residuals established is highly complicated, this paper uses the h-inverse function form to solve CDF. The h-inverse is applied to invert the conditional distribution for the first variable, which is also the inverse form of h function [
49]:
The above equations indicate that h-inverse can eliminate variable
from high-dimensional CDF, resulting in
. Taking a five-dimensional C-vine structure as an example, it can be observed that the derivation is as follows:
where
is the
i-dimension variable to be solved, and
. It can be seen that the solution is dynamic, and the value obtained from a higher dimension is what is needed to solve in the lower dimension. Wherein, the highest-dimensional conditional variable
is the input conditional probability, also known as the quantile
. Obtain different quantiles of PV residuals by inputting different quantiles.
4. Conclusions
Faced with a new power system with continuously increasing penetration of renewable energy, this paper proposes a novel probabilistic forecasting method based on the combination of time series decomposition and Vine Copula uncertainty modeling to quantify the uncertainty of PV output. The goal is to use the periodic, trend, and residual components under STL decomposition to establish models to obtain probabilistic forecasting results of PV power, and at the same time, rich experimental comparisons have been conducted on the models established for each component to demonstrate their progressiveness. Using the PV renewable energy forecasting method proposed in this paper, various PV output scenarios can be predicted in advance to prepare operational strategies to promote the safe and stable operation of the power grid and improve the economic benefits of electricity sellers. The main contributions are as follows.
(1) Compared with the probabilistic forecasting of single models (d-Transformer, QR-LSTM), the proposed method fully utilizes various temporal features of the PV output time series and combines the advantages of different models to provide the most advanced prediction results. Firstly, modeling different features of the time series can effectively improve the learning ability of the model, greatly enhancing the accuracy of deterministic predictions. Considering the strong volatility and the laws at different scales of the periodic component, using TimeMixer to further decompose the past information of the periodic component and extract multi-scale temporal features can further improve the prediction performance. For the randomly unpredictable residual component, using the Vine Copula model for multi-factor correlation modeling can obtain conditional probability results that better reflect the real information. The proposed optimization method using Q-Learning for the Vine Tree can improve the model’s fitting power.
(2) In the deterministic forecasting of periodic components, the proposed TimeMixer model outperforms Transformer, TCN, CNN-GRU, LSTM, and XGBoost models with lower RMSE and MAPE. Across all test samples, the RMSE of the TimeMixer model was reduced by 0.1404 kW, 0.18 kW, 0.5304 kW, 0.6886 kW, and 1.4821 kW compared to other models, respectively. The MAPE was the lowest at 0.9167%, while the others were 1.1755%, 1.4022%, 1.8814%, 1.9549%, and 3.3816%, respectively. Compared to the second-best performing Transformer, it required 66 min of training time, a speed improvement of 24 min, and the total time was only 15 min slower than the fastest training LSTM.
(3) The probabilistic prediction results demonstrate that the method combining deterministic prediction models with uncertainty modeling under the time series decomposition proposed in this paper exhibits the best performance at all confidence levels and across all test samples.
The method proposed in this paper does not consider the correlation between different PV power stations. In the future, the possibility of applying this method to short-term probabilistic forecasting of distributed PV power generation will be explored.