Ensemble Interval Prediction for Solar Photovoltaic Power Generation

Zhang, Yaxin; Hu, Tao

doi:10.3390/en15197193

Open AccessArticle

Ensemble Interval Prediction for Solar Photovoltaic Power Generation

by

Yaxin Zhang

and

Tao Hu

^*

School of Mathematical Sciences, Capital Normal University, Beijing 100048, China

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(19), 7193; https://doi.org/10.3390/en15197193

Submission received: 5 September 2022 / Revised: 23 September 2022 / Accepted: 26 September 2022 / Published: 29 September 2022

Download

Browse Figures

Versions Notes

Abstract

In recent years, solar photovoltaic power generation has emerged as an essential means of energy supply. The prediction of its active power is not only conducive to cost saving but can also promote the development of solar power generation industry. However, it is challenging to obtain an accurate and high-quality interval prediction of active power. Based on the data set of desert knowledge in the Australia solar center in Australia, firstly, we have compared twelve interval prediction methods based on machine learning. Secondly, six ensemble methods, namely Ensemble-Mean, Ensemble-Median (Ensemble-Med), Ensemble-Envelop (Ensemble-En), Ensemble-Probability averaging of endpoints and simple averaging of midpoints (Ensemble-PM), Ensemble-Exterior trimming (Ensemble-TE), and Ensemble-Interior trimming (Ensemble-TI) are used to combine forecast intervals. The result indicates that Ensemble-TE is the best method. Additionally, compared to other methods, Ensemble-TE ensures the prediction interval coverage probability for confidence levels of 95%, 90%, 85%, and 80% as 0.960, 0.920, 0.873, and 0.824, respectively, using 15-min level data. Meanwhile, the narrower prediction interval normalized averaged width is calculated for the same confidence levels as 0.066, 0.045, 0.035, and 0.028, respectively. In addition, higher Winkler score and smaller coverage width-based criterion are obtained, representing high-quality intervals. We have calculated smaller mean prediction interval center deviation, which is approximately 0.044. Thus, the above demonstrates that this study obtains the prediction interval with better performance compared to other existing methods.

Keywords:

interval prediction; ensemble method; solar photovoltaic power generation; machine learning

1. Introduction

1.1. Motivation and Incitement

With the depletion of traditional non-renewable energy sources and increasingly serious problems of environmental pollution and climate warming, it is vital to find suitable renewable sources of energy, advocate the low-carbon economy, and achieve carbon neutralization as soon as possible. Solar photovoltaic power generation, which uses clean energy, is a good way to promote the sustainable development of energy [1]. As a type of new energy, solar energy is considered to be abundant, widely distributed, and environmentally friendly. Therefore, solar photovoltaic power generation has been widely recognized all over the world and occupies an essential position in the new energy area [2]. The penetration of solar energy in power systems around the world has also been increased [3]. However, it is limited by certain physical factors, such as daylight change, local climate, and geographical location [4]. Meanwhile, active power is volatile and uncertain [5], which not only influences the operation and maintenance of solar photovoltaic power generation negatively, but also brings great challenges to its prediction. Overestimation of active power will increase the cost of power supply system, while underestimation will lead to insufficient power consumption in the whole society, thus affecting people’s normal life. Therefore, it is extremely necessary to find an effective and accurate power prediction method for solar photovoltaic power generation.

1.2. Literature Review and Research Gaps

The prediction of solar photovoltaic power generation mainly consists of two parts: point prediction and interval prediction. They are also known as deterministic prediction and probabilistic prediction with different forms of prediction results. Specifically, the point prediction implies the use of multiple variables to train the regression algorithm and obtain a series of single-point prediction values. Currently, many studies have researched point prediction in solar photovoltaic power generation. Wang et al. (2018) [6] used gated recurrent unit (GRU) to predict the photovoltaic power generation. Benali et al. (2019) [7] utilized random forests (RF) and artificial neural network (ANN) models to predict and compare solar power output. Dash et al. (2021) [8] used empirical wavelet transform (EWT) and robust minimum variance random vector functional link network (RRVFLN) in short term solar power forecasting. Elsaraiti et al. (2022) [9] demonstrated that the LSTM network could provide reliable information for the photovoltaic power forecast. Shedbalkar et al. (2022) [10] proposed using the Bayesian linear regression to solve solar power generation forecasting problem. Elizabeth et al. (2022) [11] proposed a multi-step convolutional neural network (CNN) Stacked LSTM technique to predict short-term solar power. In addition, there are dozens of related machine learning methods for point prediction of solar photovoltaic power generation [4,5]. However, the single-point prediction obviously cannot meet the researchers’ requirements for the reliability of power generation prediction. The uncertainty and error range of the predicted value are considered essential as well. Therefore, it is important to study and develop the interval prediction of the solar photovoltaic power generation.

In contrast, interval prediction possesses more research value, since it provides more uncertain information to quantify the extent to which we can trust the predictions of the model. The purpose of the interval prediction is to obtain the forecast interval C for real value y and a given confidence level

α

to ensure:

P (y \in C) \geq α .

It can not only quantify the uncertainty of the predicted value and estimate more accurately, but also facilitate relevant staff to regulate and plan for the entire solar photovoltaic power generation system. In recent years, many related studies have been reported. Almeida et al. (2015) [12] used quantile regression forests (QRF) to construct prediction intervals. Ni et al. (2017) [13] proposed a method based on the extreme learning machine (ELM) and lower upper bound estimation (LUBE) to construct a reliable solar energy prediction interval. Huang et al. (2017) [14] used the k-nearest neighbor to obtain the interval of solar energy prediction. Pan et al. (2021) [15] used attention mechanism-gated recurrent unit-kernel density estimation (A-GRU-KDE) to predict and estimate the interval of solar power. Wang et al. (2021) [16] combined LSTM and Gaussian process regression (GPR) to obtain a reliable interval estimation in solar power generation. Ramkumar et al. (2021) [17] proposed a model of solar photovoltaic power interval forecast based on an online sequential extreme learning machine with a forgetting mechanism (FOS-ELM) algorithm. Li et al. (2022) [18] proposed a method for interval forecasting day-ahead solar power generation based on extreme gradient boosting (XGBoost) and KDE. Chen et al. (2022) [19] proposed a method using the density peak clustering improved by the kernel Mahalanobis distance (KMDDPC) combined with the multivariate kernel density estimation (MKDE) method to obtain the prediction intervals in four seasons. The methods mentioned above have their respective advantages and their prediction accuracy is also different. To combine the advantages of a variety of methods and further improve the prediction accuracy, the ensemble method is proposed, which also have the opportunity to obtain prediction intervals with higher quality.

The multi-model ensemble method, i.e., the model combination method, is the average of estimates or forecasts from multiple models that are different from each other through suitable weights [20], which can not only fully utilize the results of each method but also obtain more stable results [21]. Zhang et al. (2011) [20] chose a variety of weight selection criteria, such as smoothed-Akaike information criterion (S-AIC), smoothed-bayesian information criterion (S-BIC), and optimal weight selection method to average the model. Zhang et al. (2011) [22] used the focused information criterion (FIC) and frequentist model average (FMA) in generalized additive partial linear models (GAPLMs) to study model selection and model averaging. Gaba et al. (2017) [23] proposed the probability averaging of endpoints and simple averaging of midpoints (PM) to combine interval prediction results. The above studies reveal that the ensemble method can further improve the performance of the results of each method.

1.3. Major Contribution and Organization

This study proposes an ensemble interval prediction for solar photovoltaic power generation that obtains prediction intervals with higher quality than other methods. The main contributions of this paper can be stated as follows:

1.: Twelve latest interval prediction methods with good results are compared simultaneously in this paper in order to understand their advantages and applicability, and use them effectively. To the best of our knowledge, the existing literature does not focus on the horizontal comparison for these methods.
2.: Six ensemble methods of prediction interval are used for a combination of prediction intervals of the aforementioned partial methods. The focus is on the advantages of those methods to obtain more reliable and stable interval prediction results.
3.: Compared with the previous prediction results, the ensemble or combination method of interval described in this study further improves the quality of interval prediction: the prediction interval coverage probability (PICP) under different confidence levels is close to the given confidence level; it has a smaller prediction interval normalized averaged width (PINAW), i.e., narrower and more accurate standardized average interval width; it has smaller coverage width-based criterion (CWC) and higher Winkler score, that is, the interval holds a higher comprehensive quality; and it has a smaller mean prediction interval center deviation (MPICD), that is, the actual value is closer to the midpoint of the prediction interval.

The rest of this paper is organized as follows. Section 2 provides a detailed description of the methods of point prediction, methods of interval prediction, and ensemble methods used in this study. In Section 3, the data sources, data preprocessing, and performance evaluation indexes used in this study are presented. In Section 4, the experimental results of point prediction, interval prediction, and interval prediction using ensemble method are described and compared, and the best-combined interval forecasting method is identified. Finally, the main conclusions and contributions of this paper along with future research prospects are described in Section 5.

2. Methodology

This section mainly includes three parts: point prediction, which is used to obtain the deterministic prediction of active power, method of interval prediction, which is used to get the probability prediction of active power, and ensemble method of interval prediction, which is used to combine the constructed intervals to obtain higher quality intervals. Specifically, the flow chart describing the three kinds of methods is illustrated in Figure 1.

2.1. Point Prediction

2.1.1. Random Forest (RF)

RF is an ensemble algorithm that was first proposed by Breiman [24]. Its base learner is CART, and the ensemble method is bagging [25].

The objective of bagging is to use the bootstrap [26] method to resample the original training samples to gain a large number of independent sample subsets. Then, the average or mode of their fitting values is computed to average the model with high difference and small deviation to reduce the variance of the model. Based on the binary decision tree, the objective of the CART algorithm is to recursively bisect each input feature to divide the input space into finite elements, where we predict the probability distribution, including generation and pruning.

RF combines the idea of CART and bagging. Then, it introduces random feature selection to decorrelate the tree to improve bagging. RF randomly samples without replacement rather than selecting all of the original training samples. In addition, it randomly selects partial features for the split points on each decision tree. Generally, the number of features (k) opted is approximately the square root of the number of features (p) [27], i.e.,

k \approx \sqrt{p}

. Compared with the single regression tree, RF can avoid the negative influence of noise, outliers, and overfitting and achieve more accurate and robust results.

2.1.2. Gated Recurrent Unit (GRU)

GRU [28] is a special RNN [29], whose basic structure is similar to that of RNN. More specifically, it uses the current input

x_{t}

and previous hidden state

h_{t - 1}

to obtain the current result

y_{t}

and current hidden state

h_{t}

as the outputs.

Unlike the LSTM network with three gates, which also belongs to RNN, GRU has only two gates, including an update gate and a reset gate. The network structure of GRU is depicted in Figure 2.

In Figure 2, the update gate

r_{t}

is used to control and filter the extent to which the state information of the previous time is brought into the current state. The larger the value of

r_{t}

, the more state information of the previous time is retained. The expression of

r_{t}

is as follows:

r_{t} = σ (W_{r} [h_{t - 1}, x_{t}]),

where

σ

is the sigmoid function and W represents the corresponding parameters to be trained (similarly hereinafter). The reset gate

z_{t}

controls the amount of information of the previous state recorded to the current candidate set

{\tilde{h}}_{t}

. The smaller the value of

z_{t}

, the less information about the previous state is recorded.

z_{t}

and

{\tilde{h}}_{t}

are expressed as follows:

z_{t} = σ (W_{z} [h_{t - 1}, x_{t}]),

{\tilde{h}}_{t} = t a n h (W_{\tilde{h}} [r_{t} * h_{t - 1}, x_{t}]),

where ∗ is the Hadamard product. The expression of the new hidden state

h_{t}

and current output result

y_{t}

is as follows:

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * {\tilde{h}}_{t},

y_{t} = σ (W_{o} h_{t}) .

In general, both LSTM and CRU filter and retain important features and information obtained from various gate functions. Both of them can maintain long-term memory and alleviate the gradient disappearance problem, while GRU has one less gate function than LSTM, which leads to a more compact and simpler structure [30]. Therefore, we chose GRU for point prediction.

2.1.3. Gradient Boosting Regression Tree (GBRT)

Similar to RF, GBRT [31] with CART as its base learner is also an ensemble learning algorithm. Nonetheless, the ensemble method used in GBRT is different, and is known as boosting. Unlike bagging, boosting does not require resampling but generates multiple trees sequentially based on the modified values of the original data. Each tree learns the residuals between the results obtained by all previous trees and the real values to further update the residuals to gradually reduce them. GBRT provides accurate prediction and strong generalization ability without having high requirements on data. In this study, the mean and median of active power are considered as the prediction targets, and MSE and pinball loss (also known as weighted absolute deviations [32]) are selected as the scoring criteria to choose hyperparameters and then predict the active power.

In addition to the methods described above, ridge regression [33] and NGB are also used in this study for point prediction. Particularly, NGB will be introduced in Section 2.2, since it can output the results of point and interval prediction simultaneously.

2.2. Interval Prediction

2.2.1. Kernel Density Estimation (KDE)

KDE [34] is a commonly used nonparametric method that uses adjacent samples to estimate the density function at a certain point, and then estimates the probability density of the whole sample. The expression of KDE estimation is as follows:

\hat{f} (x) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x - x_{i}}{h}),

where

K (x)

represents kernel function and h represents bandwidth. In this study, the Gaussian kernel is selected for KDE, and its expression is as follows:

K (x) = \frac{1}{\sqrt{2 π}} e^{- \frac{x^{2}}{2}} .

The commonly used methods to determine the optimal bandwidth include the cross validation and adaptive bandwidth rule of thumb. In this study, we choose the 5-fold cross validation to select the bandwidth. Then we determine the KDE estimation of the sample probability density function. Next, we obtain the KDE estimation of the cumulative distribution function

{\hat{F}}_{h} (x) = \int_{- \infty}^{x} {\hat{f}}_{h} (t) d t = \frac{1}{n h} \sum_{i = 1}^{n} \int_{- \infty}^{x} K (\frac{t - x_{i}}{h}) d t

to the pth quantile

{\hat{Q}}_{p} (h)

using

{\hat{F}}_{h} (x)

, which can be expressed as follows:

{\hat{Q}}_{p} (h) = \sup {x : {\hat{F}}_{h} (x) \leq p} .

Finally, for new samples, given the confidence level

α

, the constructed prediction interval C is described as follows:

C = [{\hat{Q}}_{(1 - α) / 2} (h), {\hat{Q}}_{(1 + α) / 2} (h)] .

2.2.2. Natural Gradient Boosting (NGB)

NGB is a gradient boosting regression method that predicts the conditional probability distribution of the target variable to construct the prediction interval. Different from the general gradient boosting method, NGB is solved on the statistical manifold where the probability distribution is located (see [35] for the definition) and applied to multi-parameter distribution. More specifically, it includes basic learner

f^{(k)}

, parameterized conditional probability distribution

P_{θ} (y | x)

, and appropriate scoring rules S. Scoring rules are used to calibrate the difference between model results and real values. If a scoring rule satisfies the following formula:

E_{y \sim Q} [S (Q, y)] \leq E_{y \sim Q} [S (P, y)],

where P is the predicted distribution and Q is the real distribution, then it is called an appropriate scoring rule. The maximum likelihood estimation (MLE) is one of the common scoring rules, which can induce Kullback-Leibler Divergence. Choosing MLE as the scoring rule, the gradient for the prediction parameters

θ

on conditional probability density function

P_{θ} (y | x)

is recorded as

\nabla_{θ} L (θ, y)

and the natural gradient of each step is recorded as

g_{i}^{k}

, which can be obtained by solving the corresponding optimization problem. The expression of

g_{i}^{k}

is as follows:

g_{i}^{k} = I_{L} {(θ_{i}^{k - 1})}^{- 1} \nabla_{θ} L (θ_{i}^{k - 1}, y),

where

I_{L} (θ)

is the Fisher information brought by the observed value of

P_{θ}

, which can be written as:

I_{L} (θ) = E_{y \sim P_{θ}} [\nabla_{θ} L (θ, y) \nabla_{θ} L {(θ, y)}^{T}] .

In the process of model learning, a group of basic learners is trained using natural gradient and input values at each stage. In this study, normal distribution is used; therefore, the parameters include mean

μ

and standard deviation

σ

, indicating that the corresponding basic learners are

f_{μ}^{(k)}

and

f_{σ}^{(k)}

. Then, scale as follows and update the parameter

θ

:

θ = θ^{(0)} - η \sum_{k = 1}^{n} ρ^{(k)} f^{(k)} (x),

where

ρ^{(k)}

and

f^{(k)} (x)

are the scaling coefficient and the basic learner of each stage, respectively, and

η

is the learning rate. Finally, we obtained two sets of parameter values

\hat{μ}

and

\hat{σ}

in the fitted normal distribution, that is, given a new sample x,

y | x \sim N (\hat{μ} (x), \hat{σ} (x))

. Then, under the confidence level

α

, the corresponding quantiles

Φ^{- 1} ((1 - α) / 2)

and

Φ^{- 1} ((1 + α) / 2)

are calculated and prediction interval C is constructed as follows:

C = [\hat{μ} (x) + \hat{σ} (x) Φ^{- 1} ((1 - α) / 2), \hat{μ} (x) + \hat{σ} (x) Φ^{- 1} ((1 + α) / 2)] .

2.2.3. Jackknife+-after-bootstrap (J+ab)

J+ab [36] is a method for constructing prediction intervals in which the bootstrap and the jackknife+ [37] methods are used to construct prediction intervals. In addition, jackknife+ is a modified version of the jackknife method. To obtain the prediction interval using the J+ab method, firstly, the training dataset is resampled using bootstrap. Then, the regression model

\hat{μ}

is trained to obtain the predicted value, which will be aggregated. The ensemble method described in this paper is mean. Next, we calculate the absolute value of the leave-one-out residual of each sample

R_{i} = | y_{i} - {\hat{μ}}_{(- i)} (x_{i}) |

, where

{\hat{μ}}_{(- i)} (x_{i})

is the predicted value of the regression model that is not trained by the sample i. It is also the output of the regression model trained by other subsets resampled in the bootstrap, which does not contain the sample i. Finally, for the new samples, the prediction interval C is constructed using

R_{i}

. The expression of C is as follows:

C = [q_{α}^{-} ({\hat{μ}}_{(- i)} (x) - R_{i}), q_{α}^{+} ({\hat{μ}}_{(- i)} (x) + R_{i})],

where

q_{α}^{-} (x)

and

q_{α}^{+} (x)

represent the upper and lower

α

th quantiles of x at the confidence level

α

, respectively.

J+ab can be combined with any regression method for prediction since there is no requirement for the underlying regression algorithm. Therefore, we combine J+ab with Ridge, multi-layer perceptron (MLP), and RF for interval prediction, respectively.

2.2.4. Random Forest-Out-Of-Bag (RF-OOB)

Based on RF, RF-OOB [38] is a method that uses out-of-bag (OOB) [39] samples to calculate the error of OOB prediction and then obtains the empirical distribution to establish the prediction interval. After establishing the RF model, we can use the OOB sample corresponding to each point to calculate the prediction error given by

D_{i} = y_{i} - {\hat{y}}_{i}^{*}

,

i = 1, 2, \dots, n

, where

{\hat{y}}_{i}^{*}

represents the prediction value of the OOB samples. The original sample and prediction error are both independent and identically distributed. Therefore, we can establish the empirical distribution D of the prediction error, and then construct the prediction interval C for the new sample y. The expression of C is as follows:

C = [\hat{y} + D_{[n, (1 - α) / 2]}, \hat{y} + D_{[n, (1 + α) / 2]}],

where

α

is the confidence level,

D_{[n, p]}

is the pth quantile of D, and

\hat{y}

is the predicted value. Thus, we can get the prediction interval obtained by RF-OOB at a given confidence level.

2.2.5. Split Conformal-Random Forest (SC-RF)

SC-RF is a combination of split conformal (SC) [40] and RF. It uses split samples to separate the model fitting part from the subsequent sorting part and has no requirements for the underlying method. It can be combined with any regression method for prediction, and the computational cost is far less than the full conformal prediction [41].

SC-RF first divides the training samples into two parts

S_{1}

and

S_{2}

with the same sample size.

S_{1}

is used as the training set to build the RF model

\hat{μ}

and

S_{2}

is used as the test set to generate the predicted value of the RF model. Then, the absolute value of the residual

R_{i} = | y_{i} - \hat{μ} (x_{i}) |, i \in S_{2}

is obtained. Next, we calculate the kth value d in

R_{i}

; here

k = (n / 2 + 1) α

and

α

is the confidence level. Finally, for the new sample x, we construct the prediction interval C, which can be expressed as follows:

C = [\hat{μ} (x) - d, \hat{μ} (x) + d] .

It can be proven [40] that the prediction interval C satisfies

α \leq P (Y \in C) \leq α + 2 / (n + 2)

, which assures that the coverage of C reaches the given confidence level

α

.

2.2.6. Quantile Regression Forests (QRF)

Combined with RF and quantile regression [42], QRF [32] is a nonparametric conditional quantile method that is capable of predicting any quantile to establish a prediction interval. Compared with RF, which only retains the mean value of all observations in the leaves, QRF retains all values and evaluates the conditional distribution based on this information. The conditional distribution expression of the target variable when X is given as follows:

F (y | X = x) = P (Y \leq y | X = x) = E (I (Y \leq y) | X = x),

where

I (\cdot)

is the indicative function. Then, the weighted average of the observed values of

I (Y \leq y)

is used to approximate

E (I (Y \leq y) | X = x)

to obtain the estimation of the conditional distribution

\hat{F}

, whose expression is as follows:

\hat{F} (y | X = x) = \sum_{i = 1}^{n} w_{i} (x) I (Y \leq y),

w_{i} (x) = \frac{1}{k} \sum_{t = 1}^{k} w_{i} (x, θ_{t}),

w_{i} (x, θ_{t}) = \frac{X_{i} \in R_{l} (x, θ)}{# {j : X_{j} \in R_{l} (x, θ)}},

where

θ

is the parameter,

R_{l} (x, θ)

is the leaf node,

# {j : X_{j} \in R_{l} (x, θ)}

indicates the number of samples belonging to

R_{l} (x, θ)

,

w_{i} (x, θ_{t})

is the weight vector, the sum of which is 1, and

w_{i} (x)

is the weight in RF representing the average weight vector of k decision trees. After that, the estimation of pth quantile

{\hat{Q}}_{p} (x)

is obtained, whose expression is as follows:

{\hat{Q}}_{p} (x) = \sup {y : \hat{F} (y | X = x) \leq p} .

Finally, for the new sample x with the confidence level

α

, the constructed prediction interval C is as follows:

C = [{\hat{Q}}_{(1 - α) / 2} (x), {\hat{Q}}_{(1 + α) / 2} (x)] .

QRF provides a complete conditional distribution of the target variable when x is given and is widely used in the field of machine learning to estimate quantiles and obtain prediction intervals.

2.3. Ensemble Method of Interval Prediction

In this section, we introduce several ensemble methods for prediction intervals to combine a variety of constructed prediction intervals to obtain better prediction intervals. Specifically, six ensemble methods are introduced in this study, namely Ensemble-Mean, Ensemble-Med, Ensemble-En, Ensemble-TE, Ensemble-TI, and Ensemble-PM. It is assumed that m prediction intervals corresponding to n real values have been obtained, which is expressed as follows:

[L_{i j}, U_{i j}], i = 1, 2, \dots, n, j = 1, 2, \dots, m,

Then, the final ensemble interval is as follows:

[L_{i}, U_{i}] = f ([L_{i j}, U_{i j}]), i = 1, 2, \dots, n, j = 1, 2, \dots, m,

where

f (x)

represents the ensemble method of m prediction intervals.

Ensemble-Mean means that the ensemble interval is the mean value of m intervals, and the formula is given by:

L_{i} = \frac{1}{m} \sum_{j = 1}^{m} L_{i j}, U_{i} = \frac{1}{m} \sum_{j = 1}^{m} U_{i j} .

Ensemble-Mean is one of the most commonly used ensemble methods, which is simple and easy to understand, and can represent the centralized trend of intervals.

Ensemble-Med represents the median value of m intervals, and the expression is as follows:

L_{i} = m e d i a n_{j} (L_{i j}), U_{i} = m e d i a n_{j} (U_{i j}) .

Ensemble-Med has good stability and is not easily to be affected by extreme interval values. Therefore, some intervals with large fluctuations will be excluded.

Ensemble-En means that the lower bound of ensemble intervals takes the minimum value of the lower bound of prediction intervals, and the upper bound of ensemble intervals takes the maximum value of the upper bound of prediction intervals. In other words, the ensemble interval surrounds all the prediction intervals. The formula is given by:

L_{i} = \min_{j} (L_{i j}), U_{i} = \max_{j} (U_{i j}) .

Ensemble-En may improve the coverage of the prediction interval. However, it may also make the interval too wide.

Ensemble-TE means to delete the outermost

2 k

points in the prediction intervals and then compute average, which is expressed as follows [23]:

k = \{\begin{matrix} 0, & 1 \leq m \leq 3, \\ 1, & 4 \leq m \leq 7, \\ 2, & 8 \leq m \leq 11, \\ 3, & 12 \leq m, \end{matrix}

L_{i} = \frac{1}{m - k} \sum_{j = 1}^{m - k} L_{i j}^{(- k_{\min})}, U_{i} = \frac{1}{m - k} \sum_{j = 1}^{m - k} U_{i j}^{(- k_{\max})},

where

L_{i j}^{(- k_{\min})}

represents the lower bound of the remaining prediction intervals after deleting the minimum k values and

U_{i j}^{(- k_{\max})}

represents the upper bound of the remaining prediction intervals after deleting the maximum k values. Ensemble-TE not only removes some outliers in the prediction intervals, making it easier to obtain a narrow ensemble interval, but also uses the advantage of mean value, representing the centralized trend of the remaining prediction interval. Thus, the entire information is used as fully as possible without being affected by larger extreme values.

Ensemble-TI means that

2 k

points at the innermost part of the prediction interval are deleted and then averaged. The expression is as follows:

L_{i} = \frac{1}{m - k} \sum_{j = 1}^{m - k} L_{i j}^{(- k_{\max})}, U_{i} = \frac{1}{m - k} \sum_{j = 1}^{m - k} U_{i j}^{(- k_{\min})},

where

L_{i j}^{(- k_{\max})}

represents the lower bound of the remaining prediction intervals after deleting the maximum k values and

U_{i j}^{(- k_{\min})}

represents the upper bound of the remaining prediction intervals after deleting the minimum k values. Ensemble-TI retains the information on the outside of the prediction intervals and deletes the innermost information, which may help to appropriately improve the coverage of interval prediction.

Ensemble-PM [23] means that the ensemble interval is obtained by simply averaging the endpoints of the prediction interval after probability averaging, and it is assumed that the prediction interval of each point is established based on normal distribution. If

Y_{i} \sim N (μ, σ)

, then

Z_{i} = \frac{Y_{i} - μ}{σ} \sim N (0, 1)

, and the pth quantile is expressed, as follows:

P (Y_{i} \leq y) = P (Z_{i} \leq \frac{y - μ}{σ}) = Φ (\frac{y - μ}{σ}) = p .

Therefore, the pth quantile on

N (μ, σ)

can be expressed as [43]:

y = μ + σ Φ^{- 1} (p) .

Then, average the quantile values corresponding to the ensemble interval on m cumulative distribution functions and make it equal to the quantile of the endpoint of the interval at the given confidence level

α

; here,

p = (1 - α) / 2, (1 + α) / 2

, and the upper and lower bounds of the ensemble interval satisfy the following formula:

\frac{1}{m} \sum_{j = 1}^{m} F_{i j} (L_{i}) = (1 - α) / 2, \frac{1}{m} \sum_{j = 1}^{m} F_{i j} (U_{i}) = (1 + α) / 2,

where

F_{i j}

is the cumulative distribution function of the jth prediction interval at the ith point. Ensemble-PM makes full use of the probability information of the prediction interval.

3. Data and Evaluation Indexes

3.1. Data Description and Processing

In this section, we introduce the data sources used in this study and how we process the data, including the selection of the period of data, the selection of variables, display of variable correlation, data preprocessing, and division of training, validation, and test sets.

The data used in this study are from the desert knowledge Australia solar center (DKASC) in Alice Springs, Australia at http://dkasolarcentre.com.au/locations/alice-springs (last access: 5 September 2022). This platform provides historical data queries, real-time displays of solar power generation, and relevant meteorological information on power station systems of various manufacturers. In this study, we select the data of the 31st site established in 2013. The time range is from 1 April 2014 to 31 October 2015 with an interval of 15 min and 5 min. The aim of this study is to predict the active power of solar energy. Since there is no solar energy at night to generate power, we limit the time range of each day to 5:30–19:00. The opted variables include three parts. First is the physical variables, including wind speed, weather temperature (Celsius), weather relative humidity, global horizontal radiation, and diffuse horizontal radiation. Second is the time variables, including hour and month. To ensure continuity, the month is mapped to cosine and sine values of the month, which are represented as month_cos and month_sin. The respective expressions are as follows:

month_\cos = c o s (2 π \cdot month / 12), month_\sin = s i n (2 π \cdot month / 12) .

Considering the autocorrelation of the solar active power, in the third part, we choose four lagged values t-15, t-30, t-45, and t-60 of active power for data with an interval of 15 min, which respectively corresponds to the power values 15, 30, 45, and 60 min before the current time. Then, we calculate the missing proportion of each variable and find that only active power is missing, which is 0.03%. Because the missing proportion is very low, it can be deleted. Next, we draw the heatmap of Pearson correlation coefficients between the relevant variables of the model, as demonstrated in Figure 3. Correspondingly, we select four lagged values for data with an interval of 5 min and do the same process above. Given that the results are similar to that of the 15-min interval, we only illustrate the latter there.

Figure 3 reflects that the active power has a strong linear correlation with the global horizontal radiation and four lagged values t-15, t-30, t-45, and t-60. It also has a weak linear correlation with diffuse horizontal radiation, wind speed, weather temperature (Celsius), and weather relative humidity. In particular, it seems to have a very weak linear correlation with month_sin, month_cos, and hour. However, this only represents linear correlation. Further modeling and analysis are required for other correlations. In terms of the division of data set, we regard the period from 1 April 2014 to 30 June 2015 as the training set, 1 July 2015 to 31 August 2015 as the validation set, and 1 September 2015 to 31 October 2015 as the test set.

3.2. Performance Evaluation Indexes

3.2.1. Evaluation Indexes for Point Prediction

In this section, we introduce five indexes to evaluate the accuracy and fitting degree of the point prediction, including mean absolute error (MAE), root mean square error (RMSE), fitting coefficient (

R^{2}

), mean absolute percentage error (MAPE), and symmetric mean absolute percentage error (SMAPE), which are used to evaluate the difference between the predicted results of the model and real values. The specific expressions of the five indicators are as follows:

MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|,

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}},

R^{2} = \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{y})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

MAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \times 100 %,

SMAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{|y_{i}| + |{\hat{y}}_{i}|}| \times 100 %,

where

\bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}

represents the mean of the actual value,

y_{i}

represents the actual value,

{\hat{y}}_{i}

represents the predicted value, and n is the sample size. MAE, RMSE, MAPE, and SMAPE measure the degree to which the predicted value deviates from the real value. The smaller the value of the four indexes, the closer the predicted value is to the real value and the stronger is the prediction ability of the model.

R^{2}

measures the fitting degree of the predicted value to the real value; that is, the extent of the proportion of variance of real value is explained by the predicted value of the model. The closer

R^{2}

is to 1, the better is the fitting effect of the model.

3.2.2. Evaluation Indexes for Interval Prediction

To measure the quality of prediction interval, we select five indicators, including PICP, PINAW, CWC, Winkler score, and MPICD to evaluate the reliability, accuracy, and relative position with the real value of the prediction interval.

PICP represents the proportion of the predicted interval covering the real value, which is used to measure the reliability of the interval. The prediction interval is recorded as

[L_{i}, U_{i}], i = 1, 2, \dots, n

, and the expression of PICP is as follows:

PICP = \frac{1}{n} \sum_{i = 1}^{n} I (y_{i} \in [L_{i}, U_{i}]),

where

I (\cdot)

is the indicative function, which means

I (y_{i} \in [L_{i}, U_{i}]) = 1

if

y_{i} \in [L_{i}, U_{i}]

, otherwise

I (y_{i} \in [L_{i}, U_{i}]) = 0

. Larger PICP ensures that more real values are covered and the prediction interval is more reliable. Generally speaking, for a given confidence level

α

, PICP should be as close as possible to

α

when it is greater than or equal to

α

[44].

PINAW refers to the relative average width of the prediction interval after standardization, which is used to measure the accuracy of the interval. The expression of PINAW is as follows:

PINAW = \frac{1}{n R} \sum_{i = 1}^{n} (U_{i} - L_{i}),

where

R = \max (y_{i}) - \min (y_{i})

represents the range of data, which is used to further standardize the average width of the interval to exclude the width change caused by the variation of real value. Smaller PINAW results in a narrow relative prediction range and more accurate prediction interval. If PINAW is too large, it may easily close to PICP. However, the relative prediction range will be too wide to capture the detailed changes.

However, the two indicators mentioned above are contradictory, which means that they consider only one characteristic of the prediction interval. When the PICP is large enough, PINAW is also large. When the PINAW is small, the PICP often fails to meet the confidence level. Thus, it is difficult to comprehensively evaluate the interval quality. Therefore, we also select the CWC and Winkler score to comprehensively evaluate the reliability and accuracy of the prediction interval.

CWC will punish the prediction interval of PICP that does not reach the confidence level accordingly, and the expression is as follows:

CWC = PINAW (1 + I (PICP < α) e^{- η (PICP - α)}),

where

α

represents the confidence level and

η

represents the penalty factor, which is used to scale the difference between PICP and

α

. When the PICP of the interval reaches the confidence level, CWC will be similar to PINAW. When PICP does not reach the confidence level, the corresponding penalty will be made to make the CWC larger. Therefore, smaller CWC better ensures the quality of the prediction interval. In this study, penalty factors 25 and 50 are opted for evaluation. However, it is noted that the excessive penalty of CWC results in part from the value being too small; thus, the penalty factor we chose is 25.

Winkler score [45] specifically penalizes the interval that does not cover the real value of each point. The expressions are as follows:

w_{i}^{(α)} = U_{i}^{(α)} - L_{i}^{(α)},

S_{i}^{(α)} = \{\begin{matrix} - 2 α w_{i}^{(α)} - 4 (L_{i}^{(α)} - y_{i}), & y_{i} < L_{i}^{(α)}, \\ - 2 α w_{i}^{(α)}, & y_{i} \in [L_{i}^{(α)}, U_{i}^{(α)}], \\ - 2 α w_{i}^{(α)} - 4 (y_{i} - U_{i}^{(α)}), & y_{i} > U_{i}^{(α)}, \end{matrix}

S^{(α)} = \frac{1}{n} \sum_{i = 1}^{n} S_{i}^{(α)},

where

w_{i}^{(α)}

,

S_{i}^{(α)}

, and

S^{(α)}

represent the absolute width of the prediction interval at confidence level

α

, with an interval score at each point, and the final result of the Winkler score. When the real value is not covered by the prediction interval, the interval score will punishes the part exceeding the upper or lower bounds. Therefore, larger the Winkler score, the better the interval quality.

In addition, MPICD is selected to measure the position of the real value in the prediction interval. The specific formula is as follows:

MPICD = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \frac{L_{i}^{(α)} + U_{i}^{(α)}}{2}| .

This formula indicates that MPICD quantifies the average difference between the real value and the midpoint of the prediction interval. The smaller the MPICD is, closer the real value is to the middle point of the interval, and the better is the prediction interval quality.

4. Results and Discussion

In this section, we conducted empirical research on the real data with both point prediction and interval prediction methods introduced in Section 2, along with modeling, analysis, prediction and model evaluation. All programs are implemented in Python 3.8.5 or R language 4.1.1.

4.1. Results of Point Prediction

For point prediction, the prediction performance of six methods, namely Ridge, RF, GRU, NGB, GBRT-Mean, and GBRT-Med, are compared. For each prediction model, the choice of hyperparameters is crucial for the performance of the model. We use grid search and cross validation on the validation set to evaluate the performance of the model and determine the optimal hyperparameter of each model, which are demonstrated in Table 1. The order of importance of the RF model variables is illustrated in Figure 4.

It can be observed from Figure 4 that the importance of global horizontal radiation ranks first among all independent variables, which constitutes for more than 50% of variable importance, followed by t-15 and t-30.

To ensure a faster convergence rate and learning effect, we standardize the data before establishing GRU model. As a mainstream optimization algorithm, Adam [46] is opted as our optimizer. The number of iterations is set to 1000, learning rate to 0.01, hidden layer to 4, hidden node to 60, and the regularization parameter to 0.0001.

In the NGB model, we choose the Gaussian distribution, and the learning rate is set to 0.01, the number of iterations is 532, and the percent subsample of rows to use in each boosting iteration is 0.4. The importance ranking of location parameter variables is depicted in the left panel of Figure 5. Like the RF results, the importance of global horizontal radiation ranks is first followed by t-15, and the importance of month_cos is higher than month_sin. However, the third one is slightly different, which is the diffuse horizontal radiation.

In the GBRT-Mean method, we set the maximum depth of the tree to 5, the number of boosting stages to perform is 400, the minimum number of samples required to split an internal node is 10, the minimum number of samples required to be at a leaf node is 15, and the learning rate is 0.05. For the median method, GBRT-Med, we set the maximum depth of the tree to 15, the number of boosting stages to perform is 400, the minimum number of samples required to split an internal node is 15, the minimum number of samples required to be at a leaf node is 10, and the learning rate is 0.15.

To measure the closeness between the predicted and real values, the relevant evaluation indexes of the prediction results of the six models mentioned earlier are calculated to evaluate the prediction performance of the models, which are listed in Table 2. The best results under each evaluation index are displayed in bold according to the description in Section 3.2.1.

Table 2 reveals that compared to other models, MAE, RMSE, MAPE, and SMAPE calculated from the prediction results of RF model are the minimum and

R^{2}

is the maximum, which indicates that the RF model has the best performance. However, the difference with RF in most of the indicators in GBRT-Mean and GBRT-Med are negligible, indicating that their performances are also similar, followed by GRU and NGB. The worst is Ridge regression, in which the accuracy is far from that of other methods.

4.2. Results of Interval Prediction

In this study, we have compared the performance and prediction interval quality of twelve relatively new interval prediction methods proposed recently. Then, we have used six ensemble methods for a combination of prediction intervals of the above-mentioned partial methods to obtain better prediction intervals.

Specifically, the following methods are chosen: KDE, including GRU-KDE, RF-KDE, Ridge-KDE, GBRT-Mean-KDE, and GBRT-Med-KDE based on their point prediction residuals, J+ab, including J+ab-Ridge, J+ab-MLP, and J+ab-RF, RF-OOB, QRF, and SC-RF based on random forest, and NGB.

For the KDE method, we train the point prediction models, including GRU, RF, Ridge, GBRT-Mean, and GBRT-Med (see Section 4.1 for the training process and hyperparameter adjustment). Then, the error between the predicted and actual values of the five models is calculated. Next, we estimate the kernel density bandwidth of the residual by cross validation, of which the values are from 0.005 to 0.15. The values obtained for the five models are 0.040, 0.016, 0.072, 0.034, and 0.026, respectively. Then, we calculate the cumulative distribution function and corresponding quantiles. When the confidence levels are 95%, 90%, 85%, and 80%, respectively, the upper and lower quantiles corresponding to the five methods are demonstrated in Table 3.

For the J+ab method, the multi-layer perceptron in J+ab-MLP chooses the Adam optimizer, maximum number of iterations is set to 8000, activation function selected is tanh, the sizes of the three hidden layers are 50, 40 and 30 in order, and the regularization parameter is 0.001. The parameter settings of J+ab-Ridge and J+ab-RF base learners are the same as those in Section 4.1.

For the NGB method, the specific modeling process and parameters are described in Section 4.1. For the interval prediction in NGB, its scale parameters are more important. The order of importance of the variables affecting scale parameters is depicted in the right panel of Figure 5. It can be observed that the three most important factors affecting the scale parameters are the global horizontal radiation, t-15, and diffuse horizontal radiation.

For the prediction intervals established by the above method, we draw the resulting graph of active power from 1 September 2015 to 4 September 2015, as illustrated in Figure 6. In addition, the corresponding weather conditions including wind speed, temperature (Celsius), and relative humidity are depicted in Figure 7. It can be observed from Figure 6 that, in general, the variation trend of the prediction intervals obtained by the twelve methods is similar to that of the actual value, and they include most of the actual values. Among them, the prediction interval obtained by J+ab-Ridge, J+ab-MLP, and J+ab-RF is relatively wide, followed by Ridge-KDE, while other methods are relatively narrow.

In addition, we calculate PICP, PINAW, Winkler score, CWC, and MPICD from the prediction interval of each method and the results are depicted in Figure 8. The specific values at the 95% confidence level are listed in Table 4 (see Table A1, Table A2 and Table A3 for the results at other confidence levels).

According to the results in Figure 8, the models with PICP close to the nominal level are J+ab-Ridge, RF-OOB, SC-RF, NGB, GRU-KDE, and Ridge-KDE for different indexes under each confidence level. The models with narrow PINAW and high Winkler score are RF-OOB, SC-RF, QRF, NGB, GRU-KDE, RF-KDE, GBRT-Mean-KDE and GBRT-Med-KDE, which are less than 0.12 and greater than −1.1, respectively. The models with smaller CWC are RF-OOB, SC-RF, QRF, NGB, GRU-KDE, Ridge-KDE, and GBRT-Mean-KDE, all of which are less than 0.23. The models with smaller MPICD are RF-OOB, SC-RF, QRF, NGB, GRU-KDE, RF-KDE, Ridge-KDE, GBRT-Mean-KDE, and GBRT-Med-KDE, all of which are less than 0.13.

More specifically, for the performance of each method, although the PICP of the three J+ab methods are basically close to the confidence level, PINAW is too large to be accurate enough and other indexes are not ideal as well. The PICP of QRF and GBRT-Mean-KDE sometimes cannot reach the confidence level. However, the value is relatively close with other indexes, which are relatively ideal. The PICP of RF-KDE and GBRT-Med-KDE are far from the given confidence level. Compared with other methods, the CWC of RF-KDE and GBRT-Med-KDE are larger, while other indexes are ideal. The PICP of Ridge-KDE is achieved, but the PINAW is slightly higher resulting in a relatively wide interval. In contrast, NGB, GRU-KDE, RF-OOB, and SC-RF perform well with all indexes.

In addition, we also compare the total computational time of each method for obtaining the prediction interval under four confidence levels, as depicted in Figure 9. The training was completed by a personal computer with AMD R7-5800h CPU, 3.20 GHz processor and 16 GB memory. It can be observed from Figure 9 that the methods with the shortest time are RF-OOB, NGB, and SC-RF, which are all less than 40 s, followed by QRF, J+ab-Ridge, J+ab-MLP, and RF-KDE, all of which are less than 200 s. Meanwhile, the most time-consuming methods are J+ab-RF, GBRT-Mean-KDE, GBRT-Med-KDE, GRU-KDE, and Ridge-KDE, all of which have taken more than 200 s. In general, the KDE method takes a longer time than other methods, probably due to the time-consuming cross validation when selecting the bandwidth.

To ensure that the results are more stable, we have repeated the tests ten times. As illustrated in Figure 10, the results obtained by J+ab method are unstable, with values fluctuating by more than 0.1 for PICP and 0.05 for PINAW, while those obtained by other methods are overlapped and almost close to a straight line, indicating having more stable results, which are not shown in the graph for aesthetic reasons. Therefore, we only demonstrated the test corresponding to the fifth PICP of each method in the ten tests when the confidence level was 95% in the previous results.

Based on the above results, the J+ab method will not be considered in the further ensemble interval with the above-mentioned methods because it is already an ensemble method, performs poorly in various indicators, and has a number of outliers in the prediction interval.

As for the ensemble part, we have implemented six methods, including Ensemble-Mean, Ensemble-Med, Ensemble-En, Ensemble-TE, Ensemble-TI, and Ensemble-PM to combine the prediction intervals obtained by nine methods, namely RF-OOB, SC-RF, QRF, NGB, GRU-KDE, RF-KDE, Ridge-KDE, GBRT-Mean-KDE, and GBRT-Med-KDE to obtain the ensemble prediction intervals. Then, we compare the results from the ensemble methods with those of the previously implemented models with best performance, such as RF-OOB, SC-RF, NGB, and GRU-KDE, and calculate the five indexes, namely PICP, PINAW, Winkler score, CWC, and MPICD. The results are reflected in Figure 11, in which the method with the best performance on the basis of four indexes except PICP is marked in black, and the corresponding confidence levels are indicated by dotted lines. Moreover, we have listed the index comparison results under 90% confidence level in Table 5, in which the method with the best performance based on the four indexes except PICP is marked in bold (see Tables Table A4, Table A5 and Table A6 for the results at other confidence levels). In addition, we have also compared the data with an interval of 5 min according to the same method. The corresponding results are illustrated in Figure 12 and Table 5 (see Table A7, Table A8 and Table A9 for the results at other confidence levels).

It can be observed from Figure 11 and Figure 12, Table 5 and Table 6 that the PICP of all ensemble methods reaches the confidence level at both the time intervals of fifteen and five minutes. It is demonstrated that Ensemble-TE is the optimal method at all confidence levels with the value of PINAW and CWC being significantly smaller than that of other methods, and the value of Winkler score is higher than others. Meanwhile, in most cases, Ensemble-TE method has the smallest MPICD. It only reflects that the optimal method with MPICD is Ensemble-mean at 80% confidence level with an interval of fifteen minutes. Similarly, the optimal method with MPICD is RF-OOB at 95% confidence level with an interval of five minutes. However, in these two cases, it can be found that a slight difference exists between the results of Ensemble-TE method and that of the optimal method within

10^{- 3}

. Therefore, it is considered that Ensemble-TE is the best interval prediction method among all the other methods.

Finally, the numerical results of five interval quality indicators at different confidence levels are described in Table 7 and Table 8, respectively.

In addition, the indexes calculated by the interval prediction results in this paper also perform better than those in other literatures. To be more specific, when using the A-GRU-KDE [15] method on the same data set with a fifteen-min and a five-min interval, the PINAW of the prediction interval obtained are 0.258 and 0.195, respectively, at a confidence level of 95%. In the same case, however, the results of PINAW in our method are 0.066 and 0.063, respectively, indicating that the relative average width of the interval is reduced by nearly three quarters with guaranteed coverage. Meanwhile, the Winkler scores in previously implemented methods are −2.39 and −1.88, respectively, at a confidence level of 95%. In contrast, the Winkler scores in our method are −0.608 and −0.591, and increased by about 70%, which indicates a high-quality prediction interval. Similar conclusions can be obtained at other confident levels, including 90%, 85%, and 80%. Thus, the proposed method exhibits increased accuracy and higher quality prediction interval on the same DKASC data set.

5. Conclusions

In this study, we implement six methods to obtain point prediction for solar photovoltaic power generation, namely Ridge regression, RF, GRU, NGB, GBRT-Mean, and GBRT-Med. On comparing the prediction results of these methods, we found that the results of RF method are the best under five point prediction evaluation indexes, including MAE, RMSE,

R^{2}

, MAPE, and SMAPE.

We also compare twelve interval prediction methods in this paper, namely J+ab-Ridge, J+ab-MLP, J+ab-RF, RF-OOB, SC-RF, QRF, NGB, GRU-KDE, RF-KDE, Ridge-KDE, GBRT-Mean-KDE, and GBRT-Med-KDE. For the performance of each method, although the PICP of the three J+ab methods are basically close to the confidence level, PINAW is too large to be accurate enough and other indicators are not ideal as well. The PICP of QRF and GBRT-Mean-KDE cannot reach the confidence level in some cases, but they are relatively close, and the other indicators are relatively ideal. The PICP of RF-KDE and GBRT-Med-KDE are far from the given confidence level. In addition, the CWC of RF-KDE and GBRT-Med-KDE are larger, while other indicators are ideal. The PICP of Ridge-KDE is achieved; however, the PINAW of it is slightly higher, which leads to a relatively wide interval. In contrast, NGB, GRU-KDE, RF-OOB, and SC-RF perform well with all indicators. From the perspective of the computational time of each method, the methods with the shortest time are RF-OOB, NGB, and SC-RF, all of which are less than 40 s, followed by QRF, J+ab-Ridge, J+ab-MLP, and RF-KDE with a time within 200 s. The most time-consuming methods are J+ab-RF, GBRT-Mean-KDE, GBRT-Med-KDE, GRU-KDE, and Ridge-KDE, which all took more than 200 s.

Furthermore, we use six methods of ensemble interval prediction, namely Ensemble-Med, Ensemble-Mean, Ensemble-En, Ensemble-TE, Ensemble-TI, and Ensemble-PM to combine the prediction intervals obtained by nine methods, including RF-OOB, SC-RF, QRF, NGB, GRU-KDE, RF-KDE, Ridge-KDE, GBRT-Mean-KDE, and GBRT-Med-KDE, with an aim to obtain the ensemble prediction intervals. Then, we compare the ensemble results with the previously implemented models with the best performance, such as RF-OOB, SC-RF, NGB, and GRU-KDE. Finally, it is found that the prediction interval obtained by Ensemble-TE is basically optimal under five indicators, including PICP, PINAW, Winkler score, CWC, and MPICD, and at four confidence levels, including 95%, 90%, 85%, and 80%. Therefore, we propose an ensemble interval prediction for solar photovoltaic power generation, which is, a combination of the above nine interval prediction methods using Ensemble-TE method. The results demonstrate that, compared to other methods, the proposed method can obtain the interval with an coverage close to the nominal level, higher accuracy, and closer prediction between the interval and actual value, with an interval of fifteen and five minutes, at four confidence levels of 95%, 90%, 85%, and 80%.

In the future, we will focus on applying the ensemble or combination method of prediction for solar photovoltaic power generation power in other regions or countries and try to make long-term predictions, such as 24 h. In addition, we will further utilize other state-of-the-art interval prediction methods for ensemble prediction to obtain more reliable and accurate prediction intervals. Furthermore, we will further explore the impact of weather uncertainty on the model and find methods possessing good prediction performance with insensitivity to weather changes.

Author Contributions

Conceptualization, Y.Z. and T.H.; methodology, Y.Z. and T.H.; software, Y.Z.; validation, Y.Z. and T.H.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, T.H.; visualization, Y.Z.; supervision, T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Beijing Natural Science Foundation Z210003 and National Nature Science Foundation of China (Grant Nos: 12171328, 11971064).

Data Availability Statement

The datasets from the desert knowledge Australia solar center are available online at http://dkasolarcentre.com.au/locations/alice-springs.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CART	Classification and regression trees
CWC	Coverage width-based criterion
En	Envelop
GBRT	Gradient boosting regression tree
GRU	Gated recurrent unit
J+ab	Jackknife+-after-bootstrap
KDE	Kernel density estimation
LSTM	Long short-term memory
MAE	Mean absolute error
MAPE	Mean absolute percentage error
Med	Median
MLP	Multi-layer perceptron
MSE	Mean squared error
NGB	Natural gradient boosting
OOB	Out-of-bag
PICP	Prediction interval coverage probability
PINAW	Prediction interval normalized averaged width
PM	Probability averaging of endpoints and simple averaging of midpoints
QRF	Quantile regression forests
$R^{2}$	Fitting coefficient
RF	Random forests
RMSE	Root mean square error
SC	Split conformal
SMAPE	Symmetric mean absolute percentage error
TE	Exterior trimming
TI	Interior trimming

Appendix A. Other Results of Twelve Interval Prediction Methods

In this section, we demonstrate other results of twelve interval prediction methods at other confidence levels, including 90%, 85%, and 80%, as a supplement to Section 4.2.

Table A1. Results of five interval quality indexes of prediction interval of twelve methods (90% confidence level).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
J+ab-Ridge	0.963	0.179	−1.563	0.179	0.178
J+ab-MLP	0.954	0.307	−2.661	0.307	0.317
J+ab-RF	0.900	0.352	−3.063	0.709	0.530
RF-OOB	0.927	0.065	−0.590	0.065	0.048
SC-RF	0.927	0.069	−0.630	0.069	0.052
QRF	0.924	0.077	−0.665	0.077	0.060
NGB	0.931	0.066	−0.577	0.066	0.066
GRU-KDE	0.925	0.056	−0.501	0.056	0.055
RF-KDE	0.736	0.022	−0.257	1.345	0.045
Ridge-KDE	0.954	0.146	−1.275	0.146	0.126
GBRT-Mean-KDE	0.891	0.049	−0.447	0.110	0.052
GBRT-Med-KDE	0.723	0.026	−0.289	2.159	0.051

Table A2. Results of five interval quality indexes of prediction interval of twelve methods (85% confidence level).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
J+ab-Ridge	0.945	0.164	−1.372	0.164	0.185
J+ab-MLP	0.834	0.284	−2.4	0.709	0.408
J+ab-RF	0.823	0.31	−2.581	0.918	0.499
RF-OOB	0.892	0.045	−0.413	0.045	0.048
SC-RF	0.893	0.049	−0.454	0.049	0.052
QRF	0.901	0.063	−0.514	0.063	0.056
NGB	0.899	0.057	−0.487	0.057	0.066
GRU-KDE	0.881	0.046	−0.405	0.046	0.055
RF-KDE	0.652	0.017	−0.221	2.371	0.045
Ridge-KDE	0.935	0.128	−1.071	0.128	0.126
GBRT-Mean-KDE	0.84	0.041	−0.366	0.093	0.052
GBRT-Med-KDE	0.641	0.02	−0.249	3.775	0.05

Table A3. Results of five interval quality indexes of prediction interval of twelve methods (80% confidence level).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
J+ab-Ridge	0.905	0.148	−1.203	0.148	0.206
J+ab-MLP	0.834	0.237	−1.912	0.237	0.34
J+ab-RF	0.808	0.275	−2.176	0.275	0.474
RF-OOB	0.842	0.033	−0.316	0.033	0.048
SC-RF	0.841	0.035	−0.341	0.035	0.052
QRF	0.876	0.053	−0.415	0.053	0.053
NGB	0.869	0.051	−0.42	0.051	0.066
GRU-KDE	0.845	0.041	−0.347	0.041	0.055
RF-KDE	0.588	0.014	−0.202	2.814	0.045
Ridge-KDE	0.906	0.114	−0.918	0.114	0.126
GBRT-Mean-KDE	0.786	0.034	−0.308	0.082	0.052
GBRT-Med-KDE	0.604	0.018	−0.231	2.488	0.05

Appendix B. Other Results of Six Ensemble Methods and Four Interval Prediction Methods

In this section, we demonstrate other results of six ensemble methods and four interval prediction methods, including 95%, 85%, and 80%, as a supplement to Section 4.2. The method with the best performance of the other four indicators except PICP is also marked in bold.

Table A4. Results of five interval quality indexes measured by four interval prediction methods and six ensemble methods (95% confidence level, 15-min interval).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
RF-OOB	0.968	0.108	−0.983	0.108	0.048
SC-RF	0.965	0.115	−1.046	0.115	0.052
NGB	0.961	0.078	−0.714	0.078	0.066
GRU-KDE	0.964	0.075	−0.684	0.075	0.054
Ensemble-Med	0.976	0.079	−0.719	0.079	0.052
Ensemble-Mean	0.983	0.089	−0.809	0.089	0.049
Ensemble-En	0.998	0.209	−1.880	0.209	0.112
Ensemble-TE	0.960	0.066	−0.608	0.066	0.045
Ensemble-TI	0.992	0.107	−0.962	0.107	0.052
Ensemble-PM	0.993	0.125	−1.123	0.125	0.072

Table A5. Results of five interval quality indexes measured by four interval prediction methods and six ensemble methods (85% confidence level, 15-min interval).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
RF-OOB	0.892	0.045	−0.413	0.045	0.048
SC-RF	0.893	0.049	−0.454	0.049	0.052
NGB	0.899	0.057	−0.487	0.057	0.066
GRU-KDE	0.881	0.046	−0.405	0.046	0.055
Ensemble-Med	0.917	0.045	−0.383	0.045	0.045
Ensemble-Mean	0.941	0.052	−0.437	0.052	0.048
Ensemble-En	0.995	0.145	−1.17	0.145	0.108
Ensemble-TE	0.873	0.035	−0.319	0.035	0.044
Ensemble-TI	0.965	0.064	−0.523	0.064	0.051
Ensemble-PM	0.963	0.062	−0.508	0.062	0.055

Table A6. Results of five interval quality indexes measured by four interval prediction methods and six ensemble methods (80% confidence level, 15-min interval).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
RF-OOB	0.842	0.033	−0.316	0.033	0.048
SC-RF	0.841	0.035	−0.341	0.035	0.052
NGB	0.869	0.051	−0.42	0.051	0.066
GRU-KDE	0.845	0.041	−0.347	0.041	0.055
Ensemble-Med	0.883	0.036	−0.303	0.036	0.044
Ensemble-Mean	0.91	0.044	−0.358	0.044	0.048
Ensemble-En	0.992	0.13	−0.99	0.13	0.106
Ensemble-TE	0.824	0.028	−0.261	0.028	0.044
Ensemble-TI	0.95	0.054	−0.426	0.054	0.05
Ensemble-PM	0.937	0.049	−0.389	0.049	0.052

Table A7. Results of five interval quality indexes measured by four interval prediction methods and six ensemble methods (95% confidence level, 5-min interval).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
RF-OOB	0.968	0.103	−0.947	0.103	0.041
SC-RF	0.966	0.113	−1.044	0.113	0.044
NGB	0.97	0.074	−0.684	0.074	0.061
GRU-KDE	0.966	0.076	−0.704	0.076	0.049
Ensemble-Med	0.97	0.074	−0.684	0.074	0.047
Ensemble-Mean	0.978	0.085	−0.775	0.085	0.043
Ensemble-En	0.997	0.195	−1.755	0.195	0.102
Ensemble-TE	0.957	0.063	−0.591	0.063	0.041
Ensemble-TI	0.985	0.1	−0.912	0.1	0.046
Ensemble-PM	0.99	0.117	−1.056	0.117	0.064

Table A8. Results of five interval quality indexes measured by four interval prediction methods and six ensemble methods (85% confidence level, 5-min interval).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
RF-OOB	0.893	0.038	−0.365	0.038	0.041
SC-RF	0.892	0.04	−0.387	0.04	0.043
NGB	0.927	0.054	−0.467	0.054	0.061
GRU-KDE	0.894	0.042	−0.376	0.042	0.049
Ensemble-Med	0.91	0.035	−0.322	0.035	0.04
Ensemble-Mean	0.936	0.043	−0.377	0.043	0.042
Ensemble-En	0.991	0.123	−0.998	0.123	0.096
Ensemble-TE	0.884	0.028	−0.277	0.028	0.039
Ensemble-TI	0.956	0.052	−0.445	0.052	0.044
Ensemble-PM	0.956	0.052	−0.437	0.052	0.049

Table A9. Results of five interval quality indexes measured by four interval prediction methods and six ensemble methods (80% confidence level, 5-min interval).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
RF-OOB	0.85	0.026	−0.27	0.026	0.041
SC-RF	0.851	0.028	−0.288	0.028	0.043
NGB	0.901	0.048	−0.402	0.048	0.061
GRU-KDE	0.842	0.034	−0.308	0.034	0.049
Ensemble-Med	0.877	0.027	−0.251	0.027	0.039
Ensemble-Mean	0.914	0.035	−0.305	0.035	0.042
Ensemble-En	0.987	0.111	−0.847	0.111	0.094
Ensemble-TE	0.838	0.021	−0.222	0.021	0.039
Ensemble-TI	0.942	0.044	−0.359	0.044	0.044
Ensemble-PM	0.934	0.039	−0.328	0.039	0.046

References

Hosenuzzaman, M.; Rahim, N.A.; Selvaraj, J.; Hasanuzzaman, M.; Malek, A.A.; Nahar, A. Global prospects, progress, policies, and environmental impact of solar photovoltaic power generation. Renew. Sustain. Energy Rev. 2015, 41, 284–297. [Google Scholar] [CrossRef]
Parida, B.; Iniyan, S.; Goic, R. A review of solar photovoltaic technologies. Renew. Sustain. Energy Rev. 2011, 15, 1625–1636. [Google Scholar] [CrossRef]
Tawn, R.; Browell, J. A review of very short-term wind and solar power forecasting. Renew. Sustain. Energy Rev. 2022, 153, 111758. [Google Scholar] [CrossRef]
Ahmed, R.; Sreeram, V.; Mishra, Y.; Arif, M.D. A review and evaluation of the state-of-the-art in PV solar power forecasting: Techniques and optimization. Renew. Sustain. Energy Rev. 2020, 124, 109792. [Google Scholar] [CrossRef]
Sobri, S.; Koohi-Kamali, S.; Rahim, N.A. Solar photovoltaic generation forecasting methods: A review. Energy Conv. Manag. 2018, 156, 459–497. [Google Scholar] [CrossRef]
Wang, Y.; Liao, W.; Chang, Y. Gated recurrent unit network-based short-term photovoltaic forecasting. Energies 2018, 11, 2163. [Google Scholar] [CrossRef]
Benali, L.; Notton, G.; Fouilloy, A.; Voyant, C.; Dizene, R. Solar radiation forecasting using artificial neural network and random forest methods: Application to normal beam, horizontal diffuse and global components. Renew. Energy 2019, 132, 871–884. [Google Scholar] [CrossRef]
Dash, D.R.; Dash, P.K.; Bisoi, R. Short term solar power forecasting using hybrid minimum variance expanded RVFLN and Sine-Cosine Levy Flight PSO algorithm. Renew. Energy 2021, 174, 513–537. [Google Scholar] [CrossRef]
Elsaraiti, M.; Merabet, A. Solar power forecasting using deep learning techniques. IEEE Acc. 2022, 10, 31692–31698. [Google Scholar] [CrossRef]
Shedbalkar, K.H.; More, D.S. Bayesian Regression for Solar Power Forecasting. In Proceedings of the 2nd International Conference on Artificial Intelligence and Signal Processing (AISP), Vijayawada, India, 12–14 February 2022; pp. 1–4. [Google Scholar]
Elizabeth Michael, N.; Mishra, M.; Hasan, S.; Al-Durra, A. Short-term solar power predicting model based on multi-step CNN stacked LSTM technique. Energies 2022, 15, 2150. [Google Scholar] [CrossRef]
Almeida, M.P.; Perpinan, O.; Narvarte, L. PV power forecast using a nonparametric PV model. Solar Energy 2015, 115, 354–368. [Google Scholar] [CrossRef]
Ni, Q.; Zhuang, S.; Sheng, H.; Kang, G.; Xiao, J. An ensemble prediction intervals approach for short-term PV power forecasting. Solar Energy 2017, 155, 1072–1083. [Google Scholar] [CrossRef]
Huang, J.; Perry, M. A semi-empirical approach using gradient boosting and k-nearest neighbors regression for GEFCom2014 probabilistic solar power forecasting. Int. J. Forecast. 2016, 32, 1081–1086. [Google Scholar] [CrossRef]
Pan, C.; Tan, J.; Feng, D. Prediction intervals estimation of solar generation based on gated recurrent unit and kernel density estimation. Neurocomputing 2021, 453, 552–562. [Google Scholar] [CrossRef]
Wang, Y.; Feng, B.; Hua, Q.S.; Sun, L. Short-term solar power forecasting: A combined long short-term memory and gaussian process regression method. Sustainability 2021, 13, 3665. [Google Scholar] [CrossRef]
Ramkumar, G.; Sahoo, S.; Amirthalakshmi, T.M.; Ramesh, S.; Prabu, R.T.; Kasirajan, K.; Ranjith, A. A short-term solar photovoltaic power optimized prediction interval model based on FOS-ELM algorithm. Int. J. Photoenergy 2021, 2021, 3981456. [Google Scholar] [CrossRef]
Li, X.; Ma, L.; Chen, P.; Xu, H.; Xing, Q.; Yan, J.; Cheng, Y. Probabilistic solar irradiance forecasting based on XGBoost. Energy Rep. 2022, 2021, 1087–1095. [Google Scholar] [CrossRef]
Chen, W.H.; Cheng, L.S.; Chang, Z.P.; Zhou, H.T.; Yao, Q.F.; Peng, Z.M.; Chen, Z.X. Interval Prediction of Photovoltaic Power Using Improved NARX Network and Density Peak Clustering Based on Kernel Mahalanobis Distance. Complexity 2022, 2022, 8169510. [Google Scholar] [CrossRef]
Zhang, X.; Zou, G. Model averaging method and its application in forecasting. Stat. Res. 2011, 28, 97–102. (In Chinese) [Google Scholar]
Gneiting, T.; Raftery, A.E. Weather forecasting with ensemble methods. Science 2005, 310, 248–249. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Liang, H. Focused information criterion and model averaging for generalized additive partial linear models. Ann. Stat. 2011, 39, 174–200. [Google Scholar] [CrossRef]
Gaba, A.; Tsetlin, I.; Winkler, R.L. Combining interval forecasts. Decis. Anal. 2017, 14, 1–20. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Efron, B. Bootstrap Methods: Another Look at the Jackknife. In Breakthroughs in Statistics; Kotz, S., Johnson, N.L., Eds.; Springer: New York, NY, USA, 1992; pp. 569–593. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning with Applications in R; Springer: New York, NY, USA, 2013. [Google Scholar]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
Hopfield, J.J. Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. USA 1982, 79, 2554–2558. [Google Scholar] [CrossRef]
Liu, X.; Lin, Z.; Feng, Z. Short-term offshore wind speed forecast by seasonal ARIMA-A comparison against GRU and LSTM. Energy 2021, 227, 120492. [Google Scholar] [CrossRef]
Liu, X.; Lin, Z.; Feng, Z. Stochastic gradient boosting. Computat. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar]
Meinshausen, N.; Ridgeway, G. Quantile regression forests. J. Mach. Learn. Res. 2006, 7, 983–999. [Google Scholar]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Duan, T.; Anand, A.; Ding, D.Y.; Thai, K.K.; Basu, S.; Ng, A.; Schuler, A. Ngboost: Natural gradient boosting for probabilistic prediction. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 2690–2700. [Google Scholar]
Kim, B.; Xu, C.; Barber, R. Predictive inference is free with the jackknife+-after-bootstrap. Adv. Neural Inform. Proc. Syst. 2020, 33, 4138–4149. [Google Scholar]
Barber, R.F.; Candes, E.J.; Ramdas, A.; Tibshirani, R.J. Predictive inference with the jackknife+. Ann. Stat. 2021, 49, 486–507. [Google Scholar] [CrossRef]
Zhang, H.; Zimmerman, J.; Nettleton, D.; Nordman, D.J. Random forest prediction intervals. Am. Stat. 2019, 74, 392–406. [Google Scholar] [CrossRef]
Martínez-Muñoz, G.; Suárez, A. Out-of-bag estimation of the optimal sample size in bagging. Pattern Recognit. 2010, 43, 143–152. [Google Scholar] [CrossRef]
Lei, J.; G’Sell, M.; Rinaldo, A.; Tibshirani, R.J.; Wasserman, L. Distribution-free predictive inference for regression. J. Am. Stat. Assoc. 2018, 113, 1094–1111. [Google Scholar] [CrossRef]
Vovk, V.; Gammerman, A.; Shafer, G. Conformal prediction. In Algorithmic Learning in a Random World; Springer: Boston, MA, USA, 2005; pp. 17–51. [Google Scholar]
Koenker, R.; Hallock, K.F. Quantile regression. J. Econ. Perspect. 2001, 15, 143–156. [Google Scholar] [CrossRef]
Keener, R.W. Theoretical Statistics: Topics for a Core Course; Springer: New York, NY, USA, 2010. [Google Scholar]
Khosravi, A.; Nahavandi, S.; Creighton, D. Construction of optimal prediction intervals for load forecasting problems. IEEE Trans. Power Syst. 2010, 25, 1496–1503. [Google Scholar] [CrossRef]
Winkler, R.L. A decision-theoretic approach to interval estimation. J. Am. Stat. Assoc. 1972, 67, 187–191. [Google Scholar] [CrossRef]
Jais, I.K.M.; Ismail, A.R.; Nisa, S.Q. Adam optimization algorithm for wide and deep neural network. Know. Eng. Data Sci. 2019, 2, 41–46. [Google Scholar] [CrossRef]

Figure 1. Flow chart of this study.

Figure 2. Network structure of GRU.

Figure 3. Heatmap of Pearson correlation coefficients between correlated variables of the model.

Figure 4. Ranking of RF variables based on their importance.

Figure 5. Ranking of NGB variables based on their importance (left pane: location parameter, right pane: scale parameter).

Figure 6. Prediction intervals of twelve methods from 1 September 2015 to 4 September 2015 (J+ab-Ridge, J+ab-MLP, J+ab-RF, RF-OOB, SC-RF, QRF, NGB, GRU-KDE, RF-KDE, Ridge-KDE, GBRT-Mean-KDE, and GBRT-Med-KDE, respectively).

Figure 7. Weather conditions including wind speed, temperature (Celsius), and relative humidity from 1 September 2015 to 4 September 2015.

Figure 8. Comparison of PICP, PINAW, Winkler score, CWC, and MPICD of twelve methods (J+ab-Ridge, J+ab-MLP, J+ab-RF, RF-OOB, SC-RF, QRF, NGB, GRU-KDE, RF-KDE, Ridge-KDE, GBRT-Mean-KDE, and GBRT-Med-KDE, respectively).

Figure 9. Comparison of total calculation time of twelve methods.

Figure 10. PICP and PINAW of three J+ab methods (80% confidence level).

Figure 11. Comparison of PICP, PINAW, Winkler score, CWC, and MPICD of nine methods (RF-OOB, SC-RF, NGB, GRU-KDE, Ensemble-Med, Ensemble-Mean, Ensemble-En, Ensemble-TE, Ensemble-TI, Ensemble-PM, respectively, 15-min interval).

Figure 12. Comparison of PICP, PINAW, Winkler score, CWC, and MPICD of nine methods (RF-OOB, SC-RF, NGB, GRU-KDE, Ensemble-Med, Ensemble-Mean, Ensemble-En, Ensemble-TE, Ensemble-TI, Ensemble-PM, respectively, 5-min interval).

Table 1. Hyperparameters of different methods in point prediction.

Forecasting Method	Hyperparameter
RF	Estimators = {100, 125, 150, 175, 200}
	Max depth = {10, 30, 50, 70, 100}
	Max features = {5, 6, 7, 8, 9, 10, 11}
GRU	Learning rate = {0.005, 0.01, 0.015}
	Hidden nodes = {50, 60, 70, 80}
	Hidden layer = {2, 3, 4, 5}
	Regularization parameter = {0.00005, 0.0001, 0.00015, 0.0002}
GBRT	Estimators = {100, 200, 300, 400, 500}
	Learning rate = {0.05, 0.1, 0.15, 0.2}
	Min samples leaf = {5, 10, 15}
	Min samples split = {5, 10, 15, 20}
	Max depth = {5, 10, 15, 20}
NGB	Learning rate = {0.005, 0.01, 0.015}
NGB	Minibatch frac = {0.3, 0.4, 0.5}
Ridge	Alphas = {0.01, 0.1, 1}

Table 2. Performance of six point prediction methods.

Evaluating Indicator	Model
Evaluating Indicator	GRU	NGB	RF	Ridge	GBRT-Mean	GBRT-Med
MAE	0.054	0.066	0.045	0.127	0.052	0.050
RMSE	0.081	0.099	0.079	0.180	0.081	0.080
$R^{2}$ (%)	99.75	99.62	99.76	98.74	99.74	99.75
MAPE	0.788	0.237	0.048	1.120	0.104	0.055
SMAPE	0.086	0.086	0.024	0.154	0.051	0.028

Table 3. Upper and lower quantiles of five KDE methods with different confidence levels.

Method	Confidence Level
Method	95%	90%	85%	80%
GRU-KDE	(−0.176, 0.177)	(−0.131, 0.135)	(−0.108, 0.112)	(−0.094, 0.099)
RF-KDE	(−0.078, 0.075)	(−0.051, 0.052)	(−0.039, 0.040)	(−0.032, 0.035)
Ridge-KDE	(−0.440, 0.436)	(−0.355, 0.335)	(−0.312, 0.293)	(−0.280, 0.261)
GBRT-Mean-KDE	(−0.159, 0.157)	(−0.115, 0.117)	(−0.095, 0.097)	(−0.079, 0.081)
GBRT-Med-KDE	(−0.087, 0.095)	(−0.057, 0.064)	(−0.047, 0.049)	(−0.042, 0.044)

Table 4. Results of five interval quality indicators of prediction interval of twelve methods (95% confidence level).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
J+ab-Ridge	0.975	0.253	−2.301	0.253	0.223
J+ab-MLP	0.981	0.367	−3.316	0.367	0.350
J+ab-RF	0.953	0.402	−3.632	0.402	0.479
RF-OOB	0.968	0.108	−0.983	0.108	0.048
SC-RF	0.965	0.115	−1.046	0.115	0.052
QRF	0.944	0.106	−0.952	0.227	0.074
NGB	0.961	0.078	−0.714	0.078	0.066
GRU-KDE	0.964	0.075	−0.684	0.075	0.054
RF-KDE	0.838	0.032	−0.342	0.562	0.045
Ridge-KDE	0.974	0.185	−1.684	0.185	0.127
GBRT-Mean-KDE	0.951	0.067	−0.617	0.067	0.052
GBRT-Med-KDE	0.837	0.038	−0.391	0.687	0.051

Table 5. Results of five interval quality indicators measured by four interval prediction methods and six ensemble methods (90% confidence level, 15-min interval).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
RF-OOB	0.927	0.065	−0.590	0.065	0.048
SC-RF	0.927	0.069	−0.630	0.069	0.052
NGB	0.931	0.066	−0.577	0.066	0.066
GRU-KDE	0.925	0.056	−0.501	0.056	0.055
Ensemble-Med	0.944	0.057	−0.501	0.057	0.047
Ensemble-Mean	0.961	0.064	−0.559	0.064	0.049
Ensemble-En	0.996	0.165	−1.409	0.165	0.109
Ensemble-TE	0.920	0.045	−0.413	0.045	0.045
Ensemble-TI	0.978	0.077	−0.668	0.077	0.051
Ensemble-PM	0.980	0.082	−0.704	0.082	0.061

Table 6. Results of five interval quality indicators measured by four interval prediction methods and six ensemble methods (90% confidence level, 5-min interval).

Method	Evaluating Indicator
Method	PICP	PINAW	Winkler Score	CWC	MPICD
RF-OOB	0.932	0.059	−0.544	0.059	0.041
SC-RF	0.933	0.066	−0.606	0.066	0.044
NGB	0.948	0.062	−0.553	0.062	0.061
GRU-KDE	0.935	0.052	−0.474	0.052	0.049
Ensemble-Med	0.942	0.048	−0.445	0.048	0.043
Ensemble-Mean	0.955	0.056	−0.502	0.056	0.043
Ensemble-En	0.994	0.143	−1.222	0.143	0.096
Ensemble-TE	0.924	0.039	−0.379	0.039	0.039
Ensemble-TI	0.971	0.067	−0.592	0.067	0.045
Ensemble-PM	0.973	0.072	−0.629	0.072	0.054

Table 7. Results of five interval quality indicators of Ensemble-TE at different confidence levels (15-min interval).

Confidence Level	PICP	PINAW	Winkler Score	CWC	MPICD
0.95	0.960	0.066	−0.608	0.066	0.045
0.90	0.920	0.045	−0.413	0.045	0.045
0.85	0.873	0.035	−0.319	0.035	0.044
0.80	0.824	0.028	−0.261	0.028	0.044

Table 8. Results of five interval quality indicators of Ensemble-TE at different confidence levels (5-min interval).

Confidence Level	PICP	PINAW	Winkler Score	CWC	MPICD
0.95	0.957	0.063	−0.591	0.063	0.041
0.90	0.924	0.039	−0.379	0.039	0.039
0.85	0.884	0.028	−0.277	0.028	0.039
0.80	0.838	0.021	−0.222	0.021	0.039

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Hu, T. Ensemble Interval Prediction for Solar Photovoltaic Power Generation. Energies 2022, 15, 7193. https://doi.org/10.3390/en15197193

AMA Style

Zhang Y, Hu T. Ensemble Interval Prediction for Solar Photovoltaic Power Generation. Energies. 2022; 15(19):7193. https://doi.org/10.3390/en15197193

Chicago/Turabian Style

Zhang, Yaxin, and Tao Hu. 2022. "Ensemble Interval Prediction for Solar Photovoltaic Power Generation" Energies 15, no. 19: 7193. https://doi.org/10.3390/en15197193

APA Style

Zhang, Y., & Hu, T. (2022). Ensemble Interval Prediction for Solar Photovoltaic Power Generation. Energies, 15(19), 7193. https://doi.org/10.3390/en15197193

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ensemble Interval Prediction for Solar Photovoltaic Power Generation

Abstract

1. Introduction

1.1. Motivation and Incitement

1.2. Literature Review and Research Gaps

1.3. Major Contribution and Organization

2. Methodology

2.1. Point Prediction

2.1.1. Random Forest (RF)

2.1.2. Gated Recurrent Unit (GRU)

2.1.3. Gradient Boosting Regression Tree (GBRT)

2.2. Interval Prediction

2.2.1. Kernel Density Estimation (KDE)

2.2.2. Natural Gradient Boosting (NGB)

2.2.3. Jackknife+-after-bootstrap (J+ab)

2.2.4. Random Forest-Out-Of-Bag (RF-OOB)

2.2.5. Split Conformal-Random Forest (SC-RF)

2.2.6. Quantile Regression Forests (QRF)

2.3. Ensemble Method of Interval Prediction

3. Data and Evaluation Indexes

3.1. Data Description and Processing

3.2. Performance Evaluation Indexes

3.2.1. Evaluation Indexes for Point Prediction

3.2.2. Evaluation Indexes for Interval Prediction

4. Results and Discussion

4.1. Results of Point Prediction

4.2. Results of Interval Prediction

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Other Results of Twelve Interval Prediction Methods

Appendix B. Other Results of Six Ensemble Methods and Four Interval Prediction Methods

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI