1. Introduction
Around the world, the full deployment of solar energy is being facilitated by several factors including, but not limited to, the reduced price of solar panels; environmental, political and social concerns; and solar energy undercutting utility prices, inter alia. According to [
1] global installed capacity will double every two years; however, significant factors have been identified which impede the speed at which solar dominance can be achieved: (i) lack of investments in efficiency, (ii) insufficient government incentives, and (iii) regulatory constraints. Small-scale Photovoltaic (PV) installations such those in the residential sector benefit from self-consumption by shifting a load from hours when electricity prices are high to hours when the PV energy is being generated, thereby achieving electricity bill savings. Going one step further, the aggregation and coordination of several PV installations in the shape of a Virtual Power Plant (VPP) with the accurate forecasting of global production facilitates its integration into the network [
2]. Consequently, the increasing PV penetration can lead to the increasing aggregation of PV systems into VPPs. However, these new business models are difficult to implement due to the previously mentioned regulatory constraints.
Power forecasting along with load demand and energy prices, for different time horizons and resolutions, are factored into the equation. For VPPs, spatial horizons should also be considered. Forecasting methods can be classified according to different factors, such as: the forecasted parameter (irradiance or power), the time horizon and resolution, the lead time, the model approach, and the nature of the forecasting statistic. Regarding the forecasted parameter, two different alternatives exist: direct [
3] and indirect [
4]. The direct method predicts the solar power through historical datasets of PV power generation and weather conditions. Indirect forecasting differs from the direct method in that it firstly predicts the solar irradiance and then, the solar power is calculated by using a performance model of the PV plant. As far as the time horizon is concerned, four categories can be found [
5,
6]: nowcasting (from 1 min to several minutes) which is used for real-time optimization in Energy Management Systems (EMSs); short-term forecast (from 1 h to several hours) used for intra-day market participation and for day-ahead operation optimization; medium term forecast (from 1 month to 1 year); and long-term forecast (up to several years). Time resolutions may range from 1 minute for real-time market operations, and 15-minute periods for load-shifting strategies and for optimizing Battery Energy Storage Systems (BESSs), to 1 hour for longer time horizons used by consumption monitoring, and a 1-week resolution for 1-year time horizons which can be used to identify consumption trends [
7]. The lead time can be defined as the time difference between the instant when the forecast is launched and the occurrence of the forecasted value, considering the forecast horizon as the maximum forecast lead time. Forecast errors increase with forecast lead time due to the atmospheric motion. As for the model, the optimal method for solar irradiance prediction depends on the forecast lead time [
8]. In this regard, four approaches have been widely used [
9]: (a) time-series-based statistical models whose aim is to identify patterns between historical datasets and the output parameters; (b) machine learning (ML) models mainly based on artificial neural networks (ANNs), which use historical datasets to learn the dependency between the past and the future; (c) physical strategies which utilize Numerical Weather Prediction (NWP) and PV models for solar power forecasting; and (d) hybrid models which explore different algorithm combinations with the aim of improving forecast accuracy and reducing the computational burden of online forecasting applications [
5]. The objective of all the models is to improve forecasting accuracy by minimizing some quality metrics, usually the sum of squared errors. The existence of different models raises the question of whether one method is better than the others. This is particularly true for statistical and ML models. Some studies conclude that statistical models outperform ML models [
10] while others state the opposite [
11,
12]. However, this interpretation may appear to be fairly simplistic without taking into account the dataset size [
13], the variable being forecast [
14], the time horizon [
15], or the computational load [
16]. Although historically, the forecasts have been dominated by statistical methods, over the last decade there has been a significant shift toward ML strategies [
17]. This comparative study is beyond the scope of the paper.
Regardless of the method used, the existence of forecasting errors poses a major challenge in optimizing the PV plant operation. While minor forecasting errors may not adversely affect the PV plant operation, larger errors can produce negative effects in the optimization models. Uncertainties hinder the performance in terms of accurately assessing the variables during the PV plant scheduling and operation. Forecast uncertainty quantification is, therefore, crucial. For this reason, considering the prediction intervals, which account for the uncertainty, provides additional accurate information about the expected values in terms of the range of plausible values and the probability assigned to each of them [
17,
18]. Another solution to the problem involves the aggregation of several PV sites for a unique forecasting strategy, since the error is significantly reduced as the number of installations increases. To prove this, in [
19] the authors present an approach to forecast the PV power from irradiance prediction maps, obtaining the power forecast of 200 sites located in Germany. Results show that the error is reduced from a Root Mean Square Error (RMSE) of
for single sites, to
for an area of 220 km × 220 km with multiple sites. The distance among sites is also an important factor which influences accuracy, since the error is significantly reduced when the distance between facilities increases. This strategy provides a powerful solution in the context of VPPs, since multiple systems or nodes are controlled, managing Distributed Generation (DG) units, Energy Storage Systems (ESSs), flexible loads and Information and Communication Technologies (ICTs) [
20]. Regarding the types of DG units, PV systems can be considered as the easiest and most cost-effective Renewable Energy Sources (RESs) to exploit, mainly for households, where it is possible to turn PV installations into flexible VPP nodes [
21].
Finally, as stated above, for indirect forecasting approaches, performance models of PV systems are required to obtain the prediction of solar power generation. To this end, a strategy that works under arbitrary conditions of irradiance and temperature must be adopted. Methods that exhibit these key characteristics are the Osterwald’s method [
22], which stands out by its simplicity, or similar studies from the literature that improve the performance of the Osterwald’s method by adjusting the results under low irradiance levels [
23,
24]. When the operating point of the PV panels is known, alternative methods, such as those reported in [
25,
26], can improve accuracy, while other research uses parametrization models to simplify the process [
27]. Sometimes, the irradiance of the site is measured on a horizontal plane, obtaining the Global Horizontal Irradiance (GHI). However, the panels are on a different plane. This is typical in satellite measurements but can also be the case in installations with multiple Maximum Power Point Trackers (MPPT) or PV panels with axis trackers. To solve this problem, a conversion process is needed, using: (i) different expressions to tackle the problem step-by-step by separating the global components into direct irradiance, diffuse irradiance, and albedo, modifying the angle of these components to obtain the global irradiance on the plane of the panel, estimating its losses to obtain the effective irradiance, or (ii) an approach that simplifies the process [
28]. In this regard, it becomes crucial to reduce the complexity and the computational burden placed on the forecasting algorithms. With this in mind, this work makes use of the Osterwald’s method to calculate the PV power, since low irradiance values (
W/m
2) are barely existent in the dataset and a generalization of the algorithm for VPP environments leads to better results. Satellite data are also required in this work since they offer information on the GHI, which is converted into irradiance on the tilted plane by following the steps stated above.
The forecasting strategy developed in this paper, uses long short-term memory recurrent neural networks (LSTM-RNNs) and is based on an indirect approach in which the irradiance is forecasted first and the output power is calculated by using the PV model. LSTM-RNNs have been used in several works, achieving satisfactory results on account of their recurrent architecture, which includes memory units [
16]. These allow the ANN to identify temporal patterns from the historical data of the forecast variable, thereby reducing the forecast error in comparison to other alternatives. The authors in [
29] propose a PV power forecasting strategy based on LSTM-RNN which is compared with other methods without memory units, showing their limitations in terms of not being able to model the dynamics of the PV output power data. In [
30] a LSTM-RNN with only exogenous inputs, e.g., dry bulb and wet bulb temperatures, and relative humidity, is used to forecast the day-ahead solar irradiance.
The main contributions of this paper are summarized as follows: (i) the PV forecasting method is applied to a VPP environment to reduce the forecasting error, which is modelled as a function of two well-defined parameters called lead time and launch time; (ii) prediction intervals are used to model the forecast uncertainty as a function of not only the lead time and the launch time, but also the Cloud Cover Factor (CCF), which allows the type of day to be identified; (iii) the input data for the forecasting strategy are derived from free-of-charge open-access data sources, offering a viable and cost-effective solution; and (iv) a trade-off between accuracy and computational burden facilitates the application of multiple PV power forecasts at different locations, within the context of a VPP.
The remainder of this paper is organized as follows:
Section 2 introduces the framework for the intra-day power forecasting strategy; the experimental results are presented in
Section 3; and finally, some conclusions are drawn in
Section 4.
3. Results
This section presents the results obtained by the proposed intra-day forecasting strategy for VPP, which is divided into different steps: (a) GHI forecasting for a real VPP node and for an emulated VPP; (b) PV power estimation from the GHI forecasting output; (c) the quantitative assessment of prediction intervals; and (d) VPP scheduling. Firstly, the results are validated for a real PV installation, which plays the role of a VPP node. The PV installation is located in the Polytechnic School, at the University of Alcala (Madrid). Secondly, the strategy is developed for an emulated VPP, by using several ground-based meteorological stations uniformly spread over the Community of Madrid [
33]. In order to evaluate the effectiveness of the model, a performance comparison in terms of accuracy/error, with respect to other methods proposed in literature, is also performed.
3.1. LSTM-RNN-Based GHI Forecasting for a Real VPP Node
The LSTM-RNN-based GHI forecasting for the real VPP node is performed by using measurements of irradiance taken in the PV facility located at the Polytechnic School of the University of Alcala (Spain). The initial training dataset is based on a 5-year period of irradiance values obtained from the CAMS dataset, since RNNs require a large amount of data for the learning process and GHI measurements are scarce in new installations. However, the test dataset is based on real measurements taken during the period from 1 June 2020 to 31 May 2021. Therefore, a whole year of real GHI values under different seasonal weather conditions are used to assess the accuracy of the forecasting approach. With a resolution of 15 min, the forecasting process starts at sunrise and ends at sunset. Furthermore, a new prediction is launched every 15 min and the dataset of irradiance is then updated, which ensures the accuracy of the results obtained. The network is trained with new measurements every day, during the night, to yield the best results. The GHI forecasts are given as a function of both the launch time and the lead time, parameters which are further defined, with the aim of computing the prediction intervals.
As far as the error assessment is concerned, this work relies on two types of metrics: (i) scale-dependent metrics such as the MAE and the Root Mean Square Error (RMSE); (ii) percentage-error metrics, such as the relative Mean Absolute Error (rMAE); and (iii) the relative Root Mean Square Error (rRMSE). Absolute values provide information about the average forecasting whereas the quadratic values are more sensitive to outliers, the combined analysis of the two allows for a thorough study of the results.. Error percentage values, on the other hand, provide an intuitive understanding of the error committed, which allows for a fair comparison to be conducted since the dependence on the magnitude is removed. However, when these values are near zero, scale-dependent metrics constitute the preferred option. The error metrics are summarized in
Table 2, where
is the measured data at time
,
is the forecast value at time
, and
is the length of the time series used to assess the accuracy of the algorithm.
The value of
denotes GHI at a specific hour of the day,
, and
is the prediction of
at
. The initial time,
, is fixed for each day and corresponds to the sunrise. To assess the error, two parameters are defined: lead time and launch time. Lead time corresponds to
and is the difference between the time instant of the prediction and the moment when the prediction is launched. Launch time, on the other hand, is denoted by
and is the difference between the current time and sunrise. Launch and lead time for the predictions of a particular day are better explained in
Figure 6. When the launch time is fixed and the lead time is used as a parameter, a vector of predictions is obtained. However, when both parameters are set to a value, a single point forecast is obtained (red diamond in
Figure 6).
The 3D plot in
Figure 7 depicts the errors as a function of the lead time and the launch time which leads to the following conclusions. Firstly, for the scaled error, a high error rate is observed for short launch times under medium lead times. It is expected that the scaled error is large under the previous conditions since the radiation is high. However, as the launch time increases, this error significantly decreases. Secondly, it was clear that the lower the radiation, the smaller the scaled error; however, for percentage errors, the opposite is the case; when the launch time is small (less than 1 h), the percentage error is high, irrespective of the lead time. These plots give some insight into the prediction behavior and become particularly useful in enhancing confidence in the prediction with respect to other forecasting techniques. In this particular case, the intra-daily prediction is used when the mean error is smaller than the day-ahead prediction [
37]. Finally, prediction intervals are derived from the MAE, assuming a particular distribution and splitting the predictions into groups as a function of the lead time, the launch time and the type of day, being very useful when a high degree of accuracy is required for the prediction.
Finally, the predictions obtained by the LSTM-RNN used in this work are compared with those available in the literature, which are depicted in
Table 3. It is worth noting that this comparative analysis should not be strictly considered, since each dataset can have a relative influence on the performance. Nevertheless, some preliminary conclusions can be drawn from the study. Firstly, taking into account other widely used techniques from [
45], the forecast error obtained in this work, in terms of the rMAE, is much smaller under short lead times (15 min), increasing until a similar value of the error is obtained under large lead times (6 h). A good performance under small forecast horizons is also obtained when comparing the results with [
46] for a statistical AutoRegressive Integrated Moving Average (ARIMA) model, in terms of the MAE, obtaining a similar error to that of traditional RNNs, and a higher error with respect to a similar LSTM-based approach presented in [
46], despite considering other inputs highly correlated with the irradiance. Finally, comparing the strategy presented in this paper with respect to the deep learning techniques (GRU, LSTM-RNN, and CNN-LSTM) from [
39,
47,
48,
49], a similar performance can be observed. To conclude, for small lead times, the forecasting approach introduced in this paper yields better results than those obtained by traditional methods. However, the forecasting error of the proposed LSTM-RNN-based method increases for higher lead times, until a similar performance is obtained with respect to the traditional methods compared from the literature. It is also observed that an increase in the number of inputs seems to slightly improve the performance of the forecast approach. Adding exogenous inputs to the forecast process is an alternative which is often used by researchers but negatively affects the performance when those resources are not available.
3.2. PV Power Estimation from the Forecasted GHI
The following step consists of estimating the power delivered by the PV modules from the GHI forecasts. To this end, the following parameters are required: (i) the prediction time instant; (ii) the site location in terms of latitude, longitude, and altitude; (iii) the installation characteristics, which include the orientation and inclination of the panels, rated parameters of the PV models available in datasheets, and losses associated with each part of the installation; and (iv) the ambient temperature, obtained from NWP maps. As stated above, analytical techniques exist to achieve this goal and, as a result, it is possible to quantify the error committed in the procedure.
This section focuses on two different approaches Firstly, real measurements of PV power are compared against the estimated values of PV power obtained from real measurements of GHI at the site. Secondly, the PV power is estimated from the forecasted values of GHI, evaluating the errors associated with the whole process. The GHI conversion searches for a reduced value of the error to maintain a similar performance to that obtained in the previous section, using the errors to construct the prediction intervals (
Section 3.3).
Figure 8 depicts the comparison between the measured values of PV power at the site with respect to the PV power estimation obtained from real GHI measurements taken at the site. Three types of days have been selected: a cloudy day, an overcast day and a sunny day. The
x axis is expressed in solar time. It is worth noting that the experimental setup at the site location has a building near the PV panels that generates partial shadows on some of them, starting from 16:36 and continuing until sunset. This event is also modelled in Equation (2), assuming a linear variation of this effect with respect to time (in
Figure 8 at 16:36, decreasing until
at sunset), and it also varies depending on the season of the year. Results show a reduced value for the error similar to that reported in other works [
28], obtaining an
for sunny days, an
for partially cloudy days, and an increased value of
for overcast days. In terms of the squared error, values range from
on sunny days and
on partially cloudy days, to
for overcast days. The transient characteristic of the inverter MPPT controller reveals that, in the presence of passing clouds, the inverter operating point becomes unstable. This is the reason why the error increases on these days. However, this does not pose any problem for the forecasting process since the time interval is 15 min, which considerably mitigates this negative effect.
Finally,
Figure 9 depicts the forecast error in terms of the difference between the measured and estimated PV power as a function of the lead time and the launch time. The shapes of the figures are similar to the previous section, with similar percentage errors. Therefore, from the figure, the same conclusions reached by analyzing
Figure 7 can be drawn: (i) the scaled error is high for short launch times and medium lead times but decreases significantly as the launch time increases; (ii) for launch times of less than an hour the percentage error is high, irrespective of the lead time; and (iii) the percentage error is high at lead times higher than approximately 7 h. The forecast error, which is dependent on the lead time and the launch time, is used to generate the prediction intervals in the following section.
3.3. Prediction Intervals of the Forecasted PV Power
Prediction intervals provide additional information about the plausible range of PV energy that will be generated at the site, for a defined confidence level selected by the user. Prediction intervals also indicate the degree of uncertainty in point forecasts. This could avoid unexpected energy shortages or, by contrast, energy surpluses, which are less critical than the former since the inverter can change its operational point to produce only the energy needed, despite wasting an exploitable energy resource.
In this paper, prediction intervals are obtained based on the work carried out in [
44]. Previous results show how dependent the forecast accuracy is on the lead time and the launch time. This fact is used to split the dataset of predictions and create groups, assuming a specific distribution which is built based on the
MAE. Therefore, each group is defined by selecting a launch time and a lead time, obtaining 365 samples per group, since a whole year is forecasted on this research.
Figure 10 shows different error distributions for launch time values of 2, 4, and 6 h, and lead time values of 1, 2, and 3 h. In all of them, a Laplacian distribution is considered, similar to the work carried out in [
37] but as a function of the CCF. Prediction intervals (
) for each subset can be defined in terms of the
MAE under this assumption: for a Laplacian distribution, a percentile
of probability
has an interval of
.
More detailed distributions can be determined provided that the selected groups are also created as a function of the CCF. However, by considering 10 groups as presented in [
37], the number of samples of each group is not sufficient to create a proper error distribution. To overcome this drawback, the number of CCF groups is reduced to three, using the type of day classification criteria (e.g., sunny, cloudy, and overcast). The CCF parameter has an hourly resolution, its value is 0 when the sun is not covered by clouds and 1 when the sunlight is totally blocked. The type of day is classified evaluating the CCF during the daylight hours, with an hourly weighting of the amount of energy produced during the day. After that, the k-nearest neighbors (k-NN) method is used to form the groups, since it allows the dataset to be split in a simple way, offering an independent solution for each site in the VPP.
The assumption of a Laplacian distribution for each new selected subset carries an error that is necessary to quantify. The Prediction Interval Coverage Probability (
PICP) [
50], in Equation (4), indicates the percentage of predicted values that are inside the interval selected, and it must be close to the confidence level (
). The confidence level selected in this research is
, although this parameter can be modified depending on the operational risks that the site can handle: the higher the risks, the higher the benefits from the installation:
Figure 11 depicts the absolute difference between the confidence level and the
PICP for each type of day, being an effective method when this difference is close to zero. On sunny days, the
PICP is close to the confidence level across the whole area, except for high lead times under small launch times where the difference increases. On cloudy days, the
PICP is quite different from the confidence level during sunset. Nevertheless, the difference is acceptable in the rest of the area. In this case, the forecast has a lesser value during sunset since the energy produced is significantly reduced. Hence, prediction intervals also offer valuable information on cloudy days. Finally, for overcast days, the difference between the
PICP and the confidence level increases with respect to sunny days, but the magnitude is acceptable and the prediction intervals are still valuable. To conclude, there are some zones with a high difference between the
PICP and the confidence level. However, these scenarios correspond to small PV power measurements with bad forecasting performance (
Figure 9). Therefore, prediction intervals are of little value for these points, since the strategy presented in this paper does not focus on those cases.
3.4. Evaluation of the GHI Forecasting for an Emulated VPP
The effectiveness of the whole forecasting process has been demonstrated for a single PV installation, which plays the role of a VPP node, along with its limitations with respect to the launch time and the lead time. The next step consists of assessing the algorithm performance for a set of PV facilities, forming a VPP. There are, however, no additional PV installations available in the study. Therefore, seven ground-based meteorological stations located in the Community of Madrid, apart from the PV facility at the university, are used to emulate the VPP nodes. Their locations are depicted in
Figure 12. These ground-based stations are equipped with GHI sensors which allow the GHI forecasts to be generated. As for the power conversion, the characteristics of the PV installation from the university are used to obtain the power estimation for each emulated VPP node (peak power,
, temperature coefficient,
, and the performance of the equipment).
The same results as those shown in
Figure 9 are used to quantify the accuracy of the prediction. However, in this case, the PV power forecast for each station is individually evaluated and the sum of power forecasts of the stations represents the PV power generated by the VPP, whose forecast error is depicted in
Figure 13. By doing so, the PV power obtained at each station can be compared with respect to the total PV power forecasted. It can be observed that the scaled values of the error (MAE and RMSE) are higher than those in
Figure 9. However, there is an 8-fold increase in the peak power with respect to a single facility. As a result, by looking at the relative values of the error (rMAE and rRMSE) it can be noted that the performance of the prediction increased for the VPP. The accuracy improvement of the PV power forecast can be expressed as the difference between the VPP forecast error and the sum of the error on each installation, dividing that value by the mean error committed on a single installation, obtaining a mean value of
with respect to the MAE, and
with respect to the RMSE. The shapes of the figures lead to identical conclusions to those reached by the analysis in
Figure 9. Therefore, the prediction intervals maintain their potential value for error forecasting in the case of a VPP.
4. Discussion and Conclusions
The technical development of VPPs must be supported by EMSs, for which PV power forecasting is an essential part. By knowing the energy produced by each VPP node, usually based on renewable resources such as solar technologies, it is possible to optimize the expected profit generated by energy exchanges with the grid operator. However, it is difficult to obtain PV power forecasts when it is necessary to gather information from several nodes scattered throughout a wide area, especially when the input data, required for the predictions, incur costs. This research presents a way of accomplishing this objective, using an LSTM-RNN-based strategy to, firstly, forecast the GHI by using a dataset of irradiance values derived from satellite data freely obtained from the CAMS, and secondly, estimate the solar power by utilizing a PV model of the installation. The forecast is updated during the day to achieve the highest accuracy, and prediction intervals are estimated as a function of the MAE. This provides a useful framework to understand the behavior of each installation that composes the VPP.
The first results provided are related to the GHI forecast for the installation and are based on the lead time and the launch time, which allow zones with a reduced error and a high level of confidence to be created in the shape of prediction intervals which depend on the type of day. The GHI error, as a function of the lead time and the launch time, shows a low performance when the launch time is lower than 1.5 h, corresponding to sunrise. To avoid this, the forecasting process can begin at 1.5 h after sunrise; before this time, this research can rely on the day-ahead prediction made in [
37] to obtain the irradiance forecast. To assess the accuracy of the intraday forecast, the results have been compared with those in the literature, achieving similar results to those obtained from deep learning algorithms and outperforming traditional techniques. The distinction between the lead time and the launch time means it possible to create better comparisons with respect to the literature, but also means it is difficult to summarize the research with only one value. The
MAE committed, without considering the lead time and the launch time, is of
, which is coherent with other studies.
Once the irradiance is forecasted, the conversion to PV power is analytically calculated, minimizing the error, which ranges from
to
in terms of the
rMAE and from
to
in terms of the
rRMSE. The error committed in this case is similar to the errors found in other articles [
26,
28]. The shapes of the error matrixes show similar results to those presented above. Therefore, similar conclusions can be drawn. The global
MAE committed in this case is
in a PV facility of
.
Prediction intervals are selected once the PV power forecast is available, which allow a range of plausible values of point forecasts to be obtained. The method considers a Laplacian distribution of the error and distinguishes between the lead time, the launch time and the type of day, which is selected with a k-NN algorithm as a function of the CCF. To verify whether the boundaries maintain the associated level of confidence, the PICP is calculated, obtaining values close to the selected confidence level of . In this case, results reveal a noticeable difference between the PICP and the confidence level on cloudy days close to sunset. However, the predictions at those hours have minor importance. It can be concluded that the selected prediction intervals are of great relevance.
Finally, the PV power forecast is created, and the prediction intervals are selected for the PV facility so that conclusions under a VPP environment can be drawn. In this case, a real PV facility and seven ground-based weather stations in the Community of Madrid are selected to emulate the VPP, obtaining an improvement in the accuracy of with respect to the MAE, and with respect to the RMSE. Similar conclusions can be reached regarding the error as a function of the lead time and the launch time. Therefore, the whole strategy can be applied under different scenarios for launch times higher than 1.5 hours, relying on the day-ahead prediction prior to this. For this case, the error matrixes also indicate the best moments to obtain the predictions of the nodes, making it possible to increase the reliability of the VPP operation.
The major limitation of this study is related to the information of temperature and cloudiness freely obtained in Spain from NWP maps. In locations where this information is not available forecasts cannot be provided. Future works will focus on the application of this strategy along with a day-ahead time horizon strategy to schedule the operation of a VPP, creating a software that simplifies the process.