2.1. Methods for Predicting Solar Irradiation
In order to have long sequences of solar irradiation, here are lots of methods for obtaining solar irradiation at different time scales. The two main scales where most authors have presented methods or methodologies for generating what is named “synthetic solar irradiation data” are daily scale and hourly scale.
In all of these methods, the underlying idea can be summarized as starting from an exhaustive statistical study of the historical records of the locality or localities for which solar data are available, to later propose a mathematical model of generation solar. In this statistical study of the data, at least the following two types of characteristics must be included:
Independent characteristics of solar irradiation time, such as means (both monthly and annual), variances, or standard deviations, etc.
Time-dependent or sequential characteristics of solar irradiation: mainly partial and total autocorrelation functions.
Once these parameters are known, the next step is the proposal of a mathematical model that generates synthetic irradiation series that are equivalent to the real series, in the sense that the aforementioned statistical parameters must be similar (the closer the better) to the values of the real series, within certain reliability margins.
Hereafter, a review of the methods of generating series on a daily scale is firstly presented and then those of the hourly scales.
One of the pioneering works in the field of daily series was due to Klein [
32]. This researcher made use of the fact that most of the seasonal variations of the global daily irradiation were due to variations in the extraterrestrial or extra atmospheric irradiation (the one that affects the upper layers of the Earth, without having managed to cross the atmosphere), and these seasonal variations can be eliminated while using the K
T clarity index (quotient between global irradiation and extraterrestrial irradiation) as a variable. In this way, the variable to be modeled was not the global irradiation itself but the index of clarity. However, many other researchers began studying the global irradiation itself.
Thus, Brinkworth carried out another of the first works [
33] while using an autoregressive model with moving average (ARMA: AutoRegressive Moving Average) applied directly on the daily global irradiation data. Paasen [
34] modeled the daily irradiation sequences in the Netherlands, while using a modified irradiation variable. Exell [
35] and Vergara-Domínguez et al. [
36] made use of a new variable, called clear sky irradiation, which is similar to the clarity index. However, none of these authors incorporated in their study the analysis of the distribution of the data obtained by means of the distribution function. In this sense, Amato et al. [
37] included the distribution function of the daily global irradiation series, but, nevertheless, the proposed model was only applicable to the locality under study, that is, it was not of universal application.
Although the global irradiation distribution function will locally depend on where the irradiation comes from, Liu and Jordan [
38] showed that, for the case of the distribution functions of the daily clarity index, they are universal. In addition, these functions are non-Gaussian, dependent on the monthly clarity index, and therefore monthly variables. Dagelman carried out a work that already included the universal distribution functions of Liu and Jordan [
39], who proposed a method for generating the daily clarity indices in a random way from the distribution curves of Liu and Jordan.
Also important are the works of Boileau [
40], based on developments in Fourier series, and those of Bartoli et al. [
41] also focused on Fourier series and Markov processes. However, the most widespread are those that were proposed by Graham and Hollands [
42] based on Gaussian inversion techniques and those of Aguiar and Collares-Pereira [
43] that makes use of Markov Transition Matrices. These last two works are currently considered the best in this field, and they are usually used as a basis to generate artificial sequences of solar irradiation with great rigor. In the case of Graham and Hollands, the study was conducted with Canadian localities from different climates, and in the case of Aguiar, the locations used were from several countries, from Portugal to Macao (China).
Regarding hourly generation methods, one of the pioneering works based on ARMA processes was due to Goh and Tan [
44] for data from Singapore. Mustacchi et al. [
45], studying about twenty Italian localities, used Markov Transition Matrices to simulate the stochastic processes that were implicit in the real time solar irradiation series. A method that was based on spectral techniques was presented by Balouktsis and Tsalides [
46] for data from Athens. The Spanish researchers Llanos Mora and Mariano Sidrach [
47] present a model that was based on multiplicative ARMA processes, while using data from Spanish localities, while Palomo [
48], also for Spanish localities, uses Markov transition matrices.
However, once again, the methods that were used as a paradigm in this field are again those that were proposed by Graham and Hollands [
49] and by Aguiar and Collares-Pereira [
50]. The method presented by Graham and Hollands makes use of ARMA and Gaussian investment processes, being practically a continuation of the work presented for the generation of daily series. However, the work of Aguiar and Collares-Pereira is quite different from the one that they proposed for daily series, since they do not use Markov matrices, but in this case they start by making a very exhaustive study of the data that they have available, discovering certain properties that they try to implement in their new method. This new method is called the Gaussian autoregressive time-dependent model (“TAG: Time dependent Autoregressive Gaussian model”), and the results that it produces are very satisfactorily adjusted to the real hourly solar irradiation values.
It is very important to try to generate solar irradiation data at lower time scales of the hour, as, for instance, in the application of photovoltaic design for smart grids, the data provided by the grid are obtained for minutes or even less. Nevertheless, for lower time scales (less than an hour), there are few works [
30,
31,
32]. A bibliographic search has been done and it can be summarized as the most relevant.
In 2009, Reikard [
51] presented a work in which he analyzes different methods to make predictions about the behavior of solar irradiation in two possible time scales: (a) time slots (intervals of 1 h, 2 h, 3 h, and 4 h) and (b) minutes (intervals of 5 min, 15 min, 30 min, and 60 min). It is interesting to note that, already in this work, authors present a study that goes down to the scale of minutes to predict solar irradiation. Of the six methods analyzed, it concludes that for hour forecasts the best method is one that is based on ARIMA (AutoRegressive Integrated Moving Average) methods, methods that are fully contrasted by many other researchers. For periods of minutes, the ARIMA method is still almost the best, although it is slightly surpassed with a methodology that is based on Neural Networks, especially for periods of 5 min, although when the interval is greater, it continues to dominate the ARIMA method.
Additionally, Barbieri et al. [
52], presented a work in which on the one hand explains how to make the possible prediction of the PV power of a PV system from solar irradiation prediction methods (also for wind) in the very short term and by another side makes an important revision of solar irradiation methods. In this case, he concludes that the most reliable methods are those that are based on neural networks to predict solar irradiation. Despite all of these researchers insisting on the difficulty of predicting the output of a PV system in the short term due to the difficulty of the previous prediction in solar irradiation, the methodology proposed could be very interesting to reproduce in our characterization.
Rahmann et al. [
53] propose a method of control strategy to reduce the impacts on long-term PV plants, when fluctuations occur in the input (solar irradiation). It proposes methods for the prediction of solar irradiation on the daily scale while using finally three typical days: sunny (clear), partly cloudy, and totally cloudy and the methodology of Neural Networks.
Finally, it is interesting to highlight the work, as previously mentioned, done by Mora et al. [
47], in which they propose an ARMA method for the generation of solar irradiation data at a time scale. It is one of the methods that gives better results, so its reproducibility at a minor time scale of an hour can be analyzed.
The first conclusion that can be drawn from the literature review is that there are methods for the generation of solar irradiation at a time scale that are highly contrasted and mainly based on two major types, those that are based on classical methods, such as ARIMA and ARMA, and those that are based on newer methods, such as artificial neural networks. However, there are not many methods for predicting solar irradiation at a minor time scale of an hour. Nowadays, studies and the proposal of methods to generate solar irradiation series are increasing. These methods are usually quite complex and very local, since the variability that solar irradiation presents at minor time scale of an hour is complex and difficult to determine. This is where there is a wide study field that authors try to cover.
2.2. Database of Solar Irradiation in Jaén
The initial material available is a database of solar irradiation available in Jaén. (Spain, latitude 37.73° N, 3.73° W), where we have our research laboratory and a solar irradiation data base of many years. The accessible data are values of solar irradiation at a minor time scale of an hour (values every 10 min). Our research group has been recording solar irradiation data in the city of Jaén from 1996 to 2003 and from 2005 to 2011, in a time scale of 10 min.
Subsequently, these data have been sorted and filtered in order to eliminate errors. The data measured and provided by the University of Jaén during the study period (from 1996 to 2011, excluding the year 2004) in some cases they were null, which will be called erroneous. For this reason, a filtering of said values has been carried out in order to obtain a more complete database.
For this, a previous study of ordering and detection of errors was carried out.
Table 1 shows the errors that were detected and the percentage of final error.
Finally, a database of more than 15 years has been obtained, with data at minor time scale of an hour: approximately 788,400 data.
This database was already characterized and the Typical Meteorological Year (TMY) [
54] was obtained. In next two sections, we explain the characterization done and the calculation of the TMY.
2.4. Calculation of the Typical Meteorological Year (TMY) for Jaén
Finally, the Typical Meteorological Year (TMY) has been calculated to complete the characterization of solar irradiation in Jaén. The main function of the TMY is to be able to perform simulations in the case of not having a method to generate solar irradiation series, since the TMY works the same as any series generated. A TMY internally conserves the parameters and fundamental characteristics of the solar irradiation of a certain place.
By definition, a TMY is one that collects the different hourly values of global horizontal irradiation and ambient temperature obtained over a hypothetical year constituted by a succession of twelve months belonging to a set of real years. These twelve months are chosen, so that the TMY represents with reliability the meteorological characteristics of the place in question.
TMY is only available for very few locations, even it is difficult to obtain the hourly values of horizontal irradiation and room temperature for most places.
For the construction of the TMY, different base periods can be used, although it is convenient that this period is the month, that is, it is used, for each generic month that will make up the TMY, all of the data of a single month of the locality in question. Thus, the TMY will represent both the variation of monthly averages throughout the year and the distribution of daily and hourly values within each month.
If the irradiation data of a single year were chosen as a typical year, this would not take into account neither the distribution nor the sequences of the irradiation in this period; otherwise, if they were chosen, for each generic day of a typical year, the data of the actual days would have a succession of days of almost uniform clarity index.
Two different criteria have been used for the selection of the months that will constitute the TMY of the town of Jaén under study:
Criterion I: Criterion of the monthly average values of daily irradiation.
Based on finding a month whose average daily irradiation value is as close as possible to the average irradiation value of the same month of all years.
Criterion II: Criteria for the monthly distribution of values of the clarity index.
For this criterion, a similar study is carried out but with the monthly distribution of the values of the clarity index. In this case, to achieve the appropriate adjustment, a test of goodness of fit must be used; in this case, it has been done following the Kolmogorov–Smirnov test [
55]. This test involves the examination of a random sample (that will have some unknown distribution) versus a known distribution function.
The Kolmogorov–Smirnov test for a sample is a “goodness of fit” procedure, which allows for measuring the degree of agreement between the distribution of a data set and a specific theoretical distribution. Its objective is to indicate whether the data come from a population that has the specified theoretical distribution, that is, it contrasts if the observations could reasonably come from the specified distribution.
The Kolmogorov–Smirnov test has been used to locate the degree of similarity that exists between the distribution function in a month and that of the generic distribution function.