1. Introduction
This paper presents and derives an analytical expression for the stability distribution of a random variable that follows the general Edgeworth–Sargan (ES) type of probability distribution. The distribution possesses several interesting properties and is able to capture departures from Gaussianity of various kinds, notably asymmetry and thick tails, i.e., skewness and kurtosis. The paper shows that the final expression is a weighted sum of Chi-squared distributions with increasing degrees of freedom, starting with the standard Chi-squared if the variables followed a Gaussian distribution. Thus, it departs from the standard stability test, and the usual upper tail probability intervals implemented in stability testing will be less restrictive. This exact analytic result is convenient, since then probability intervals can be easily calculated from the probabilities for Chi-squared distribution tables, which are readily available.
The test is applied to the analysis of solar irradiance data provided by the PV-GIS European database [
1]. First, a distribution of this type is fitted to the data. Second, the stability test is implemented to check eventual disruptions over the period considered—weekly data spanning the period 2005–2016.
The distribution was formally introduced by Edgeworth more than a century ago [
2], but it was Sargan who introduced it into the wide econometrics and statistical theoretical and applied fields [
3,
4]. A further significant contribution was provided by the work of Gallant [
5,
6], who suggested a transformation to avoid some potential negativity problems. This transformation, nevertheless, although theoretically better and sufficiently general, is challenging to implement and has not shown a clear advantage over the first and simpler version.
An early empirical application of this distribution is [
7]. Since then, it has slowly made its way into the applied financial field, with significant contributions being made by [
8,
9,
10]. In particular, ref. [
11] provides a generalisation to a multivariate setting, accounting for crossed moments beyond covariances, like co-skewness and co-kurtosis [
12]. The distribution has been compared to alternatives [
13,
14], notably the Student’s t [
15] and its asymmetric generalisation [
16]. In general, and although from a theoretical point of view this distribution can also capture some non-normal anomalies, in practice it has failed to show its superiority, and it is far more challenging to obtain generalisations and derive convenient results given its high nonlinearity. Other popular alternatives include the multivariate skew-normal [
17], although this distribution cannot account for any other non-standard feature beyond asymmetry. However, beyond the field of applied finance, the ES distribution is largely unknown. A secondary purpose of this research is, therefore, to show its applicability in the burgeoning field of solar irradiance statistical characterisation.
The following section presents the formal derivation in a simplified case. Use is made of a significant number of supporting and related results, discussed in several appendices in order to clarify the derivation.
Section 3 presents the results of estimating the ES distribution for the data analysed and implements the test for several configurations. A discussion on possible alternative distributions and their empirical feasibility is conducted in
Section 4, and
Section 4 summarises the main contributions of the paper, suggesting immediate avenues for future research. Several appendices present and derive results required in the main derivation. In particular, a generalisation of the primary result of
Section 2 is presented in
Appendix C.1, and
Appendix C.2 summarises a list of results required in several steps of the derivation for easy reference. Finally,
Appendix D reports additional empirical results and provides a detailed description and reference for the data analysed.
2. A Simplified Case
This section is devoted to deriving the proposed test in a simplified case that, nevertheless, involves the main steps and helps clarify the derivation. A general case will be dealt with in
Appendix C. Let us start by considering the distribution of a random variable,
, with probability density function (p.d.f.) given by:
where,
is the p.d.f. of a
, the
are some constants, and the
are Hermite polynomials of orders 2 and 3, respectively—see
Appendix B.3. This can be conveniently rewritten as follows:
These transformed coefficients must fulfil the following properties: (1)
, so that the probability integral is one, and (2)
, for the mean to be zero. It is straightforward to check that the p.d.f. in (1) meets both conditions—see
Appendix B.3. Note, also, that
, in general, although transforming to unitary variance is immediate. It will be assumed from now on that this correction has been implemented and denotes the new variable,
. It may be helpful before proceeding to gather the assumptions implied in what follows: (a)
so that the probability integral is one; (b)
so that the variance is one; (c)
so that the mean is zero; (d)
is the p.d.f. of a standard
; (e) the
are i.i.d., i.e., independently and identically distributed. These assumptions are implied in the specification of (1), (2).
Consider now the joint distribution of the vector of independent variates
given by:
It is now convenient to implement the polar coordinate transform, i.e.,
and
. The joint p.d.f. of the transformed variables will be trivially given by:
where
is the absolute value of the relevant Jacobian—see
Appendix A.2 for a complete derivation. Note that
in this context is just a shorthand for
so that it is not an independent variate. An explicit form for this p.d.f. is given by:
where
is the p.d.f. of a Chi-squared distribution with two degrees of freedom and
is the marginal density of
; see
Appendix C for a detailed derivation of the general case. Let us suppose now that this p.d.f., i.e., the parameters
, have been estimated over a given sample
. For forecasting purposes, a stability test is in order. Consider, then, the simple case where two additional observations are available to conduct the test,
. Note that when
in (3),
is distributed as a standardised normal, i.e.,
and therefore, the standard forecasting stability test follows a Chi-squared distribution with two degrees of freedom, i.e.,
. By analogy to the standard normal case, a convenient statistic to conduct the test would also be
. The p.d.f. of
can be obtained immediately now as the marginal of the joint distribution of
, i.e.:
where use is made of (A14)–(A16) whereby terms involving odd powers of
are zero, and cross-products moments are equal to the product of the individual moments. Solving now for the moments of
as given in (A16), yields:
Finally, applying the result in (A19) yields the explicit p.d.f. sought after as:
where
, because
, and
.
Using the operator defined in
Appendix B.4, this last expression (9) can be written more compactly as:
With:
which can also be written as
, with
. This is a slightly more general, but entirely equivalent notation, since odd terms in (10) vanish, given that all odd moments of
are zero. It is of interest to note, as well, that the distribution can be written as:
i.e., a weighted sum of Chi-squared distributions of degree 2 and above, where:
as it should, for (12) to integrate to one and therefore be a proper p.d.f. The cumulative probability function required to establish probability confidence intervals is immediately given finally as:
A generalisation of this result to more complex and realistic cases, along with some computational considerations and additional results, is left to
Appendix C.
3. Empirical Results
The ES p.d.f. and related proposed distributions addressing an assorted array of issues have been implemented almost exclusively in applied financial analysis; see the introduction for a summary survey. However, it can also be applied in many other settings, including the study of meteorological and, in particular, solar radiation data. This is a promising field of research, given the urgency to tackle the climate change threat by deploying a whole array of renewable energy technologies, particularly solar photovoltaic (PV), given its impressive and sustained cost decreases since it was commercially introduced at the beginning of the 1980s. It is convenient to clarify at the outset that stability, or its lack thereof, is a property of a model. In order to check empirically whether a model is stable, appropriate statistical tests are required. In the present case, the focus is on the p.d.f. of the errors of a series, once the annual cycle has been removed. The test proposed here is completely general, but nevertheless can be applied to the residuals of any given model, e.g., a standard linear dynamic model, possibly estimated by an ordinary least-squares procedure. Note, also, that what is being tested is the stability of the underlying model.
The primary data set analysed has been the radiation database PVGIS-SARAH provided by the EU [
1]. For technical details and other discussions related to its applicability, see [
18,
19]. The starting point was hourly data for the period 2005–2016 (both end-years inclusive), and PV power generation in central Spain; see
Appendix D for details. Weekly observations were calculated from the hourly data, yielding a series with 624 data points—again, see
Appendix D for details.
The main series considered, the weekly generation of PV power,
, exhibits a substantial cycle over the year. The first step is removing it and obtaining a ‘de-cycled’ series, as explained next. The weekly average over the years is denoted as:
where the subindices,
i,
t, refer respectively to a given year,
, and the week within that year,
One straightforward and accurate way to define the cycle is given by:
i.e., a weighted sum given by the moving average using appropriately selected weights; see, e.g., [
20] for a related discussion on alternative patterns and their optimality. Note that this weighted sum can be understood in the framework of a ‘circular’ time series, and hence, there are no gaps at both extremes: i.e.,
,
, and similarly for other periods. The ‘de-cycled’ observation,
, is now straightforwardly given as:
The cycle as it has been calculated and its fitting accuracy are analysed graphically in
Figure 1—kWh is kilowatt hour, i.e., one thousand watts per hour of energy generated, electric power in this case; the specific details and references for the series analysed are further considered in
Appendix D; appropriate literature references are [
1,
18,
19].
Eventual remaining structures in the series have been considered by means of regression analysis, and no significant dynamic relationship has been detected. The residuals do not show clear signs of heteroskedasticity of any kind, although the normality hypothesis is strongly rejected: a relevant test yields a value of
, strongly rejecting the null, and thereby suggesting that a more general p.d.f. is warranted. Therefore, an ES p.d.f. of the type considered in this research has been estimated, yielding the following results for the coefficients associated with the Hermite polynomials of order 3 and 4, respectively:
where the t-ratios are the figures in brackets, and no other polynomial is statistically significant. For these estimates, specific values for the confident intervals of the test presented in (14) in
Section 2 can be derived: accordingly, stability tests for the following periods have been calculated,
, and for the standard confidence intervals
. The values, jointly with the corresponding values of the relevant Chi-squared distribution, are reported in
Table 1.
It is immediately remarkable that the intervals for the ES case are higher in all cases, as the expression for the ES c.d.f. in (12–14) suggests. In the present case, however, the differences are not too large because the original variable does not depart strongly from the normality assumptions, as shown in the estimated coefficients for
that imply skewness and excess kurtosis values of
, respectively—statistically significant, but not large in absolute value. Note that skewness and excess kurtosis, compared to the standard Gaussian, are respectively measured by
. They are also the two most immediate departures from the Gaussianity assumption and what every p.d.f. should purport to capture adequately in empirical distributions. Note, also, that the Bera–Jarque test against non-normality is based precisely on the joint departure from zero of these two values—recent empirical applications of the Bera–Jarque test can be seen in, e.g., [
21,
22].
For the estimated distribution and the values in
Table 1, the results for the five weeks’ stability test over the whole sample are displayed in
Figure 2—note that 624/5 = 124 period tests, plus a remainder of 4 weeks. In this case, the general conclusion would be that the distribution is acceptably stable over the whole period analysed, save a few localised exceptions. Nevertheless, this does not imply that data at other frequencies, e.g., daily and even hourly, exhibit the same stability over time, in the same or different historical dates. The results for the remaining stability periods considered, 2, 10, and 20 weeks ahead, are reported in
Appendix D.
In this context, it is also worth considering that equivalent results could be produced from computer-generated pseudo-random numbers for the ES distribution. A random number from the ES p.d.f., similarly to any other p.d.f. for that matter, can be derived from the following expression:
where
,
are respectively the p.d.f. and c.d.f. of a Gaussian (0,1) p.d.f.,
is the c.d.f. of a uniform p.d.f. over the (0,1) interval, and ‘
y’ a pseudo-random number generated with a suitable algorithm, like, e.g., [
23]. Solving this expression for ‘
x’ yields a random number that follows precisely that distribution, i.e., the ES with Hermite polynomials
and their associated coefficients
; see, e.g., [
24]. Solving this highly non-linear equation for a large number of random values is computationally demanding since it involves the inverse of
. There are available computational approximations [
25], although there may be workarounds, e.g., generating and storing in a first step values for the ES c.d.f. Nevertheless, although generally, it is much easier and exact to derive the exact values of the test as given in (13) in
Section 2, the random numbers derived using (20) may be helpful in specific cases, and even to provide an independent check for the analytical results.