1. Introduction
The skew-normal distribution was introduced in [
1] and the skew-Student in [
2]. These two distributions share the property that they may be derived formally. There are several methods of derivation of which probably the best known is to consider the bivariate normal distribution of
X and
Y each with zero mean, unit variance, and correlation
. The skew-normal distribution then arises by then considering the distribution of
X conditional on
[or
]. A second method of construction is to consider a random variable
where
U has a standard normal distribution truncated from below at zero, written
, where
denotes a normally distributed variable with mean
and standard deviation
truncated from below at
x and
. There are similar and equally well-known constructions for the skew-Student and for the multivariate versions of these distributions. That the conditioning variable
Y is required to be less than (greater than) zero and that
U follows a standard normal distribution truncated from below at zero are, however, limitations. This is for four principal reasons. First, using negative (positive) values of
Y to determine whether or not
X is observed is self-evidently a limitation. Depending on the application, the appropriate threshold or truncation point for
Y might take any nonzero value, as might the value of the mean of its underlying normal distribution. For example, in his recent paper [
3] refers to early work by [
4]. The latter was concerned with the scores from admission examinations: in such a case, the mean of
Y would surely be greater than zero, as would the truncation point. Similarly, there is often no reason a priori for the underlying mean of
U to be zero.
Second, empirical evidence reported in the financial economics literature suggests that in the absence of truncation from below at zero, the distribution of the unobserved variable
U, denoted
in this paper, exhibits nonzero values of
(see, for example, [
5] or [
6]). In the first method of derivation above, the corresponding conditioning event would that
(
). Such distributions are referred to in the literature (see [
7,
8]), as extended skew-normal or extended skew-Student. The importance of nonzero values of
also arises in stochastic frontier analysis, commonly referred to as SFA. SFA models are used to measure the efficiency of manufacturing companies and organizations such as banks. There is a detailed review of SFA models and methods in [
9]. In its basic form, SFA employs linear regression models in which the unobserved residual has two components, commonly written as
. The first term,
, is a standard
variate. The second term,
, is a non-negative variate assumed to have an
distribution, which is truncated from below at zero; that is, a half normal distribution. The expected value of
, which is nonzero, measures inefficiency. With these assumptions, the residual
has a skew-normal distribution. A somewhat different model was introduced by [
10] in which the half normal variable is replaced one which has an exponential distribution. The paper by [
11] shows that under the limit as
, and with suitable choice of other model parameters, the extended skew-normal distribution encompasses both the half-normal and exponential distributions for the inefficiency term
. Use of the extended version of the distribution offers greater flexibility in modeling inefficiency: the distribution of the inefficiency variable may exhibit a nonzero mode or may decay steeply.
Nonzero and negative values also arise in the study of stock market crashes. The standard model in financial economics is that returns on risky financial assets follow a multivariate normal distribution. Under this assumption, formally, the basic model of portfolio theory is to consider the conditional distribution of asset returns given a specified return on a market index. The resulting conditional distribution is multivariate normal, leading in essence to regression models for the return on individual assets. In the same manner, a market crash may be studied by considering the distribution of asset returns given that the return on the market index is less than a specified negative value. The resulting distribution is multivariate extended skew-normal. For market crashes, the value of the parameter denoted is both negative and of substantial magnitude. Analogous results arise if it is assumed that returns follow a multivariate Student distribution. In both the normal and Student cases, the distributions that arise as are of interest, one reason being that the limiting properties are different.
Third, use of extended versions of the skew-normal or skew-Student gives greater variability in the moments and critical values of the distributions. For empirical applications, this offers the possibility of better model fit. For some applications, the implied flexibility in the formal foundations may offer insights into the underlying data generation process. Last, in the multivariate case, conditional distributions are always in general of the extended type. Thus, for applications where conditional distributions play a role, extended versions are important if not unavoidable. The formal derivation of a skew-normal regression model as in [
5] offers an example of this.
Extended versions of the skew-normal and skew-Student distributions have explicit advantages for some purposes. They offer the potential for greater flexibility in empirical work and, in addition, methodological advantages in some cases. The main aim of this paper is to present properties of the multivariate extended skew-normal (MESN) and multivariate extended skew-Student (MEST) distributions. The results demonstrate the differences from the standard versions. The paper also studies limiting cases of the distributions as the magnitude of the extension parameter
increases without limit, extending a result reported in [
11]. As the paper shows, these limiting cases are of interest from a theoretical point of view and offer insights for some applications.
The methodological results are illustrated by two applications. First, there is a study of the effect of a stock market crash. The results are different depending on whether the underlying distributions are multivariate normal or Student. The study presented here is theoretical, but its results can inform the development of econometric models of stock returns. Second, some researchers in this area of statistics have suggested informally that the skew-Student could be used as an alternative to the extended skew-normal. For a specified univariate application, it would be straightforward to estimate the parameters of both distributions and then make an informed choice using a test of fit or, for example, consideration of the tails of the distribution. Such an alternative may be attractive, but the suggestion could equally well be made in reverse: the extended skew-normal could be an alternative to the skew-Student. A general investigation of the similarity of the two distributions, particularly for multivariate cases, would be a major task and beyond the scope of this paper. To inform further research into this issue, this paper contains a short study designed to investigate this conjecture.
The structure of this paper is as follows. In
Section 2 and
Section 3, results for the MESN and MEST distributions, respectively, are presented. The results in these two sections are based on the extended versions of the second method of construction referred to above.
Section 4 is concerned with the first method of construction, sometimes referred to in the literature as a hidden truncation model. This section contains the illustrative example of the effect of a stock market crash. The example shows that different behavior arises depending on the choice of model.
Section 5 describes a brief investigation into the use of the skew-Student as an alternative to the extended skew-normal.
Section 6 offers some concluding remarks. The abbreviations (E)SN and (E)ST are used for the univariate (extended) skew-normal and (extended) skew-Student distributions, respectively, with MSN and MST for the multivariate versions. Examples and graphs are based on univariate distributions, with most numerical results rounded to four decimal places. Notation not defined explicitly in the text is that in common use.
2. Multivariate Extended Skew-Normal Distribution
The multivariate skew-normal distribution was introduced by [
12]. The multivariate extended skew-normal distribution, MESN, with an additional parameter, was first described in [
13], independently by [
8,
14]. Following the notation in the third of these papers, the distribution of an
n-vector
that follows this distribution is denoted
. The authors of reference [
13] derive the MESN distribution as a hidden truncation model. The authors of reference [
8] present a direct derivation and link it to results in [
7], who show that conditional distributions are in general of the extended type. The authors of reference [
14] derive it as the convolution
, where the random vector
has the multivariate normal distribution
and the scalar random variable
V is independently normally distributed as
truncated from below at 0, denoted
. The basic properties of the MESN distribution are described in this section using the notation in [
14]. The probability density function of the distribution of
is
where
is the probability density function of an
n-vector
, which has a multivariate normal distribution with mean vector
and covariance matrix
evaluated at
.
is the standard normal distribution function evaluated at
z, with
denoting the corresponding density function. The distribution is denoted
. The moment generating function of
is
The mean vector and covariance matrix of the
distribution are, respectively
where the function
is defined as
Note that the covariance matrix may also be written
a form that is referred to in
Section 3.3. Coskewness and cokurtosis, defined here as the 4th cumulant, are given by
respectively.
For the skew-normal distribution itself, the mean of the underlying truncated normal variable denoted
V equals
. Rounded to four decimal places, this value is shown in panel 2, column 1 of
Table 1 in the row named “
mean”. When
the minimum and maximum values of the mean are 0.5251 and 1.2876, respectively, as shown in panels 1 and 3. The corresponding results for the higher moments are shown in in the other rows of column 1 of the table. Columns 2 and 3 shown the analogous results when
and ≤30, respectively. Thus, as well as arising automatically under conditioning, the extended version of the skew-normal provides for more flexibility in the moments of the distribution.
In their Lemma 2, reference [
11] report an apparently known result concerning the limiting distribution of
as
. The lemma is reported here for convenience.
Lemma 1 ([
11]).
Let be distributed as . As , the distribution of tends to the multivariate normal distribution . An implication of this result, described in more detail below, is that as increases in magnitude, V has an exponential distribution with parameter , that is, with mean and standard deviation both equal to . As , the distribution of tends to a multivariate normal with an unbounded mean vector but a finite covariance matrix .
The remainder of this section of the paper presents a number of properties of the MESN distribution.
Figure 1 shows two sets of examples of the density function of the (univariate) extended skew-normal distribution for
, and 0. In the left-hand set, the nonzero values of the extension parameter
are negative. In the right-hand set, the signs of the
are reversed. In both sets,
, and
. Both sets demonstrate that asymmetry disappears progressively as
increases and exhibit the properties reported in Lemma 1 and the text that follows it.
Papers by [
7,
15] show that a suitable linear transformation reduces the MSN distribution to a canonical form. Corresponding representations may be derived for the extended version of the distribution and, as shown below, for the extended skew-Student. These representations depend on the following standard result.
Lemma 2. Let denote an unit matrix, an n-vector and an n-vector of zeros. The eigenvalues of the matrix are (i) and (ii) 1 repeated times. The corresponding eigenvectors are (i) and (ii) an orthogonal matrix which satisfies .
This is used to establish the following:
Proposition 1. Let and let , where is a left square-root matrix of Σ. Then, , , and there exists an orthogonal transformation of such that and are independently distributed, and . Note that from Lemma 1, as , the limiting distribution of is the standard normal.
2.1. The Truncated Normal Distribution and Its Approximations
The probability density function of the distribution of the truncated normal variable
is
The moment-generating function (MGF), originally reported in [
16], is
with the MGF valid for all
. Following on from [
17], numerous authors present results for the moments of the truncated normal distribution and generalizations thereof. These include [
18,
19,
20,
21,
22] and, recently, [
23], among others. For values of
that are less than zero, the asymptotic expansion of
from page 932 of [
24] is
Noting that with suitable choices of
m and values of
, the remainder term
may be ignored. In this case, the moment-generating function of
V is
This leads to a distribution for which the corresponding density function is a weighted average of gamma densities
where
denotes the density function of the gamma distribution
For sufficiently large values of
, terms after the first may be ignored, giving an exponential distribution with density function
When used to to form the convolution
, the distribution at (
15) leads to the skew-normal exponential distribution described in [
11] but originally due to [
10].
Figure 2 shows sketches of the truncated normal density function for
and
. The steepness of decay increases with
.
Figure 3 shows the truncated normal density function for
, together with the corresponding exponential density function and and approximation based on the density at Equation (
13) with
. As
Figure 3 indicates, the three density functions are visually similar. In particular, there is little difference between the truncated normal density and the three term mixture based on Equation (
13).
2.2. Moments of the Truncated Normal Distribution
Expressions for moments of the truncated normal distribution are reported in [
21], as well as in references cited above in
Section 2.1. In the notation of the present paper, from Equation (
9), the mean and variance of the truncated normal distribution are, respectively,
Skewness and kurtosis, defined here as the fourth cumulant, are respectively
and
Kurtosis, the fourth moment about the mean and denoted by
, is
Expressed in terms of
, this is
Note that, from [
25],
for all
. Using the first term of the asymptotic expansion for
for
, under which
V has the exponential distribution at Equation (
15), leads to the following expressions for the first four derivatives of
.
where in this paper the notation ≃ is taken to mean that the ratio of the two functions tends to unity as, in this case,
. These results give the same expressions for the first four moments as those computed from the exponential distribution at Equation (
15).
Table 2 shows the computed values of the first four moments of the truncated normal distribution, the limiting exponential distribution at Equation (
15), and the mixture distribution based on Equation (
13) with
. Values are shown for
, and
. In the table, kurtosis is the fourth moment about the mean, that is,
. As the table shows, the differences between the exact and approximate results are small and decline as
increases. Whether a given approximation may be used as a practical alternative to the truncated normal will depend on the magnitude of
and the application in question.
2.3. Standardized Form of the Extended Skew-Normal Distribution
Additional insights into the MESN distribution may be obtained by standardization. If
denotes a left square root matrix of
, the random
n-vector
now defined as
satisfies
and
. The distribution of
has the density function
where
and
are as defined for Equation (
1) and
For the standardized form of extended skew-normal distribution, coskewness and cokurtosis (also defined here in terms of the fourth cumulant) are given by
respectively, where
is the standardized value of the skewness or shape parameter defined as
Both coskewness and cokurtosis tend to zero as
, in which case the limiting distribution of
is the standard multivariate normal. A suitable transformation similar to that in Proposition 1 shows that the standardized MESN distribution may be expressed in canonical form similar once again to that described in [
7] and [
15].
Proposition 2. Let and let , where is a left square-root matrix of Ω and let Then and are independently distributed withand as defined at Equation (16) and in Proposition 1. Note that as in Proposition 1 as the limiting distribution of is the standard normal. Figure 4 shows two sets of standardized extended skew-normal density functions. In both sets
,
and
. In the left-hand set, values of the extension parameter
are set to −30, −15, −5, −2.5 and 0. In the right-hand set, the signs of the
are reversed. Both sets of densities illustrate that for
little asymmetry is apparent even when the shape parameter
is substantial; in this case five times greater than the scale parameter
. Of the values of
shown in the figure, only
leads to a density function with a discernible amount of asymmetry.
Figure 5 shows two more sets of the skew-normal density functions. The panel on the left shows extended skew-normal density functions with
and
. The values of
are −5, −2.5, −1, 0, 1, 2.5 and 5. The panel on the right-hand side consists of the corresponding densities standardized to have mean equal to zero and variance equal to one. The X-scales are the same in each panel. As
Figure 5 shows, the skewness apparent for the extended skew-normal distributions reduces and largely disappears under standardization. There are analogous results for negative values of
.
Table 3 shows a selection of moments for the extended skew-normal distribution and the corresponding standardized form for values of
that are less than or equal to zero. Values of
and
are as shown in the table. Values of the location and scale parameter are
and
and are used in all numerical results. As the table shows, when
the values of standardized skewness and kurtosis are numerically close to 0 and 3 respectively, thus supporting the result of Lemma 1. For
and 1 there is evidence to support normality for
. Asymmetry is most evident when
is zero or close to it.
Table 4 shows the corresponding selection for positive values of
. The panel corresponding to
is repeated for ease of reading. The table indicates normality for
. Asymmetry is evident when
. The panel with
has values of
that are negative.
3. Multivariate Extended Skew-Student Distribution
The multivariate extended skew-Student distribution, MEST], is an extension of the multivariate skew-Student distribution originally introduced by [
2]. The extended version is reported in [
26] and later in both [
27,
28]. Following [
14], the former derives it as the convolution
, where the random vector
of length
has a multivariate Student distribution with location parameter vector
and scale matrix
with
V truncated from below at zero. Consistent with the notation in
Section 1, this is denoted
, where
denotes a Student’s
t variable with location parameter
and scale
truncated from below at
x. The marginal distribution of
has the symmetric density function reported
Section 3.2 of [
27] and independently in [
28]. The probability density function of the distribution of
is
where
and where
is as defined at Equation (
2).
is the probability density function of an
n-vector
which has a multivariate Student distribution with location parameter vector
and scale matrix
evaluated at
.
is the distribution function of a Student’s
t variable with
degrees of freedom evaluated at
z and
is the corresponding density function. This distribution is denoted
. As in
Section 2, this section of the paper presents basic properties of the MEST distribution. Similar to
Table 1, unreported results show that nonzero values of
make a substantial difference to the moments of the distribution. As
, the limiting distribution of
is multivariate Student.
Proposition 3. Let . The limiting distribution as is multivariate Student with location parameter , scale matrix , and ν degrees of freedom; denoted .
The proof of this result uses the scale mixture representation reported in Lemma 3 of [
29]. This result is consistent with the analogous property of the MESN distribution reported in
Section 2. As shown later in this section, however, the limiting distribution of
as
in the MEST case is different from that for the MESN.
Figure 6 shows sketches of the extended skew-Student density function for
and
. The left-hand panel shows density functions with negative values of
ranging from
to
.The right-hand side shows densities with positive values of
ranging from 0 to 30. This symmetric density function is that reported in both [
27,
28]. Two notable features are, first, the similarity of the density function for increasing positive values of
, but, second, the increasing spread of the density function as
increases for negative values of
. For
, the left-hand panel of
Figure 7 shows density functions with the same negative values of
. The right-hand panel shows densities with
ranging from 0 to 20. In both of these figures,
and
. In the right-hand panel of
Figure 7, the density function is qualitatively similar to the corresponding skew-normal distribution: asymmetry disappears with increasing values of
, and the location parameter increases, but the spread does not. For negative values of
, the spread increases and asymmetry decreases with increasing values of
. To support the sketches in the figures, the moments of the extended skew-Student distribution are reported in
Section 3.3 below.
A canonical form of the MEST distribution may be derived using an approach that is essentially the same as that in Proposition 1.
Proposition 4. Let and let , where is a left square-root matrix of Σ. Then , , and there exists an orthogonal transformation of such that the density function of iswhere is the normalizing constant for an n-variate multivariate Student distribution with ν degrees of freedom,and Equivalently, where .
Standard manipulations show that
and that the marginal distribution of
has the symmetric Student-like density function reported in Section 3.2 of [
27].
3.1. The Truncated Student’s t Distribution
Similar to the extended skew-normal, the properties of the extended skew-Student distribution are substantially affected by those of the truncated form of Student’s
t. The density function of the truncated Student’s
t variable
v is
Figure 8 shows sketches of the truncated Student
t density function for
, together with two approximating beta type-2 density functions as described below in Lemma 5. The degrees of freedom
are 5 and 20, respectively. For a fixed value of
, the figure illustrates the increasing severity of decay as
increases. It is notable that for
, the truncated Student
t is well approximated by the beta type-2 densities.
3.2. Moments of the Truncated Student’s t Distribution
Moments of the truncated distribution at Equation (
30) may be evaluated directly. Note that expressions for the moments of a doubly truncated
t distribution may be found in [
30]. As reported in [
27], for
and
, respectively, the mean and variance of this distribution are
where
The following result, derived using integration by parts, leads to a more useful representation of .
Lemma 3. For , the following result holds Using this result, for
, the functions
and
are related by the identity
Equation (
33) allows the variance to be written as
Note that
is sufficient to show that the limiting values in Equation (
31) equal those for the truncated normal at Equation (
16). For
and
, skewess and kurtosis (the fourth moment about the mean), respectively, are
and
where
and
As already noted above, reference [
25] showed that the skewness of the truncated normal distribution is non-negative for all values of
. The following shows that the same result holds for the truncated Student distribution.
Proposition 5. Let . For , the following result holds: .
The proof is by contradiction. First, note that since , the sign of is determined by the sign of the expression in in Equation (35). This quadratic function of has roots Since the coefficient is positive, the function is negative between the roots, which is a contradiction.
Note that as
, Proposition 5 also establishes Sampford’s result, and note that the expressions for the first four moments tend to those for the truncated normal distribution at Equations (
16), (
17), and (
19).
Computation of limiting expressions for the moments as
requires a result that is analogous to the well-known asymptotic expression for normal distribution reported in [
24]. Such a result was first reported in [
31]. As it does not appear to be well known, it is summarized below in the notation of this paper.
Lemma 4 ([
31]). For values of
that are less than zero, the asymptotic expansion of
is
Noting that with suitable choices of
m and values of
, the remainder term
may be ignored.
Using the first two terms in the expansion in Lemma 4 for
and
gives
from which the asymptotic expected value is
For
, the corresponding expression for the asymptotic variance is
Thus, for fixed finite degrees of freedom
, the expected value and variance increase without limit as
. As
, the expected value and variance tend to
and
, respectively, the results for the truncated normal distribution. The corresponding expressions for skewness and kurtosis are omitted in view of their complexity. However, if just the terms proportional to
are considered, then for
as
, asymptotic skewness is
Similarly for
, asymptotic kurtosis is proportional to
.
Table 5 shows a selection of moments from the truncated Student’s
t distribution. As
increases above zero, the distribution increasingly resembles Student’s
t as demonstrated by the values in the bottom panel of the table. The top panel corresponding to
shows the increasing values of the moments. The analog of the limiting exponential distribution that arises in the normal case described in
Section 2.1 is as follows.
Lemma 5. Let . For , as the ratio increases without limit, the asymptotic distribution of is , that is, with density function. The proof of this lemma is in
Appendix A. An asymptotically equivalent result is that the variable
is also distributed as
.
It is straightforward to show that the conditional distribution of
given
follows a multivariate Student distribution with
degrees of freedom, location parameter vector
, and scale matrix
Use of this distribution in conjunction with the asymptotic distribution of
V in Equation (
45), for
does not lead to tractable results that are analogous to those in
Section 2.
3.3. Moments of the MEST Distribution
For
and
, respectively, the mean vector and covariance matrix of the MEST distribution are
and
Using the identity at Equation (
33) allows the covariance matrix to be written as
The similarity of the coeffcient of
to the corresponding term in Equation (
6) may be noted. The coefficient of
provides the inequality
.
The skewness of a single variable
in
with scale denoted by
may be expressed in terms of the moments of
V the truncated Student’s
t variable, specifically Equations (
34) and (
35), and is given by
The kurtosis of
is given by
The corresponding expressions for coskewness and cokurtosis are omitted. A selection of moments of the extended skew-Student is shown in
Table 6,
Table 7,
Table 8 and
Table 9.
Table 6 [
7] shows results for
[
] for
. The panel for
is repeated for convenience and corresponds to Student’s
t distribution. The lower panels of
Table 6 show the increasing magnitude of variance and kurtosis as
increases, even for
.
Table 8 and
Table 9 show the corresponding results for
. Note that in
Table 8, some large results are shown to two decimal places only to preserve the formatting.
3.4. Standardized Forms of the MEST Distribution
As in
Section 2.3, further insights into the extended skew-Student distribution may be obtained by standardization. If
denotes a left square root matrix of
, the random vector
now defined as
satisfies
and
. The distribution of
has the density function
where
is as defined for Equation (
28),
is as defined for Equation (
1) and
The distribution at Equation (
54) has a canonical form. First, define
partition
into a scalar
and an
-vector
and let
be the quadratic form
where
is as defined in Proposition 1. Methods similar to those used in that proposition gives the following result.
Proposition 6. Let and , where is a left square-root matrix of . Then whereand The density function of iswhere As Equations (
58) and (
59) show, under the canonical representation, the asymmetry in the density function is attributable solely to the scalar variable
. The marginal distribution of
is symmetric and of the same type reported
Section 3.2 of [
27]. Examples of the EST and standardized EST density functions are shown in
Figure 9 for
and
and
, and 100. In the upper (lower) row,
. The X-scales are the same in each panel. The graphs confirm results from
Table 8 and
Table 09, namely that the degree of asymmetry is reduced under standardization. Examples of contour plots for the bivariate EST and standardized EST distributions are shown in
Figure 10.
To investigate the behavior of the distribution as
for fixed
, consider the scalar variable
, which has the marginal distribution
where
. For
, define
As
for fixed
, the asymptotic density function of
is
where
This leads to the following result:
Proposition 7. For , as , the distribution of has the asymptotic density functionwith the sign of determined by the sign of , and The result in this proposition requires the asymptotic expression for the distribution function of Student’s
t. As noted above, such a result was first provided by [
31] and is summarized in Lemma 4. Comparative examples of the exact and asymptotic EST density functions are shown in
Figure 11. The implication of Proposition 7 is that as
, the standardized distribution is qualitatively similar to the corresponding form for the extended skew-normal in that dependence on
disappears. For nonzero values of
or
, however, the distribution remains asymmetric. It is important to note though that, unlike the MESN, dependence on
as it tends to
does not disappear in the nonstandardized MEST case. In addition to Proposition 7, recall from results in
Section 3.2 and
Section 3.3 that for finite degrees of freedom, the location parameter vector depends on
and the covariance matrix on
.
4. Hidden Truncation Models
In their simple form, hidden truncation models are concerned with the bivariate normal distribution of
in situations in which
X is observed if
Y is greater than (less than) a given threshold, here denoted
. The procedure is commonly referred to as selective sampling. The resulting conditional distribution is that of
. Such a construction is reported in a more general form in [
12] for the case in which the scalar
X is replaced by a random vector
. The phrase hidden truncation models is more often associated with the [
13] in which they refer to an earlier work [
32]. In selective sampling situations, it seems self-evident that the threshold
will depend on the application in question. This is clearly implied in
Section 2 of [
13] in which they denote the threshold by
and report the resulting distribution of
X conditional on
, which is the extended skew-normal. The extended version of the skew-normal is also described in [
33]. In the introduction to a sole-authored later paper, [
34],
Y is assumed to exceed its expected value. This case is more in keeping with the skew-normal literature, which does not generally employ the extended version of the distribution. Subsequent sections of [
34], however, are inter alia concerned with extended versions of the skew-normal and other distributions.
The aim of this section is to present limiting forms of the extended skew-normal and skew-Student distributions when they are derived as hidden truncation models. Consistent with the results in
Section 2 and
Section 3, the limiting distributions exhibit different properties. The distributions of the hidden truncated variable
Y and the observed vector
differ markedly depending on whether the underlying form is normal or Student’s t. In selective sampling, limiting forms of the distributions arise when the notional observation on the conditioning variable
Y is required to be in one of the tails of its distribution. To illustrate the differences between the hidden truncation skew-normal and skew-Student distributions, either extended or not, this section contains a table of critical values corresponding to a probability of 0.025. Critical values corresponding to other probabilities are available on request. In addition to these general results,
Section 4.4 describes an application to stock market crashes, in which the truncated variable is not only material to the resulting distribution but is also observed.
4.1. Hidden Truncation under the Normal Distribution
It is assumed that the
n-vector
and a scalar variable denoted
Y have a multivariate normal distribution
The conditional distribution of
, given that
, has the probability density function
where
The moment-generating function of the conditional distribution of
given
is
and that of
Y is given
Noting the similarity to the MGF of the truncated variable denoted
V in
Section 2.1, it follows that
As
, the variable
Y given that it is less than or equal to
becomes deterministic in the sense that its expected value is asymptotically equal to
, but its variance and all higher moments are asymptotically equal to zero. The conditional expected return and covariance matrix of
are, respectively,
As
, the vector of expected values and the covariance matrix become
It is interesting to note that element
i of the vector of expected values decreases or increases depending upon whether
is positive or negative. The joint moment-generating function of
and
Y conditional on
is
from which
Using similar arguments to those for Lemma 1, as , the covariances all tend to zero as expected.
4.2. Hidden Truncation under Student’s t Distribution
It is now assumed that the
n-vector
and a scalar variable
Y have a multivariate Student distribution with
degrees of freedom. The conditional distribution of
, given that
, has the probability density function
where
and
are as defined above and
The conditional mean and variance of
Y are
where
and
are defined at Equation (
32). As
, the asymptotic expected value and variance are
For finite and fixed degrees of freedom, and ignoring
for ease of exposition, the conditional expected value is uplifted through multiplication by
, that is, the effect is most pronounced when the degrees of freedom are small. The asymptotic variance increases with
, that is, potentially without limit. The conditional expected return and covariance matrix of
are
and
where
As
, the vector of expected values and the covariance matrix become
and
That is, for finite degrees of freedom, both expected values and the covariance matrix increase in magnitude without limit as
. Similar to Equation (
71), the conditional expected value of element
i of
will increase without limit if the corresponding value of
is negative and is unaffected if it equals zero.
Comparing the normal and Student hidden truncation models, the vectors of expected values are mainly determined by
. Differences will be marked only if the degrees of freedom are small. The covariance matrices differ substantially: in the Student case for fixed
, the covariance matrix increases without limit as
. For a given finite value of
, the increase in the elements of the covariance matrix decreases with increasing
. The conditional covariance between
and
Y is
Standard manipulations using Equation (
83) show that the conditional correlation between a typical element
i of
and
Y is asymptotically equal to
which tends to zero as
.
4.3. Hidden Truncation with Extended Distributions
Table 10 shows critical values corresponding to a probability of 0.025 for the univariate versions of distributions at Equations (
66) and (
75) for a range of values of
,
, and
. Table entries are computed numerically, displayed to two decimal places. In Panel 4, corresponding to the standard case
, the first row,
yields the critical values for Student’s t distribution with
, and 100 degrees of freedom and the standard normal distribution. The other rows in the same panel correspond to
, and
. As the panel shows the critical values range from
to
. In Panels 1 to 3, for which
takes negative values, the range is greater and increases with the magnitude of
. In panels 5 to 7, with positive values of
, the critical values closely approximate those of Student’s
t and the normal distribution as expected. In each panel, the rows corresponding to
are the critical values of the nonstandard symmetric Student-like distribution reported in both [
27,
28]. The effect of the distribution of
and
Y and the threshold
has a non-negligible effect on critical values, that is, for many applications, extended versions of the distributions may be preferred.
4.4. Stock Market Crashes
The basic empirical model for the returns on stocks is a regression in which the single explanatory variable is the contemporaneous return on a suitable market index, such as the UK’s FTSE100 or the USA’s S&P 500. The model is generally referred to as the market model. It is the operational version of the capital asset pricing model, universally referred to as the CAPM, of [
35,
36,
37]. Numerous other regression setups are in widespread use, but all maintain a close connection to the market model. More formally, it is assumed that the
n-vector of asset returns
and the contemporaneous return on the market index
have a multivariate normal distribution
where
. An element
of
may denote the return on an individual stock or a portfolio of stocks. The market model is then the conditional distribution of
given that R
m = r
m that is
or, if the market model is written in familiar regression style notation
The results with an underlying Student distribution are similar. For
, the conditional mean is the same, but for
, the conditional covariance matrix now depends on
as follows
That is, the conditional variance is inflated by a factor that is proportional to the squared deviation of from its expected value.
In this subsection, the effect of a market crash is considered. A detailed coverage of the statistical and empirical properties of crashes is beyond the scope of this paper, but some theoretical insights into crashes may be derived using the skew-normal and skew-Student distributions. Specifically, the standard conditioning event
is changed to
. This characterizes a crash when
is both negative and of large magnitude. Comparison of Equation (
85) with (
65) and (
66) shows that the resulting conditional distribution of
is extended skew-normal or extended skew-Student. For underlying normal returns, the conditional mean and variance of market returns are, respectively,
where
. Similar to the results in
Section 4.1, in the limit, as
, market return becomes nonstochastic with (expected) value equal to
.
The corresponding results for the conditional mean vector and covariance matrix of asset returns
are
and
In a crash, the conditional expected return on asset
i decreases or increases without limit depending on the sign of
, but there is no effect if
. The conditional covariance matrix is asymptotically equal to
, the conventional case defined at Equation (
86). With underlying Student returns, for
, the conditional mean and variance of market returns are, respectively,
Using the results at Equation (
78), it follows that the expected value of market return in a crash is negative and increases pro rata to the standardized crash size. Unlike the results based on an underlying normal distribution, the conditional variance is proportional to the square of the standardized crash size; for given
, the variance increases without limit. A sketch of the conditional distribution of index returns under normal and Student’s
t distributions with five degrees of freedom and corresponding to a five-standard-deviation crash is shown in
Figure 12. As the sketch shows, the Student’s
t tail is longer and fatter than that of the normal.
The corresponding results for the conditional mean expected return vector is
As above, the conditional expected return for asset
i will increase or decrease without limit depending on the sign of
but is unchanged if it equals zero. Using Equation (
80), the conditional covariance matrix is
which, in keeping with Equation (
83), may also increase without limit. Noting that
, the similarities between Equation (
94) and (
88) are clear.
5. Extended Skew-Normal versus Skew-Student
The literature concerning the skew-normal and skew-Student distributions is more abundant that that for the corresponding extended versions. It has been conjectured by some researchers in the area, albeit informally, that the skew-Student could be used as an alternative to the extended skew-normal distribution. To some extent, such a suggestion is motivated naturally by the similarities in the shapes of some of the respective density functions. Somewhat more formally, use of the skew-Student could be regarded as being closer in spirit to the original skew-normal literature. For univariate distributions, and from the perspective of empirical work, this is an issue that is more concerned with parameter estimation and tests of fit. That is, for a given data set, does the extended skew-normal or the skew-Student offer better fit? For multivariate distributions, the issue is the same in principle, although the details are more complex. It is of course also the case that the extended skew-normal might be preferred to the skew-Student. For example, for the former, all moments exist, which may be a consideration for some applications. Conditional distributions are in general of the extended type. For multivariate applications in which conditioning is a requirement, methodological issues could imply that extended versions of the distribution are more appropriate. That is, an MESN or even MEST distribution may be preferable to the MST.
To construct an approximation, at least two types of method suggest themselves. Given a specified extended skew-normal distribution, one method would be to minimize a suitable measure of the distance between the two density functions. Several measures of distance could be considered. Denoting the two density functions by
and
and assuming that the parameters of the former (latter) are given, the parameters of the latter (former) could be chosen by minimizing
Numerous variations on this theme could be constructed, for example, using a different norm or minimizing the divergence between the ESN and ST density functions using the Kullback–Leibler divergence measure [
38] or the Hellinger distance ([
39]). A second approach could be to seek to match the first four moments of the two distributions. It is clear that a comprehensive study of this conjecture, particularly bearing in mind multivariate distributions, would be a substantial undertaking. In this section of the paper, an initial investigation into the approximation of the univariate extended skew-normal distribution by the skew-Student, which may inform more comprehensive studies to be carrried out in the future, is described. The section is in two parts. In the first section, a theoretical investigation based on population moments is reported. In the second part, a study in which simulated data from a number of specified extended skew-normal distributions is used to estimate the parameters of both models is reported.
There are three technical points to note. First, the choice of an approximating skew-Student distribution is informed by the limiting forms of the extended skew-normal. From Lemma 1, as
, the limiting form of the ESN distribution is
, which is the limiting form of a skew-Student distribution with
, that is, Student’s
t, as
. As also reported in
Section 2, a similar result holds as
. The implication is that using ST distributions to approximate the ESN is appropriate for values of
that are not too large. Second, motivated again by similarities in the shape of the density function, an ESN distribution may be approximated by the SN itself. Third, there are combinations of the parameters
and
for which approximation by moment matching are infeasible. To illustrate this, consider an approximation of a univariate ESN with parameters
,
,
and
by an SN with parameters
,
and
. Equating skewness shows that a real value of the ratio
requires that
and that simple computations show that the inequality does not always hold.
5.1. Moment Matching Study
The study in this paper considers the approximation of an ESN distribution by an ST. As above, for the ESN,
and
. The extension parameter
takes 11 values in the range
. As skewness is asymmetric in the shape parameter;
takes 9 values in the range
. For practical reasons, the derived value of
is restricted to be an integer. For a given pair
, the approximating values of
and
are derived by minimizing the absolute difference in standardized skewness. This is done by grid search. The other parameters are computed by equating the expected value and variance of the two distributions. For
, pairs for which a moment matching approximation exists, the divergence between the ESN and ST density functions is computed using the Kullback–Leibler divergence measure [
38]. The values of this divergence measure are ranked from best to worst, with the parameters corresponding to the best ten and worst ten shown in
Table 11. The first two columns of each panel show the values of
and
. The next three columns show the computed values of
,
, and
for the approximating ST distribution, with values rounded to four decimal places. Computed values of
that were equal to 1000 or greater were replaced by
∞, that is, the approximating distribution is effectively skew-normal.
Table 12 shows the corresponding values of the moments. As the Best 10 panel shows, the differences in the first four moments are negligible. For the Worst 10 panel, differences in mean, variance, and skewness are also negligible because of the method of construction. Unlike the results in the upper panel, there are differences in kurtosis.
Table 13 shows the corresponding critical values, displayed in eight columns. These show critical values at
p-values of 0.5%, 2.5%, 95.5%, and 99.5% in ESN/ST pairs. Values are shown corrected to two decimal places and were computed numerically. As the table shows, for the Best 10 approximations, the differences are negligible. For the Worst 10, the differences are more pronounced. To illustrate the effect of the moment matching procedure,
Figure 13 shows ESN and ST density functions for which the ST approximation is the worst according to the Kullback–Leibler divergence measure.
The results in
Table 11,
Table 12 and
Table 13 provide support to the implications of Equation (
95), namely that the method of approximations works well for values of
that are not too large. An interesting result is that for numerous parameter combinations, the extended skew-normal distribution may be well approximated by a skew-normal. The usefulness of the results in the Worst 10 panels will depend on the application. In some applications, accurate critical values are not necessary, but in others, they are. There are other methods of measuring the divergence between two density functions. Two well-known ones are Hellinger distance ([
39]) and Jensen–Shannon divergence ([
40]), both of which constitute topics for future investigation.
5.2. Simulation Study
The simulation study uses the same sets of values of
,
,
, and
. For each combination of the parameters, 100 samples of size 100 from an extended skew-normal distribution were drawn. The parameters were estimated by maximum likelihood for the ESN and ST distributions. In addition, motivated by the results in
Table 11, the parameters of the skew-normal distribution were also estimated. Summaries of the results are shwon in
Table 14,
Table 15 and
Table 16.
Table 14 shows the value of the log-likelihood function for each parameter combination computed at its estimated maximum, averaged over the 100 samples and over values of
. The table has four columns, with the first showing values of
based on parameter values inferred from sample moments. As columns 2 through 4 of the table show, the value of
varies little with the choice of underlying distribution. For this relatively small sample size, if the value of
were the sole criterion for model selection, it would be difficult to discriminate between the three distributions.
For each parameter combination shown in
Table 15 and
Table 16, the entries are averages of the 100 samples.
Table 15 shows the root mean-square error in the moments for the three distributions and for 35 selected combinations of
. As the table shows, the lowest root mean-square error occurs under the ESN for 30 of the
combinations. Root mean square error is computed as the square root of the average squared difference between the population moments and the average of the estimated moments based on parameters based on MLE for each distrbution. The population moments included in the calculations are mean, variance, skewness, and kurtosis.
Table 16 shows the corresponding errors in the critical values. Root mean square error is computed as the square root of the average squared difference between the population critical values and the average of the estimated values based on MLE parameter estimates for each distribution. The critical values are computed at nominal percent probabilities equal to 0.05, 0.5, 2.5, 5.0, 95.0, 97.5, 99.5, and 99.95. The lowest root mean square error occurs under the ESN for 28 of the parameter combinations. In both
Table 15 and
Table 16, the root mean square error is generally the largest under the ST distribution.
6. Concluding Remarks
In this paper, results that demonstrate the properties of both the multivariate extended skew-normal and extended skew-Student distributions as the value of the extension parameter changes are presented. In general, for given value of location, scale, and shape or skewness, nonzero values of lead to greater variability in both the moments and critical values. In turn, this offers greater flexibility in empirical applications of these distributions. From a theoretical perspective, increasing values of leads to more fundamental changes in both distributions. As increases without limit, the asymptotic distributions are multivariate normal and multivariate Student, respectively. The respective vectors of expected values of both distributions are dependent on and are unbounded. The covariance matrices, however, remain finite. Skewness disappears for both distributions. By contrast, as , more substantial changes take place in the distributions. Most notable is that for the MESN distribution dependence on vanishes, but for the MEST in general, it does not. In the case of the MESN, the limiting distribution is multivariate normal. For the MEST distribution with finite degrees of freedom, asymmetry remains. For fixed , the extent of asymmetry decreases as the degrees of freedom increase. For fixed degrees of freedom, as , the vector of expected values and the covariance matrix are both unbounded.
To illustrate the potential of the MESN and MEST distributions, two applications are described. First, the effect of a stock market crash is studied assuming underlying multivariate normal and multivariate Student distributions. A crash, in which the return on a market index is less than a given negative threshold, results in multivariate extended skew-normal and multivariate extended skew-Student distributions. Under an underlying multivariate normal distribution, as the crash size increases without limit, the return on a stock market index becomes nonstochastic. In short, the market plummets: actual return equals expected return. Under an underlying multivariate Student distribution, expected return is broadly the same, but variability increases without limit. The market decline is noisy. There are analogous results for the returns on individual stocks. In particular, with underlying normality, the conditional covariance matrix remains finite, whereas under an underlying Student distribution, it does not. A detailed investigation of the implications and suitability of these models is beyond the scope of this particular paper, but it is reasonable to posit that the results offer support to the view that an underlying Student distribution is a more realistic model than the normal. Given that stock market collapses have in the past been of relatively short duration, the results also imply that for financial applications the models change. The methods described may be applied in principle to stock market booms. It may also be noted that if an inefficiency variable were to be constructed, SFA analysis could be treated in the same way.
Second, the conjecture that the skew-Student could be used instead of the extended skew-normal is an interesting one. Given the similarity in the shapes of the density functions for many combinations of parameters, this conjecture suggests that there is the possibility of flexible model choice. A general investigation of this conjecture would be a substantial task. The exercise reported in this paper is intended to offer evidence to motivate further research. The short study reported in this paper, part theoretical and part based on simulation, suggests that a given ESN distribution should be treated as such. However, the results also suggest that the ESN could be well-approximated by the skew-normal in some circumstances, but in general not by the skew-Student.