More on the Supremum Statistic to Test Multivariate Skew-Normality

Opheim, Timothy; Roy, Anuradha

doi:10.3390/computation9120126

Open AccessReview

More on the Supremum Statistic to Test Multivariate Skew-Normality

by

Timothy Opheim

and

Anuradha Roy

^*

Department of Management Science and Statistics, The University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USA

^*

Author to whom correspondence should be addressed.

Computation 2021, 9(12), 126; https://doi.org/10.3390/computation9120126

Submission received: 11 October 2021 / Accepted: 23 October 2021 / Published: 29 November 2021

(This article belongs to the Special Issue Modern Statistical Methods for Spatial and Multivariate Data)

Download Versions Notes

Abstract

:

This review is about verifying and generalizing the supremum test statistic developed by Balakrishnan et al. Exhaustive simulation studies are conducted for various dimensions to determine the effect, in terms of empirical size, of the supremum test statistic developed by Balakrishnan et al. to test multivariate skew-normality. Monte Carlo simulation studies indicate that the Type-I error of the supremum test can be controlled reasonably well for various dimensions for given nominal significance levels 0.05 and 0.01. Cut-off values are provided for the number of samples required to attain the nominal significance levels 0.05 and 0.01. Some new and relevant information of the supremum test statistic are reported here.

Keywords:

Monte Carlo simulations; multivariate skewness; skew-normal distribution; supremum test

1. Introduction

In this brief review, we summarize existing tests for multivariate skew-normality, both comparing their efficacy in terms of power as well as their practical implementation in terms of programming complexity. Afterwards, we focus on the superb supremum test developed by Balakrishnan et al. [1]. The exact distribution of the supremum test statistic was developed therein when the parameters of the skew-normal distribution were known. Exhaustive Monte Carlo simulations are conducted herein to determine the effect, in terms of empirical size, of the use of this distribution when the skew-normal parameters are unknown for the case of dimensions d equal to 3, 5, 8, and 10. Finally, cut-off values are provided for the number of samples required to attain the nominal 0.05 and 0.01 significance levels.

Many multivariate techniques have been developed based on the assumption of multivariate normality in the last century: see [2,3,4,5,6,7]. There is a great quote “Normality is a myth; there never was, and never will be, a normal distribution” in [8]. When the assumption of multivariate normality is met, a multivariate linear model with an unstructured variance–covariance matrix can be fit effortlessly due to the existence of closed-form maximum likelihood estimators (MLEs) for the unknown parameters. Due to this convenience, some leniency crept into the use of the multivariate normal distribution in order to avoid complexities arising in optimization procedures for multivariate non-normal distributions. Moreover, when the assumption of multivariate normality appeared to be a grave misspecification, a transformation of the response vector was recommended in order to fix the original deficiency. However, this fix cannot correct the non-normality issue completely in many datasets, but only to a certain extent. Nevertheless, with the advent of better computers, there appears to be no compelling reason to restrict oneself to the multivariate normal distribution when, in fact, real data mostly have some non-negligible skewness. For example, one prominent alternative to the normal distribution is the skew-normal (SN) distribution. Azzalini introduced the univariate SN distribution in 1985 [9], and after a decade, Azzalini and Dalla Valle [10] proposed a multivariate extension of Azzalini’s SN distribution. Interested readers are recommended to read the recent article by Kollo [11] on a journey from normal distribution to skewed multivariate distributions in order to get a deep insight on the evolution of multivariate SN distribution.

We write

Z \sim S N_{d} (0_{d}, \bar{Ω}, a)

to denote the normalized multivariate skew-normal distribution of Azzalini and Dalla Valle [10] type having probability density function (p.d.f.)

\begin{matrix} f_{Z} (x) = 2 ϕ_{d} (x; \bar{Ω}) Φ (a^{'} x), x \in R^{d}, \end{matrix}

where

\underset{d \times d}{\bar{Ω}}

is a positive-definite (PD) correlation matrix;

a

is a d-dimensional vector of shape (skewness) parameters, albeit indirectly;

ϕ_{d} (Z; Σ)

denotes the p.d.f. of a

N_{d} (0_{d}, Σ)

random variable; and

Φ (\cdot)

is the cumulative distribution function of a standard normal random variable. This family of distributions can be extended to include location and scale parameters through the usual transformation

\begin{matrix} Y = ξ + ω Z, \end{matrix}

where

ξ

is a d-dimensional location vector and

ω = diag (ω_{1}, \dots, ω_{d})

is a PD matrix (i.e.,

ω_{i} > 0

for all

i = 1, \dots, d

) representing the scale parameters. In this case, we write

Y \sim S N_{d} (ξ, Ω, a)

to denote that its p.d.f. is

\begin{matrix} f_{Y} (x) = 2 ϕ_{d} (x - ξ; Ω) Φ (a^{'} ω^{- 1} (x - ξ)), x \in R^{d}, \end{matrix}

where

Ω = ω \bar{Ω} ω

is a PD covariance matrix.

1.1. Canonical Form of a Multivariate Skew-Normal Variable

Definition 1.

Let

Y \sim S N_{d} (ξ, Ω, a)

and

a_{Y^{*}} = {(a_{*}, 0, \dots, 0)}^{'}

, where

a_{*} = {(a^{'} \bar{Ω} a)}^{1 / 2}

. An affine transformation of

Y,

Y^{*},

is said to be in canonical form if

Y^{*} \sim S N_{d} (0_{d}, I_{d}, a_{Y^{*}})

.

Result 1.

For

Y \sim S N_{d} (ξ, Ω, a)

, define

M = Ω^{- 1 / 2} Σ Ω^{- 1 / 2}

, where

Σ = v a r (Y)

and

Ω^{1 / 2}

is the unique positive definite symmetric square root of Ω. Let

Q Λ Q^{'}

denote a spectral decomposition of

M

, where without loss of generality, we assume that the diagonal elements of Λ are arranged in increasing order, and

H = Ω^{- 1 / 2} Q

. Then,

\begin{matrix} Y^{*} = H^{'} (Y - ξ) \end{matrix}

has canonical form.

See [12] for the above Result 1. An affine non-singular transformation of

Y

results in a canonical form with d uncorrelated components, where the first univariate component can be skewed

{S N}_{1} (0, 1, a_{*})

with

a_{*} = {(a^{'} \bar{Ω} a)}^{1 / 2}

, and the remaining components are distributed as

N_{1} (0, 1)

. Now, the question is: how do we test multivariate skew-normality, and how many samples are needed for the testing?

2. Hypothesis Test

Balakrishnan et al. [1] tested the following hypothesis:

\begin{matrix} H_{0} : & Y \sim a multivariate skew - normal distribution with some location - scale parameters \\ v s . & H_{1} : & Y ≁ a multivariate skew - normal distribution . \end{matrix}

(1)

To the best of our knowledge, there appears to be but two references on tests for multivariate skew-normality. As noted in Meintanis and Hlavka [13], tests can be constructed using the multivariate Kolmogorov–Smirnov and the Cramer–von Mises goodness-of-fit test statistics. However, these were shown to possess less power and more computational complexity than the so-called Meintanis and Hlavka test developed therein, which is based on the empirical moment generating function. Moreover, the Meintanis and Hlavka test statistic is not without its own computation complexities as well, since it requires bootstrap resampling and the numerical solution of a first-order partial differential equation. Finally, the Meintanis and Hlavka test was also shown to be more powerful than yet another test statistic redolent of the technique used in [1], where the modulation invariance property of the multivariate SN distribution was used to arrive at a test statistic approximated by a

χ^{2}

distribution.

Balakrishnan et al. [1] proposed a test for multivariate skew-normality by exploiting the modulation invariance property of the SN distribution and transformations of univariate normal distributions, such as the absolute value of a univariate SN distribution being equivalent to the absolute value of a univariate normal distribution, which in turn is a half-normal distribution [14], and the ratio of a univariate normal random variable and a half-normal random variable is a Cauchy random variable. In their paper, the authors obtained a test statistic from the canonical form of the multivariate skew-normal distribution as described in Section 1.1, and suggested three forms of combined statistics from n random samples (independent and identically distributed) of d variate observation, where

d \geq 2

. Of the three combined statistics, the bin-ensemble test was decided against in favor of the other two due to its seemingly large variability, while the remaining two statistics had similar power. Out of these two tests, the supremum test is the exact test; thus, we study it little bit more in this review and report some new and relevant information about it. As suggested by a referee, we include more details on the Balakrishnan et al.’s supremum test [1] in the following section. Some parts (e.g., Result 1) from Balakrishnan et al. [1] are also included in the previous section so that the reader could follow the following section with ease.

Balakrishnan et al.’s (2014) Supremum Test

Balakrishnan et al. [1] considered one

d

-dimensional observation

y = {[y_{1}, \dots, y_{d}]}^{'}

generated by the random variable

Y = {[Y_{1}, \dots, Y_{d}]}^{'}

to test the Hypothesis 1, i.e., if its distribution is a multivariate SN distribution. Let

Y^{*} = {[Y_{1}^{*}, \dots, Y_{d}^{*}]}^{'}

denote the canonical form of

Y

. If

H_{0}

is true, from Section 1.1,

Y_{1}^{*}

is univariate skew-normal, and all other variables

Y_{2}^{*}, \dots, Y_{d}^{*}

are distributed as normal random variables. For

i = 2, \dots, d,

the ratios

Y_{i}^{*} / | Y_{1}^{*} |

are distributed as Cauchy random variables, since the absolute value of a SN distribution,

| Y_{1}^{*} |

, is distributed as half-normal [14]. If

H_{1}

is true, the marginal components of

Y^{*}

may not follow the marginal distributions as mentioned in Section 1.1, and Balakrishnan et al. [1] used this property proficiently to construct a test statistic to test the Hypothesis 1.

When

d = 2

,

T_{2} = \frac{Y_{2}^{*}}{| Y_{1}^{*} |}

is a sensible test statistic to test the Hypothesis 1. Under

H_{0}

,

T_{2}

is distributed as a Cauchy random variable; if

H_{1}

is true,

Y_{2}^{*}

is not distributed as a standard normal, and a sample from this distribution will be likely to have larger values in absolute value with respect to a sample from the standard normal distribution. Balakrishnan et al. [1] mentioned that the critical region will be CR

= {| T_{2} | > t^{*}},

where

t^{*}

is the

1 - α

percentile of the Cauchy distribution for a test with significance level

α

. When

d > 2

, Balakrishnan et al. [1] introduced the following test statistic:

\begin{matrix} T & = & max \{\frac{Y_{2}^{*}}{| Y_{1}^{*} |}, \frac{Y_{3}^{*}}{| Y_{1}^{*} |}, \dots, \frac{Y_{d}^{*}}{| Y_{1}^{*} |}\} - min \{\frac{Y_{2}^{*}}{| Y_{1}^{*} |}, \frac{Y_{3}^{*}}{| Y_{1}^{*} |}, \dots, \frac{Y_{d}^{*}}{| Y_{1}^{*} |}\} \\ = & \frac{max {Y_{2}^{*}, Y_{3}^{*}, \dots, Y_{d}^{*}} - min {Y_{2}^{*}, Y_{3}^{*}, \dots, Y_{d}^{*}}}{| Y_{1}^{*} |} . \end{matrix}

(2)

If

H_{0}

is true, T has the distribution of the range of a Cauchy random variable; if

H_{1}

is true, a larger range is expected. Therefore, a realistic critical region is CR

= {T \geq t^{*}}

, where

t^{*}

is the

1 - α

percentile of the distribution of T under

H_{0}

for a test with significance level

α

. Note that both

T_{2}

and T are pivotal quantities. See Balakrishnan et al. [1] for the distribution of T under

H_{0}

that requires to evaluate a double integration numerically.

Their combined tests for a sample of size n follow the following rubric: for each of the n samples, they first calculated their proposed statistics

T_{1}, T_{2}, \dots, T_{n}

and then combined these n statistics in three different ways to get a pooled statistic. Out of these three pooled statistics, the supremum test statistic,

U = max_{n} {T_{1}, T_{2}, \dots, T_{n}}

, can be used to develop an exact test when

ξ

,

Ω

, and

a

are known, since under these conditions, the distribution function of T is known, from which the distribution function

P (U \leq u) = {[P (T \leq u)]}^{n}

. This is the reason we want to study this test statistic in more detail. When these parameters are unknown and n falls below the cut-off values developed here, critical values and p-values must be estimated by simulation or a like method. Finally, Balakrishnan et al. [1] compare their test statistic with the Meintanis and Hlavka test statistic when

d = 2

and conclude that the latter test tends to outperform or be equivalent to their test under the situations considered therein. Notwithstanding, the test statistic developed by Balakrishnan et al. [1] is far less computationally intensive.

To evaluate their three proposed tests, they carried out some Monte Carlo simulations to calculate empirical significance levels (ESLs) when the nominal significance level (NSL)

α

was 0.05. For sample sizes 100, 200, and 300, they simulated 1000 samples from a 3-dimensional SN distribution with fixed

\begin{matrix} ξ = (\begin{matrix} 1 \\ 2 \\ 3 \end{matrix}), Ω = (\begin{matrix} 1 & 1 & 1 \\ 1 & 2.5 & 1 \\ 1 & 1 & 5 \end{matrix}) and a = [\begin{matrix} 1 \\ - 2 \\ 3 \end{matrix}], \end{matrix}

but treated as unknown parameters. This choice of parameters is associated with a moderate skewness, having a Mardia index of skewness [15,16] as 0.5; a sample from this distribution would have

a_{*} \approx 3.3

. Balakrishnan et al. [1] have estimated these parameters by the maximum likelihood (ML) method and computed ESLs of all three pooled tests when the NSL was

α = 0.05

from a sample of 1000. See Table 1 in [1]. This Table 1 gives an idea of the needed sample size in a sample to test the skew-normality. We see from this table a sample size between 100 and 200 is required for the supremum test statistic to achieve the NSL when

d = 3

, and the critical value (CV) is obtained via the approximate distribution function, it being apparent that the NSL is always attained when the critical value is obtained via simulation. In real life, we often get dataset with larger d. However, nothing was mentioned about ESLs for larger d in Balakrishnan et al. (2014), thus motivating our study of the behavior of the supremum test statistic for larger d in this short review. In the process, we verify and extend their table for

d = 3

, but use 50,000 simulations instead of 1000. From our results, it appears that the previous mentioned sample size between 100 and 200 is larger than necessary to attain the requisite NSL.

3. Monte Carlo Simulation Studies

We shall use the component-wise representation to denote a

d \times 1

vector,

a = {(a_{1}, \dots, a_{d})}^{'}

, by

a = (a_{i})

and to denote a

d \times d

matrix,

A

, by

A = (a_{i j})

, where

i, j \in {1, 2, \dots, d}

. In the following tables, we suppose

d = 3, 5, 8, 10

and define the relevant skew-normal direct parameters as

\begin{matrix} \underset{d \times 1}{ξ} & = & (i) \\ \underset{d \times 1}{a} & = & (I (i \leq 3) {(- 1)}^{i + 1} i) \\ \underset{d \times d}{Ω} & = & \{\begin{matrix} (ω_{11}) = 1 \\ (ω_{i i}) = 2.5 (i - 1), & for i = 2, \dots, d \\ (ω_{i j}) = 1, & for i \neq j \end{matrix}, \end{matrix}

where

I (\cdot)

denotes the indicator function.

Table 1 replicates the simulation set-up in [1], changing only the number of Monte Carlo simulations [17] from 1000 to 50,000 and considering more sample sizes. That is, the same parameter values were selected, and MLEs were obtained for each simulated sample using the reliable selm.fit function occurring in Azzalini’s sn package in R. Taking into account the sampling error associated with the calculated empirical sizes, Table 1 agrees with that in [1] for n equal to 200 and 300, but it gravely disagrees with their reported value of 0.095 for the empirical size when

n = 100

. Perhaps this was merely a typographical error on their part or an unfortunate set of 1000 simulations; notwithstanding, our table ameliorates this deficiency, showing close agreement between the nominal and empirical size for a sample size of 100. This being found, the extension of sample sizes considered is shown in order to provide a lower acceptable bound on use of their reported critical value in order to attain the nominal size. Confidence intervals (CIs) of 95% for the corresponding ESLs are also computed in Table 1 to get a better idea on a sample size value. To be conservative, we recommend using

n = 55

as such a lower bound. Table 1 also calculates ESLs and 95% CIs for the corresponding ESLs for NSLs

α = 0.01

to get an idea of the needed sample size to test the multivariate skew-normality. Finally, in situations where this condition is violated, standard bootstrapping procedures can be employed to obtain critical values agreeing with the nominal size.

In addition, tables are also developed when the number of responses d equals 5, 8, and 10. From Table 2, Table 3 and Table 4, it appears that the test achieves the nominal significance level 0.05 when n is around 75 for

d = 5

, when n is around 110 for

d = 8

, and when n is around 135 for

d = 10

. For these values of d, no closed-form solution exists for the CDF of the supremum test statistic. Therefore, one must rely on numerical integration techniques to obtain the requisite critical values. Using a nested call of R’s integrate function, the critical values for various sample sizes n were obtained in order to calculate the empirical size. The R code used is given in the Appendix A.

Remark 1.

As expected, we observe from Table 1, Table 2, Table 3 and Table 4 that the required sample size increases with the increase in the number of response variables d. We also see from Table 2, Table 3 and Table 4 that the critical values or the cut-off values also increase with d. One can easily compute the CV for

d = 3

, as it has the closed-form solution. See Lemma 1 in [1]. Therefore, one must be very cautious in choosing the number of samples and the critical values as otherwise the Type I error rate could be substantially inflated.

We see the calculated value of the statistic T in (2) increases if a variable is added, and thus the calculated value of the supremum test statistic U. Since the critical values also increase with d, a variable should be added only if it adds a significant amount to U.

Remark 2.

It was found that for n suitably large, obtaining the upper 95% and 99% quantiles of the supremum test statistic led to numerical instabilities, notwithstanding fine tuning of the rel.tol parameter. Specifically, it was found that these instabilities occurred in the value of the upper 95% quantile for sample sizes near 265, 205, and 190 when the number of responses d equals 5, 8, and 10, respectively. Fortunately, MATLAB’s integral2 function possesses far greater accuracy than use of the nested base R functions in order to evaluate the requisite double integral.

As expected, we see from Table 1, Table 2, Table 3 and Table 4 the ESL decreases with the sample size n for fixed d. Moreover, the ESL increases as d increases for fixed n. This is true for

α = 0.05

as well as

α = 0.01 .

4. Discussions

Since U possesses a readily calcuable distribution function, one can obtain critical value (

u_{α}

) for the supremum test by using the distribution of T as

P (T \leq u_{α}) = {(1 - α)}^{1 / n}

. However, one must be careful about using the critical values obtained therefrom when

ξ

,

Ω

, and

a

are unknown since this implies that the double integral representation of the distribution function of T is an approximation. From Table 1, Table 2, Table 3 and Table 4, we see that for

d = 3

, a sample of 55 is needed for NSL

α = 0.05,

and for

d = 5, 8,

and 10, samples of sizes 75, 110, and 135 are needed for NSL

α = 0.05

. These samples of sizes are chosen as being the first (or nearly the first) to have a CI that contains the nominal size. Table 1, Table 2, Table 3 and Table 4 also calculate ESLs and 95% CIs for the corresponding ESLs for NSLs

α = 0.01

. We see that one could reduce the sample sizes by 5–15 for each d for NSLs

α = 0.01

. Furthermore, Table 2, Table 3 and Table 4 calculate empirical CVs (ECVs) of the supremum statistic. For

d = 3

, one can easily compute the CV, as it has the closed-form solution as

\sqrt{2} tan (\frac{π}{2} {(1 - α)}^{(1 / n)})

.

5. Conclusions

Basically, this review is about verifying and generalizing the supremum test statistic developed in [1] and reporting some new and notable information about it. In this short review, simulation tests are carried out for various dimensions to examine the influence of the supremum test statistic established by Balakrishnan et al. [1] to test for multivariate skew-normality in terms of empirical size. The number of samples necessary to achieve the nominal significance levels of 0.05 and 0.01 are given as cut-off values. We will extend the supremum test statistic to test the skew-normality of doubly multivariate data with Kronecker product covariance structure and report it in a future correspondence. Recently, Opheim and Roy [18] developed doubly multivariate linear models with matrix-variate skew-normally distributed errors by assuming that the covariance matrix defining the location-scale matrix-variate skew-normal distribution has block compound symmetry structure.

Author Contributions

Conceptualization, A.R.; Realization of the simulations and the evaluation of the results, T.O.; Supervision, A.R.; Writing—original draft preparation, A.R. and T.O.; Writing—review and editing, A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

The authors want to thank the four reviewers for their careful reading, valuable comments, and suggestions that led to a quite improved version of the manuscript. Timothy Opheim was not available to confirm coauthorship, but the corresponding author affirms that Timothy Opheim contributed to the paper and assures for Timothy Opheim’s coauthorship status.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. The R Code

T.integrand = function(x,y,z) { (d-1)*(pnorm(x+z*y)-pnorm(x))^(d-2)

*dnorm(x)*sqrt(2/pi)*exp(-.5*y^2) }

T.CDF = function(z) {

integrate(function(y) {

sapply(y, function(y) {

integrate(function(x) T.integrand(x,y,z), -Inf, Inf,

rel.tol=.Machine$double.eps^.85)$value

})

}, 0, Inf, rel.tol=.Machine$double.eps^.85)$value

}

References

Balakrishnan, N.; Capitanio, A.; Scarpa, B. A test for multivariate skew-normality based on its canonical form. J. Multivar. Anal. 2014, 128, 19–32. [Google Scholar] [CrossRef]
Anderson, T.W. Introduction to Multivariate Statistical Analysis; Wiley & Sons: Hoboken, NJ, USA, 1958. [Google Scholar]
Rao, C.R. Linear Statistical Inference and Its Applications; Wiley: New York, NY, USA, 1965. [Google Scholar]
Mardia, K.V.; Kent, J.T.; Bibby, J.M. Multivariate Analysis; Academic Press Inc.: New York, NY, USA, 1979. [Google Scholar]
Muirhead, R.J. Aspects of Multivariate Statistical Theory; Wiley-InterScience: Hoboken, NJ, USA, 1982. [Google Scholar]
Seber, G.A.E. Multivariate Observations; Wiley: New York, NY, USA, 1984. [Google Scholar]
Jensen, D.R. Multivariate distributions. In Encyclopedia of Statistical Sciences; Kotz, S., Johnson, N.L., Read, C.B., Eds.; Wiley: Hoboken, NJ, USA, 1985; Volume 6, pp. 43–55. [Google Scholar]
Geary, R.C. Testing for normality. Biometrika 1947, 34, 209–242. [Google Scholar] [CrossRef] [Green Version]
Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Statist. 1985, 12, 171–178. [Google Scholar]
Azzalini, A.; Dalla Valle, A. The multivariate skew-normal distribution. Biometrika 1996, 83, 715–726. [Google Scholar] [CrossRef]
Kollo, T. From normality to skewed multivariate distributions: A personal view. In Multivariate, Multilinear and Mixed Linear Models; Filipiak, K., Markiewicz, A., von Rosen, D., Eds.; Contributions to Statistics; Springer: Cham, Switzerland, 2021; pp. 17–40. [Google Scholar] [CrossRef]
Capitanio, A. On the Canonical Form of Scale Mixtures of Skew-Normal Distributions. arXiv 2012, arXiv:1207.0797. [Google Scholar]
Meintanis, S.; Hlavka, Z. Goodness-of-fit tests for bivariate and multivariate skew-normal distributions. Scand. J. Stat. 2010, 37, 701–714. [Google Scholar] [CrossRef]
Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistica 1986, 46, 199–208. [Google Scholar]
Mardia, K.V. Measures of multivariate skewness and kurtosis with applications. Biometrika 1970, 57, 519–530. [Google Scholar] [CrossRef]
Mardia, K.V. Applications of some measures of multivariate skewness and kurtosis intesting normality and robustness studies. Sankhya 1974, 36, 115–128. [Google Scholar]
Rizzo, M.L. Statistical Computing with R, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
Opheim, T.; Roy, A. Score tests for intercept and slope parameters of doubly multivariate linear models with skew-normal errors. J. Stat. Theory Pract. 2021, 15, 30. [Google Scholar] [CrossRef]

Table 1. ESLs with asymptotic 95% CIs of the supremum test statistic for various sample sizes when

d = 3