1. Introduction
Let be a compact group of linear transformations (operators) from to . A Borel probability measure P on is called -symmetric if and only if there exist an affine nonsingular transformation from onto and a -invariant Borel probability measure such that In other words, if X is a random vector with the distribution then there exists an affine nonsingular transformation such that the random vector is -invariant, which in its turn means that for all transformations S from the group . Clearly, an affine transformation is not unique, as one could take instead for an arbitrary transformation S from the group . But, on the other hand, one could always fix by normalizing it in any proper way.
Obviously, we can define as the distribution of the random variable . The couple will be called parameters (specifiers) of the -symmetric probability measure We denote by the set of all -symmetric distributions on
We are interested in the following problem. Given an independent identically distributed (i.i.d.) sample from the distribution defined on the probability space , construct and study tests for -symmetry of the distribution P.
Our group approach will unify the theory for different tests for different types of symmetry. Below we give some examples of common in literature types of symmetry with corresponding choices of .
Example 1. Let . Then -invariant probability measure is a so called sign change symmetrical measure. Probability measure P is -symmetric if there exists an affine transformation in the form with some such that . In this case, P is often called diagonally or reflectively symmetrical measure.
Let denote the Euclidean norm of a vector . If one can define . Obviously, the parameter and the transformation are uniquely defined for any P such that the corresponding X is integrable (not necessarily -symmetric). We can also define in such a generality as the distribution of the random variable .
Example 2. Let be the group of all orthogonal transformations in . Then -symmetrical probability measure P is often called ellipsoidally symmetric or elliptically symmetric or elliptically contoured measure and is spherically symmetric one. In this case, if one can define as the mean and as the square root of the covariance operator of so that for any x from . Define as the distribution of the random variable . As in the previous case, these parameters are defined for any P such that the corresponding X is square-integrable (not necessarily -symmetric).
We refer to the papers [
1,
2,
3,
4,
5,
6,
7,
8,
9] for results on ellipsoidal and spherical symmetry testing.
Example 3. Let and let be a group of all transformations translating a regular polygon with k vertices centered at 0 into itself. Clearly, is a subgroup of the group of all orthogonal transformations. Thus, an affine transformation can be fixed in the same way as in example 2.
Example 4. Let be a group of all reflections about hyperplanes . Then for each there exists a permutation such that for all . In this case -invariant probability measure is called permutation symmetric measure.
As in examples 2 and 3 we can define
for any
x from
, where
and
is the square root of the covariance operator of
provided
(also see [
10,
11]).
Tests for symmetry of a multivariate distribution play an important role in statistics and in various fields of science. To name a few, in finance theory log-returns of assets are assumed to be ellipsoidally symmetric. In genetics it is assumed that gene expression values are diagonally symmetrically distributed. In image analysis components are assumed to be spherically symmetric. In linear programming it is assumed that the distribution of feasible solutions is permutation symmetric. In statistics sliced inverse regression method due to Li, see [
12], works for ellipsoidally symmetric distributions. Also since tests for normality are extended to tests for ellipsoidal symmetry, any research field that employs multivariate analysis based on normality assumption can benefit from relaxing this assumption to ellipsoidal symmetry assumption. So clearly symmetry tests are needed in applications. See [
13] for a detailed survey on the use of symmetry in various scientific fields.
The rest of the paper is organized in the following way. We give notations and construct test statistics with examples in
Section 2. Main results and bootstrapped test statistics are given in
Section 3, followed by a detailed example in
Section 4. The proofs are in
Section 5. The closing remarks are in
Section 6, followed by technical details in Appendix.
2. Notations and Preliminaries
Let
m denote the uniform distribution (the normalized Haar measure) on the group
Given a bounded Borel function
f on
we define
It is easy to check that for a
-symmetric
P distribution with specifiers
we have
for any bounded Borel function
f. Indeed,
for any
. Since
is a compact group, one can integrate the equality over
with respect to the uniform measure. Thus,
implies (
2).
As a result, if a class
characterizes the distribution, i.e.
implies that
then
P is
-symmetric if and only if (
2) holds for all
In general, we call
P a
-asymmetric distribution if and only if there exists a function
such that (
2) does not hold. This observation is the key idea behind the tests that we construct and study. Naturally, a class
should be rich enough and possess good properties for further analysis. Let us describe it. Let
be a
semialgebraic subgraph class as introduced in [
7]. Basically, for a function from such a class, its subgraph can be constructed from a union of intersections of a finite number of subgraphs of polynomials of a finite degree in
. The same should be true for a product of two functions from such a class. For instance, one can use polynomials of bounded degree or trigonometric functions of bounded frequency as
. The precise definition can be found in Appendix. We will provide a few examples later on.
Let
be the empirical distribution based on the sample
Assume in what follows that
is a
-consistent estimator of
. And, furthermore, there exists such a function
that
and as
For example, assuming
let us define
and
as
and
where all vectors are columns and superscript
T denotes transposition. Then one can define
in examples 2-4 and
in example 1 for any
. Under the condition
is a
-consistent estimator of
. Weaker moment assumptions on
P can be imposed if other statistics are considered for estimation of
, such as a sample median and an
M-estimator for the covariance matrix. See, e.g., [
14].
The scaled residuals of the observations
are defined as
Let
denote the empirical distribution based on the sample
Our approach to the problem of testing for
-symmetry will be to use the sup-norms of the stochastic process
as test statistics
Such functionals can be viewed as “measures of asymmetry” of the empirical distribution because of the relationship (
2).
Note that a nonsingular affine transformation of the data results in an orthogonal transformation of the scaled residuals. If a class is invariant with respect to all orthogonal transformations (i.e. for all and any orthogonal transformation O we have ), then the test statistic defined as the sup-norm of the process is affine invariant. This is the case in the following examples.
Example 1.1. Consider
from example 1. Let
be the class of all half-spaces in
, where
denotes the unit sphere in
. Consider the class
For
, we have
The process
becomes
The test statistic is represented as
Example 1.2. Consider
from example 1. Let
be the class of all half-spaces in
as in the previous example. Let
for
with
. Then we have
. The process
becomes
and the test statistic looks like
In one-dimensional case
this test statistic becomes
One gets the expression that resembles a well known test for symmetry based on the empirical distribution function
,
See, for instance the discussion in the paper [
15].
Example 2.1. Consider
from example 2. Let
be the class of “caps” on the unit sphere
Consider the class
For
we have
The process
becomes now
The test statistic
can also be represented as
where
is the rearrangement of
such that
These tests were studied in [
7] and [
9].
Example 2.2. Consider
from example 2. Let
denote the linear space of spherical harmonics of degree less than or equal to
l in
and let
be the unit ball in
Denote
Then for
we have
and
, where
is the average of
on
In this case, the process
becomes
The statistic
becomes
where
denotes an orthonormal basis of the space
for
and 0 otherwise, and
of a set denotes the number of elements of the set. These tests were studied in [
7] and [
9], where their superiority in level preservation and power performance over other tests both theoretically and in a simulation study, was shown. A similar approach was used to test for multivariate normality in [
16]. The authors of [
6] developed a different kind of tests for ellipsoidal symmetry based on spherical harmonics.
Example 2.3. Consider
from example 2. Let
be the class of all half-spaces in
as in the example 1.1. For
where
we have
where
The process
in this case is
and the test statistic can be defined as
This type of test statistics was systematically studied in papers [
7,
9,
10].
Example 2.4. Consider
from example 2. Let
For
we have
where
Thus, the process
becomes
and the test statistic can be chosen as
Example 2.5. Consider
from example 2. Consider the class
For
we have
, where
denotes the Bessel function of the
l-th order, the constant
depends only on
d. The process
becomes
and the test statistic can be chosen as
Consider from example 3. Due to similarity between examples 2 and 3, one can choose the same classes of functions for from example 3. We give just one of the examples as an illustration.
Example 3.1. Consider the class
from example 2.1. Then for
we have
for all
, where
is the rotation on angle
,
. In this case, the process
is
The test statistic
can be also represented as
where
is the rearrangement of
such that
Example 4.1. Consider
from example 4. Let
be the class of all half-spaces in
as in the example 1.1. Denote
Then for
such that
for
we have
where the summation is over all permutations
of
. In this case the process
is
and the test statistic can be defined as
where the last supremum is taken over all combinations
out of
. Well known and frequently used Friedman’s rank tests are based on the similar choice of a class
. For reference see the papers [
17] and [
11].
It is not hard to see that the function classes defined in examples 1.1, 1.2, 2.1–2.4, 3.1, 3.2, and 4.1 are semialgebraic subgraph. In addition, classes characterize the distribution in the case of examples 1.1, 2.1, 2.3, 2.4, 3.1, 4.1 above.
We say the class of transformations preserves the semialgebraic property if for any polynomial p on of degree less than or equal to r the set belongs to for some q and l (see Appendix for the definition). Classes , defined in examples 1–4, preserve the semialgebraic property.
Let
It follows from (
2) that, for a
-symmetric distribution
P and for all
fLet
denote the
P-Brownian bridge, i.e. a centered Gaussian process indexed by functions in
with the covariance
We will frequently use integral notation for
As always,
denotes the space of all uniformly bounded functions on
with the sup-norm
A sequence of stochastic processes
is said to converge weakly in
(in the sense of Hoffmann-Jørgensen) to the stochastic process
if and only if there exists a Radon probability measure
on
such that
is the distribution of
and, for all bounded and
-continuous functionals
we have
where
stands for the outer expectation, which is defined as
for a
. See for instance [
18].
We assume in what follows that the class
satisfies standard measurability assumptions used in the theory of empirical processes (see [
19] or [
18]). We also need smoothness conditions (S) on
P and
, which are given in Appendix.
3. Main Results
Theorem 1 Suppose that is a semialgebraic subgraph class, the smoothness conditions (S) hold and Define a Gaussian stochastic processwhose distribution is a Radon measure in . Then the sequence of stochastic processesconverges weakly in the space to the process In particular, if P is -symmetric with specifiers then the sequence converges weakly in the space to the process Define the test statistics
Given
let
Let
be the hypothesis that
and let
be the alternative that
Also, denote by
the alternative that
P is
-asymmetric.
Theorem 1 and the well-known theorem of Cirel’son on continuity of the distribution of the sup-norm of Gaussian processes, see [
20], imply the following.
Corollary 1 Suppose all conditions of Theorem 1 hold. Under the hypothesis and under the alternative In particular, if characterizes the distribution, then under the alternative , i.e. for a fixed -asymmetric distribution P, In most cases, however, the limit distributions of such statistics as
depend on the unknown parameters of the distribution
Thus, to implement the test one has to evaluate the distribution of the test statistic using, for instance, a bootstrap method. We describe below a version of the conditional bootstrap for
-symmetry testing. It is a generalization of the bootstrap method proposed in [
7].
Given
let
denote the
-symmetric distribution with specifiers
It will be called
the -symmetrization of
Denote by
the
-symmetric distribution with specifiers
Let
, …,
be an i.i.d. sample from the distribution
defined on a probability space
One can construct such a sample using the following procedure. Take an i.i.d. sample
from
, which is a resampling from
. Define
Then conditionally on
,
is an i.i.d. sample from the
-symmetric distribution
In particular, for
from example 1
where
is a Rademacher i.i.d. sample, that is
with probability 1/2,
independent of
.
For
from example 2 one can take an i.i.d. sample
uniformly distributed on
and an i.i.d. sample
from
, the empirical distribution based on
, independent of
In other words,
is the resampling from the sample
. Then
For
from example 3 let
be an i.i.d. sample uniformly distributed on
independent of
, then
where
is a rotation on the angle
about 0.
Finally, for
from example 4 consider
n independent permutations
of
,
independent of
. Then
where
is a reflection transformation such that
for
.
Let
denote the empirical measure based on the sample
and let
Define
the bootstrapped scaled residuals as
Let
denote the empirical distribution based on the sample
The bootstrap version of
is the process
Let
denote the set of all functionals
such that
for all
and
for all
. Given two stochastic processes
we define the following bounded Lipschitz distance:
where
denotes the outer expectation.
Now we are going to consider a bootstrap version of Theorem 1.
Theorem 2 Suppose that is a semialgebraic subgraph class, the smoothness conditions (S) hold and Then the sequence of stochastic processes converges weakly in the space to a version of the process (defined on the probability space ) in probability More precisely,In particular, if P is -symmetric, converges weakly to a version of the process Define test statistics
Given
let
In other words,
is a
-quantile of the distribution of
conditional on the sample
.
Then Theorems 1 and 2 imply the following.
Corollary 2 Suppose all the conditions of Theorems 1 and 2 hold. Under the hypothesis and under the alternative In particular, if characterizes the distribution, the bootstrap test is consistent against any asymmetric alternative (subject to the smoothness conditions (S)): under the alternative Thus, our method provides tests that are consistent against any -asymmetric alternative.
4. Detailed example
In this section we provide an example for which we verify all the assumptions and supply a step-by-step computational algorithm. Let
and consider the problem of testing whether
P is elliptically contoured measure (Example 2). For a vector
let
be its polar coordinates. For a fixed integer
l let
where
denotes the linear span of
with all the functions bounded by 1, see Example 2.2. This class
satisfies the following assumptions.
1. It characterizes distribution only for
. For a finite
l it does not characterize the distribution, since one might find two different distributions
such that
and
for all
.
2.
is a semialgebraic subgraph class. Indeed, for
the sets
can be represented as unions of finite number of intersections of polynomial sets of finite degree. For instance, for
we have have the following representation
The representations for any
can be obtained similarly using trigonometric identities. Obviously, similar arguments work for sines, linear combinations of sines and cosines and products of any two functions from
.
3. is invariant with respect to all orthogonal transformations, which are rotations on the unit circle. Indeed, for any rotation on an angle a vector is transformed into the vector . So for any we have or , where both functions belong to the linear span of and are bounded by 1. So any linear combination of such functions, which is bounded by 1, would also lie in .
4. Condition (S2) holds. Indeed, is uniformly bounded by 1. For any and any we have for some constant , so that the measure of the set defined in (S2) is zero for .
Also note that the group
of all orthonormal transformations of
preserves semialgebraic property. Indeed, for any rotation on an angle
the sets
are semialgebraic. The same holds for sines and linear combinations of sines and cosines.
We also require the following two conditions on P: (S1) holds and . The first condition is satisfied for absolutely continuous P with the uniformly bounded and continuously differentiable Lebesgue density with the corresponding derivative approaching zero at infinity faster than , . For example, distributions with densities on a finite support and normal distributions satisfy (S1). The last condition is satisfied if .
Given a random sample from a distribution P let us describe a step-by-step testing algorithm.
1. Obtain . In our example is the sample mean and is the square root of the sample covariance for the sample .
2. Calculate residuals .
3. Find the test statistics, which can be simplified as follows
where
is the polar coordinate of
.
4. Choose a number of bootstrap repetitions, say M. On practice we often take a large number, for instance . Then the next four steps are repeated M times.
4.1. Generate a sample . In this example, first, generate a sample from a uniform distribution on the unit circle, independent of . Secondly, resample with replacement from to obtain . Thirdly, .
4.2. Obtain . In our example is the sample mean and is the square root of the sample covariance for the sample .
4.3. Calculate residuals .
4.4. Find the bootstrapped test statistics, which can be simplified as follows
where
is the polar coordinate of
.
5. Based on find the empirical -quantile of the distribution of , conditional on . Let us denote it as .
6. If then reject is elliptically contoured distribution, at the significance level .
5. Proofs
We use ideas and methods of the work [
7]. Their technique was developed for ellipsoidal symmetry and is needed to be adjusted for group symmetry. Basically one should change
to
throughout the proofs. However, there are technical difficulties associated with using transformations
A instead of
, they are hidden in the proofs of lemmas. We give a few details for completeness.
Let
denote a subset of all nonsingular linear transformations in
Given a transformation
denote
For a function
f on
let
Given a class
of functions on
, define
Now the process
is represented as
and the process
as
Clearly,
Define
Given a function
g on
we can write
where
A similar computation shows that
Let
Given a class
of functions, define
We reformulate the following versions of lemmas from [
7] that describe smoothness properties of the functions introduced above and Donsker properties of the classes of functions given above. The convergence of transformations is with respect to the operator norm on the set of all linear transformations. Smoothness conditions are used in the proof of Lemma 1. Properties of Vapnik-Chervonenkis subgraph classes are used in the proof of Lemma 2. See [
18] for details on Vapnik-Chervonenkis, Glivenko-Cantelli, and Donsker classes of functions.
Lemma 1 Suppose that P and satisfy the smoothness conditions (S). Then the following statements hold:
(C2) The function is differentiable at the point for any , and the Taylor expansion of the first orderholds uniformly in . (C3) Similarly, the function is differentiable at the point for any , and the Taylor expansion of the first orderholds uniformly in . (C4) The function is continuous with respect to A at uniformly in
(C5) The function is differentiable at the point for any and, moreover, the Taylor expansion of the first orderholds uniformly in Moreover, the matrix-valued function is continuous at uniformly in (C6) if then for all and Lemma 2 For a uniformly bounded semialgebraic subgraph class the classes and are uniformly Donsker and uniformly Glivenko–Cantelli.
Proof of Theorem 1. Define a process
(C1) and
being a
P-Donsker class by Lemma 2 imply that we can use asymptotic equicontinuity to obtain
for all
. Clearly,
If
we have
Note that
Using (
5) and
-consistency of
we obtain
as
uniformly in
. It follows from (C1) and
-consistency of
that
uniformly in
. Representations (
6) and (
7), the fact that
is a uniformly Donsker class from Lemma 2 and (C1) imply that the sequence
converges weakly in the space
to the Gaussian stochastic process
This implies the first statement of the theorem. If
P is ellipsoidally symmetric then
, which concludes the proof of Theorem 1.
Proof of Theorem 2. Define a process
By Lemma 2, the class
is uniformly Glivenko–Cantelli. This together with (C4) and representations (
3), (
4) implies that
uniformly in
. Similarly, since the class
is uniformly Glivenko–Cantelli, by (C5) and representations (
3), (
4), we obtain
uniformly in
.
Since
is a uniformly Donsker class, we can use Corollary 2.7 in [
21] to prove that a.s.
converges weakly in the space
to the same limit as
, where
is the empirical measure based on a sample from
, i.e. to the
-Brownian bridge
Asymptotic equicontinuity and (C1) yield that for all
a.s.
Define
Since
we can write
If
we have
Note that
Using (
8), (
9) and standard asymptotic properties of the estimators
we obtain
as
uniformly in
. Here and in what follows the remainder term
converges to 0 as
uniformly in
in probability
.
Applying the asymptotic equicontinuity condition to the process
and using (C1), we obtain
as
Now we can write
Note that by (
3) and (
4)
Since, by Lemma 2,
is uniformly Donsker class and since (C5) and (C6) hold, it is easy to prove the weak convergence of the processes
in the space
where
is a ball in
with the center
Using the asymptotic equicontinuity and (C6), we obtain
It follows from (C3) and standard asymptotic properties of the estimators
that
uniformly in
. Relationships (10)–(14) along with, again, Corollary 2.7 in [
21], imply the statement of the theorem.
6. Conclusion
We propose and study a general class of tests for group symmetry, which encompasses different types of symmetry, such as ellipsoidal and permutation symmetries. Our approach is based on supremum norms of special empirical processes combined with bootstrap.
There are several advantages to our methodology. First, the test statistics are indexed by classes of functions that are rich enough and still relatively simple to use. This provides some flexibility in choosing a suitable class of functions, thereby giving an appropriate test. Secondly, these tests are consistent against all possible asymmetric alternatives. Thirdly, they enjoy the property of affine invariance. Fourth, these are bootstrap tests, which could be considered as a drawback but it is a way to deal with complex nature of asymptotic null distribution of a non-bootstrap semiparametric test, and these tests have good theoretical properties. Fifth, this approach gathers separate ideas and methods developed for various types of symmetry under one umbrella. It provides a unified theory for studying statistical properties of seemingly different tests for different types of symmetry.
7. Appendix
Definition of a semialgebraic set. For any polynomial p on of degree less than or equal to r we will call a polynomial set of degree less than or equal to r in . Let denote the class of all polynomial sets in of degree less than or equal to r. Then any set from the union is called a semialgebraic set of degree less than or equal to r and order less than or equal to l, being the minimal set algebra generated by . Let denote the class of all semialgebraic sets of degree less than or equal to r and order less than or equal to l in .
A class of functions on is a semialgebraic subgraph class if and only if for some for all functions g from the set belongs to and for all functions from the set belongs to
Conditions on P and . We also introduce the following smoothness conditions on the distribution P and the class
(S1)
P is absolutely continuous with a uniformly bounded and continuously differentiable density
p such that for some
where
denotes the derivative of the density
p.
(S2) The class
is uniformly bounded and for all
and
Here
denotes Lebesgue measure in
and
The classes characterize the distribution. The classes
characterize the distribution in the case of the examples 1.1, 2.1, 2.3, 2.4, 3.1, 4.1 above. Indeed, this is a well-known property of the classes used in examples 1.1, 2.3 and 4.1. As to the Example 2.4, we refer, e.g., to the paper [
22] for similar statements. To prove that this is the case in Example 2.1 (and in Example 3.1, similarly), consider the map
Since this map is a Borel isomorphism (even a homeomorphism), it suffices to show that for any two finite measures
in
the condition
implies
We will prove that, in fact, for any two finite measures
on
the condition
where
is the class of all half-spaces in
implies that
(the previous statement then follows, since one can consider two measures in
both supported in
). The condition
is equivalent to the following one
for all
Using a standard approximation of Borel functions by simple functions, we extend this to the equality
that holds for all bounded Borel functions
If we set
and
we obtain that the characteristic functions of
P and
Q are equal, which implies that