1. Introduction
Goodness-of-fit tests are crucial in statistical inference. They allow researchers to assess whether a proposed distribution accurately reflects the observed data. These fundamental goodness-of-fit problems have had a long history. The Cramér–von Mises test [
1,
2] provides a useful criterion for continuous distributions. For independent and identically distributed random variables (
) from a distribution with a cumulative distribution function (
), to test
where
is pre-specified, the Cramér–von Mises test statistic uses a squared distance between the empirical distribution function (
) and the given distribution (
):
which is distribution-free if
is continuous [
3]. With the transformed random variables (
) taking values in the range of
, one can equivalently test whether
follow the uniform
distribution. For
, the Cramér–von Mises statistic can also be written as
It has been well established that the empirical process of
converges weakly to a Gaussian process with covariance function
under the null hypothesis. As a consequence, it holds that
where
and
’s are independent
random variables (see Chapter 5 in [
4]). Some variants of (
2) with the incorporation of different weight functions have been further discussed (see, for example, [
5,
6], among others).
The information era has witnessed an explosion in the collection of high-dimensional data across a wide range of areas, where the number of dimensions can be comparable to or even much larger than the sample size. Model fitting and distribution verification represent a basic step in statistical inference of high-dimensional data [
7,
8]. Some progress has been made in the testing of high-dimensional multinomials for discrete data (see, for example, [
9,
10]). For continuous distributions, the goodness-of-fit test of distributions is much less understood in the high-dimensional case (see Zhang and Wu [
11] for a Kolmogorov–Smirnov-type test and Liang et al. [
12] for the special case of testing for high-dimensional normality).
In multivariate cases with fixed dimensions, Cramér–von Mises tests have also been investigated in a considerable body of literature. Rosenblatt [
13] considered the transformation of writing the joint distribution into products of conditional distributions to facilitate Kolmogorov–Smirnov and Cramér–von Mises tests. The Cramér–von Mises test for independence has been studied by Blum et al. [
14], Cotterill and Csörgö [
15], Cotterill et al. [
16], and Genest and Rémillard [
17], Genest et al. [
18], among others.
However, technical challenges arise when the dimension (
d) is large compared to the sample size (
n). Specifically, Portnoy [
19] showed that the multivariate central limit theorem is generally not valid when
d is large such that
. Heuristic argument and theory are inadequate in the high-dimensional case. Nevertheless, interestingly, we will show that a result similar to (
3) can still be valid by using new technical tools to study the distribution of quadratic functions of high-dimensional stochastic processes.
An important direction in fitting and testing multivariate distributions is based on copula modeling, which has gained remarkable popularity in the last decade. Inspired by Sklar’s representation [
20], one can decompose inference of the multivariate distribution into the modeling of marginal distributions and the copula (see Genest and Nešlehová [
21], Joe [
22] for more on copulas). A variety of tests have been developed to test the dependence structures by given copulas (see [
23] for a survey and implementation). The majority of these tests focus on cases with a small number of dimensions. Hering and Hofert [
24] considered testing Archimedean copulas in a high-dimensional setting. While copula modeling has receiving increasing attention, a simultaneous goodness-of-fit test on the margins in the high-dimensional setting remains unsolved. In this paper, we shall provide another solution to the goodness-of-fit problem by introducing Cramér–von Mises-type criteria, which can be used to test for marginal distributions and copulas for multivariate continuous data, allowing the dimension (
d) to grow with the sample size (
n). Our focus is on the marginal tests.
One primary goal of the paper is to rigorously establish an asymptotic theory of the Cramér–von Mises-type test statistics for marginal distributions and copulas when the dimension is large. We show that the limiting distribution of the Cramér–von Mises-type test statistics can be written as a sum of weighted random variables, where the weights depend on the eigenvalues of the covariance function of the high-dimensional linear operators. In our asymptotic relation, we let , and we can allow for a bounded dimension (d) and an increasing dimension ().
As another major contribution, two different procedures are proposed to estimate the limiting distribution of the proposed Cramér–von Mises test statistic: a plug-in calibration method and a subsampling method. The former estimates the eigenvalues of the covariance function to approximate the distribution of a linear combination of chi-squared distributions, the validity of which is guaranteed under normalized consistency, a new matrix convergence criterion. The latter can avoid estimating the eigenvalues by drawing inference from a large number of subsamples of the data.
This paper is organized as follows. In
Section 2, we introduce the modified Cramér–von Mises statistics to test marginal distributions and copulas and develop the distributional approximation theory. In
Section 3, we introduce two different procedures to implement a high-dimensional Cramér–von Mises test in practice and provide theoretical justification of the validity of both methods. In
Section 4.1, we conduct a simulation study to evaluate the finite sample performance of the two approaches proposed in
Section 3. All the proofs are provided in
Appendix A.
We introduce some notations. For any real-valued square matrix (), let be the trace of A, be the spectral norm, and be the Frobenius norm. For a random variable (X) and some constant (), we define . If is a sequence of random variables and is a sequence of values such that as , and we write . If, for any , there exist finite and such that for any , we write . For two sequences of positive numbers ( and ), we write if there exist two positive constants () such that for all large n values.
2. High-Dimensional Cramér–von Mises Test
In this section, we address the Cramér–von Mises-type test for high-dimensional data from a jointly continuous distribution. We first consider the testing of marginal distributions with and without a loss-of-generality test (uniform [0, 1]) for each marginal distribution. Let
and
be independent and identically distributed random vectors of dimension
. We aim to test
Here, we can allow any dependence structure on
d components of
. To account for the dimensionality, our Cramér–von Mises-type test statistic is defined by
Note that for
,
is a modified version of (
2) with the diagonal elements removed. The computation of
can be significantly simplified in view of the fact that
Let
. For
, let
and the covariance function be
. We define the linear operator (
) as
According to Mercer’s theorem [
25], there exist countably many eigenvalues (
) and eigenfunctions (
,
) such that
. We define
Theorem 1. Assume that in (4) is true. Let and be eigenvalues of the linear operator () in (6). We define andwhere represents independent random variables (). Assume that for some , Then, there exists some constant () such that Remark 1. In the special case of , it holds that when is true; then, condition (7) trivially holds. In this case, it holds that the covariance function (, ) and the linear operator () have eigenvalues of and eigenfunctions of , (see Chapter 5 in [4] for arguments). In this case, our result (8) agrees with the classical Cramér–von Mises criterion. Remark 2. We now discuss the condition of the above theorem in multivariate scenarios and the convergence rate in the result (8). If components are independent and identically distributed, then we can obtain by applying Rosenthal’s inequality [26]. In this case, elementary calculations imply ; then, (7) holds for an arbitrary as long as , which is the minimal condition needed for n. The convergence rate in (8) is , which is sharper with a larger δ. More generally, if is a stable stationary process in that its functional dependence measures are summable [27], then we also have , and (7) holds as , noting that . If are strongly dependent such that , since , the minimal condition () also suffices for (7). We can obtain the rate (). A larger value of δ is still favored to sharpen the rate in view of . More generally, allowing any dependence structure on , is the restriction we should impose on the orders of d and n for the validity of the theorem. As a special case, if we further assume that d and n can satisfy , where is a constant in terms of δ, then the rate is , an order depending on how changes with δ. Remark 3. Theorem 1 above establishes a distributional approximation result for the diagonal-removed statistic (). We now consider adding the diagonals back to investigate the full version,where for , and with , in view of the fact that for , as given in (6), . It can be determined that under As a consequence, the distributional approximation result, i.e.,holds under (7). In this case, the limiting distribution of is a sum of weighted random variables. Therefore, in the low-dimensional scenario where d is fixed and is satisfied, it becomes asymptotically equivalent to use either the diagonal-removed statistic () or the diagonal-included statistic (). Both statistics have simple closed forms, allowing for straightforward computation based on for and , and share a similar type of limiting distribution, namely a linear combination of chi-squared random variables. On the other hand, in the high-dimensional setting where d is allowed to grow with n, the quantity of may be on the same order or much larger than n. In this case, according to the central limit theorem, if the Lindeberg condition holds for . Elementary calculation implies . If we assume , then . We can obtain the normal convergence as follows: Thus, (9) and (10) illustrate an interesting dichotomy in high-dimensional settings: the statistic , with diagonals included, can exhibit two different asymptotic distributions depending on the magnitudes of n and . This divergence in asymptotic behavior across different regimes presents practical challenges in determining the appropriate relationship between n and and, consequently, which type of asymptotic distribution should be applied. Relying on a subjective choice of asymptotic distribution could lead to unreliable conclusions. In contrast, using the diagonal-removed statistic () offers the advantage of avoiding this dichotomy. Testing for copulas can be addressed in a similar way. Assume that
has a jointly continuous copula (
), with each marginal distribution being uniform [0, 1]. We test
where
is specified. The modified Cramér–von Mises test statistic then becomes
Let
. For
and
, let
and let the covariance function be
. The associated linear operator (
) can be given by
Corollary 1 below provides the result for analogously.
Corollary 1. Assume in (11) is true. Let and , be eigenvalues of the linear operator (). We define andwhere represents independent random variables. Assume that for some , As in the test of marginals, the asymptotic distribution of
also depends on
. As a specific application, in many cases, we are interested in testing for independence. If the distribution (
; for example, a joint normal distribution with a known mean and covariance matrix) can suggest explicit conditional distributions given some components, we can first apply Rosenblatt transformation [
13] to the data, then equivalently test whether the transformed sample is from a population uniformly distributed on the
d-dimensional hypercube. Thus the problem can be equivalently transformed to test
A sequence of work has been devoted to the testing of independence (
14) in classical settings with a fixed
d. The Cramér–von Mises test has been investigated by Blum et al. [
14], Cotterill and Csörgö [
15], Cotterill et al. [
16], and Genest and Rémillard [
17], among others. Our result in Corollary 1 made a theoretical contribution in the high-dimensional case, in which
d can increase with
n.
We acknowledge that known marginals and copulas in high-dimensional scenarios may not be readily accessible in many practical applications. Our primary objective is to provide a theoretical investigation, and the results presented in this paper serve as a crucial initial advancement toward investigating more realistic settings, for instance, distributions with estimated parameters.
3. Estimating Distributions of Linear Combinations of Chi-Squares
In
Section 2, we presented the asymptotic results of Theorem 1 and Corollary 1. If the eigenvalues (
) were known, then we could derive the distribution of
W either analytically or numerically via Monte Carlo simulation. With a significance level (
) such as
, take the critical value as
, the
th quantile of
W. We reject
if the value of
exceeds
. Similar steps apply for
. Practically, the eigenvalues are unknown. We can use (
8) or (
13) for high-dimensional Cramér–von Mises testing by estimating the distribution of
W (resp.
), which is a linear combination of chi-squared distributions, and compute the threshold value, i.e.,
(resp.
). A natural calibration involves estimating the eigenvalues, and plugging them in to the distribution of
W (resp.
). For the special case when
, we can work out the eigenvalues as
. In most cases, however, it is highly nontrivial to analytically compute the covariance function (
) and the associated
values. To this end, we propose procedures to implement the high-dimensional Cramér–von Mises test in practice. We introduce two different approaches. In
Section 3.1, we propose a plug-in calibration approach, the validity of which is theoretically justified in
Section 3.2. In
Section 3.3, we introduce a subsampling approach with its theoretical guarantee. In this context, we estimate the distribution of
W. The case of
can be dealt with in a similar way.
Table 1 provides a summary of the notations commonly used in this section.
3.1. A Plug-In Procedure
Let
N be a sufficiently large integer and
. We first apply the discretization technique to the interval
by taking evenly spaced points (
) for
. Let
, and let the
p-dimensional vector be
, where we recall that
, with
. The covariance matrix is denoted as
, with eigenvalues of
and
. Note that
is a discretized version of the linear operator (
) given in (
6) and that the eigenvalues of
converge to those of
as
(cf. [
25]). Recall that
and
. The plug-in calibration approach consists of the three steps listed below.
Step 1: Find a good estimate of
based on the data (
) and denote it by
, with eigenvalues of
. Let
and
where
represents i.i.d.
random variables that are independent of
.
Step 2: Given , obtain the th quantile of , , either analytically or by extensive simulation.
Step 3: Based on , compute , which serves as an estimate for . Then, use as an estimate for the critical value of W at a significance level of .
To ensure the validity of the above procedure, we need to impose suitable conditions so that the following requirements are met:
- (i)
The estimated quantile () is close to the -th quantile of ;
- (ii)
The ratio consistency is in probability.
As discussed in
Section 3.2, (i) requires that
N be sufficiently large and that
be a
normalized consistent estimate of
, which asserts that
(see Definition 1). In
Section 3.2 we also discuss its relation with the classical spectral norm convergence (
). In this paper, we verify that the sample covariance matrix, i.e.,
can be a good choice of
in Step 1 and illustrate the validity of the procedure using
in Theorem 2. The eigenvalues of
are denoted by
. We can define
,
, and
accordingly.
Theorem 2. Assume that . Under in (4) and as , it holds that in probability, and with probability converging to 1,where denotes the conditional probability given . Remark 4. To deal with , we discretize the hypercube of . Let N be a sufficiently large number and . For , is a discretized version of the linear operator (). Using similar arguments as Theorem 2 and replacing the condition with , we can verify the validity of the procedure when it applies to .
Although Theorem 2 provides a theoretical guarantee for the plug-in method, practical implementation remains challenging due to the ultra-high dimension of the discretized sample covariance matrix (
). In particular,
is of size
, with
for the marginal testing problem. Directly estimating such a large number of eigenvalues involves a significant computational workload. To address this, an equivalent approach—Gaussian multiplier bootstrap—can be employed for computational efficiency. Recall that
and
, where
, with
. Also recall
. Let
B be a sufficiently large number and let
be i.i.d.
. Let
with
Then, it follows that
. Thus, given the data (
),
is distributed identically to the bootstrapped statistic (
). Therefore, we can approximate the distribution of
W according to the empirical distribution of
.
denotes the empirical
-th quantile. Then,
is rejected if
. Below in Algorithm 1,we provide the pseudo-code for the implementation of the plug-in calibration approach for the marginal testing problem. It can easily be adapted to test for joint distributions. In the implementation, we recommend taking the values of
N and
B as large integers, such as 1000.
Algorithm 1: Cut-off value approximation by plug-in calibration |
![Mathematics 12 03467 i001]() |
3.2. Validity of the Plug-In Procedure
The idea of plug-in calibration is to approximate the distribution of
according to that of
defined in (
15). To evaluate the validity of the distribution approximation, we first state a useful lemma concerning the closeness of two general linear combinations of chi-squared distributions.
Lemma 1. For , let and be two sequences of real numbers satisfying and . Assume as . Let be i.i.d. random variables, and let and . Then, there exists some constant () such that Similar to (
15), we define
where
and
are eigenvalues of
and
, as defined in
Section 3.1.
According to Lemma 1, with
and plugging in
and
with
and
, respectively, we have
where
for
. Note that the selection of
N is arbitrary and is independent of
d. We have
for any
d (cf. [
25]). Similarly, we also have
as
. By choosing a sufficiently large
N, the two requirements ((i) and (ii)) described in
Section 3.1 can be induced to the following:
- (i’)
The estimated quantile () is close to , the th quantile of V;
- (ii’)
The ratio consistency is in probability.
For theoretical justification, we adopt the following more general setting in this subsection. Let
and
be independent and identically distributed
p-dimensional random vectors with a mean of
and covariance matrix of
with eigenvalues of
and
. Let
be an estimate of
with eigenvalues of
and
. We employ the plug-in procedure described in
Section 3.1 to approximate the distribution of
V. According to Lemma 1, if
then with the probability converging to 1, we have
where
denotes the conditional probability given
. With (
21), requirement (i’) is met.
Next, we verify the condition (
20). Interestingly, there is a simple sufficient condition for (
20). According to Weyl’s theorem (cf. Theorem 8.1.5 in Golub and Van Loan [
28]), (
20) follows from
We say an estimator of a matrix is normalized consistent if (
22) holds, as stated in the definition below.
Definition 1. Let be an estimator of Σ. We also define and . We say is normalized consistently for Σ if (22) holds. Theorem 3 below indicates that under mild conditions, the sample covariance matrix,
satisfies normalized consistency (
22), which is a sufficient condition for
to meet requirement (i’). Moreover, Theorem 3 shows
in probability and, thus,
meets requirement (ii’). Therefore, the theoretical justification of the validity of the plug-in procedure using
is then complete.
Theorem 3. We define and recall that . Assume Then, in probability andwhich further implies the normalized consistency (22) of . Remark 5. Inspection of the proof shows that Theorem 3 also holds for the sample covariance variant () when is known. Theorem 3 requires the condition of . This condition trivially holds if the entries of are strongly dependent in the sense that and, for some constant (), . In this case, , and the condition of reduces to the natural one (.
Next, we examine the condition of for the marginal testing problem. Recall that . It can be calculated that , and In the extreme case of complete dependency, where for all , . If the margins are mutually independent, then the quantity is , and the condition fails if .
Below, we provide an example to illustrate the sharpness of this condition in general.
Example 1. Let be i.i.d. , and let Then, the covariance matrix of is with eigenvalues of and for . It can be computed that and . Hence, holds if and only if . Theorem 3 is then applicable.
On the other hand, we can show that if , the sample covariance () does not satisfy (22), and thus, is not normalized consistently. In particular, we consider the eigendecomposition, , where D is a diagonal matrix, i.e., , with being the eigenvalues of Σ and Q being the orthogonal matrix whose columns are the eigenvectors of Σ. Furthermore, . Then, it can be easily determined that are i.i.d. and . For , it follows that In a similar way, elementary calculations imply In the case of , it holds that and in probability, which further implies . Then (22), fails under in view of and .□ The concept of normalized consistency in (
22) is closely related to but different from the classical definition of spectral norm consistency in the sense of
Normalized consistency does not generally imply spectral norm consistency (
26). For example, let
and
be i.i.d. standard
random vectors. According to the random matrix theory, (
26) does not hold for the sample covariance matrix (
), which is not a consistent estimate of
(see Marčenko and Pastur [
29], Wachter [
30], Geman [
31]). Indeed, the largest eigenvalue of
converges to 4, while the smallest one converges to 0. However, the normalized consistency (
22) holds, since both
and
, where
. Without further conditions, spectral norm consistency (
26) does not imply normalized consistency either. Proposition 1 relates these two types of convergence, which are of independent interest.
Proposition 1. For an estimate () of Γ, assume that in probability. Then, the normalized consistency (22) holds if and only if . 3.3. A Subsampling Procedure
Plug-in calibration requires a discretization step and imposes condition (
24) in addition to those of Theorem 1. In this section, we provide a subsampling approach with a different idea from the plug-in approach, which avoids the discretization step and covariance matrix estimation. For a subset (
), let
be its cardinality. Each marginal empirical function for the subsample and the entire sample are denoted as follows:
Associated with the subset (
S), the Cramér–von Mises-type subsampling statistic for the testing of marginal distributions is written as
where
. Note that
for
. We consider a variant of (
28) by excluding the diagonal elements.
Note that the quantities
and
in
are introduced to remove the bias. We introduce empirical cumulative distribution functions (
30) and (
31) and the following subsampling-based procedure for estimation of the distribution function (
):
As a slightly different version, one can also employ the following procedure. Let the index be set to
,
, where
with
. The subsampling empirical distribution function is defined as
Based on the result of (
8), we can show that the subsampling approach is asymptotically valid (see Theorem 4 below for details, which shows that the empirical distribution functions defined in (
30) and (
31) both converge to the distribution function (
)). The subsample proportion (
) is a tuning parameter. Theoretically, we only require
and
.
Theorem 4. Let and . Assume that (7) holds, with n therein replaced by m. Then, under in (4), (i)(ii) If , then the convergence (32) also holds for . The subsampling method avoids the discretization step. In addition, Theorem 4 imposes no additional condition beyond those of Theorem 1. It only requires condition (
7), with
n replaced by
m, where the subsample size (
m) can be any value as long as
and
. Comparatively, plug-in calibration requires an additional sharp and sufficient condition (
24) to guarantee its consistency, which may result in conflicts in high-dimensional settings(see Remark 5). On the other hand, a scrutinization of the proof also reveals that the convergence rate is upper-bounded by the rate in (
8) m with
n replaced by
m, which hampers the convergence rate if
n is small.
For ease of implementation, we provide the pseudo-code for the subsampling approach in Algorithm 2. A recommended value of the
m parameter is
, and the subsampling number (
) is a large integer, such as 1000. A flowchart of the overall procedure for hypothesis testing is presented in
Figure 1.
Algorithm 2: Cut-off value approximation by subsampling |
![Mathematics 12 03467 i002]() |
Remark 6. The subsampling approach can also be applied to the testing of joint distributions based on . The subsampling test statistic can be corrected as 4. Numerical Analysis
4.1. A Simulation Study
In this section, a simulation study is conducted to evaluate the performance of the test with the plug-in method and subsampling calibration. For
, data are generated by
where
represents i.i.d. standard normal random variables.
are i.i.d
random vectors, where
has diagonal entries (1) and off-diagonal entries (
), i.e.,
The value
a controls the strength of the cross-sectional dependence among the components. We consider testing of the null hypothesis in the marginal test.
Let
be the c.d.f. of the standard normal random variable. To test
, we first obtain
. Then, the test is equivalent to (
4).
The modified Cramér–von Mises statistics (
5) are computed. Plug-in and subsampling calibration are performed following Algorithm 1 and Algorithm 2, respectively, where we take
, and
.
In the simulation study, we take , , and . respectively. The p values under the null hypotheses are reported from each of the 1000 repetitions.
In
Figure 2,
Figure 3 and
Figure 4, we present the empirical distribution of the
p values under
, namely with
and
. Under
, the
p values obtained according to the true sampling distribution of the test statistic have a uniform distribution between 0 and 1. Therefore, the
p values estimated from an accurate approximation of the sampling distribution should have an empirical cumulative distribution function close to the 45-degree line. It is observed from
Figure 2,
Figure 3 and
Figure 4 that under
, the
p values of our Cramér–von Mises-type test obtained via subsampling calibration are uniformly distributed between 0 and 1 for all scenarios with
and
, which verifies Theorem 4. The plug-in procedure, on the other hand, results in uniform
p values in classical multivariate cases with
and a moderately low dimension of
, while in high-dimensional scenarios with
,
p values are less dispersed in the cross-sectionally independent (
) and weakly dependent (
) cases, so the corresponding procedure tends to result a smaller true size than the nominal size with small values such as 0.05. This observation indicates that subsampling calibration outperforms plug-in calibration in high-dimensional settings with moderately large
n values, which aligns with Remark 5. The computation loads of the plugin and subsampling procedures are similar, as the plug-in method is implemented with a Gaussian multiplier bootstrap, which avoids estimation of the ultra-high-dimensional covariance matrix and its eigenvalues (c.f. Algorithm 1).
To assess the power, the vectors (
) are replaced by
for
, where
for
and
with the proportion (
s) evaluated at
in the simulation. In the data, the couplaremains unchanged, and the marginal distributions in the first
margins follow a
distribution. Four methods, referred to as plug-in, subsampling, BH, and ZW, are compared. They are based on modified Cramér–von Mises statistics (
5) with plug-in and subsampling calibration, the Komolgorov–Smirnov-type statistic presented in [
11], and the Benjamini–Horchberg procedure (c.f., [
32]), respectively. The corresponding nominal significance levels and false discovery rate are set to 0.05. In
Table 2, we report the proportions of rejection, i.e., sizes and powers for
and
, respectively, obtained from 500 repetitions.
The scenarios with are from the same simulation repetitions. The plug-in procedure exhibits lower sizes than nominal, especially when and , which aligns with the empirical CDF plots. When the changes are moderately sparse and dense, with and , respectively, both procedures of the modified Cramèr–von Mises methods outperform both the BH and ZW procedures in almost all the dimension and dependence combinations.
4.2. An Illustrative Example
Normality assumptions are usually imposed, for example, for forecast intervals using vector autoregressive (VAR) models. In the event of conflict, under certain conditions, central limit theorem-type results, can still support the asymptotic normality of the estimation error of the conditional expectation. However, depiction of the distribution of the white-noise component is required to construct the forecast intervals.
In this section, we illustrate the application of our proposed test statistic (
) on the test residuals from a high-dimensional VAR model studied in [
33]. A VAR model was used to analyze a macroeconomic dataset compiled by Stock and Watson [
34] and enlarged by Koop [
35]. The full dataset contains 168 quarterly macroeconomic indicators between Quarter 2, 1959 and Quarter 4, 2007, which are available in
The Journal of Applied Econometrics Data Archive.
In this application, we test the residuals from the medium-size VAR model following [
33], where the VAR model estimated with Lasso was considered. The time series of
includes 20 quarterly macroeconomic series, such as real gross domestic product (GDP251), the Consumer Price Index (CPIAUSL), and the Federal Funds Rate (FYFF). Here,
is assumed to follow a VAR model, with the white noise process expressed as
. In [
33], the 195 quarterly time series were partitioned such that the first 69 quarters’ data were used to train the Lasso algorithm and estimate
A, the next 75 were used for for cross-validation, and the last 61 time points were used for evaluation of the performance. To test the marginal distribution of the residuals, we input the residuals extracted from the test set using the
R functions proposed by Nicholson et al. [
33], with
and
. The residuals are obtained with the BigVAR package and data described in [
33], which are available at
https://cran.r-project.org/web/packages/BigVAR/index.html (accessed on 1 November 2024) and
https://github.com/wbnicholson/BigVAR (accessed on 1 November 2024), respectively.
We remark that testing the normality of errors in high-dimensional VAR models is, itself, a challenging problem. Here, we only consider the marginal tests of normality, which is a necessary but insufficient condition of the joint Gaussian assumption. The mean vector of the prediction residuals is asymptotically 0 in the
norm under certain conditions (e.g., [
36]). As an illustration of our test, here, we simply rescale each residual series (
) with its sample standard deviation (
), i.e.,
, and test if all the marginal distributions are standard normal:
A more careful approximation and rigorous analysis of , as well as the test for joint normality, should be considered in future work.
The value of
and the
p values estimated from the plug-in and subsampling methods are
and
, respectively, both resulting in the rejection of the Gaussian marginal null hypothesis at a significance level of
. Comparatively, classical multiple comparisons using individual Cramér–von Mises tests fails to reject the null hypothesis. Indeed, the minimum five
p values are
, and
. Bonferroni’s calibration with a familywise error rate of
has a cut-off value of
, which suggests not rejecting any individual null hypothesis. Additionally, the Benjamini–Hochberg method [
32] with a 0.05 false discovery rate also results in the rejection of none of the null hypotheses. Multiple comparisons are known to be conservative when cross-sectional dependence exists. A scrutinization of the QQ-normal plot (
Figure 5) reveals that there are residual series that deviate remarkably from normal.
The lowest univariate Cramér–von Mises p-value is from the residuals of the FMRRA series (Depository Institution Reserves: total, adjusted for reserve requirement changes, , p-value = ), which apparently has heavier tails than the Gaussian case. The p value is also close to the Bonferroni correction cutoff value of . Now, we consider fitting a t-distribution. We utilize maximum likelihood estimation with R function and fit the FMRRA residual has a t-marginal distribution with a location of , scale of and degree of freedom of , denoted as .
Now, we test the null hypothesis.
Applying test (
5), we find
, and the estimated
p values from the plug-in and subsampling methods are
and
, respectively. Thus,
is not rejected at the significance level of
. Our test suggests that it is more appropriate to use these distributions rather than all Gaussian margins in further inference.