The limit distributions of the cointegrating rank test statistics are non-standard, as shown in the previous sub-sections; however, given the existing results in the literature, the distributions can be closely approximated by a gamma distribution identified by the first two moments. We first derive this approximation and then show how to implement the approximation.
4.1. Derivation of Response Surface
The literature shows that the asymptotic distributions for cointegration rank testing are nearly gamma distributed. The approximating gamma distribution can be captured either through the mean and variance of the asymptotic distribution or through the associated shape and scale parameters. The quality of the gamma-distribution approximation method has been documented in several papers. Using analytic methodology,
Nielsen (
1997) showed a very good agreement between limit distributions and approximate gamma distributions in tests for unit roots.
Doornik (
1998) then conducted detailed simulation studies to demonstrate a similar agreement for standard full-system cointegration rank test statistics; see also
Doornik (
2003) for various tables of asymptotic quantiles produced by the gamma-distribution approximations. JMN also employed this method.
In order to apply the gamma approximation method, we first define parameters for shape and scale. By Theorem 3, the partial system statistic satisfies
, where the statistics
are identically distributed and also the pairs
are identically distributed. Thus, we get
Solve for
and
when
and insert above to get
Thus, it suffices to approximate the moments of the full sample distributions through simulation. Numerically, it appears that better approximations arise when approximating shape and scale parameters instead of mean and variance. We therefore write
From this, we get the shape and scale parameters as
Hence, we simulated
,
and
and constructed response surfaces to approximate the distribution of
. Following JMN and
Doornik (
1998), we applied a variety of data generating processes and present the results using response surface analysis.
The quantities , and were simulated for a set of given and relative break points. Following JMN, we chose as the maximum number of sub-samples, with a and b representing the smallest and the second smallest of relative sample lengths, respectively. For example, if along with , we then have and . The grid points a and b were selected in the same way as those for Figure 1 in JMN e.g., , so that they were subject to the constraints of and and the total number of their combinations was 20, along with the selection of non-stationary components . For the overall sample sizes or Ts, JMN used 10 integers derived from for but we quadrupled them in order to improve approximations to the underlying limit distributions of the response variables. Thus, we obtained a new set of 10 sample sizes, Ts, ranging from 200 to 2000. For and , this simulation design led to 1600 cases, while the number of cases was reduced to 1400 for as a result of missing values corresponding to .
The computational algorithm used in our study was based on Theorem 4. These asymptotic results justify simulating three sets of T-step random walks for broken linear-trend and constant cases and scaling them according to the pre-specified relative sample lengths. The number of simulation replications N was set at 100,000.
For the response surface analysis, we used , and as the response variables, instead of the logged means and variance as in JMN. It turns out that the use of these response variables (, in particular) mitigates the residual heteroscedasticity problem, hence resulting in a reduction of the number of indicator variables required for and . Note that needs to be included in the set of response variables in any response surface study, in order to make use of Equation (30). In addition, note that taking the log of is not permissible, since covariance is not always positive.
Compared to JMN, we increased the maximum number of observations from
to
It was found that the large-sample (
) approximates of the mean and variance in small dimensions (
) tend to be rather different from those when
T is small. This finding is consistent with
Doornik (
1998), who introduced a set of indicator variables being assigned 1 for
and
and assigned 0 otherwise; these indicators put residual heteroscedasticity under control even in the presence of influential values for
and
.
We regressed each of the three response variables,
,
and
on a set of regressors formed from
a,
b,
and
T. Our baseline function form was a modified version of Equation (3.11) in JMN. In the context of the present paper, the equation in JMN is expressed as
where
y is either
,
or
, while
,
,
,
and
. Following
Doornik (
1998), we also added to this equation a set of indicator variables as explanatory variables, each of which is 1 for a selected value of dimension
and is 0 otherwise. Performing a series of regression analyses and carefully removing insignificant explanatory variables by utilising the
Autometrics option available in
PcGive (
Doornik and Hendry 2013), we arrived at parsimonious response–surface functions for
,
and
; these functions are henceforth denoted
with
z taking values
and
, respectively.
Table A1 and
Table A2 in
Appendix A record the rounded coefficients for
a,
b,
and their variants in the response surface regression for the broken linear trend case and the broken constant case, respectively. The inverse of the observation number,
, and its variants such as
, also play critical roles in the response surface regression, but all of them are irrelevant asymptotically and thus disregarded when calculating the limit approximates based on these tables.
It should also be noted that a response–surface regression analysis of
was technically difficult in terms of residual diagnostic tests.
Doornik (
1998) used the average of estimates for
when performing a response surface analysis for partial systems with no break. We adhered to the regression approach, rather than simply taking the average of the covariance estimates, by assigning importance to various significant influences of
a,
b and
on the behaviour of
This regression analysis indeed bore fruit and clarified the highly complex structure of the dependence of
on
a,
b,
and its variants, as shown in the third column of each of
Table A1 and
Table A2. These findings about
are not known in the literature, thus giving added value to the response surface study conducted in this paper, although the impact of variation in
on the approximate shape and scale parameters may not always be large.
Table 1 and
Table 2 display a set of examples demonstrating the accuracy of the response surface regression results. A class of approximately 95% limit quantiles is presented in each of the tables for various combinations of
a,
b,
and
, when either broken-linear-trend or broken-constant specifications are adopted in analysis. Approximate quantiles in the fifth column (
) in
Table 1 and
Table 2 are derived from
Table A1 and
Table A2, respectively; that is, they are from the full-system-based response surface analysis, combined with the mappings (
29), (30) and (
31). By contrast, approximate quantiles recorded in the sixth column (
) of each table, except those for
, were obtained directly from auxiliary response surface regressions based on partial-system simulations with the same
Ts and
N as above. Each of these auxiliary regression equations employed a simulated 95% quantile as a response variable and involved a constant,
and its powers if necessary, as explanatory variables. The regression equations vary in specification for the purpose of capturing the underlying smooth response surfaces of various simulated quantiles; the graph of each regression’s actual and fitted values was checked to ensure the capturing of the underlying smoothness. Estimated constants in these regression equations are recorded in the columns for
as approximate 95% limit quantiles. The limit quantiles in
for
(that is, no break cases) were taken from
Doornik (
2003).
Table 1 and
Table 2 show that the quantiles in
almost coincide with those in
regardless of specifications of the deterministic terms; see the seventh column of each table for
, a series of absolute relative errors, all of which are very small. This correspondence can be seen as strong evidence supporting the validity of the proposed approximation method based on the full model. Furthermore, the eighth column of each table records a class of discrepancies in approximate
p-values, defined as
, in which
represents a gamma density function calculated from simulated mean and variance. Most of the discrepancies are very small, and even the largest one is around 0.02 when
is relatively large, for which we should recall that a large value of
could give rise to various other distortion issues in practice. The overall evidence allows us to argue that the approximate quantiles work as useful critical values in applications from a practical viewpoint. The
Supplementary Materials includes an Ox code for simulating asymptic distribution. This can be used if further precision is needed.
As a caveat in relation to large values for
, let us recall that our response surface regression was conducted by using a class of realistic number of non-stationary variables,
, which suffice in most applied research. Thus, an empirical study using a partial system of large dimension may require careful examinations of the underlying cointegrating rank, in addition to the application of the proposed
tests to the data under study, as discussed by
Juselius (
2006, §8).
4.2. Implementation of Response Surface
The response surface in
Table A1 and
Table A2 are used as follows. The response surface is aimed at the situation with two breaks. However, Theorem 4 shows that with a simple correction the response surface can also be used with a single break or no break.
In the case of sample periods and thus 2 breaks at , , where , we let a, b be the smallest and second-smallest relative sub-sample length. Thus, if , , so that . We choose and
In the case of sample periods and thus 1 break at , where , then , , so that . We let and .
In the case of
sample period and thus no break, let
. Theorem 4 and (
28) show that the mean and variance for the cases where
can be found from those for
by choosing
as indicated and subtracting
and
, respectively.
Given the choices of
,
,
a and
b, compute the approximations to
Table A1 is used for the case with a broken linear trend while
Table A2 is used for the case with a broken constant. This is then inserted in (
31), which in turn is inserted into (
29), (
30), while correcting for the number of breaks, that is,
Finally, we approximate the quantile of interest or the
p-value of the observed
statistic using a gamma distribution with mean and variance matching (
33) and (34). Equivalently, one can specify the shape and scale of the gamma distribution as
and
.
A spreadsheet for implementing the response surfaces in
Table A1 and
Table A2 is available in the
Supplementary Materials. This also includes an Ox program for simulating the asymptotic distributions and calculating
p-values of observed test statistics for specifications outside the range covered by
Table A1 and
Table A2, for instance when the number of structural breaks is greater than 2 or
.