*Article* **Partial Cointegrated Vector Autoregressive Models with Structural Breaks in Deterministic Terms**

**Takamitsu Kurita 1,† and Bent Nielsen 2,\*,†**


Received: 10 December 2018; Accepted: 30 September 2019; Published: 6 October 2019

**Abstract:** This paper proposes a class of partial cointegrated models allowing for structural breaks in the deterministic terms. Moving-average representations of the models are given. It is then shown that, under the assumption of martingale difference innovations, the limit distributions of partial quasi-likelihood ratio tests for cointegrating rank have a close connection to those for standard full models. This connection facilitates a response surface analysis that is required to extract critical information about moments from large-scale simulation studies. An empirical illustration of the proposed methodology is also provided.

**Keywords:** partial cointegrated vector autoregressive models; structural breaks; deterministic terms; weak exogeneity; cointegrating rank; response surface

**JEL Classification:** C12; C32; C50

#### **1. Introduction**

Partial cointegration models with structural shifts in level or linear trends are quite common in practice; however, no formal analysis is available for these models. The likelihood analysis of the partial models with such breaks is based on reduced rank regression, just like standard full cointegrated vector autoregressive models introduced by Johansen (1988, 1995). The main difference lies in the fact that likelihood-based tests for cointegrating rank in the partial models involve a set of new asymptotic distributions which reflect the combination of weakly exogenous regressors and broken deterministic terms. We generalise the standard assumption of normal innovations (Johansen 1995) to a flexible class of heterogeneous martingale difference innovations. We then derive the asymptotic distributions of the test statistics in question and provide a simulated responsed surface of the asymptotic distribution.

The presented models combine two widely used extensions of Johansen's original model. The first extension was a partial cointegrated system investigated by Harbo et al. (1998), referred to as HJNR henceforth, see also Pesaran et al. (2000). This partial system is a conditional vector autoregressive model for a vector of variables, *Yt*, given another vector of variables, *Zt*, as well as lags of both variables. They also presented simulated tables for asymptotic rank test distributions based on the partial system. Boswijk (1995) and Ericsson and MacKinnon (2002) explored the use of conditional autoregressive models. Recently, Cavaliere et al. (2018) considered information criteria based on the HNJR test statistics. The second extension was a full cointegrated system with structural breaks in a constant level or linear trend, a model explored by Johansen et al. (2000), referred to as JMN hereafter. This full model is a multivariate extension of model C of Perron (1989), where both level and linear trend slope change at the time of the break, as opposed to his models A and B, in which only one of the

two is changing. Deterministic breaks in cointegrated systems have also been explored by Inoue (1999) and Hendry and Massmann (2007).

Each of the two extensions above has proved to be useful in empirical applications; furthermore, subsequent practical work has shown that we frequently require both of the two extensions simultaneously. As an example, Bårdsen et al. (2005) built a large scale model of the Norwegian economy by combining a number of smaller partial cointegration models. Each of these sub-systems is regarded as a partial model subject to structural shifts, and these types of models are useful in a practical sense for empirical macroeconomic research. As it stands, however, the exact asymptotic properties of likelihood-based test statistics derived from the partial models with structural breaks are unknown, so that a formal econometric study based on these models is unfeasible. This paper, therefore, conducts both analytical and simulation-based investigations into the unknown asymptotic properties so that researchers can perform a formal analysis using the partial models with structural breaks. Another example of these partial models is a trade model for the UK by Schreiber (2015), which we are going to use as an empirical illustration later in this paper.

This paper shows that the asymptotic distributions of the proposed likelihood-based test statistics are dependent on information about the dimension of the variables *Yt* and *Zt*, cointegrating rank, the number of breaks and their locations, but the distributions themselves are free of any unknown parameters. Hence, the limit distributions can be simulated given the above information, as in a manner similar to Johansen (1995, §15), HJNR or MacKinnon et al. (1999). The Granger–Johansen representation for the full model in JMN is also reexamined as a basis for the required asymptotic study, and this reexamination can be viewed as a useful clarification of roles of a set of starting values in the workings of the system. It should be noted that a condition for weak exogeneity reviewed in Section 2 is assumed to be satisfied when exploring the properties of the test statistics; the violation of this condition can give rise to a class of limit results that are unfavourable in applications, as discussed by Johansen (1992a). This assumption is testable by following an ex-post testing procedure suggested by Johansen (1992a) and others. We demonstrate this procedure in the empirical illustration in Section 5.

In deriving the asymptotic distributions of the test statistics, the assumption of normal innovations in Johansen (1995), HJNR and JMN, is relaxed to the assumption of martingale difference innovations, with a view to widening the scope of applications of the proposed models. This means we have to be careful in developing asymptotic arguments required for the quasi-likelihood ratio test statistics. We use martingale limit results of Anderson and Kunitomo (1992) and Brown (1971) for approximately stationary components and for non-stationary components, respectively.

Furthermore, it is shown that the derived asymptotic distributions can be approximated by gamma distributions, a class of common statistical distributions identifiable only by the first two moments; the validity of this gamma-distribution approximation method in various other existing models was documented by Nielsen (1997), Doornik (1998) and JMN. The study utilises the fact that mean and variance of the limit distributions for the proposed partial models are expressible in terms of the mean and variance for full models and certain covariance terms. As a result, it is feasible to apply the gamma approximation method to simulation results based on the *full*models, in order to obtain precise limit quantiles of the test statistics for the proposed *partial* models. Hence, we are justified in conducting comprehensive simulations in the full-model framework, the results of which are applied in a response surface analysis combined with the gamma approximation method. The outcomes of the response surface analysis are tabulated in two tables, the accuracy of which is verified by moving back to the partial-model framework. The tables allow researchers to conduct formal applied studies with the proposed partial models. A brief empirical study is also provided.

Overall, this paper adds to the literature on time series econometrics and applied macroeconomics. As a result, the partial cointegrated models will be recognised as more flexible and practical devices for modelling and analysing non-stationary time series data containing structural breaks. For *I*(2) models, Paruolo and Rahbek (1999) proposed partial analysis while Kurita et al. (2011) introduced a model with deterministic shifts. In future work, it may be of interest to combine those ideas as well.

The rest of this paper consists of five sections. Section 2 introduces partial cointegrated models subject to deterministic breaks and their moving-average representations. Section 3 derives partial quasi likelihood-based tests for cointegrating rank allowing for the breaks, and explores the limit distributions of the test statistics. In this section, a response surface analysis is performed by using simulated distributions and then the results of the analysis are summarised as a set of statistical tables. An empirical illustration of the proposed methodology is provided in Section 5. Finally, Section 6 gives concluding remarks. This study used *Ox* (Doornik 2013) and *PcGive* (Doornik and Hendry 2013) to conduct the simulations and the empirical study, respectively.

#### **2. Models and Representations**

We introduce partial cointegrated vector autoregressive models with deterministic breaks. Section 2.1 reviews the existing models known, while Sections 2.2–2.4 provide details of the proposed models.

#### *2.1. Previous Models*

*The cointegrated vector autoregressive model* was proposed by Johansen (1988, 1995). Suppose that we observe a *p*-variate vector time series *Xt* integrated of order 1, denoted as *I*(1) hereafter. In the presence of two lags, a constant and restricted linear trend, the model equation for *Xt*

$$
\Delta X\_t = (\Pi\_\prime \Pi\_\ell) \begin{pmatrix} X\_{t-1} \\ t \end{pmatrix} + \Gamma \Delta X\_{t-1} + \mu + \varepsilon\_t \quad \text{for} \quad t = 3, \dots, T,\tag{1}
$$

with index for the linear trend model and the associated cointegrating rank hypothesis, for *r* ≤ *p*,

$$\text{rank}\,(\Pi,\Pi\_{\ell}) \le r \qquad \text{so that} \qquad (\Pi,\Pi\_{\ell}) = \mathfrak{a}(\beta',\gamma). \tag{2}$$

Here, the initial values *X*<sup>1</sup> and *X*<sup>2</sup> are fixed while *p*-vector innovations *ε*3, ... ,*ε<sup>T</sup>* are distributed as independent normal, denoted by N*p*(0, Ω). The parameters in Equation (1) are all variation free, defined as *<sup>α</sup>*, *<sup>β</sup>* <sup>∈</sup> *<sup>R</sup>p*×*<sup>r</sup>* , *<sup>γ</sup>* <sup>∈</sup> *<sup>R</sup><sup>r</sup>* , *<sup>μ</sup>* <sup>∈</sup> *<sup>R</sup><sup>p</sup>* and <sup>Γ</sup>, <sup>Ω</sup> <sup>∈</sup> *<sup>R</sup>p*×*<sup>p</sup>* and with <sup>Ω</sup> being positive definite. This model is interpreted in terms of its Granger–Johansen representation. The likelihood function is maximised through reduced rank regression of Δ*Xt* on the vector of *Xt*−1, 1 corrected for Δ*Xt*−1. The cointegrating rank *r* can be determined through a sequence of rank test statistics, which have Dickey–Fuller type limit distributions depending on the number of common trends, *p* − *r* in this case, and with a linear trend adjustment. Once the rank is determined, asymptotic inference for the cointegrating vectors *β* and the adjustment vectors *α* can be based on *χ*<sup>2</sup> distributions.

*The partial model* is derived from the model given by Equation (1), which is referred to as *the full model* henceforth. This allows exogenous regressors that are not necessarily analysed in the model equation. With a view to setting up the partial model, let us introduce an integer *m* satisfying 0 ≤ *r* ≤ *m* < *p*, so that we can decompose *Xt* into an *m*-vector *Yt* and a vector *Zt* of dimension *p* − *m*. Decompose the parameters and error terms of Equation (1) conformably so that, for instance,

$$
\Pi = \left(\begin{array}{c} \Pi\_y \\ \Pi\_z \end{array}\right), \Gamma = \left(\begin{array}{c} \Gamma\_y \\ \Gamma\_z \end{array}\right), \mu = \left(\begin{array}{c} \mu\_y \\ \mu\_z \end{array}\right), \varepsilon\_t = \left(\begin{array}{c} \varepsilon\_{y,t} \\ \varepsilon\_{z,t} \end{array}\right) \text{ and } \Omega = \left(\begin{array}{c} \Omega\_{yy} & \Omega\_{yz} \\ \Omega\_{zy} & \Omega\_{zz} \end{array}\right).
$$

We also define the population regression coefficient *ω* = Ω*yz*Ω−<sup>1</sup> *zz* , which leads to a class of conditional coefficients Π*y*·*<sup>z</sup>* = Π*<sup>y</sup>* − *ω*Π*z*, Γ*y*·*<sup>z</sup>* = Γ*<sup>y</sup>* − *ω*Γ*<sup>z</sup>* and *μy*·*<sup>z</sup>* = *μ<sup>y</sup>* − *ωμz*. The partial or conditional model for *Yt* given *Zt* is then presented as

$$
\Delta \mathbf{Y}\_t = \omega \Delta Z\_t + \left(\Pi\_{y \cdot z \cdot \epsilon} \Pi\_{y \cdot z \cdot \ell}\right) \binom{\mathbf{X}\_{t-1}}{t} + \Gamma\_{y \cdot z} \Delta X\_{t-1} + \mu\_{y \cdot z} + \varepsilon\_{y \cdot z, t \cdot \ell} \tag{3}
$$

where the conditional innovation sequence *εy*·*z*,*<sup>t</sup>* = *εy*,*<sup>t</sup>* − *ωεz*,*<sup>t</sup>* is N*m*(0, Ω*yy*·*z*) distributed, so *εy*·*z*,*<sup>t</sup>* is independent of *Zt* and the overall past series, while its variance is

$$
\Omega\_{yy\cdot z} = \Omega\_{yy} - \Omega\_{yz}\Omega\_{zz}^{-1}\Omega\_{zy}.\tag{4}
$$

The cointegration rank hypothesis is, for *r* ≤ *m*,

$$\text{rank}\left(\Pi\_{\mathcal{Y}:z}, \Pi\_{\mathcal{Y}:z,\ell}\right) \le r \qquad \text{so that} \qquad \left(\Pi\_{\mathcal{Y}:z}, \Pi\_{\mathcal{Y}:z,\ell}\right) = a\_{\mathcal{Y}:z}(\boldsymbol{\beta}^{\prime}, \boldsymbol{\gamma}), \tag{5}$$

where *αy*·*<sup>z</sup>* = *α<sup>y</sup>* − *ωαz*. The marginal model for *Zt* is simply given as

$$
\Delta Z\_t = \alpha\_z(\beta', \gamma) \binom{X\_{t-1}}{t} + \Gamma\_z \Delta X\_{t-1} + \mu\_z + \varepsilon\_{z,t} \tag{6}
$$

Due to the conditioning of *Yt* on *Zt*, the innovations *εy*·*z*,*<sup>t</sup>* and *εz*,*<sup>t</sup>* are independent. Even so, the cointegrating relationships *β Xt*−<sup>1</sup> + *γt* form cross equation restrictions, so that maximum likelihood estimation involves a joint analysis of (3) and (6). The rank can be determined from a partial analysis using information criteria albeit without size control as argued by Cavaliere et al. (2018).

Weak exogeneity arises when *α<sup>z</sup>* = 0. In this case, the partial model and the marginal model are unrelated and *Zt* is weakly exogenous for a class of parameters of interest, *αy*, *β* and *γ*, in the sense of Engle et al. (1983). See also Johansen (1992a, 1992b, 1995, §8) and HJNR. Maximum likelihood estimation can be performed by analysing the two models separately, i.e., the partial model is estimated by reduced rank regression while the marginal model is by least squares regression. The maintained assumption is that the joint vector *Xt* has *r* cointegrating relations and hence *p* − *r* common trends, with the cointegrating relations being in the partial model for *Yt*. A notable feature of the setup is that it is left unspecified whether or not *Zt* is cointegrated. In a one-lag model *Zt* will not be cointegrated, but with further lags *Zt* could be cointegrated since the short-run dynamics are determined by both *α* and Γ; see HJNR (p. 390) for an example of these models. HJNR explored an asymptotic theory for likelihood-based rank testing in the partial model (3). The asymptotic distribution of HJNR's rank test statistic is of the Dickey–Fuller type, now depending on both *m* − *r* and *p* − *r*, which are the dimensions of common trends for *Yt* and *Xt*, respectively. Seo (1998) suggested a class of cointegrated models where a stationary regressor, Δ*Zt* is included in a cointegration model. This corresponds to a models of the type (3), but where *Xt*−<sup>1</sup> is replaced by only *Yt*−1. In general, this results in an inference that depends on nuisance parameters. Rahbek and Mosconi (1999) noticed that, if the stationary regressor Δ*Zt* is cumulated and entered in the cointegrating vector *Xt*−<sup>1</sup> as in (3), then the asymptotic distributions of HJNR would apply.

*Structural breaks in deterministic terms* were included in the full system model by JMN. The idea is to consider, say, two sub-samples starting at time *T*<sup>0</sup> and *T*1, respectively, for 0 = *T*<sup>0</sup> < *T*<sup>1</sup> < *T*<sup>2</sup> = *T*. The dynamic parameters in the model are the same for both sub-samples, while the parameters for deterministic terms can differ. In the model with lag-length *<sup>k</sup>* = 2, the observations *Tj*−<sup>1</sup> + 1, *Tj*−<sup>2</sup> + <sup>2</sup> for *j* = 1, 2 are held back as initial observations. Thus, the transition from one regime to the next is not modelled. Recently, Harvey and Thiele (2017) used a similar idea in a structural time series model.

#### *2.2. The Partial Model with Structural Breaks*

We are in a position to introduce a new model, a partial cointegrated model allowing for structural breaks in its deterministic terms.

We start by defining the timing of the sub-samples. Suppose we have *T* observations. We extend the partial model to the one with a pre-specified number of sub-sample periods, *q* say, and *k* lags. Following JMN, we introduce the sub-sample structure 0 = *T*<sup>0</sup> < *T*<sup>1</sup> < ··· < *Tq* = *T*. The model will

have *<sup>k</sup>* lags. Thus, for each sub-sample *<sup>j</sup>*, the effective range is *Tj*−<sup>1</sup> + *<sup>k</sup>* < *<sup>t</sup>* ≤ *Tj*. In summary, we have data for 0 < *t* ≤ *T*, while the *effective sample* is the collection of *effective sub-samples*, that is,

$$T\_{j-1} + k < t \le T\_j \qquad \text{where} \qquad 1 \le j \le q. \tag{7}$$

The model has dynamic parameters that are common across the sub-sample periods, whereas the parameters for deterministic terms vary. This gives, for each effective sub-sample *j* as defined above:

$$
\Delta Y\_t = \omega \Delta Z\_t + a\_\mathcal{Y}(\boldsymbol{\beta}^\prime, \gamma\_\mathcal{j}) \begin{pmatrix} X\_{t-1} \\ \mathbf{t} \end{pmatrix} + \sum\_{i=1}^{k-1} \Gamma\_{\boldsymbol{y}\cdot\boldsymbol{z},i} \Delta X\_{t-i} + \mu\_{\boldsymbol{y}\cdot\boldsymbol{z},\boldsymbol{j}} + \varepsilon\_{\boldsymbol{y}\cdot\boldsymbol{z},t} \tag{8}
$$

where *<sup>γ</sup><sup>j</sup>* <sup>∈</sup> *<sup>R</sup><sup>r</sup>* , *μ<sup>j</sup>* = (*μ y*,*j* , *μ z*,*j* ) <sup>∈</sup> *<sup>R</sup><sup>p</sup>* and *<sup>μ</sup>y*·*z*,*<sup>j</sup>* <sup>=</sup> *<sup>μ</sup>y*,*<sup>j</sup>* <sup>−</sup> *ωμz*,*<sup>j</sup>* for *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>q</sup>*, along with Γ*<sup>i</sup>* = (Γ *y*,*i* , Γ *z*,*i* ) <sup>∈</sup> *<sup>R</sup>p*×*<sup>p</sup>* and <sup>Γ</sup>*y*·*z*,*<sup>i</sup>* <sup>=</sup> <sup>Γ</sup>*y*,*<sup>i</sup>* <sup>−</sup> *<sup>ω</sup>*Γ*z*,*<sup>i</sup>* for *<sup>i</sup>* <sup>=</sup> 1, ... , *<sup>k</sup>* <sup>−</sup> 1, and all the other parameters were defined in the previous sub-section. Note that the parameters for deterministic terms depend on *j*, indicating the presence of parameter shifts according to regime changes. A class of initial observations *XTj*−1+1, ... , *XTj*−1+*<sup>k</sup>* plays the dual role of capturing the transition from the previous regime, *<sup>j</sup>* − 1 and of serving as the initial observations for the regime *j*. In some applications, the transition between the regimes may be longer than *k* observations, in which case more observations could be classified as initial observations. The marginal model for *Zt* under *α<sup>z</sup>* = 0 is

$$
\Delta Z\_t = \sum\_{i=1}^{k-1} \Gamma\_z \Delta X\_{t-i} + \mu\_{z,j} + \varepsilon\_{z,t} \,. \tag{9}
$$

We can form a full model equation as in Equation (1) for each sub-sample period. This is the model of JMN with weak exogeneity imposed. This model will be presented in the next sub-section.

The partial model can be formulated as a single equation for the full sample period in terms of the following notation. Following JMN, we define impulse dummy variables as

$$D\_{j,t} = \begin{cases} 1 & \text{for } t = T\_{j-1} \\ 0 & \text{otherwise} \end{cases} \quad \text{for } j = 1, \dots, q \quad \text{and} \quad t = 1, \dots, T\_{\prime}$$

so that *Dj*,*t*−*<sup>i</sup>* = 1 if *<sup>t</sup>* = *Tj*−<sup>1</sup> + *<sup>i</sup>*, and also define indicators for the effective samples as

$$E\_{j,t} = \sum\_{i=k+1}^{T\_{j}-T\_{j-1}} D\_{j,t-i} = \begin{cases} 1 & \text{for } T\_{j-1} + k < t \le T\_{j\prime} \\ 0 & \text{otherwise,} \end{cases} \quad \text{and} \quad E\_{t} = \begin{pmatrix} E\_{1,t\prime} \dots E\_{q,t} \end{pmatrix}^{\prime}.$$

The whole-sample model equation then has the form, with *X*- *<sup>t</sup>*−<sup>1</sup> = (*X <sup>t</sup>*−1, *tE t*) , where the index - indicates the model with a linear trend, for *t* = *k* + 1, . . . , *T*,

$$
\Delta Y\_{l} = \omega \Delta Z\_{l} + \mu\_{y} (\Pi\_{y}, \Pi\_{y,\ell}) X\_{l-1}^{\ell} + \sum\_{i=1}^{k-1} \Gamma\_{y \cdot z, i} \Delta X\_{t-i} + \mu\_{y \cdot z} E\_{t} + \sum\_{i=1}^{k} \sum\_{j=2}^{q} q\_{j,i} D\_{j,t-i} + \varepsilon\_{y \cdot z, t \prime} \tag{10}
$$

with cointegration rank hypothesis, for *r* ≤ *m*,

$$\mathsf{H}\_{\ell}(r):\qquad \text{rank}\left(\Pi\_{\mathcal{Y}}, \Pi\_{\mathcal{Y}\mathcal{E}}\right) \le r \qquad \text{so that}\qquad \left(\Pi\_{\mathcal{Y}}, \Pi\_{\mathcal{Y}\mathcal{E}}\right) = \left(\mathcal{S}', \gamma\right).\tag{11}$$

Here, *<sup>ϕ</sup>j*,*<sup>i</sup>* <sup>∈</sup> *<sup>R</sup><sup>m</sup>* represents a class of parameters for *Dj*,*t*−*<sup>i</sup>* for *<sup>i</sup>* <sup>=</sup> 1, ... , *<sup>k</sup>* and *<sup>j</sup>* <sup>=</sup> 2, ... , *<sup>q</sup>*, while the parameters *γ* and *μy*·*<sup>z</sup>* are now redefined in a manner allowing for breaks as

$$\gamma = (\gamma\_1, \dots, \gamma\_q) \in \mathbb{R}^{r \times q} \quad \text{and} \quad \mu\_{\mathcal{Y}^{\circ \cdot z}} = (\mu\_{\mathcal{Y}^{\circ}z, \mathbb{1}\_{\prime}}, \dots, \mu\_{\mathcal{Y}^{\circ}z, q}) \in \mathbb{R}^{m \times q},$$

which are used in the rest of this study. Equation (8), or its whole-sample form (10), is referred to as the partial model with a broken linear trend term.

#### *2.3. Representations*

Various properties of the proposed partial model (8) will be analysed using the Granger–Johansen representation of an *I*(1) process, which is formulated based on the full model for *Xt*; thus, the representation is the same as that in JMN (Theorem 2.1). In JMN, each sub-sample period is analysed conditionally on its initial observations. As a result, the representation for each sub-sample period is the same as that in Johansen (1995, Theorem 4.2). The initial values for each sub-sample can be large and thus be influential even in the asymptotic context, but, when following the underlying argument of JMN, one can see that such initial values do not play critical roles in the required asymptotic analysis. Following Kurita and Nielsen (2009), we show this in two steps: first, we analyse a homogeneous equation, and then consider the roles of deterministic terms by moving to a non-homogeneous equation. For further details, see the proof of Theorem 1 below.

For each sub-sample, the full model for *Xt* is defined as a joint system of (8) and (9) through:

$$
\Delta X\_t = a(\beta', \gamma\_j) \binom{X\_{t-1}}{t} + \sum\_{i=1}^{k-1} \Gamma\_i \Delta X\_{t-i} + \mu\_j + \varepsilon\_{t\prime} \tag{12}
$$

while the corresponding homogeneous equation is

$$
\Delta \bar{X}\_t = \alpha \beta' \bar{X}\_{t-1} + \sum\_{i=1}^{k-1} \Gamma\_i \Delta \bar{X}\_{t-i} + \varepsilon\_{t\prime} \tag{13}
$$

where *X*˜*<sup>t</sup>* denotes a *p*-variate *mean-zero* vector time series. We then set up a companion vector based on (13) and analyse a companion form of this equation. Several choices are conceivable with respect to a companion form for (13) and we use the choice that appears, for instance, in Hansen (2005). For the purpose of studying details of the representation, the parameters need to satisfy Assumption 1 below. This is applicable to both (12) and (13). Some additional notation is required. When *β* has full column rank *<sup>r</sup>*, let *<sup>β</sup>*<sup>⊥</sup> denote a *<sup>p</sup>* × (*<sup>p</sup>* − *<sup>r</sup>*) dimensional orthogonal complement, so that (*β*, *<sup>β</sup>*⊥) is invertible and *β* ⊥*<sup>β</sup>* <sup>=</sup> 0 and introduce the normalization *<sup>β</sup>* <sup>=</sup> *<sup>β</sup>*(*β β*)−1. The same notation applies to *α*.

**Assumption 1.** *Assume that the roots of the characteristic polynomial,*

$$A(z) = (1 - z)I\_p - \alpha \beta' z - \sum\_{i=1}^{k-1} \Gamma\_i (1 - z) z^i \beta$$

*are outside the complex unit circle or at unity; furthermore, assume that the matrices α and β have full column rank r and that the square matrix α* <sup>⊥</sup>Ψ*β*<sup>⊥</sup> *has full rank p* <sup>−</sup> *r, where* <sup>Ψ</sup> <sup>=</sup> *Ip* <sup>−</sup> <sup>∑</sup>*k*−<sup>1</sup> *<sup>i</sup>*=<sup>1</sup> Γ*i.*

Given Assumption 1, we can define *<sup>C</sup>* = *<sup>β</sup>*⊥(*α* ⊥Ψ*β*⊥)−1*α* <sup>⊥</sup> This is often referred to as the impact matrix in cointegration literature; see Paruolo (1997) for inference on this matrix.

We are approaching the stage where the Granger–Johansen representation for each sub-sample period is presented. For the homogeneous Equation (13), let us define

$$\mathbf{a}^{\prime} = \begin{pmatrix} a & \Gamma\_1 & \cdots & & \Gamma\_{k-1} \\ 0 & I\_p & & & 0 \\ \vdots & & \ddots & & \vdots \\ & & & & 0 \\ 0 & \cdots & & 0 & I\_p \end{pmatrix}, \quad \Lambda = \begin{pmatrix} I\_p & 0 & \cdots & & 0 \\ I\_p & -I\_p & & & \vdots \\ 0 & & \ddots & & \\ \vdots & & & & 0 \\ 0 & \cdots & 0 & I\_p & -I\_p \end{pmatrix},\tag{14}$$

*Econometrics* **2019**, *7*, 42

as well as

$$\mathcal{B}' = \left( \begin{array}{cc} \beta' & 0\\ 0 & I\_{p(k-1)} \end{array} \right) \Lambda, \quad \mathfrak{X}\_{t-1} = \left( \begin{array}{cc} X\_{t-1} \\ \vdots \\ X\_{t-k} \end{array} \right), \tag{15}$$

and *<sup>ι</sup>* **= (***Ip*, 0, ... , 0) together with *<sup>r</sup>* <sup>=</sup> *<sup>r</sup>* <sup>+</sup> *<sup>p</sup>* (*<sup>k</sup>* <sup>−</sup> <sup>1</sup>). The representation is then given in the theorem below, the proof of which is provided in Appendix B.

**Theorem 1.** *Suppose that Assumption 1 is fulfilled. Then, an r-variate process β X***˜***<sup>t</sup> derived from ation (13) satisfies, on the effective sample Tj*−<sup>1</sup> + *<sup>k</sup>* < *<sup>t</sup>* ≤ *Tj for* <sup>1</sup> ≤ *<sup>j</sup>* ≤ *q,*

$$
\mathfrak{gl}'\tilde{\mathbf{X}}\_{l} = (l\_{l} + \mathfrak{gl}'\mathfrak{a})\mathfrak{gl}'\tilde{\mathbf{X}}\_{l-1} + \mathfrak{gl}'\mathfrak{a}\_{l} \quad \text{with} \quad |\text{eigen}(l\_{l} + \mathfrak{gl}'\mathfrak{a})| < 1,\tag{16}
$$

*which is a stable first-order vector autoregression. The solution to (13) is given as*

$$\mathcal{R}\_t = \mathbb{C} \sum\_{s=T\_{j-1}+k+1}^t \varepsilon\_s + \{(I-\mathbb{C}\mathbb{Y})\mathbb{X}, \mathbb{C}\mathbb{Y}\} \mathfrak{F}\mathfrak{X}\_t - \mathbb{C}(\mathbb{Y}, \mathbb{Y}) \Lambda \mathfrak{X}\_{T\_{j-1}+k\prime} \tag{17}$$

*where* <sup>Υ</sup> = (Υ1,... <sup>Υ</sup>*k*−1) *with* <sup>Υ</sup>*<sup>i</sup>* = −Γ*<sup>i</sup>* −···− <sup>Γ</sup>*k*−1*. Thus, the variable Xt in (12) satisfies*

$$\begin{split} X\_{l} = \mathbb{C} \sum\_{s=T\_{j-1}+k+1}^{t} \varepsilon\_{s} + (I - \mathbb{C}\mathbb{V}) \bar{\boldsymbol{\beta}} \boldsymbol{\beta}^{\prime} \bar{\boldsymbol{X}}\_{l} - \mathbb{C} \sum\_{i=1}^{k-1} \boldsymbol{\Gamma}\_{i} \sum\_{\ell=0}^{i-1} \Delta \bar{\boldsymbol{X}}\_{l-\ell} \\ &- \mathbb{C} \mathbb{V} \boldsymbol{\mathcal{X}}\_{T\_{j-1}+k} + \mathbb{C} \sum\_{i=1}^{k-1} \boldsymbol{\Gamma}\_{i} \sum\_{\ell=0}^{i-1} \Delta \bar{\boldsymbol{X}}\_{T\_{j-1}+k-\ell} + \boldsymbol{\tau}\_{\mathbf{t},j} + \boldsymbol{\tau}\_{\ell,j} t\_{\ell} \tag{18} \end{split} \tag{19}$$

*for <sup>X</sup>*˜*<sup>t</sup>* <sup>=</sup> *Xt* <sup>−</sup> *<sup>τ</sup>c*,*<sup>j</sup>* <sup>−</sup> *<sup>τ</sup>*-,*jt with the parameters τc*,*<sup>j</sup> and τ*-,*<sup>j</sup> satisfying*

$$
\Psi \tau\_{l,j} = \alpha \beta'(\tau\_{c,j} - \tau\_{\ell,j}) + \mu\_j \quad \text{and} \quad \beta' \tau\_{\ell,j} + \gamma\_j = 0.
$$

Note that the initial observations for the *j*-th sub-sample in (18) are expressed in terms of linear combinations of the mean-zero values *<sup>X</sup>*˜ *Tj*−1<sup>+</sup>1, ... , *<sup>X</sup>*˜ *Tj*−1+*k*, so that we can in general argue that the the starting values for each sub-sample period do not play critical roles in asymptotic analysis. This property was not explicitly examined in JMN. Thus, Theorem 1 can be seen as a useful clarification of roles of the initial values in the full cointegrated model subject to deterministic breaks. The Granger–Johansen representation is utilised in proofs of asymptotic theorems in Section 3.

As an alternative to the above sub-sample representation, one can derive a joint representation for the whole sample. For this purpose, we need a full system equation for *Xt* over the entire sample period. This equation is derived from a combination of (12) over *j* = 1, ... , *q* augmented with dummies *Dj*,*t*−*<sup>i</sup>* and *Ej*,*t*, as in (10); that is,

$$
\Delta X\_t = a \left( \beta', \gamma \right) X\_{t-1}^\ell + \sum\_{i=1}^{k-1} \Gamma\_i \Delta X\_{t-i} + \mu E\_t + \sum\_{i=1}^k \sum\_{j=2}^q \kappa\_{j,i} D\_{j,t-i} + \varepsilon\_{t\prime} \tag{19}
$$

where *<sup>κ</sup>j*,*<sup>i</sup>* <sup>∈</sup> *<sup>R</sup><sup>p</sup>* for *<sup>i</sup>* <sup>=</sup> 1, ... , *<sup>k</sup>* and *<sup>j</sup>* <sup>=</sup> 2, ... , *<sup>q</sup>*, and *<sup>μ</sup>* = (*μ*1, ... , *<sup>μ</sup>q*) <sup>∈</sup> *<sup>R</sup>p*×*q*; see Equation (2.6) in JMN. We then replace the innovations *ε<sup>t</sup>* with *ε<sup>D</sup> <sup>t</sup>* = *<sup>ε</sup><sup>t</sup>* + *αγtEt* + *<sup>μ</sup>Et* + <sup>∑</sup>*<sup>k</sup> <sup>i</sup>*=<sup>1</sup> <sup>∑</sup>*<sup>q</sup> <sup>j</sup>*=<sup>2</sup> *κj*,*iDj*,*t*−*<sup>i</sup>* to reach a whole-sample representation such as

$$X\_t \approx \mathbb{C} \sum\_{s=k+1}^t \varepsilon\_s^D + \mathbb{C}\_1(L)\varepsilon\_t^D + A \quad \text{for} \quad k < t \le T\_\prime \tag{20}$$

where *C*1(*L*)*ε<sup>D</sup> <sup>t</sup>* denotes a moving-average process whose coefficients decrease exponentially fast, and *A* depends on initial observations *X*1, ... , *Xk*, satisfying *β A* = 0. This is an approximation, since the precise formulation of the moving-average component requires introduction of an infinite past, while the model is formulated as conditional on the initial observations. As before the deterministic parts of the common trends *C* ∑*<sup>T</sup> <sup>s</sup>*=*k*+<sup>1</sup> *<sup>ε</sup><sup>D</sup> <sup>s</sup>* will be piecewise constant since *Cα* = 0, so that each constant fails to cumulate to a linear trend. A similar approach was adopted in *I*(2) cointegration analysis by Kurita et al. (2011). The representation (20) is clear and concise, but the transition from one regime to another is considered to be explicitly autoregressive, which may leave less flexibility to represent regime transitions of some persistent and messy nature. For the asymptotic study conducted below, we follow JMN by using the sub-sample representation (18).

#### *2.4. The Partial Model with Shifts in The Level*

In some applications, it suffices to exclude the broken linear trends and just include shifts in the constant term. By restricting the broken constant term within the cointegrating space, Equation (10) is reduced to, with *X<sup>c</sup> <sup>t</sup>*−<sup>1</sup> = (*X <sup>t</sup>*−1, *<sup>E</sup> <sup>t</sup>*) and index *c* for model with breaks in the constant level,

$$
\Delta Y\_t = \omega \Delta Z\_t + \left(\Pi\_{\mathcal{Y}}, \Pi\_{\mathcal{Y}\mathcal{E}}\right) X\_{t-1}^c + \sum\_{i=1}^{k-1} \Gamma\_{\mathcal{Y}\cdot z, i} \Delta X\_{t-i} + \sum\_{i=1}^k \sum\_{j=2}^q \varphi\_{j,i} D\_{j,t-i} + \varepsilon\_{\mathcal{Y}\cdot z, t} \tag{21}
$$

and cointegration rank hypothesis, for *r* ≤ *m*,

$$\mathsf{H}\_{\mathfrak{c}}(r):\qquad \text{rank}\left(\Pi\_{\mathfrak{y}}, \Pi\_{\mathfrak{y},\mathfrak{c}}\right) \le r \qquad \text{or that} \qquad \left(\Pi\_{\mathfrak{y}}, \Pi\_{\mathfrak{y},\mathfrak{c}}\right) = \mathsf{a}\_{\mathfrak{y}}\left(\mathfrak{beta}', \gamma\right). \tag{22}$$

The Granger–Johansen representation has the same form as (18) but is subject to

$$
\beta' \tau\_{\varepsilon,j} + \gamma'\_j = 0 \quad \text{and} \quad \tau\_{\ell,j} = 0.
$$

#### **3. Testing for Cointegrating Rank in the Partial Models**

This section addresses the issue of testing for cointegrating rank in the suggested partial models with deterministic shifts. Section 3.1 introduces a partial likelihood ratio test for the choice of rank based on the broken linear-trend model and Section 3.2 derives its limit distribution. Section 3.3 then turns to the broken constant model and examines the test statistic based upon it. Finally, Section 4 derives a class of approximations to the limit distributions by means of computer simulations and response surface regression.

#### *3.1. Rank Test Statistic*

For each sub-sample period, the partial model (10) is seen as equivalent to that in HJNR, given the presence of structural breaks in its deterministic terms. We derive the log partial likelihood ratio test statistic for the cointegration rank hypothesis H-(*r*) defined in (11). This likelihood is analysed by reduced rank regression in a manner similar to the original cointegration model in Johansen (1988, 1995). We show that the reduced rank regression can be done in three, numerically equivalent ways.

The first approach is based on a full-sample reduced rank regression. Regress each of the vectors Δ*Yt* and *X*- *<sup>t</sup>*−<sup>1</sup> = (*X <sup>t</sup>*−1, *tE <sup>t</sup>*) on a vector *Ht* consisting of the variables Δ*Zt*, the lagged differences <sup>Δ</sup>*Xt*−1, ... , <sup>Δ</sup>*Xt*−*k*+1, the intercepts *Et* and the impulse dummies *Dj*,*t*−*<sup>i</sup>* for *<sup>i</sup>* = 1, ... , *<sup>k</sup>* and *<sup>j</sup>* = 2, ... , *<sup>q</sup>*, so that *Ht* has dimension *pk* − *m* + *q* + *k*(*q* − 1). This gives residuals *R*0,*<sup>t</sup>* and *R*1,*t*:

$$
\begin{pmatrix} R\_{0,t} \\ R\_{1,t} \end{pmatrix} = \begin{pmatrix} \Delta Y\_t \\ X\_{t-1}^\ell \end{pmatrix} - \sum\_{s=k+1}^T \begin{pmatrix} \Delta Y\_s \\ X\_{s-1}^\ell \end{pmatrix} H\_s' \left( \sum\_{s=k+1}^T H\_s H\_s' \right)^{-1} H\_t \quad \text{for } k < t \le T.
$$

*Econometrics* **2019**, *7*, 42

The second approach is viewed as a sub-sample approach. We note that the impulse dummies result in a perfect fit for each transitional period in between two connecting regimes; thus, *R*0,*<sup>t</sup>* and *R*1,*<sup>t</sup>* are zero for all transitional periods. We can, therefore, compute the residuals *R*0,*<sup>t</sup>* and *R*1,*<sup>t</sup>* by analysing the effective sub-sample periods only; see also Doornik et al. (1998, §12.2). For this purpose, let us form regressors *Pt* from the variables <sup>Δ</sup>*Zt*, the lagged differences <sup>Δ</sup>*Xt*−1, ... , <sup>Δ</sup>*Xt*−*k*+<sup>1</sup> and the intercepts *Et*, so that *Pt* is a vector of dimension *pk* − *m* + *q*. The residuals *R*0,*<sup>t</sup>* and *R*1,*<sup>t</sup>* then satisfy

$$
\begin{pmatrix} R\_{0,t} \\ R\_{1,t} \end{pmatrix} = \begin{pmatrix} \Delta Y\_t \\ X\_{t-1}^\ell \end{pmatrix} - \sum\_{j=1}^q \sum\_{s=T\_{j-1}+k+1}^{T\_j} \binom{\Delta Y\_s}{X\_{s-1}^\ell} P\_s' \left( \sum\_{j=1}^q \sum\_{s=T\_{j-1}+k+1}^{T\_j} P\_s P\_s' \right)^{-1} P\_{t-1}
$$

for *Tj*−<sup>1</sup> + *<sup>k</sup>* < *<sup>t</sup>* ≤ *Tj* with 1 ≤ *<sup>j</sup>* ≤ *<sup>q</sup>*, while *<sup>R</sup>*0,*<sup>t</sup>* and *<sup>R</sup>*1,*<sup>t</sup>* are zero, otherwise.

The third approach is recognised as a two-step approach, in which we first demean the observed time series and then partial out influences from the lagged differences. In the first step, we analyse two vectors Δ*Yt* and *X*- *<sup>t</sup>*−1, along with a vector *Vt* consisting of the variables <sup>Δ</sup>*Zt* and the lagged differences <sup>Δ</sup>*Xt*−1, ... , <sup>Δ</sup>*Xt*−*k*<sup>+</sup>1. These three vectors are demeaned within each sub-sample period, yielding Z0,*t*, Z1,*<sup>t</sup>* and Z2,*<sup>t</sup>* defined as

$$
\begin{pmatrix} Z\_{0,t} \\ Z\_{1,t} \\ Z\_{2,t} \end{pmatrix} = \begin{pmatrix} \Delta Y\_t \\ X\_{t-1}^\ell \\ V\_t \end{pmatrix} - \frac{1}{T\_j - T\_{j-1} + k} \sum\_{s=T\_{j-1}+k+1}^{T\_j} \begin{pmatrix} \Delta Y\_s \\ X\_{s-1}^\ell \\ V\_s \end{pmatrix},\tag{23}
$$

for 1 ≤ *<sup>j</sup>* ≤ *<sup>q</sup>* and *Tj*−<sup>1</sup> + *<sup>k</sup>* < *<sup>t</sup>* ≤ *Tj* and zero otherwise. In the second step, we compute

$$
\begin{pmatrix} R\_{0,t} \\ R\_{1,t} \end{pmatrix} = \begin{pmatrix} \mathcal{Z}\_{0,t} \\ \mathcal{Z}\_{1,t} \end{pmatrix} - \sum\_{s=k+1}^{T} \begin{pmatrix} \mathcal{Z}\_{0,s} \\ \mathcal{Z}\_{1,s} \end{pmatrix} \mathcal{Z}\_{2,s}^{\prime} \left( \sum\_{s=k+1}^{T} \mathcal{Z}\_{2,s} \mathcal{Z}\_{2,s}^{\prime} \right)^{-1} \mathcal{Z}\_{2,t} \dots$$

Since Z0,*t*, Z1,*<sup>t</sup>* are zero within the transitional periods, so are the residuals *R*0,*t*, *R*1,*t*. With the residuals *R*0,*<sup>t</sup>* and *R*1,*<sup>t</sup>* in hand, we can compute the product moments

$$
\left(\begin{array}{cc} \mathcal{S}\_{00} & \mathcal{S}\_{01} \\ \mathcal{S}\_{10} & \mathcal{S}\_{11} \end{array}\right) = \frac{1}{T - k} \sum\_{t = k + 1}^{T} \binom{R\_{0,t}}{R\_{1,t}} \left(\begin{array}{c} R\_{0,t} \\ R\_{1,t} \end{array}\right)' \tag{24}
$$

and a set of squared canonical correlations 1 <sup>≥</sup> *<sup>λ</sup>*<sup>ˆ</sup> <sup>1</sup> ≥···≥ *<sup>λ</sup>*<sup>ˆ</sup> *<sup>m</sup>* <sup>≥</sup> 0 by solving the eigenvalue problem

$$0 = \det(\lambda S\_{11} - S\_{10} S\_{00}^{-1} S\_{01}).$$

Hence, the log partial likelihood ratio (*PLR*) test statistic for the null hypothesis of cointegrating rank *r*, H-(*r*), against the hypothesis H-(*m*) is

$$PLR\{\mathsf{H}\_{\ell}(r)|\mathsf{H}\_{\ell}(m)\} = -(T-k)\sum\_{i=r+1}^{m} \log(1-\hat{\lambda}\_{i}).\tag{25}$$

#### *3.2. Asymptotic Distribution of the Test Statistic*

We derive the asymptotic distribution of the rank test statistic in a setting where the relative break points satisfy *Tj*/*T* → *vj* for *j* = 0, ... , *q* while *T* goes to infinity. The relative break points satisfy 0 = *v*<sup>0</sup> < *v*<sup>1</sup> < ··· < *vq* = 1. Indeed, with *int*(*x*) denoting the integer part of *x*, then the *q*-vector *Eint*(*Tu*) on *u* ∈ [0, 1] has the limit

$$\mathcal{L}\_{\mathfrak{u}} = \left\{ \mathbf{1}\_{\left(\mathfrak{v}\_0 < \mathfrak{u} < \mathfrak{v}\_1\right)}, \dots, \mathbf{1}\_{\left(\mathfrak{v}\_{\mathfrak{l}-1} < \mathfrak{u} < \mathfrak{v}\_{\mathfrak{l}}\right)} \right\}'. \tag{26}$$

In the standard framework developed by Johansen (1995), the innovation sequence *ε<sup>t</sup>* is assumed to be independent and identically Gaussian distributed, and this assumption was adopted by HJNR and JMN as reviewed in Section 2.1 above. We relax this normality assumption to a martingale difference assumption. If the innovations *ε<sup>t</sup>* are not normal, the model equations lead to a quasi-likelihood function rather than a likelihood function. Weak exogeneity is preserved as it is a property of the likelihood rather than the distribution of the innovations as such. The partial innovation *εy*·*z*,*<sup>t</sup>* = *εy*,*<sup>t</sup>* − *ωεz*,*<sup>t</sup>* and the marginal innovation *εz*,*<sup>t</sup>* are uncorrelated, but they will not be independent in general when moving away from the normality assumption. We can no longer appeal to the conditional-distribution argument, as implied in Equation (3). Thus, the conditional-distribution argument is replaced with a regression argument, see Appendix C.2. The martingale difference assumption is summarised as Assumption 2 below.

**Assumption 2.** *Assume that ε<sup>t</sup> is a martingale difference sequence with respect to a filtration* F*<sup>t</sup> such that* E(*εt*|F*t*−1) = 0 *almost surely (a*.*s*.*). Let* Ω *be a positive definite matrix. Suppose that* (*i*) E(*εtε <sup>t</sup>*) = Ω*;*

	- (*a*) sup*t*∈<sup>N</sup> <sup>E</sup>{*ε <sup>t</sup>εt*1(*ε tεt*>*a*)|F*t*−1} <sup>P</sup> → 0 *as a* → ∞*;*
	- (*b*) sup*t*∈<sup>N</sup> <sup>E</sup>|*εt*<sup>|</sup> <sup>4</sup> < ∞.

The boundedness conditions in part (*iii*) are not nested. Part (*a*) can be satisfied without the existence of fourth moments as in part (*b*). Conversely, bounded fourth moments in part (*b*) do not necessarily imply part (*a*); see Remark A1 in Appendix C.1.

Under Assumption 2, we are able to apply the results of Brown (1971) to analyse the random walk components of the process. For this, we require a Lindeberg condition, which is established in Lemma A1 in Appendix C.1 under Assumption 2. Brown's result is for univariate martingale difference sequences and requires that the ratio of the sum of conditional variances to that of unconditional variances should converge to unity. For the multivariate case, we can apply the Cramér–Wold device and form linear combinations of the present multivariate martingale differences. Using parts (*i*),(*ii*) we can then show that Brown's ratios converge to unity.

Under Assumption 2, we can also analyse the (approximately) stationary components of the process. Under part (*iii*.*a*), we can apply the results of Anderson and Kunitomo (1992), which exploit a truncation argument. Under part (*iii*.*b*) we can apply the same ideas as in Anderson and Kunitomo (1992) but without the truncation argument.

Cointegration models with heteroscedasticity have previously been analysed by for instance Cavaliere et al. (2010) and Boswijk et al. (2016). The former paper is concerned with rank testing in a full system. For the analysis of the (approximately) stationary components, it relies on Hannan and Heyde (1972) who require an almost sure version of Assumption 2(*ii*). The latter paper is concerned with testing on the cointegrating vectors in a full system with an elaborate, deterministic structure for the variances of the innovations.

Before proceeding to the main results, we present a set of stronger assumptions requiring constant conditional variance, see Assumption 3 and Lemma 1 below. These assumptions were used by Lai and Wei (1982, 1985) as well as Chan and Wei (1988). They have two advantages. First, they are easier to check for practitioners than the convergence results in Assumption 2. Second, the assumptions could be used to derive a variety of almost sure convergence results, as explored by Lai and Wei (1982, 1985) and Nielsen (2005), although we will not exploit those properties here.

**Assumption 3.** *Assume that ε<sup>t</sup> is a martingale difference sequence with respect to a filtration* F*<sup>t</sup> such that* E(*εt*|F*t*−1) = 0 *a*.*s*. *and*

(*i*) Var(*εt*|F*t*−1) = Ω *a*.*s*.*, where* Ω *is positive definite;* (*ii*) sup*t*∈<sup>N</sup> <sup>E</sup>(|*εt*<sup>|</sup> <sup>2</sup>+*<sup>ξ</sup>* |F*t*−1) <sup>&</sup>lt; <sup>∞</sup> *<sup>a</sup>*.*s*. *for some <sup>ξ</sup>* <sup>&</sup>gt; 0.

#### **Lemma 1.** *Assumption 3 implies Assumption 2.*

We can now present the limit distribution of the *PLR* statistic (25), noting that, formally, it is a a log partial quasi-likelihood ratio test statistic under Assumption 3. For this purpose, let <sup>D</sup> → signify weak convergence, while let *Bu* represent a (*p* − *r*)-dimensional standard Brownian motion process on *<sup>u</sup>* <sup>∈</sup> [0, 1] and let *<sup>B</sup>*(*m*−*r*) *<sup>u</sup>* be the first *<sup>m</sup>* <sup>−</sup> *<sup>r</sup>* coordinates of *Bu*. The limit distribution is given in the next theorem, which is proved in Appendix C.3.

**Theorem 2.** *Suppose that Assumptions 1 and 2 are satisfied along with α<sup>z</sup>* = 0*, so that Zt is weakly exogenous with respect to αy, β and γ. As T* → ∞, *with relative break points satisfying Tj*/*T* → *vj for* 0 = *v*<sup>0</sup> < *v*<sup>1</sup> < ··· < *vq* = 1, *the PLR test statistic (25) under* H-(*r*) *satisfies*

$$\operatorname{PLR}\{\mathsf{H}\_{\ell}(r)|\mathsf{H}\_{\ell}(m)\} \xrightarrow{\operatorname{D}} \operatorname{D}\mathsf{F}\_{\ell}(m-r, p-r; \upsilon),\tag{27}$$

*where, with eu defined in (26),*

$$\begin{split} \mathrm{DF}\_{\ell}(m-r,p-r;v) &=& \mathrm{tr}\left\{\int\_{0}^{1} dB\_{u}^{(m-r)} G\_{u}^{\prime} \left(\int\_{0}^{1} G\_{u} G\_{u}^{\prime} du\right)^{-1} \int\_{0}^{1} G\_{u} dB\_{u}^{(m-r)\prime} \right\}, \\ \mathrm{G}\_{u} &=& \left(\begin{array}{c} B\_{u} \\ \mu \varepsilon\_{u} \end{array}\right) - \int\_{0}^{1} \left(\begin{array}{c} B\_{\mathrm{s}} \\ \mathrm{s} \varepsilon\_{\mathrm{s}} \end{array}\right) \varepsilon\_{\mathrm{s}}^{\prime} ds \left(\int\_{0}^{1} \varepsilon\_{\mathrm{s}} \varepsilon\_{\mathrm{s}}^{\prime} ds\right)^{-1} \varepsilon\_{\mathrm{u}}. \end{split}$$

Note that, when *p* = *m*, the result in Theorem 2 corresponds to Theorem 3.1 in JMN. A direct simulation of (27) is rather laborious. By exploiting some analytic properties of the distributions, we are able to simplify this simulation task. The next Theorem 3 describes these properties by linking the moments of the limit distribution in Theorem 2 to those for the full model. Theorem 3 provides a basis for simulation in Section 4. The proof of this theorem, given in Appendix C.3, is based on a slight modification of results in Doornik (1998, §9); see also Boswijk and Doornik (2005).

**Theorem 3.** *Let Bi*,*<sup>u</sup> be the i-th coordinate of the Brownian motion Bu*. *Let*

$$\mathbf{T}\_{i} = \int\_{0}^{1} dB\_{i,\mu} \mathbf{G}\_{\mathbf{u}}^{\prime} \left(\int\_{0}^{1} \mathbf{G}\_{\mathbf{u}} \mathbf{G}\_{\mathbf{u}}^{\prime} d\mathbf{u}\right)^{-1} \int\_{0}^{1} \mathbf{G}\_{\mathbf{u}} dB\_{i,\mu}^{\prime} \quad \text{for} \quad i = 1, \dots, p - r.$$

*Then,* T1, ... ,T*p*−*<sup>r</sup> are identically distributed and any pairs* T*j*,T*<sup>k</sup> are also identically distributed. Moreover, the limiting statistic (26) satisfies* DF-(*<sup>m</sup>* <sup>−</sup> *<sup>r</sup>*, *<sup>p</sup>* <sup>−</sup> *<sup>r</sup>*; *<sup>v</sup>*) = <sup>∑</sup>*m*−*<sup>r</sup> <sup>i</sup>*=<sup>1</sup> T*<sup>i</sup> with expectation and variance given by*

$$\begin{aligned} \mathsf{E}\{\mathsf{D}\mathsf{F}\_{\ell}(m-r,p-r;\upsilon)\} &= \ & \left(\frac{m-r}{p-r}\right)\mathsf{E}\left(\sum\_{i=1}^{p-r} \mathsf{T}\_{i}\right),\\ \mathsf{Var}\{\mathsf{D}\mathsf{F}\_{\ell}(m-r,p-r;\upsilon)\} &= & \left(\frac{m-r}{p-r}\right)\mathsf{Var}\left(\sum\_{i=1}^{p-r} \mathsf{T}\_{i}\right) - (m-r)(p-m)\mathsf{Cov}(\mathsf{T}\_{1},\mathsf{T}\_{2}).\end{aligned}$$

Since the above Theorem 3 links the moments of the statistics for partial systems and for a full system, we can now proceed by simulating distributions for full systems only. JMN simulated response surfaces for the mean and variance of ∑*p*−*<sup>r</sup> <sup>i</sup>*=<sup>1</sup> T*i*. As we will also need a response surface for Cov(T1, T2), we have to redo their simulation exercise. For this purpose, we quote a result from JMN.

*Econometrics* **2019**, *7*, 42

**Theorem 4** (JMN, Theorem 3.2)**.** *Let B*[1] , ... , *<sup>B</sup>*[*q*] *be independent* (*<sup>p</sup>* <sup>−</sup> *<sup>r</sup>*)*-dimensional standard Brownian motions and define*

$$\begin{aligned} J\_{\dot{j}} &=& \left\{ \int\_0^1 \left( \mu |1\rangle^2 \, du \right\}^{-1/2} \int\_0^1 \left( \mu |1\rangle \right) \left\{ dB\_{\boldsymbol{u}}^{[\boldsymbol{j}](\boldsymbol{m}-\boldsymbol{r})} \right\} \, d\boldsymbol{u} \right. \\ K\_{\dot{j}} &=& \int\_0^1 \left( \left. B\_{\boldsymbol{u}}^{[\boldsymbol{j}]} \right| \, \mu \, 1 \right) \left\{ dB\_{\boldsymbol{u}}^{[\boldsymbol{j}](\boldsymbol{m}-\boldsymbol{r})} \right\}' \, \\ L\_{\dot{j}} &=& \int\_0^1 \left( \left. B\_{\boldsymbol{u}}^{[\boldsymbol{j}]} \right| \, \mu \, 1 \right) \left( \left. B\_{\boldsymbol{u}}^{[\boldsymbol{j}]} \right| \, \mu \, 1 \right)' \, \mathrm{d} \boldsymbol{u} . \end{aligned}$$

*Then, the limiting variable (26) for a full sample with p* = *m satisfies*

$$\mathrm{DF}\_{\ell}(p-r,p-r;\upsilon) = \mathrm{tr}\left[\left(\sum\_{j=1}^{q} K\_{j} \Delta \upsilon\_{j}\right)' \left\{\sum\_{j=1}^{q} L\_{j} \left(\Delta \upsilon\_{j}\right)^{2}\right\}^{-1} \left(\sum\_{j=1}^{q} K\_{j} \Delta \upsilon\_{j}\right)\right] + \sum\_{j=1}^{q} l\_{j}' l\_{j},$$

*where* <sup>Δ</sup>*vj* <sup>=</sup> *vj* <sup>−</sup> *vj*−1*. Here, the two summands are independent and* <sup>∑</sup>*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> *J <sup>j</sup> Jj is distributed as <sup>χ</sup>*2{*<sup>q</sup>* (*<sup>m</sup>* <sup>−</sup> *<sup>r</sup>*)}*. Moreover, let J*[*i*] *<sup>j</sup> and K*[*i*] *<sup>j</sup> denote the ith coordinate of Jj and Kj so that*

$$\mathbb{T}\_{i} = \left(\sum\_{j=1}^{q} K\_{j}^{[i]} \Delta v\_{j}\right)' \left\{ \sum\_{j=1}^{q} L\_{j} \left(\Delta v\_{j}\right)^{2} \right\}^{-1} \left(\sum\_{j=1}^{q} K\_{j}^{[i]} \Delta v\_{j}\right) + \sum\_{j=1}^{q} \left(l\_{j}^{[i]}\right)^{2} \cdot \mathbb{1}$$

As in JMN, we note that Theorem 4 implies a simple relation between the limiting statistics for models with *q* and with *q* − 1 sub-sample periods that is

$$\lim\_{\Delta v\_q \to 0} \text{DF}\_\ell(p - r, p - r; v\_1, \dots, v\_{q-1}, v\_q) = \text{DF}\_\ell(p - r, p - r; v\_1, \dots, v\_{q-1}) + l\_q' l\_{q, \ell} \tag{28}$$

where DF-(*p* − *r*, *p* − *r*; *v*1,... *vq*−1) and *J <sup>q</sup> Jq* are independent and *J <sup>q</sup> Jq* is *<sup>χ</sup>*2(*<sup>p</sup>* <sup>−</sup> *<sup>r</sup>*).

#### *3.3. Asymptotic Distribution for the Broken Constant Case*

The model investigated previously has a broken linear trend. A variant of this model is free of such a linear trend but with a broken constant; see Equation (21). Equation (8) then reduces to

$$
\Delta Y\_t = \omega \Delta Z\_t + \mathfrak{a}\_{\mathcal{Y}}(\beta', \gamma\_j) \begin{pmatrix} X\_{t-1} \\ 1 \end{pmatrix} + \sum\_{i=1}^{k-1} \Gamma\_{\mathcal{Y}\cdot z, i} \Delta X\_{t-i} + \varepsilon\_{\mathcal{Y}\cdot z, t-i}
$$

As before, the partial quasi-likelihood is maximized by reduced rank regression. We follow the third approach (23) in Section 3.1, in which the broken linear trend is now replaced with the broken constant, so that we consider the vectors Δ*Yt* and *X*- *<sup>t</sup>*−<sup>1</sup> = (*X <sup>t</sup>*−1, *<sup>E</sup> t*) , together with the vector *Vt* composed of the variables <sup>Δ</sup>*Zt* and the lagged differences <sup>Δ</sup>*Xt*−1, ... , <sup>Δ</sup>*Xt*−*k*<sup>+</sup>1. Equation (23) then reduces to

$$
\begin{pmatrix} \mathcal{Z}\_{0,t} \\ \mathcal{Z}\_{1,t} \\ \mathcal{Z}\_{2,t} \end{pmatrix} = \begin{pmatrix} \Delta Y\_t \\ X\_{t-1}^\ell \\ V\_t \end{pmatrix}.
$$

The limit distribution of the log partial quasi-likelihood ratio test statistic for cointegrating rank *r*, denoted by *PLR*{H*c*(*r*)|H*c*(*m*)}, is given in Theorem 5 below. Its proof is based on a set of modifications of the proofs for the limit theorems in the previous sub-sections.

**Theorem 5.** *Suppose that Assumptions 1 and 2 are satisfied along with α<sup>z</sup>* = 0*, so that Zt is weakly exogenous with respect to αy, β and γ. As T* → ∞, *with relative break points satisfying Tj*/*T* → *vj for* 0 = *v*<sup>0</sup> < *v*<sup>1</sup> < ··· < *vq* = 1, *the PLR test statistic (25) under* H*<sup>c</sup>* (*r*) *satisfies*

$$PLR\left\{\mathsf{H}\_{\mathcal{E}}(r)|\mathsf{H}\_{\mathcal{E}}(m)\right\} \stackrel{\text{D}}{\rightarrow} \mathsf{D}\mathsf{F}\_{\mathcal{E}}(m-r, p-r; v),$$

*where* DF*<sup>c</sup> is defined as in Theorem 2 with the difference that*

$$G\_{\mathbb{N}} = \left(\begin{array}{c} B\_{\mathbb{N}} \\ e\_{\mathbb{N}} \end{array}\right).$$

*The results in Theorems 3 and 4 also apply with the present choice of Gu*.

#### **4. Approximations of the Asymptotic Distributions**

The limit distributions of the cointegrating rank test statistics are non-standard, as shown in the previous sub-sections; however, given the existing results in the literature, the distributions can be closely approximated by a gamma distribution identified by the first two moments. We first derive this approximation and then show how to implement the approximation.

#### *4.1. Derivation of Response Surface*

The literature shows that the asymptotic distributions for cointegration rank testing are nearly gamma distributed. The approximating gamma distribution can be captured either through the mean and variance of the asymptotic distribution or through the associated shape and scale parameters. The quality of the gamma-distribution approximation method has been documented in several papers. Using analytic methodology, Nielsen (1997) showed a very good agreement between limit distributions and approximate gamma distributions in tests for unit roots. Doornik (1998) then conducted detailed simulation studies to demonstrate a similar agreement for standard full-system cointegration rank test statistics; see also Doornik (2003) for various tables of asymptotic quantiles produced by the gamma-distribution approximations. JMN also employed this method.

In order to apply the gamma approximation method, we first define parameters for shape and scale. By Theorem 3, the partial system statistic satisfies DF-(*<sup>m</sup>* <sup>−</sup> *<sup>r</sup>*, *<sup>p</sup>* <sup>−</sup> *<sup>r</sup>*; *<sup>v</sup>*) = <sup>∑</sup>*m*−*<sup>r</sup> <sup>i</sup>*=<sup>1</sup> T*i*, where the statistics *Ti* are identically distributed and also the pairs *Ti*, *Tj* are identically distributed. Thus, we get

$$\begin{aligned} \mathbb{E}(\sum\_{i=1}^{m-r} \mathbb{T}\_i) &= \quad (m-r)\mathbb{E}(\mathbb{T}\_1),\\ \mathsf{Var}(\sum\_{i=1}^{m-r} \mathbb{T}\_i) &= \quad (m-r)\mathsf{Var}(\mathbb{T}\_1) + (m-r)(m-r-1)\mathsf{Cov}(\mathbb{T}\_1, \mathbb{T}\_2). \end{aligned}$$

Solve for E(T1) and Var(T1) when *m* = *p* and insert above to get

$$\mathbb{E}(\sum\_{i=1}^{m-r} \mathbb{T}\_i) \quad = \ \frac{m-r}{p-r} \mathbb{E}(\sum\_{i=1}^{p-r} \mathbb{T}\_i),\tag{29}$$

$$\mathsf{Var}(\sum\_{i=1}^{m-r}\mathsf{T}\_{i}) \quad = \ \frac{m-r}{p-r}\mathsf{Var}(\sum\_{i=1}^{p-r}\mathsf{T}\_{i}) - (m-r)(p-m)\mathsf{Cov}(\mathsf{T}\_{1},\mathsf{T}\_{2}).\tag{30}$$

Thus, it suffices to approximate the moments of the full sample distributions through simulation. Numerically, it appears that better approximations arise when approximating shape and scale parameters instead of mean and variance. We therefore write

$$\mathsf{Var}(\sum\_{i=1}^{p-r} \mathsf{T}\_i) = \delta\_{p-r}^2 \lambda\_{p-r\prime} \qquad \mathsf{E}(\sum\_{i=1}^{p-r} \mathsf{T}\_i) = \delta\_{p-r} \lambda\_{p-r} \tag{31}$$

From this, we get the shape and scale parameters as

$$\frac{1}{\lambda\_{p-r}} = \frac{\mathsf{Var}(\sum\_{i=1}^{p-r} \mathsf{T}\_i)}{\{\mathsf{E}(\sum\_{i=1}^{p-r} \mathsf{T}\_i)\}^2}, \qquad \delta\_{p-r} = \frac{\mathsf{Var}(\sum\_{i=1}^{p-r} \mathsf{T}\_i)}{\mathsf{E}(\sum\_{i=1}^{p-r} \mathsf{T}\_i)}.$$

Hence, we simulated *λp*−*r*, *δp*−*<sup>r</sup>* and Cov(T1,T2) and constructed response surfaces to approximate the distribution of DF-(*m* − *r*, *p* − *r*; *v*). Following JMN and Doornik (1998), we applied a variety of data generating processes and present the results using response surface analysis.

The quantities *λp*−*r*, *δp*−*<sup>r</sup>* and Cov(T1,T2) were simulated for a set of given *p* − *r*, *T* and relative break points. Following JMN, we chose *q* = 3 as the maximum number of sub-samples, with *a* and *b* representing the smallest and the second smallest of relative sample lengths, respectively. For example, if *q* = 2 along with *v*<sup>1</sup> ≤ 1 − *v*1, we then have *a* = 0 and *b* = *v*1. The grid points *a* and *b* were selected in the same way as those for Figure 1 in JMN e.g., (*a*, *b*)=(0, 0),(0, 0.05),(0, 0.1), ··· , so that they were subject to the constraints of *a* ≤ *b* and *b* ≤ (1 − *a* − *b*) and the total number of their combinations was 20, along with the selection of non-stationary components *p* − *r* = 1, ... , 8. For the overall sample sizes or *T*s, JMN used 10 integers derived from 500/*i* for *i* = 1, ... , 10, but we quadrupled them in order to improve approximations to the underlying limit distributions of the response variables. Thus, we obtained a new set of 10 sample sizes, *T*s, ranging from 200 to 2000. For log *λp*−*<sup>r</sup>* and log *δp*−*r*, this simulation design led to 1600 (= 20 × 8 × 10) cases, while the number of cases was reduced to 1400 for Cov(T1,T2) as a result of missing values corresponding to *p* − *r* = 1.

The computational algorithm used in our study was based on Theorem 4. These asymptotic results justify simulating three sets of *T*-step random walks for broken linear-trend and constant cases and scaling them according to the pre-specified relative sample lengths. The number of simulation replications *N* was set at 100,000.

For the response surface analysis, we used log *λp*−*r*, log *δp*−*<sup>r</sup>* and Cov(T1,T2) as the response variables, instead of the logged means and variance as in JMN. It turns out that the use of these response variables (log *λp*−*r*, in particular) mitigates the residual heteroscedasticity problem, hence resulting in a reduction of the number of indicator variables required for *p* − *r* = 2 and *p* − *r* = 1. Note that Cov(T1, T2) needs to be included in the set of response variables in any response surface study, in order to make use of Equation (30). In addition, note that taking the log of Cov(T1,T2) is not permissible, since covariance is not always positive.

Compared to JMN, we increased the maximum number of observations from *T* = 500 to *T* = 2000. It was found that the large-sample (*T* ≥ 1000) approximates of the mean and variance in small dimensions (*p* − *r* ≤ 3) tend to be rather different from those when *T* is small. This finding is consistent with Doornik (1998), who introduced a set of indicator variables being assigned 1 for *p* − *r* = 2 and *p* − *r* = 1 and assigned 0 otherwise; these indicators put residual heteroscedasticity under control even in the presence of influential values for *p* − *r* = 2 and *p* − *r* = 1.

We regressed each of the three response variables, log *λp*−*r*, log *δp*−*<sup>r</sup>* and Cov(T1,T2) on a set of regressors formed from *a*, *b*, *p* − *r* and *T*. Our baseline function form was a modified version of Equation (3.11) in JMN. In the context of the present paper, the equation in JMN is expressed as

$$y = \sum\_{m=0}^{2} \left( \phi\_m + \sum\_{i=1}^{4} \phi\_{im} z\_i + \sum\_{i=1}^{4} \sum\_{j \ge i} \psi\_{ijm} z\_i z\_j + \sum\_{i=1}^{4} \sum\_{j \ge i} \sum\_{k \ge j} \omega\_{ijkm} z\_i z\_j z\_k \right) d\_{m \times j}$$

where *<sup>y</sup>* is either log *<sup>λ</sup>p*−*r*, log *<sup>δ</sup>p*−*<sup>r</sup>* or Cov(T1,T2), while *<sup>z</sup>*<sup>1</sup> <sup>=</sup> *<sup>p</sup>* <sup>−</sup> *<sup>r</sup>*, *<sup>z</sup>*<sup>2</sup> <sup>=</sup> *<sup>a</sup>*, *<sup>z</sup>*<sup>3</sup> <sup>=</sup> *<sup>b</sup>*, *<sup>z</sup>*<sup>4</sup> <sup>=</sup> *<sup>T</sup>*−<sup>1</sup> and *dm* = (*p* − *r*) <sup>−</sup>*m*. Following Doornik (1998), we also added to this equation a set of indicator variables as explanatory variables, each of which is 1 for a selected value of dimension *p* − *r* and is 0 otherwise. Performing a series of regression analyses and carefully removing insignificant explanatory variables by utilising the *Autometrics* option available in *PcGive* (Doornik and Hendry 2013), we arrived at parsimonious response–surface functions for log *λp*−*r*, log *δp*−*<sup>r</sup>* and Cov(T1,T2); these functions are henceforth denoted *f <sup>z</sup> <sup>p</sup>*−*<sup>r</sup>* (*<sup>p</sup>* − *<sup>r</sup>*, *<sup>a</sup>*, *<sup>b</sup>*, *<sup>T</sup>*) with *<sup>z</sup>* taking values *<sup>λ</sup>*, *<sup>δ</sup>* and *cov*, respectively.

Tables A1 and A2 in Appendix A record the rounded coefficients for *a*, *b*, *p* − *r* and their variants in the response surface regression for the broken linear trend case and the broken constant case, respectively. The inverse of the observation number, *T*−1, and its variants such as *T*−2, also play critical roles in the response surface regression, but all of them are irrelevant asymptotically and thus disregarded when calculating the limit approximates based on these tables.

It should also be noted that a response–surface regression analysis of Cov(T1,T2) was technically difficult in terms of residual diagnostic tests. Doornik (1998) used the average of estimates for Cov(T1, T2) when performing a response surface analysis for partial systems with no break. We adhered to the regression approach, rather than simply taking the average of the covariance estimates, by assigning importance to various significant influences of *a*, *b* and *p* − *r* on the behaviour of Cov(T1,T2). This regression analysis indeed bore fruit and clarified the highly complex structure of the dependence of Cov(T1,T2) on *a*, *b*, *p* − *r* and its variants, as shown in the third column of each of Tables A1 and A2. These findings about Cov(T1,T2) are not known in the literature, thus giving added value to the response surface study conducted in this paper, although the impact of variation in Cov(T1,T2) on the approximate shape and scale parameters may not always be large.

Tables 1 and 2 display a set of examples demonstrating the accuracy of the response surface regression results. A class of approximately 95% limit quantiles is presented in each of the tables for various combinations of *a*, *b*, *p* − *r* and *m* − *r*, when either broken-linear-trend or broken-constant specifications are adopted in analysis. Approximate quantiles in the fifth column (*q*95) in Tables 1 and 2 are derived from Tables A1 and A2, respectively; that is, they are from the full-system-based response surface analysis, combined with the mappings (29), (30) and (31). By contrast, approximate quantiles recorded in the sixth column (*q*∗ <sup>95</sup>) of each table, except those for *a* = *b* = 0, were obtained directly from auxiliary response surface regressions based on partial-system simulations with the same *T*s and *N* as above. Each of these auxiliary regression equations employed a simulated 95% quantile as a response variable and involved a constant, *T*−<sup>1</sup> and its powers if necessary, as explanatory variables. The regression equations vary in specification for the purpose of capturing the underlying smooth response surfaces of various simulated quantiles; the graph of each regression's actual and fitted values was checked to ensure the capturing of the underlying smoothness. Estimated constants in these regression equations are recorded in the columns for *q*∗ <sup>95</sup> as approximate 95% limit quantiles. The limit quantiles in *q*∗ <sup>95</sup> for *a* = *b* = 0 (that is, no break cases) were taken from Doornik (2003).

Tables 1 and 2 show that the quantiles in *q*<sup>95</sup> almost coincide with those in *q*∗ <sup>95</sup> regardless of specifications of the deterministic terms; see the seventh column of each table for *q*95/*q*<sup>∗</sup> <sup>95</sup> − 1 , a series of absolute relative errors, all of which are very small. This correspondence can be seen as strong evidence supporting the validity of the proposed approximation method based on the full model. Furthermore, the eighth column of each table records a class of discrepancies in approximate *p*-values, defined as Δp*app* = *g <sup>q</sup>*95 *q*<sup>95</sup> <sup>−</sup> *<sup>q</sup>*<sup>∗</sup> 95 , in which *g* (·) represents a gamma density function calculated from simulated mean and variance. Most of the discrepancies are very small, and even the largest one is around 0.02 when *p* − *r* is relatively large, for which we should recall that a large value of *p* − *r* could give rise to various other distortion issues in practice. The overall evidence allows us to argue that the approximate quantiles work as useful critical values in applications from a practical viewpoint. The Supplementary Materials includes an Ox code for simulating asymptic distribution. This can be used if further precision is needed.

As a caveat in relation to large values for *p* − *r*, let us recall that our response surface regression was conducted by using a class of realistic number of non-stationary variables, *p* − *r* = 1, ... , 8, which suffice in most applied research. Thus, an empirical study using a partial system of large

dimension may require careful examinations of the underlying cointegrating rank, in addition to the application of the proposed *PLR* tests to the data under study, as discussed by Juselius (2006, §8).


**Table 1.** A comparative analysis of 95% limit quantiles: broken linear-trend models.

*Notes*. *q*<sup>95</sup> denotes 95% limit quantiles approximated from the full systems, while *q*∗ <sup>95</sup> denotes those calculated directly from the partial systems. Δp*app* represents discrepancies in approximate *p*-values.


**Table 2.** A comparative analysis of 95% limit quantiles: broken constant models.

*Notes*. *q*<sup>95</sup> denotes 95% limit quantiles approximated from the full systems, while *q*∗ <sup>95</sup> denotes those calculated directly from the partial systems. Δp*app* represents discrepancies in approximate *p*-values.

#### *4.2. Implementation of Response Surface*

The response surface in Tables A1 and A2 are used as follows. The response surface is aimed at the situation with two breaks. However, Theorem 4 shows that with a simple correction the response surface can also be used with a single break or no break.

In the case of *q* = 3 sample periods and thus 2 breaks at *T*1, *T*2, where 0 < *T*<sup>1</sup> < *T*<sup>2</sup> < *T*, we let *a*, *b* be the smallest and second-smallest relative sub-sample length. Thus, if *v*<sup>1</sup> = *T*1/*T*, *v*<sup>2</sup> = (*T*<sup>2</sup> − *T*1)/*T*, *v*<sup>3</sup> = (*T* − *T*2)/*T* so that *v*<sup>1</sup> + *v*<sup>2</sup> + *v*<sup>3</sup> = 1. We choose *a* = min(*v*1, *v*2, *v*3) and *b* = median(*v*1, *v*2, *v*3).

In the case of *q* = 2 sample periods and thus 1 break at *T*1, where 0 < *T*<sup>1</sup> <, then *v*<sup>1</sup> = *T*1/*T*, *v*<sup>2</sup> = (*T* − *T*1)/*T*, so that *v*<sup>1</sup> + *v*<sup>2</sup> = 1. We let *a* = 0 and *b* = min(*v*1, *v*2).

In the case of *q* = 1 sample period and thus no break, let *a* = *b* = 0. Theorem 4 and (28) show that the mean and variance for the cases where *q* < 3 can be found from those for *q* = 3 by choosing *a*, *b* as indicated and subtracting (3 − *q*)(*p* − *r*) and 2(3 − *q*)(*p* − *r*), respectively.

Given the choices of *p* − *r*, *m* − *r*, *a* and *b*, compute the approximations to

$$f\_{p-r}^{\lambda} = \log \lambda\_{p-r}, \qquad f\_{p-r}^{\delta} = \log \delta\_{p-r}, \qquad c\_{p-r} = \mathsf{Cov}(\mathsf{T}\_1, \mathsf{T}\_2). \tag{32}$$

Table A1 is used for the case with a broken linear trend while Table A2 is used for the case with a broken constant. This is then inserted in (31), which in turn is inserted into (29), (30), while correcting for the number of breaks, that is,

$$\mathbb{E}(\sum\_{i=1}^{m-r} \mathbb{T}\_i) \quad = \ \frac{m-r}{p-r} \exp\left(f\_{p-r}^{\delta} + f\_{p-r}^{\lambda}\right) - (3-q)(m-r), \tag{33}$$

$$\text{Var}(\sum\_{i=1}^{m-r} \mathbb{T}\_i) \quad = \frac{m-r}{p-r} \exp\left(2f\_{p-r}^{\delta} + f\_{p-r}^{\lambda}\right) - (m-r)(p-m)c\_{p-r} - 2(3-q)(m-r). \tag{34}$$

Finally, we approximate the quantile of interest or the *p*-value of the observed *PLR* statistic using a gamma distribution with mean and variance matching (33) and (34). Equivalently, one can specify the shape and scale of the gamma distribution as *mean*2/*variance* and *variance*/*mean*.

A spreadsheet for implementing the response surfaces in Tables A1 and A2 is available in the Supplementary Materials. This also includes an Ox program for simulating the asymptotic distributions and calculating *p*-values of observed test statistics for specifications outside the range covered by Tables A1 and A2, for instance when the number of structural breaks is greater than 2 or *q* > 3.

#### **5. Empirical Illustration**

As empirical illustration, we analyse a set of quarterly time series data from Schreiber (2015), who attained an econometric system for the exchange rate and bilateral trade between the UK and Germany. She decomposed the UK-Germany economic system into two blocks, a foreign exchange block and bilateral trade block, in order to obtain a data-congruent representation useful for forecasting and policy analysis. Various econometric studies were conducted by Schreiber (2015), and one of them was the analysis of a partial model for the bilateral trade block with a structural break. The methodology developed in the above sections enables us to conduct formal tests for cointegrating rank that underlies such a partial system subject to a break. This partial system analysis may also be encouraged in terms of local power advantage of partial-system-based tests over those based on a full system under weak exogeneity, as demonstrated by Doornik et al. (1998) as well as Kurita (2011).

Figure 1 presents an overview of the quarterly data spanning the sample period of the first quarter in 1991—the second quarter in 2014, denoted as 1991.1–2014.2 hereafter. The variable *tbt* is the trade balance between the UK and Germany, i.e., the difference between the log of exports of UK goods to Germany and the log of imports of German goods to the UK; *dulct* represents the unit labour cost differential between the two countries; *yt* and *y*∗ *<sup>t</sup>* denote the logs of the UK and German gross domestic products, respectively; *pppt* represents the terms of trade in logarithm. See Schreiber (2015) for further details of the data. The figure indicates the presence of a structural break around 2008–2009 attributable to a global economic recession over this period.

**Figure 1.** Data. (**a**) *tbt* is the trade balance between the UK and Germany; (**b**) *dulct* is the unit labour cost differential between the UK and Germany; (**c**) *yt* and *y*∗ *<sup>t</sup>* are the logs of the UK and German gross domestic products, respectively; (**d**) *pppt* is the terms of trade.

In this empirical illustration, we analyse the data using a bivariate partial autoregressive model for *tbt* and *dulct*, with *yt*, *y*∗ *<sup>t</sup>* and *pppt* assumed to be weakly exogenous for the class of parameters of interest such as cointegrating vectors; that is, *p* = 5 and *m* = 2. This assumption is based on Schreiber's study, suggesting that modelling the bilateral trade block centering on *tbt* and *dulct* appears to be conformable to the underlying data structure. The lag-length *k* = 2 is selected for our bivariate partial autoregressive model.

With regard to the issue on a structural break, we adopt a broken trend specification; that is, the presence of a shift in the restricted trend as well as the unrestricted constant. The second sub-sample period starts in 2008.3, corresponding to the observation point in *Tq*−<sup>1</sup> for *q* = 2, which results in the selection of relative break points *a* = 0 and *b* = 0.255. According to (10), our bivariate model with *k* = 2 requires a set of two impulse dummy variables for the initial values of the second sub-sample periods. In addition, a pair of impulse dummy variables, *Dp*1998(1) and *Dp*2006(2), is employed in our model to capture outliers in the data, as in Schreiber (2015); the former variable is 1 in 1998.1 and zero otherwise for an outlier due to the Asian financial crisis, while the latter is 1 in 2006.2 and zero otherwise, corresponding to an outlier attributable to an increase in oil prices.

A set of residual diagnostic tests for the partial system is reported in Table 3. Most of the test statistics are given in the form F*j*(*d f* 1, *d f* 2), which denotes an approximate *F* test (with relevant degrees of freedom *d f* 1 and *d f* 2) against the alternative hypothesis *j*. The alternative hypotheses are specified as: 5th-order serial correlation (F*AR*5, Godfrey 1978), 4th-order autoregressive conditional heteroscedasticity (F*ARCH*4, Engle 1982), heteroscedasticity (F*HET*, White 1980). Chi-squared tests for normality (*χ*<sup>2</sup> *ND*, Doornik and Hansen 2008) are also recorded in the table. We also note the following caveats based on recent advances in the field of mis-specification tests: Nielsen (2006) demonstrated that F*AR*<sup>5</sup> is a valid test in the presence of unit roots; Berenguer-Rico and Wilms (2018) showed that F*HET* is valid after eliminating outliers from the observations, while *χ*<sup>2</sup> *ND* is not necessarily valid after the removal of outliers, which was demonstrated by Berenguer-Rico and Nielsen (2017). In any case, no evidence is found in Table 3, suggesting significant mis-specification problems. We can thus judge this partial system is formulated sufficiently well to be subjected to *PLR* tests for cointegrating rank.


**Table 3.** Diagnostic test statistics for the estimated partial system.

*Notes.* Figures in square brackets are *p*-values.

Table 4 presents a class of *PLR* test statistics for the determination of cointegrating rank, along with the corresponding *p*-values and approximate 95% limit quantiles calculated from the response surface outcomes in the previous section. We used Table A1 in Appendix A to calculate approximates to log *δp*−*r*, log *λp*−*<sup>r</sup>* and Cov(T1,T2), and then applied them to the mappings (29) and (30) adjusted for extra*χ*<sup>2</sup> terms, so that the gamma-distribution approximation method yielded the *p*-values. Table 4 shows that, at the 5% level, the null hypothesis *r* = 0 is rejected while the hypothesis *r* ≤ 1 fails to be rejected. Hence, this formal analysis enables us to reach the conclusion of *r* = 1, which supports the informal analysis of Schreiber (2015).

**Table 4.** Testing for cointegrating rank.


*Notes.* Figures in square brackets are *p*-values. ∗ denotes significance at the 5% level.

The estimated cointegrating relationship under some additional restrictions is

$$tb = \underset{(0.121)}{0.259} \underline{d}ult \mathbf{c}\_{l} - 0.726ppp\_{l} + \underset{(0.323)}{2.34} \left( y\_{t}^{\*} - y\_{l} \right) - 0.019t \mathbf{1}\_{\left( \geq 2009: 1 \right)} + v\_{l \prime} \tag{35}$$

where a figure in brackets under each coefficient is a standard error and *υ<sup>t</sup>* is a stationary error. The signs of the coefficients in (35) are the same as those in Schreiber (2015)'s cointegrating equation except for *y*∗ *<sup>t</sup>* . The German income *y*<sup>∗</sup> *<sup>t</sup>* was insignificant in her cointegrating relationship and thus removed from it, while, in (35), *y*∗ *<sup>t</sup>* plays a significant role, along with *yt*. As a result of checking a set of unrestricted estimates for the cointegrating vector, we have arrived at Equation (35), where the coefficients of *yt* and *y*∗ *<sup>t</sup>* are restricted to add to zero, while a zero-restriction is placed on the coefficient for *<sup>t</sup>*1(≤2008:2); that is, a linear trend is present only in the second sub-sample period. The *PLR* test statistic for these restrictions is 3.571[0.168], in which the figure in square brackets is a *p*-value according to *χ*2(2). Thus, the hypothesis of the overall restrictions cannot be rejected at the 5% level.

There are several interesting aspects of Equation (35) that are worth discussing here. The real income difference between Germany and the UK, *y*∗ *<sup>t</sup>* − *yt*, has a positive coefficient, implying that a spread in the income difference leads to an improvement in the UK trade balance with Germany. This finding is interpretable in the context of an income effect from each of the two countries. The coefficient for the terms of trade, *pppt*, should also be noted. It is negative, thus indicating a relative price effect on the trade balance in a theory-consistent manner; that is, a decrease in exports prices relative to import prices leads to trade balance improvement, so that the well-known elasticity approach to trade balance appears to be empirically valid for the two countries. Furthermore, the linear trend *t* is significant solely in the second sub-sample period, suggesting long-lasting influences of the global recession on the two countries' trade balance and other economic variables.

Finally, we will check that the three variables, *yt*, *y*∗ *<sup>t</sup>* and *pppt*, are indeed weakly exogenous for the class of parameters of interest. We follow the testing procedure suggested by Johansen (1992a), Boswijk (1992) and HJNR. First, the restricted cointegrating combination is added as a regressor to a marginal system (9) for *Zt* = (*yt*, *y*<sup>∗</sup> *<sup>t</sup>* , *pppt*) . Second, a standard regression analysis is performed to

test for the significance of the cointegrating combination in each equation. Table 5 reports a class of *LR* test statistics for the exclusion of the empirical cointegrating linkage from each equation in the marginal system. Judging from the reported *p*-values according to *χ*2(1), none of the test statistics indicate evidence against the assumption of weak exogeneity; thus, the preceding partial-system analysis of cointegrating rank has been justified.

**Table 5.** Checking weak exogeneity.


#### **6. Conclusions**

This study has explored partial cointegrated vector autoregressive models subject to structural breaks in deterministic terms, a linear trend and constant. The Granger–Johansen representation of the full model in JMN has been reexamined, leading to a useful clarification of roles of the initial values in asymptotic analysis. A class of log likelihood ratio test statistics for cointegrating rank has then been introduced in the proposed partial-model framework. We have investigated asymptotic theory under a general class of innovation distributions allowing martingale difference sequences with conditional heteroscedasticity. The derived limit distributions of the statistics are closely related to those for the full models investigated by JMN. This relationship allows us to perform a response surface analysis in a simplified full-system framework, instead of relying on laborious partial-system-based simulations. The outcomes of the analysis are summarised as a set of two statistical tables providing valuable information for inference on the underlying cointegrating rank. Lastly, an empirical analysis of real-life data from the UK and Germany has demonstrated the practicality of these tables in applied economic research. As a result of this study, the partial cointegrated models have become more flexible and reliable devices for modelling time series data subject to various structural breaks.

Recently, bootstrap methods have been proposed for cointegration rank testing in full systems (Cavaliere et al. 2012). It would be interesting to extend those to partial systems with or without breaks.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2225-1146/7/4/42/s1: A preadsheet for implementing the response surface in Tables A1 and A2, as well as an Ox program for simulating the asymptotic distributions.

**Author Contributions:** The authors made equal contributions.

**Funding:** T. Kurita gratefully acknowledges financial support for this work from JSPS KAKENHI (Grant Nos: 15KK0141 and 26380349).

**Acknowledgments:** We are grateful to the editors and two anonymous referees for constructive comments. The results reported in this paper were presented at Norwegian University of Science and Technology (NTNU), Statistics Norway and the University of Copenhagen. We would like to thank Gunnar Bårdsen, Pål Boug, Søren Johansen and various other seminar participants for helpful comments and suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.


**Table A1.** Response surfaces for broken trend models.

**Appendix A. Tables for Response Surfaces**

*Note*: 1(*x*) is 1 when *p* − *r* = *x* and zero otherwise.


**Table A2.** Response surfaces for broken constant models.

Note: 1(*x*) is 1 when *p* − *r* = *x* and zero otherwise.

#### **Appendix B. Proof of the Granger–Johansen Representation**

This section provides a proof of Theorem 1, in which the Granger–Johansen representation of the full model with deterministic breaks is presented.

**Proof of Theorem 1.** The companion form of the homogenous Equation (13) is

$$
\Delta \widetilde{\mathbf{X}}\_{t-1} = \mathfrak{a} \mathfrak{F}' \widetilde{\mathbf{X}}\_{t-1} + \mathfrak{a}\_{t\prime},
$$

on the effective sample, see (7). As shown by Hansen (2005, Lemma A.1), rank(*α* ⊥Ψ*β*⊥) = *<sup>p</sup>* <sup>−</sup> *<sup>r</sup>* stated in Assumption 1 implies that the above homogenous equation is an *I*(1) system satisfying

$$
\mathfrak{G}'\widetilde{X}\_t = (I\_r + \mathfrak{G}'\mathfrak{a})\mathfrak{F}'\widetilde{X}\_{t-1} + \mathfrak{G}'\iota\varepsilon\_t \quad \text{with} \quad |\text{eigen}(I\_r + \mathfrak{G}'\mathfrak{a})| < 1,
$$

which is a stable equation (Lai and Wei 1985).

*Econometrics* **2019**, *7*, 42

We then follow Kurita and Nielsen (2009) in the analysis of non-stationary components. Start by the homogenous Equation (13) for 1 ≤ *<sup>j</sup>* ≤ *<sup>q</sup>* and *Tj*−<sup>1</sup> + *<sup>k</sup>* < *<sup>t</sup>* ≤ *Tj*:

$$
\Delta \tilde{X}\_t = \mathfrak{a} \beta' \tilde{X}\_{t-1} + \sum\_{i=1}^{k-1} \Gamma\_i \Delta \tilde{X}\_{t-i} + \varepsilon\_t.
$$

Pre-multiplying the above equation by *α* <sup>⊥</sup> and replacing <sup>Δ</sup>*X*˜*t*−*<sup>i</sup>* <sup>=</sup> <sup>Δ</sup>*X*˜*<sup>t</sup>* <sup>−</sup> <sup>∑</sup>*i*−<sup>1</sup> -<sup>=</sup><sup>0</sup> <sup>Δ</sup>2*X*˜*t*−-, we collect repeated terms Δ*X*˜*<sup>t</sup>* on the left-hand side to find

$$\alpha'\_{\perp} \Psi \Delta \bar{X}\_t = -\alpha'\_{\perp} \sum\_{i=1}^{k-1} \Gamma\_i \sum\_{\ell=0}^{i-1} \Delta^2 \bar{X}\_{t-\ell} + \alpha'\_{\perp} \varepsilon\_{t-\ell}$$

by recalling <sup>Ψ</sup> <sup>=</sup> *Ip* <sup>−</sup> <sup>∑</sup>*k*−<sup>1</sup> *<sup>i</sup>*=<sup>1</sup> Γ*i*. Summing *α* <sup>⊥</sup>ΨΔ*X*˜ *<sup>s</sup>* over *<sup>s</sup>* <sup>=</sup> *Tj*−<sup>1</sup> <sup>+</sup> *<sup>k</sup>* <sup>+</sup> 1, . . . , *<sup>t</sup>* yields

$$a'\_{\perp} \, \Psi \hat{X}\_t = a'\_{\perp} \sum\_{\substack{s=T\_{j-1} \\ +k+1}}^t \varepsilon\_s - a'\_{\perp} \sum\_{i=1}^{k-1} \Gamma\_i \sum\_{\ell=0}^{i-1} \Delta \hat{X}\_{t-\ell} - a'\_{\perp} \, \Psi \, \hat{X}\_{T\_{j-1}+k} + a'\_{\perp} \sum\_{i=1}^{k-1} \Gamma\_i \sum\_{\ell=0}^{i-1} \Delta \hat{X}\_{T\_{j-1}+k-\ell} \, \Psi \hat{X}\_{t-\ell}$$

Apply the orthogonal projection identity *α* <sup>⊥</sup>Ψ*X*˜*<sup>t</sup>* <sup>=</sup> *<sup>α</sup>* ⊥Ψ*β*⊥*β*¯ <sup>⊥</sup>*X*˜*<sup>t</sup>* <sup>+</sup> *<sup>α</sup>* ⊥Ψ*ββ*¯ *X*˜*<sup>t</sup>* to the left-hand side and then pre-multiply both sides by *<sup>β</sup>*⊥(*α* ⊥Ψ*β*⊥)−<sup>1</sup> to find the *<sup>C</sup>* matrix. Shifting *<sup>C</sup>*Ψ*ββ*¯ *X*˜*<sup>t</sup>* to the right hand side, we arrive at

$$\begin{array}{rcl} \beta \, \_\perp \bar{\beta}' \, \_\perp \tilde{X}\_t &= \mathbb{C} \sum\_{s=T\_{j-1}+k+1}^t \varepsilon\_s - \mathbb{C} \Psi \bar{\beta} \beta' \tilde{X}\_t - \mathbb{C} \sum\_{i=1}^{k-1} \Gamma\_i \sum\_{\ell=0}^{i-1} \Delta \tilde{X}\_{t-\ell} \\ &- \mathbb{C} \Psi \tilde{X}\_{T\_{j-1}+k} + \mathbb{C} \sum\_{i=1}^{k-1} \Gamma\_i \sum\_{\ell=0}^{i-1} \Delta \tilde{X}\_{T\_{j-1}+k-\ell} . \end{array}$$

Adding *ββ*¯ *X*˜*<sup>t</sup>* on both sides results in

$$\begin{array}{lll}\bar{\mathbf{X}}\_{t} &= \mathbb{C}\sum\_{s=T\_{j-1}+k+1}^{t} \varepsilon\_{s} + (I-\mathbb{C}\mathbf{Y})\bar{\beta}\beta^{\prime}\bar{\mathbf{X}}\_{t} - \mathbb{C}\sum\_{i=1}^{k-1} \Gamma\_{i}\sum\_{\ell=0}^{i-1} \Delta\tilde{\mathbf{X}}\_{t-\ell} \\ &- \mathbb{C}\mathbf{Y}\bar{\mathbf{X}}\_{T\_{j-1}+k} + \mathbb{C}\sum\_{i=1}^{k-1} \Gamma\_{i}\sum\_{\ell=0}^{i-1} \Delta\tilde{\mathbf{X}}\_{T\_{j-1}+k-\ell}. \end{array} \tag{A1}$$

Using the notation <sup>Υ</sup>*<sup>i</sup>* = −Γ*<sup>i</sup>* −···− <sup>Γ</sup>*k*−<sup>1</sup> for 1 ≤ *<sup>i</sup>* ≤ *<sup>k</sup>* − 1 as well as the matrix <sup>Λ</sup> defined in (14) leads to the first desired result (17).

Next, we move on to the non-homogenous formulation (12), in which *μ<sup>j</sup>* and *γ<sup>j</sup>* are distinct from zero. Replace *Xt* in (12) with *X*˜*<sup>t</sup>* + *τc*,*<sup>j</sup>* + *τl*,*jt* and refer to the proof of Theorem 2.1 in JMN to find

$$
\Psi^{\Psi}\tau\_{l,j} = a\beta'(\tau\_{c,j} - \tau\_{l,j}) + \mu\_j \quad \text{and} \quad \beta'\tau\_{l,j} + \gamma'\_j = 0. \tag{A2}
$$

Applying (A2) to (12) recovers the homogenous Equation (13), so the above results derived for (13) are all applicable to (12) under (A2). Substituting *<sup>X</sup>*˜*<sup>t</sup>* <sup>=</sup> *Xt* <sup>−</sup> *<sup>τ</sup>c*,*<sup>j</sup>* <sup>−</sup> *<sup>τ</sup>l*,*jt* into (A1) yields the desired representation (18).

#### **Appendix C. Proofs of Asymptotic Results**

In this section, we present a high-level assumption which overrides Assumptions 2 and 3 in the subsequent arguments. We then provide some specific lemmas required for proofs of the limit theorems in Section 3. Finally, we proceed to the proofs of Theorems 2 and 3.

We introduce some notation. For a vector *v*, let the outer product be *v*⊗<sup>2</sup> = *vv* . For a matrix *m*, the spectral norm is ||*m*||<sup>2</sup> <sup>=</sup> max eigen(*m <sup>m</sup>*). Note that ||*m*||<sup>2</sup> <sup>≤</sup> tr(*m m*).

#### *Appendix C.1. A High Level Assumption*

In order to give proofs of the theorems introduced in this paper, we need a Law of Large Numbers for the approximately stationary components of the full model, while we require a Functional Central Limit Theorem and a convergence to a stochastic integral for the non-stationary components of the full model. We formulate these as a the following high level assumption and then prove that it is satisfied under Assumptions 1 and 2.

**Assumption A1.** *Let ε<sup>t</sup> be a p-dimensional random variables and suppose that Assumption 1 is satisfied. Let <sup>X</sup>*˜*<sup>t</sup> satisfy the homogenous Equation (13) and define <sup>U</sup>t*−<sup>1</sup> *as*

$$\mathcal{U}\_{\mathfrak{t}-1} = \begin{pmatrix} \beta^\prime \vec{X}\_{\mathfrak{t}-1} \\ \Delta \vec{X}\_{\mathfrak{t}-1} \\ \vdots \\ \Delta \vec{X}\_{\mathfrak{t}-k+1} \end{pmatrix}.$$

*Suppose that*

$$T^{-1/2} \max\_{1 \le t \le T} |\mathcal{U}\_l| = \textsf{op}(1) \tag{A3}$$

*and*

$$T^{-1}\sum\_{t=1}^{T} \left(\begin{array}{cc} \varepsilon\_{t} \\ \mathbf{U}\_{t-1} \\ 1 \end{array}\right)^{\otimes 2} \stackrel{\mathbb{P}}{\rightarrow} \left(\begin{array}{cc} \Omega & 0 & 0 \\ 0 & \Sigma\_{u} & 0 \\ 0 & 0 & 1 \end{array}\right),\tag{A4}$$

*where* Ω *and* Σ*<sup>u</sup> are positive definite matrices. Furthermore, let Wu be a p-dimensional Brownian motion with variance* Ω*. Suppose that, for* 0 ≤ *u* ≤ 1*,*

$$T^{-1/2} \sum\_{t=1}^{\text{int}(Tu)} \varepsilon\_t \stackrel{\text{D}}{\rightarrow} \mathcal{W}\_{\text{uv}} \tag{A5}$$

*as a process on* (*D*[0, 1])*<sup>p</sup> endowed with the Skorokhod metric with common distortion. Finally,*

$$T^{-1}\sum\_{t=1}^{T}\sum\_{s=1}^{t-1}\varepsilon\_{s}\varepsilon\_{t}^{\prime}\stackrel{\mathcal{D}}{\rightarrow}\int\_{0}^{1}\mathcal{W}\_{\text{ul}}d\mathcal{W}\_{\text{ul}}^{\prime}.\tag{A6}$$

The next result explores the conditions of Brown (1971). Subsequently, we use this to show that Assumptions 1 and 2 imply Assumption A1.

**Lemma A1.** *Suppose Assumption 2 is satisfied. Then,*

(*a*) *T*−<sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *εtε t* P → Ω*;* (*b*) *T*−<sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> E{*ε <sup>t</sup>εt*1(*ε tεt*>*δT*)|F*t*−1} <sup>P</sup> → 0 *for all δ* > 0*;* (*c*) *T*−<sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> E{*ε <sup>t</sup>εt*1(*ε tεt*>*δT*)} → <sup>0</sup> *for all <sup>δ</sup>* > <sup>0</sup>*;* (*d*) max1≤*t*≤*<sup>T</sup>* <sup>|</sup>*ε*<sup>2</sup> *<sup>t</sup>* <sup>|</sup>/*<sup>T</sup>* <sup>P</sup> → 0.

**Proof of Lemma A1.** (*b*, *c*) Brown (1971, Lemma 2) shows that the conditional Lindeberg condition (*b*) and the marginal Lindeberg condition (*c*) are equivalent under Assumption 2(*i*, *ii*).

First, suppose Assumption <sup>2</sup>(*iii*.*a*), so that sup*t*∈<sup>N</sup> <sup>E</sup>{*ε <sup>t</sup>εt*1(*ε tεt*>*a*)|F*t*−1} = <sup>o</sup>P(1) as *<sup>a</sup>* → <sup>∞</sup>. Thus, ∀*ξ* > 0, ∃*a*<sup>0</sup> and ∀*a* ≥ *a*0, it follows that

$$\mathbb{P}\left[\sup\_{t\in\mathbb{N}}\mathbb{E}\{\varepsilon'\_t\varepsilon\_t\mathbf{1}\_{\left(\boldsymbol{x}'\_t\boldsymbol{c}\_t>\boldsymbol{a}\right)}|\mathcal{F}\_{t-1}\}>\xi\right]<\xi.$$

Thus, given *δ* > 0 and for ∀*T* > *a*0/*δ*, we find

$$\mathbb{E}\{\varepsilon'\_t \varepsilon\_t \mathbf{1}\_{\left(\varepsilon'\_t x\_t > \delta T\right)} | \mathcal{F}\_{t-1}\} \le \mathbb{E}\{\varepsilon'\_t \varepsilon\_t \mathbf{1}\_{\left(\varepsilon'\_t x\_t > a\_0\right)} | \mathcal{F}\_{t-1}\} < \xi'\_{\prime t}$$

so that the conditional Lindeberg condition (*b*) follows.

Second, suppose Assumption <sup>2</sup>(*iii*.*b*), so that sup*t*∈<sup>N</sup> <sup>E</sup>|*εt*<sup>|</sup> <sup>4</sup> < ∞. Chebychev's inequality gives

$$\mathbb{E}\{1\_{\left(\varepsilon\_t' \varepsilon\_t > \delta T\right)}\} = \mathbb{P}(\varepsilon\_t' \varepsilon\_t > \delta T) \le \frac{1}{\delta^2 T^2} \mathbb{E}|\varepsilon\_t|^4.$$

Next, by the Cauchy–Schwarz inequality and Assumption 2(*iii*.*b*), we get

$$\mathbb{E}\{\varepsilon\_t^{\prime}\varepsilon\_t \mathbf{1}\_{\left(\varepsilon\_t^{\prime}\varepsilon\_t > \delta T\right)}\} \le \left[\mathbb{E}|\varepsilon\_t|^4 \mathbb{E}\{\mathbf{1}\_{\left(\varepsilon\_t^{\prime}\varepsilon\_t > \delta T\right)}\}\right]^{1/2} \le \delta^{-1}T^{-1}\mathbb{E}|\varepsilon\_t|^4 \le \delta^{-1}T^{-1}\sup\_{t \in \mathbb{N}}\mathbb{E}|\varepsilon\_t|^4 \le \delta^{-1}T^{-1}\mathbb{C}.$$

Hence, *T*−<sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> E{*ε <sup>t</sup>εt*1(*ε tεt*>*δT*)} ≤ *<sup>δ</sup>*−1*T*−1*<sup>C</sup>* <sup>→</sup> 0 so that the marginal Lindeberg condition (*c*) holds.

(*a*) We define *U*<sup>2</sup> *<sup>T</sup>* <sup>=</sup> *<sup>T</sup>*−<sup>1</sup> <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *εtε <sup>t</sup>* and *V*<sup>2</sup> *<sup>T</sup>* <sup>=</sup> *<sup>T</sup>*−<sup>1</sup> <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> E(*εtε <sup>t</sup>*|F*t*−1) and show that ||*U*<sup>2</sup> *<sup>T</sup>* <sup>−</sup>*V*<sup>2</sup> *<sup>T</sup>*|| <sup>P</sup> → 0. Since *U*<sup>2</sup> *<sup>T</sup>* <sup>−</sup> *<sup>V</sup>*<sup>2</sup> *<sup>T</sup>* is symmetric, the spectral norm equals the spectral radius, thus it suffices to show that *v* (*U*<sup>2</sup> *<sup>T</sup>* <sup>−</sup> *<sup>V</sup>*<sup>2</sup> *<sup>T</sup>*)*v* vanishes for any linear combination *v*. In turn, it suffices to consider univariate martingale difference sequences *εt*.

First, suppose Assumption <sup>2</sup>(*iii*.*a*) holds, so that sup*t*∈<sup>N</sup> <sup>E</sup>{*ε*<sup>2</sup> *<sup>t</sup>*1(*ε*<sup>2</sup> *t*>*a*)|F*t*−1} <sup>=</sup> <sup>o</sup>P(1) as *<sup>a</sup>* <sup>→</sup> <sup>∞</sup>. We follow an argument inspired by Anderson and Kunitomo (1992, Theorem 2). Hall and Heyde (1980, Theorem 2.23) show that *U*<sup>2</sup> *<sup>T</sup>* <sup>−</sup> *<sup>V</sup>*<sup>2</sup> *T* P → 0 whenever the Lindeberg condition in part (*b*) holds and sup*T*∈<sup>N</sup> <sup>P</sup>(*V*<sup>2</sup> *<sup>T</sup>* > *λ*) → 0 as *λ* → ∞. To prove the tightness condition, note that

$$V\_T^2 = T^{-1} \sum\_{t=1}^T \mathbb{E}\{\varepsilon\_t^2 \mathbf{1}\_{\{\varepsilon\_t^2 \le \lambda\}} | \mathcal{F}\_{t-1}\} + \mathbb{E}\{\varepsilon\_t^2 \mathbf{1}\_{\{\varepsilon\_t^2 > \lambda\}} | \mathcal{F}\_{t-1}\} \le \lambda + \sup\_{t \in \mathbb{N}} \mathbb{E}\{\varepsilon\_t^2 \mathbf{1}\_{\{\varepsilon\_t^2 > \lambda\}} | \mathcal{F}\_{t-1}\}.$$

Thus, to analyse the tightness probability bound,

$$\mathbb{P}(V\_T^2 > 2\lambda) \le \mathbb{P}[\lambda + \sup\_{t \in \mathbb{N}} \mathbb{E}\{\varepsilon\_t^2 \mathbf{1}\_{\{\varepsilon\_{t^\*}^2 > \lambda\}} | \mathcal{F}\_{t-1}\} > 2\lambda] = \mathbb{P}[\sup\_{t \in \mathbb{N}} \mathbb{E}\{\varepsilon\_t^2 \mathbf{1}\_{\{\varepsilon\_{t^\*}^2 > \lambda\}} | \mathcal{F}\_{t-1}\} > \lambda].$$

This bound is uniform in *<sup>T</sup>* so that sup*T*∈<sup>N</sup> <sup>P</sup>(*V*<sup>2</sup> *<sup>T</sup>* <sup>&</sup>gt; *<sup>λ</sup>*) <sup>≤</sup> <sup>P</sup>[sup*t*∈<sup>N</sup> <sup>E</sup>{*ε*<sup>2</sup> *<sup>t</sup>*1(*ε*<sup>2</sup> *t*>*λ*)|F*t*−1} <sup>&</sup>gt; *<sup>λ</sup>*], which vanishes by Assumption 2(*iii*.*a*).

Second, suppose Assumption <sup>2</sup>(*iii*.*b*) holds, so that sup*t*∈<sup>N</sup> <sup>E</sup>*ε*<sup>4</sup> *<sup>t</sup>* < *C* < ∞. Let *mt* = *ε*<sup>2</sup> *<sup>t</sup>* <sup>−</sup>E(*ε*<sup>2</sup> *<sup>t</sup>* |F*t*−1). The Chebychev inequality and the uncorrelatedness of martingale differences gives

$$\mathcal{P} = \mathbb{P}(|T^{-1}\sum\_{t=1}^{T} m\_t| > \epsilon) \le \frac{1}{\epsilon^2} \mathbb{E}|T^{-1}\sum\_{t=1}^{T} m\_t|^2 = \frac{1}{T^2\epsilon^2} \mathbb{E}\sum\_{t=1}^{T} m\_t^2.$$

Jensen's inequality shows <sup>E</sup>{E(*ε*<sup>2</sup> *<sup>t</sup>* |F*t*−1)}<sup>2</sup> <sup>≤</sup> <sup>E</sup>{E(*ε*<sup>4</sup> *<sup>t</sup>* |F*t*−1)} <sup>=</sup> <sup>E</sup>*ε*<sup>4</sup> *<sup>t</sup>*. Thus, the inequality (*<sup>a</sup>* <sup>+</sup> *<sup>b</sup>*)<sup>2</sup> <sup>≤</sup> <sup>2</sup>(*a*<sup>2</sup> <sup>+</sup> *<sup>b</sup>*2) shows that <sup>E</sup>*m*<sup>2</sup> *<sup>t</sup>* <sup>≤</sup> <sup>2</sup>[E*ε*<sup>4</sup> *<sup>t</sup>* <sup>+</sup> <sup>E</sup>{E(*ε*<sup>2</sup> *<sup>t</sup>* |F*t*−1)}2] <sup>≤</sup> <sup>4</sup>E*ε*<sup>4</sup> *<sup>t</sup>* ≤ 4*C* < ∞. Thus, P ≤ (*T*2)−14*<sup>C</sup>* <sup>→</sup> 0.

(*d*) We show P*<sup>T</sup>* = <sup>P</sup>(max1≤*t*≤*<sup>T</sup>* |*εt*| <sup>2</sup> <sup>&</sup>gt; *<sup>δ</sup>T*) <sup>→</sup> 0 for all *<sup>δ</sup>* <sup>&</sup>gt; 0. Note that <sup>P</sup>*<sup>T</sup>* <sup>=</sup> <sup>1</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> P(|*εt*| <sup>2</sup> > *δT*). Boole's inequality gives <sup>P</sup>*<sup>T</sup>* <sup>≤</sup> <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> P(|*εt*| <sup>2</sup> > *δT*) = ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> <sup>E</sup>1(|*εt*|2>*δT*). On the set (|*εt*<sup>|</sup> <sup>2</sup> > *δT*), we get the further bound <sup>P</sup>*<sup>T</sup>* <sup>≤</sup> *<sup>δ</sup>*−1*T*−<sup>1</sup> <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> E|*εt*| 21(|*εt*|2>*δT*), which vanishes by part (*c*).

We then prove that Assumption A1 is satisfied under Assumptions 1 and 2.

**Lemma A2.** *Suppose that Xt and ε<sup>t</sup> satisfy Assumptions 1 and 2, respectively, while X*˜*<sup>t</sup> solves the homogenous Equation (13). Then, Assumption A1 is satisfied.*

**Proof of Lemma A2.** Note that the process *U<sup>t</sup>* equals *β X***˜***t*, which is studied in Theorem 1. It satisfies Equation (16), which is of the form *<sup>U</sup><sup>t</sup>* <sup>=</sup> <sup>Φ</sup>*Ut*−<sup>1</sup> <sup>+</sup> *<sup>F</sup>ε<sup>t</sup>* with <sup>Φ</sup> = (*I<sup>r</sup>* <sup>+</sup> *<sup>β</sup> <sup>α</sup>*) for *<sup>r</sup>* <sup>=</sup> *<sup>r</sup>* <sup>+</sup> *<sup>p</sup>* (*<sup>k</sup>* <sup>−</sup> <sup>1</sup>) and *F* = *β ι*, where Φ has spectral radius less than unity as verified in Theorem 1.

For (A3), we apply Anderson and Kunitomo (1992, Lemma 1). This requires that max1≤*t*≤*<sup>T</sup>* |*εt*| <sup>2</sup> = oP(*T*), which is proved in Lemma A1 using Assumption 2.

For (A4), use Assumption 2(*iii*.*a*). We apply Lemma 2 in Anderson and Kunitomo (1992) to show the convergence of the product moment matrix, which requires Assumption 2(*ii*, *iii*.*a*). Assumption 2 states that Ω is a positive definite matrix, which results in the positive definiteness of Σ*<sup>u</sup>* by Anderson and Kunitomo (1992, Lemma 3).

For (A4) using Assumption 2(*iii*.*b*). We follow Anderson and Kunitomo (1992, Lemma 2) but avoid their truncation argument. We first argue that ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *<sup>U</sup>t*−1*Fε <sup>t</sup>* <sup>=</sup> <sup>o</sup>P(*T*). Since *<sup>U</sup>t*−1*Fε <sup>t</sup>* is a martingale difference sequence with second moments due to Assumption 2(*ii*) and the spectral norm is bounded by the trace, we obtain

$$\mathcal{E} = \mathbb{E} \left| \sum\_{t=1}^{T} \mathbf{U}\_{t-1} \mathbf{F} \varepsilon\_{t} \right|^{2} \leq \mathbb{E} \mathbf{tr} \sum\_{s=1}^{T} \sum\_{t=1}^{T} \mathbf{U}\_{t-1} \mathbf{F} \varepsilon\_{t}^{\prime} \varepsilon\_{s} \mathbf{F}^{\prime} \mathbf{U}\_{s-1}^{\prime} = \mathbb{E} \mathbf{tr} \sum\_{t=1}^{T} \mathbf{U}\_{t-1} \mathbf{F} \varepsilon\_{t}^{\prime} \varepsilon\_{t} \mathbf{F}^{\prime} \mathbf{U}\_{t-1}^{\prime}.$$

Applying iterated expectations and using that max1≤*t*≤*<sup>T</sup>* E|*εt*| <sup>2</sup> is bounded by Assumption 2(*iii*.*b*) gives E ≤ *<sup>C</sup>* <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> <sup>E</sup>|*Ut*−1<sup>|</sup> 2. Noting that *U<sup>t</sup>* = ∑*t*−<sup>1</sup> *<sup>j</sup>*=<sup>0</sup> <sup>Φ</sup>*<sup>j</sup> <sup>F</sup>εt*−*<sup>j</sup>* <sup>+</sup> <sup>Φ</sup>*<sup>t</sup> U*<sup>0</sup> and using that *ε<sup>t</sup>* is a martingale difference array, we arrive at

$$\mathbb{E}\mathbf{U}\_{t-1}^{\prime}\mathbf{U}\_{t-1} = \mathbb{E}\sum\_{j=0}^{t-2} \varepsilon\_{t-1-j}^{\prime} F^{\prime}(\Phi^{j})^{\prime}\Phi^{j}F\varepsilon\_{t-1-j} = \sum\_{j=0}^{t-2} \text{tr}\{F^{\prime}(\Phi^{j})^{\prime}\Phi^{j}F\}\mathbb{E}|\varepsilon\_{t-1-j}|^{2} \dots$$

Using that max1≤*t*≤*<sup>T</sup>* E|*εt*| <sup>2</sup> is bounded and Φ has spectral radius less than unity,

$$\mathbb{E}|\mathsf{U}\_{t-1}|^2 \le \sum\_{j=0}^{\infty} \text{tr}\left\{ F'(\Phi^j)'\Phi^j F \right\} \max\_{1 \le t \le T} \mathbb{E}|\varepsilon\_t|^2 \le C.$$

As a consequence <sup>E</sup> <sup>=</sup> <sup>O</sup>(*T*) and, by the Markov inequality, <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *<sup>U</sup>t*−1*Fε <sup>t</sup>* = oP(*T*). Next, we show *T*−<sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *<sup>U</sup>t*−1*U <sup>t</sup>*−<sup>1</sup> <sup>→</sup> <sup>Σ</sup>*<sup>U</sup>* in probability. Since *<sup>U</sup>t*−<sup>1</sup> <sup>=</sup> <sup>Φ</sup>*Ut*−<sup>2</sup> <sup>+</sup> *<sup>F</sup>εt*−1, then

$$\begin{aligned} T^{-1} \sum\_{t=1}^{T} \mathbf{U}\_{t-1} \mathbf{U}\_{t-1}' &= \quad \Phi T^{-1} \sum\_{t=1}^{T} \mathbf{U}\_{t-2} \mathbf{U}\_{t-2}' \Phi' + T^{-1} \sum\_{t=1}^{T} F \varepsilon\_{t-1} \varepsilon\_{t-1}' F'\\ &+ \Phi T^{-1} \sum\_{t=1}^{T} \mathbf{U}\_{t-2} \varepsilon\_{t-1}' F' + T^{-1} \sum\_{t=1}^{T} F \varepsilon\_{t-1} \mathbf{U}\_{t-2}' \Phi'. \end{aligned}$$

Here, the second term converges to *F*Ω*F* by Assumption 2(*i*) while the last two terms vanish. Since max1≤*t*≤*<sup>T</sup>* <sup>|</sup>*Ut*<sup>|</sup> <sup>=</sup> <sup>o</sup>P(*T*1/2), we therefore get

$$T^{-1}\sum\_{t=1}^{T} \mathbf{U}\_{t-1} \mathbf{U}\_{t-1}^{\prime} = \Phi T^{-1} \sum\_{t=1}^{T} \mathbf{U}\_{t-1} \mathbf{U}\_{t-1}^{\prime} \Phi^{\prime} + F \Omega F^{\prime} + \mathbf{o}\_{\mathbb{P}}(1).$$

This is a linear equation in *T*−<sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *<sup>U</sup>t*−1*U <sup>t</sup>*−<sup>1</sup> so that *<sup>T</sup>*−<sup>1</sup> <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *<sup>U</sup>t*−1*U <sup>t</sup>*−<sup>1</sup> <sup>→</sup> <sup>Σ</sup>*<sup>U</sup>* in probability where Σ*<sup>U</sup>* solves Σ*<sup>U</sup>* = ΦΣ*U*Φ + *F*Ω*F* . Anderson and Kunitomo (1992, Lemma 3) show invertibility of Σ*U*.

For (A5), the Functional Central Limit Theorem follows from the univariate result of Brown (1971, Theorem 3), equipped with Cramér–Wold device, see Billingsley (1968, Theorem 7.7). Brown's result applies under Assumption 2(*i*, *ii*) and either of the Lindeberg condition established in Lemma A1(*b*, *c*) under Assumption 2. When using Brown's result, it is convenient to define the

univariate variables *St* = ∑*<sup>t</sup> <sup>s</sup>*=<sup>1</sup> *ε<sup>s</sup>* and *s*<sup>2</sup> *<sup>t</sup>* = <sup>∑</sup>*<sup>t</sup> <sup>s</sup>*=<sup>1</sup> E*ε*<sup>2</sup> *<sup>s</sup>*. Brown is concerned with the continuously embedded random walk through the points (*s*<sup>2</sup> *<sup>t</sup>* /*s*<sup>2</sup> *<sup>T</sup>*),(*St*/*sT*), while we are concerned with the right continuous random walk that is constant (*St*/*T*1/2) on the half-open intervals [*t*/*T*1/2,(*t* + 1)/*T*1/2). The two embeddings reconcile since *s*<sup>2</sup> *<sup>t</sup>* /*s*<sup>2</sup> *<sup>T</sup>* = *<sup>t</sup>σ*<sup>2</sup> for some constant *<sup>σ</sup>*<sup>2</sup> by Assumption <sup>2</sup>(*i*) and since max1≤*t*≤*<sup>T</sup>* <sup>|</sup>*ε*|/*T*1/2 vanishes by Lemma A1(*d*) under Assumption 2.

For (A6), the convergence to a stochastic integral for the univariate case is based on the results of Jakubowski et al. (1989), which was referred to by Kurtz and Protter (1996), while the convergence to a stochastic integral for the multivariate case is based on the results of Kurtz and Protter (1991). For the univariate case, Kurtz and Protter (1996, Theorem 7.1) show that we need to check that the martingale array *M<sup>T</sup> <sup>t</sup>* = *<sup>T</sup>*−1/2 <sup>∑</sup>*<sup>t</sup> <sup>s</sup>*=<sup>1</sup> *ε<sup>s</sup>* is uniformly tight, as required by Jakubowski et al. (1989) or, equivalently, it has uniformly controlled variations. We use Kurtz and Protter (1991, Theorem 2.2), which applies to the multivariate case; see also Hansen (1992, Theorem 2.1). Choose *δ* = ∞ so that *MT*,*<sup>δ</sup> <sup>t</sup>* = *<sup>M</sup><sup>T</sup> <sup>t</sup>* in Kurtz and Protter's notation. For each *<sup>α</sup>* <sup>&</sup>gt; 0, *<sup>T</sup>* <sup>≥</sup> 1, choose stopping times *<sup>τ</sup>T*,*<sup>α</sup>* <sup>=</sup> <sup>∞</sup> so that <sup>P</sup>(*τT*,*<sup>α</sup>* <sup>≤</sup> *<sup>α</sup>*) = <sup>0</sup> <sup>≤</sup> 1/*α*. Then, we obtain a quadratic variation processes

$$[M^{T, \delta}]\_{t \wedge \tau^{T, \mu}} = T^{-1} \sum\_{s=1}^{t \wedge \tau^{T, \mu}} \varepsilon\_s \varepsilon'\_s \le T^{-1} \sum\_{t=1}^{T} \varepsilon\_t \varepsilon'\_t = [M^{T, \delta}]\_{T, \mu}$$

so that <sup>E</sup>[*MT*,*δ*]*t*∧*τT*,*<sup>α</sup>* <sup>≤</sup> <sup>E</sup>[*MT*,*δ*]*T*. From Assumption <sup>2</sup>(*i*), it follows that <sup>E</sup>[*MT*,*δ*]*<sup>T</sup>* <sup>→</sup> <sup>Ω</sup>. Consequently, we have sup*<sup>T</sup>* ||E[*MT*,*δ*]*T*|| <sup>&</sup>lt; <sup>∞</sup>. In turn, sup*<sup>T</sup>* ||E[*MT*,*δ*]*t*∧*τT*,*<sup>α</sup>* || <sup>&</sup>lt; <sup>∞</sup> for each *<sup>t</sup>*, so that *<sup>M</sup><sup>T</sup> <sup>t</sup>* has uniformly controlled variations.

**Proof of Lemma 1.** Assumption 3 has E(*εtε <sup>t</sup>*|F*t*−1) = <sup>Ω</sup> *<sup>a</sup>*.*s*., so that *<sup>T</sup>*−<sup>1</sup> <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> E(*εtε <sup>t</sup>*|F*t*−1) = Ω *a*.*s*. follows and Assumption 2(*ii*) holds. Taking iterated expectations, we obtain E(*εtε <sup>t</sup>*) = Ω, which leads to *T*−<sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> E(*εtε <sup>t</sup>*) = Ω and thus Assumption 2(*i*) is satisfied. Lastly, we show that Assumption 2(*iii*.*a*) is implied by Assumption 3(*ii*). Using Hölder's inequality, we find, for *η* = *ξ*/2 > 0,

$$\mathbb{E}\{\varepsilon\_t^{\prime}\varepsilon\_t \mathbf{1}\_{\left(\varepsilon\_t^{\prime}\varepsilon\_t > a\right)} | \mathcal{F}\_{t-1}\} \leq [\mathbb{E}\left(|\varepsilon\_t|^{2+2\eta} | \mathcal{F}\_{t-1}\right)]^{1/\left(1+\eta\right)} [\mathbb{E}\{\mathbf{1}\_{\left(\varepsilon\_t^{\prime}\varepsilon\_t > a\right)} | \mathcal{F}\_{t-1}\}]^{\eta/\left(1+\eta\right)}\,,$$

in which we note the equality 1(1+*η*)/*<sup>η</sup>* (*ε <sup>t</sup>εt*>*a*) <sup>=</sup> <sup>1</sup>(*ε tεt*>*a*). Hence, writing the expectation of the indicator as a probability, and also using Markov's inequality, we arrive at

$$\mathbb{E}\{\mathbf{1}\_{\left(\varepsilon\_{t}^{\prime}\varepsilon\_{t}>a\right)}^{\left(1+\eta\right)/\eta}|\mathcal{F}\_{t-1}\} = \mathbb{P}\left(\varepsilon\_{t}^{\prime}\varepsilon\_{t}>a|\mathcal{F}\_{t-1}\right) \leq \frac{1}{a^{1+\eta}}\mathbb{E}\{\left(\varepsilon\_{t}^{\prime}\varepsilon\_{t}\right)^{1+\eta}|\mathcal{F}\_{t-1}\}.$$

In combination, we obtain E{*ε <sup>t</sup>εt*1(*ε tεt*>*a*)|F*t*−1} ≤ *<sup>a</sup>*−*η*E(|*εt*<sup>|</sup> <sup>2</sup>+2*η*|F*t*−1), which vanishes as *<sup>a</sup>* <sup>→</sup> <sup>∞</sup> uniformly in *t*, since the E(|*εt*| <sup>2</sup>+2*η*|F*t*−1) is uniformly bounded by assumption.

**Remark A1.** *We give an example of a martingale difference sequence* (*εt*, <sup>F</sup>*t*) *satisfying* sup*t*∈<sup>N</sup> <sup>E</sup>|*εt*<sup>|</sup> <sup>4</sup> < <sup>∞</sup> *but violating Assumption <sup>2</sup>*(*iii*.*a*)*, i.e.,* sup*t*∈<sup>N</sup> <sup>E</sup>{|*εt*<sup>|</sup> 21(|*εt*|2>*a*)|F*t*−1} → <sup>0</sup> *in probability, which is from Anderson and Kunitomo (1992). Consider the probability space* {(0, 1], F, P}*, where* F *is the Borel field on* (0, 1] *and* P *is the uniform distribution. Consider also a dyadic sequence with indices n* = 1, 2, ... *and kn* = 1, . . . , 2*n, so that t* = ∑*n*−<sup>1</sup> *<sup>j</sup>*=<sup>1</sup> <sup>2</sup>*<sup>j</sup>* + *kn, and define*

$$\varepsilon\_{t}(\omega) = \varepsilon\_{nk\_{n}}(\omega) = \left\{ \begin{array}{ll} n & \text{if } 2k\_{n} - 1 < 2^{n+1}\omega \le 2k\_{n} \\ -n & \text{if } 2k\_{n} - 2 < 2^{n+1}\omega \le 2k\_{n} - 1, \\ 0 & \text{otherwise}. \end{array} \right\}.$$

*We note that* E*ε<sup>t</sup>* = 0 *while* E*ε*<sup>4</sup> *<sup>t</sup>* = *n*4/2*<sup>n</sup> is uniformly bounded in n*. *The natural filtration of ε<sup>t</sup> is then given by the <sup>σ</sup>-fields* <sup>F</sup><sup>0</sup> <sup>=</sup> *<sup>σ</sup>*{[0, 1]}, <sup>F</sup><sup>1</sup> <sup>=</sup> *<sup>σ</sup>*{(0, <sup>1</sup> <sup>4</sup> ],( <sup>1</sup> 4 , 1 <sup>2</sup> ], <sup>F</sup>0}, <sup>F</sup><sup>2</sup> <sup>=</sup> *<sup>σ</sup>*{( <sup>1</sup> 2 , 3 <sup>4</sup> ],( <sup>3</sup> <sup>4</sup> , 1], F1}, F<sup>3</sup> =

*<sup>σ</sup>*{(0, <sup>1</sup> <sup>8</sup> ],( <sup>1</sup> 8 , 1 <sup>4</sup> ], <sup>F</sup>2} *and so on. We find that* <sup>E</sup>(*εt*|F*t*−1) = <sup>0</sup> *while* [sup*t*∈<sup>N</sup> <sup>E</sup>{|*εt*<sup>|</sup> 21(|*εt*|2>*a*)|F*t*−1}](*ω*) = <sup>∞</sup> *for all a and all <sup>ω</sup>*, *so that the random variables* sup*t*∈<sup>N</sup> <sup>E</sup>{|*εt*<sup>|</sup> 21(|*εt*|2>*a*)|F*t*−1} *cannot vanish in probability.*

#### *Appendix C.2. Several Lemmas for the Partial Systems*

The asymptotic properties of the product moment matrices,*Sij* for *i*, *j* = 0, 1 defined in (24), are investigated so as to adapt Lemmas 10.1 and 10.3 of Johansen (1995) to the present model. We do this by combining various ideas and techniques from HJNR, JMN and Kurita and Nielsen (2009). These papers assume normal innovations, which we have generalised as Assumptions 2 and 3 in our study. This means that we have to be careful when defining the limits of product moments of various non-integrated components. This issue is addressed in the following lemma:

**Lemma A3.** *Suppose that Assumptions 1 and A1 are satisfied. Let*

$$\mathbf{V}\_{t} = \begin{pmatrix} \varepsilon\_{t} \\ \Delta X\_{t} \\ \beta^{\prime} X\_{t-1} + \gamma\_{\cdot} t \end{pmatrix}, \quad Q\_{t} = \begin{pmatrix} \Delta X\_{t-1} \\ \vdots \\ \Delta X\_{t-k+1} \end{pmatrix} \quad \text{and} \quad \overline{Q}\_{t} = \begin{pmatrix} Q\_{t} \\ 1 \end{pmatrix}.$$

*Let vj* = *Tj*/*T be relative break points for j* = 0, ... , *q and define the sample product moment matrix of V<sup>t</sup> corrected for Qt and a constant as*

$$\mathcal{M}\_{VV:Q,1} = \sum\_{j=1}^{q} \frac{1}{T} \sum\_{t=T\_{j-1}+k}^{T\_j} \left\{ \mathbf{V}\_t - \sum\_{s=T\_{j-1}+k}^{T\_j} \mathbf{V}\_t \overline{\bigotimes\_{s=T\_{j-1}+k}^{\prime} \bigotimes\_{s=T\_{j-1}+k}^{T\_j}}^{T\_j} \right\}^{-1} \bigotimes\_{t=1}^{\otimes 2} \dots$$

*Then, as T* → ∞ *with fixed relative break points vj, we get that MVV*·*Q*,1 *converges in probability to a positive definite matrix with the structure*

$$
\begin{pmatrix}
\Omega & \Omega & 0 \\
\Omega & \Sigma\_{xx} & \Sigma\_{x\emptyset} \\
0 & \Sigma\_{\beta x} & \Sigma\_{\beta\beta}
\end{pmatrix}
\text{,}\tag{A7}
$$

*where* Σ*x<sup>β</sup>* = *α*Σ*ββ and* Σ*xx* = *α*Σ*β<sup>x</sup>* + Ω *hold.*

**Proof of Lemma A3.** We start with the homogenous Equation (16). For 1 ≤ *<sup>j</sup>* ≤ *<sup>q</sup>* and *Tj*−<sup>1</sup> + *<sup>k</sup>* < *<sup>t</sup>* ≤ *Tj*, this equation can always be solved as

$$\mathfrak{g}'\mathfrak{X}\_l = \sum\_{s=1}^{t-T\_{j-1}-k} (l\_r + \mathfrak{g}'\mathfrak{a})^{t-T\_{j-1}-k-s} \mathfrak{g}'\iota \varepsilon\_{T\_{j-1}+k+s} + (l\_r + \mathfrak{g}'\mathfrak{a})^{t-T\_{j-1}-k} \mathfrak{g}'\mathfrak{X}\_{T\_{j-1}+k+s}$$

and, in the first sub-sample period or *j* = 1, the initial value *β <sup>X</sup>***˜** *<sup>T</sup>*0+*<sup>k</sup>* <sup>=</sup> *<sup>β</sup> X***˜** *<sup>k</sup>* can be treated as fixed, so that the process *β <sup>X</sup>***˜***<sup>t</sup>* for *<sup>T</sup>*<sup>0</sup> <sup>+</sup> *<sup>k</sup>* <sup>&</sup>lt; *<sup>t</sup>* <sup>≤</sup> *<sup>T</sup>*<sup>1</sup> becomes uniformly bounded in probability by noting that it equals *U<sup>t</sup>* in Assumption A1. Similarly, by iterating over all the other start-up values, the process *β X***˜***t* for *Tj*−<sup>1</sup> + *<sup>k</sup>* < *<sup>t</sup>* ≤ *Tj* and *<sup>j</sup>* = 2, ... , *<sup>q</sup>* is also uniformly bounded in probability. Since the number of breaks is finite, *β <sup>X</sup>***˜***<sup>t</sup>* is uniformly bounded in probability jointly for 1 <sup>≤</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>q</sup>*.

Next, the Granger–Johansen representation (18) implies that, for 1 ≤ *<sup>j</sup>* ≤ *<sup>q</sup>* and *Tj*−<sup>1</sup> + *<sup>k</sup>* < *<sup>t</sup>* ≤ *Tj*,

$$
\Delta X\_l = C \varepsilon\_l + \Delta l I\_l + \tau\_{\ell, j} \quad \text{and} \quad \beta' X\_l + \gamma\_j t = \beta' l I\_l + \beta' \tau\_{c, j, i}
$$

*Econometrics* **2019**, *7*, 42

where

$$\mathcal{U}\_t = (I - \mathbb{C}\Psi)\mathcal{J}\mathcal{J}\mathcal{X}\_t - \mathbb{C}\sum\_{i=1}^{k-1} \Gamma\_i \sum\_{\ell=0}^{i-1} \Delta \mathcal{X}\_{t-\ell}.$$

Since *β X***˜***<sup>t</sup>* is uniformly bounded in probability, it follows that *Ut* is also uniformly bounded in probability. Note that *Ut* is identical throughout all sub-sample periods. The intercepts *τ*-,*<sup>j</sup>* and *β τc*,*<sup>j</sup>* are eliminated from Δ*Xt* and *β Xt* + *γjt* respectively, when demeaning them within each sub-sample period. Consequently, we can apply the Law of Large Numbers (A4) in Assumption A1 to

$$\frac{1}{T\_j - T\_{j-1} - k} \sum\_{t = T\_{j-1} + k}^{T\_j} \left\{ \binom{V\_t}{Q\_t} - \frac{1}{T\_j - T\_{j-1} - k} \sum\_{s = T\_{j-1} + k}^{T\_j} \binom{V\_s}{Q\_s} \right\}^{\odot 2} \tag{A8}$$

which, for 1 ≤ *j* ≤ *q*, converges in probability to a positive definite matrix denoted as

$$
\begin{pmatrix}
\mathcal{N}\_{\varepsilon\varepsilon}^{(j)} & \mathcal{N}\_{\varepsilon\varepsilon}^{(j)} & \mathcal{N}\_{\varepsilon\beta}^{(j)} & \mathcal{N}\_{\varepsilon q}^{(j)} \\
\mathcal{N}\_{\varepsilon x}^{(j)} & \mathcal{N}\_{\varepsilon x}^{(j)} & \mathcal{N}\_{\varepsilon\beta}^{(j)} & \mathcal{N}\_{\varepsilon q}^{(j)} \\
\mathcal{N}\_{\beta\varepsilon}^{(j)} & \mathcal{N}\_{\beta\varepsilon}^{(j)} & \mathcal{N}\_{\beta\beta}^{(j)} & \mathcal{N}\_{\beta q}^{(j)} \\
\mathcal{N}\_{q\varepsilon}^{(j)} & \mathcal{N}\_{q\varepsilon}^{(j)} & \mathcal{N}\_{q\beta}^{(j)} & \mathcal{N}\_{qq}^{(j)}
\end{pmatrix}
$$

for *<sup>N</sup>*(*j*) *εε* = Ω by Assumption A1. Since both *Xt*−<sup>1</sup> and *Qt* consist of the past values of *εt*, it follows from (A4) that *<sup>N</sup>*(*j*) *εβ* <sup>=</sup> 0 and *<sup>N</sup>*(*j*) *<sup>ε</sup><sup>q</sup>* = 0 hold. Note that the model equation is

$$
\varepsilon\_t = \Delta X\_t - a(\beta' X\_{t-1} + \gamma\_j t) - \Gamma Q\_t + \mu\_{j\prime} \tag{A9}
$$

for **<sup>Γ</sup>** = (Γ1, ... , <sup>Γ</sup>*k*−1). We can derive three properties from this equation. First, we post-multiply (A9) by *ε <sup>t</sup>* and then exploit *<sup>N</sup>*(*j*) *εβ* <sup>=</sup> 0 and *<sup>N</sup>*(*j*) *<sup>ε</sup><sup>q</sup>* <sup>=</sup> 0 to find <sup>Ω</sup> <sup>=</sup> *<sup>N</sup>*(*j*) *εε* <sup>=</sup> *<sup>N</sup>*(*j*) *<sup>x</sup><sup>ε</sup>* <sup>−</sup> <sup>0</sup> <sup>−</sup> <sup>0</sup> <sup>=</sup> *<sup>N</sup>*(*j*) *<sup>x</sup><sup>ε</sup>* , which is the first property. The next one is

$$(N\_{\mathbf{x}\boldsymbol{\beta}\prime}^{(j)}N\_{\mathbf{x}\boldsymbol{q}}^{(j)})\left(\begin{array}{c}N\_{\boldsymbol{\beta}\boldsymbol{\beta}}^{(j)} \\ N\_{\boldsymbol{q}\boldsymbol{\beta}}^{(j)} \\ N\_{\boldsymbol{q}\boldsymbol{\beta}}^{(j)} \end{array}\bigg|\mathcal{N}\_{\boldsymbol{q}\boldsymbol{q}}^{(j)}\right)^{-1} = (\boldsymbol{a},\boldsymbol{\Gamma}),\tag{A10}$$

where the left-hand side is the limit of the sample regression coefficient for Δ*Xt* regressed on *β Xt*−<sup>1</sup> + *γjt*, *Qt* and an intercept. This property is demonstrated by substituting *ε<sup>t</sup>* + *α*(*β Xt*−<sup>1</sup> + *γjt*) + **Γ***Qt* − *μ<sup>j</sup>* from (A9) into Δ*Xt*; we then arrive at the limit result

$$(N\_{x\circledast}^{(j)}N\_{x\q}^{(j)}) = (N\_{x\circledast}^{(j)}N\_{x\q}^{(j)}) + (a, \Gamma) \begin{pmatrix} N\_{\beta\beta}^{(j)} & N\_{\beta\epsilon}^{(j)} \\ N\_{q\circledast}^{(j)} & N\_{qq}^{(j)} \end{pmatrix} \prime$$

from which (A10) follows by noting that *<sup>N</sup>*(*j*) *εβ* <sup>=</sup> 0 and *<sup>N</sup>*(*j*) *<sup>ε</sup><sup>q</sup>* = 0. The third property is

$$
\Omega = N\_{xx}^{(j)} - (\mathfrak{a}, \Gamma) \begin{pmatrix} N\_{\beta\beta}^{(j)} & N\_{\beta q}^{(j)} \\ N\_{q\beta}^{(j)} & N\_{qq}^{(j)} \end{pmatrix} \begin{pmatrix} \mathfrak{a}' \\ \Gamma' \end{pmatrix} . \tag{A11}
$$

The left-hand side of (A11) is the limit of the sample product moment of *ε<sup>t</sup>* regressed on *β Xt*−<sup>1</sup> <sup>+</sup> *<sup>γ</sup>jt*, *Qt* and an intercept, due to *<sup>N</sup>*(*j*) *εβ* <sup>=</sup> 0 and *<sup>N</sup>*(*j*) *<sup>ε</sup><sup>q</sup>* = 0. The right-hand side of (A11) is the limit of the sample product moment of Δ*Xt* regressed on *β Xt*−<sup>1</sup> + *γjt*, *Qt* and an intercept, where we have exploited the identity (A10).

Now, we return to (A8) and partial out *Qt* to obtain

$$\begin{split} \frac{1}{N\_{j} - T\_{j-1} - k} \sum\_{t=T\_{j-1}+k}^{T\_{j}} \left\{ \mathbf{V}\_{t} - \sum\_{s=T\_{j-1}+k}^{T\_{j}} \mathbf{V}\_{s} \overline{\mathbf{Q}}\_{s}^{\prime} \left( \sum\_{s=T\_{j-1}+k}^{T\_{j}} \overline{\mathbf{Q}}\_{s}^{\otimes 2} \right)^{-1} \overline{\mathbf{Q}}\_{t} \right\}^{\otimes 2} \\ \stackrel{\scriptstyle \mathbf{P}}{\rightarrow} \left\{ \begin{pmatrix} \mathbf{N}\_{\text{xx}}^{(j)} & \mathbf{N}\_{\text{xx}}^{(j)} & \mathbf{N}\_{\text{pq}}^{(j)} \\ \mathbf{N}\_{\text{xx}}^{(j)} & \mathbf{N}\_{\text{xx}}^{(j)} & \mathbf{N}\_{\text{pq}}^{(j)} \\ \mathbf{N}\_{\text{jk}}^{(j)} & \mathbf{N}\_{\text{jk}}^{(j)} & \mathbf{N}\_{\text{jk}}^{(j)} \end{pmatrix} - \begin{pmatrix} \mathbf{N}\_{\text{pq}}^{(j)} \\ \mathbf{N}\_{\text{pq}}^{(j)} \\ \mathbf{N}\_{\text{pq}}^{(j)} \end{pmatrix} \begin{pmatrix} \mathbf{N}\_{\text{pq}}^{(j)} \end{pmatrix}^{-1} \begin{pmatrix} \mathbf{N}\_{\text{pq}}^{(j)} \end{pmatrix}^{-1} \begin{pmatrix} \mathbf{N}\_{\text{qp}}^{(j)} \,^{N} \mathbf{N}\_{\text{qp}}^{(j)} \,^{N} \mathbf{N}\_{\text{pq}}^{(j)} \end{pmatrix}, \tag{A12} \right\} \end{split}$$

for which we have that *<sup>N</sup>*(*j*) *εε* <sup>=</sup> *<sup>N</sup>*(*j*) *<sup>ε</sup><sup>x</sup>* <sup>=</sup> <sup>Ω</sup> while *<sup>N</sup>*(*j*) *εβ* <sup>=</sup> 0 and *<sup>N</sup>*(*j*) *<sup>ε</sup><sup>q</sup>* = 0. Thus, (A12) is reduced to

$$
\begin{pmatrix}
\Omega & \Omega & 0 \\
\Omega & \Sigma\_{xx}^{(\hat{j})} & \Sigma\_{x\emptyset}^{(\hat{j})} \\
0 & \Sigma\_{\hat{\beta}x}^{(\hat{j})} & \Sigma\_{\hat{\beta}\emptyset}^{(\hat{j})}
\end{pmatrix} \cdot \cdot
$$

where

$$
\begin{pmatrix}
\Sigma\_{\mathbf{xx}}^{(j)} & \Sigma\_{\mathbf{x}\mathbf{\mathcal{B}}}^{(j)} \\
\Sigma\_{\mathbf{\mathcal{B}x}}^{(j)} & \Sigma\_{\mathbf{\mathcal{B}\mathcal{B}}}^{(j)}
\end{pmatrix} = \begin{pmatrix}
N\_{\mathbf{xx}}^{(j)} & N\_{\mathbf{x}\mathbf{\mathcal{B}}}^{(j)} \\
N\_{\mathbf{\mathcal{B}x}}^{(j)} & N\_{\mathbf{\mathcal{B}\mathcal{B}}}^{(j)}
\end{pmatrix} - \begin{pmatrix}
N\_{\mathbf{x}\mathbf{q}}^{(j)} \\
N\_{\mathbf{\mathcal{B}q}}^{(j)}
\end{pmatrix} \left(N\_{\mathbf{q}\mathbf{q}}^{(j)}\right)^{-1} \left(N\_{\mathbf{q}\mathbf{x}}^{(j)}, N\_{\mathbf{q}\mathbf{\mathcal{B}}}^{(j)}\right).
$$

Furthermore, noting ∑*<sup>q</sup> <sup>j</sup>*=<sup>1</sup> <sup>Δ</sup>*vj* <sup>=</sup> 1 and <sup>∑</sup>*<sup>q</sup> <sup>j</sup>*=1(Δ*vj*)Ω = Ω, we define

$$
\Sigma\_{ik} = \sum\_{j=1}^{q} (\Delta v\_j) \Sigma\_{ik}^{(j)} \quad \text{and} \quad \mathcal{N}\_{lm} = \sum\_{j=1}^{q} \Delta v\_j \mathcal{N}\_{lm'}^{(j)} \tag{A13}
$$

for *i*, *k* = *x*, *β* and *l*, *m* = *q*, *x*, *β*. The use of Slutsky's theorem then leads to (A7).

It is left to show that Σ*x<sup>β</sup>* = *α*Σ*ββ* and Σ*xx* = *α*Σ*β<sup>x</sup>* + Ω. For the first expression, we apply the identities in (A13) to (A10) so as to obtain

$$(N\_{\mathbf{x}\boldsymbol{\beta}\prime}N\_{\mathbf{x}\boldsymbol{q}}) = (\boldsymbol{a}, \boldsymbol{\Gamma}) \begin{pmatrix} N\_{\boldsymbol{\beta}\prime\boldsymbol{\beta}} & N\_{\boldsymbol{\beta}\boldsymbol{q}} \\ N\_{\boldsymbol{q}\boldsymbol{\beta}} & N\_{\boldsymbol{q}\boldsymbol{q}} \end{pmatrix}. \tag{A14}$$

Taking partitioned inversion in (A14) results in *<sup>α</sup>* <sup>=</sup> *Nx<sup>β</sup>*·*qN*−<sup>1</sup> *ββ*·*<sup>q</sup>* <sup>=</sup> <sup>Σ</sup>*xβ*Σ−<sup>1</sup> *ββ* . For the second expression, we apply the identities in (A13) to (A11) to find

$$
\Omega = N\_{\rm xx} - (a, \Gamma) \left( \begin{array}{c} N\_{\beta \beta} & N\_{\beta \eta} \\ N\_{q \beta} & N\_{q \eta} \end{array} \right) \left( \begin{array}{c} a' \\ \Gamma' \end{array} \right) . \tag{A15}$$

Inserting (A14) into (A15) and taking its partitioned inversion, we arrive at

$$
\Omega = \mathcal{N}\_{\mathbf{x}\mathbf{x}} - (\mathcal{N}\_{\mathbf{x}\mathbf{\tilde{\rho}}}, \mathcal{N}\_{\mathbf{x}\mathbf{q}}) \left(\begin{array}{c} \mathcal{N}\_{\mathbf{\tilde{\rho}}\boldsymbol{\beta}} & \mathcal{N}\_{\mathbf{\tilde{\rho}}\boldsymbol{q}} \\ \mathcal{N}\_{\mathbf{q}\boldsymbol{\beta}} & \mathcal{N}\_{\mathbf{q}\boldsymbol{q}} \end{array}\right)^{-1} \left(\begin{array}{c} \mathcal{N}\_{\mathbf{\tilde{\rho}}\boldsymbol{x}} \\ \mathcal{N}\_{\mathbf{q}\boldsymbol{x}} \end{array}\right) = \mathcal{N}\_{\mathbf{x}\mathbf{x}\cdot\mathbf{q}} - \mathcal{N}\_{\mathbf{x}\mathbf{\tilde{\rho}}\cdot\mathbf{q}} \mathcal{N}\_{\mathbf{\tilde{\rho}}\boldsymbol{\beta}\cdot\mathbf{q}}^{-1} \mathcal{N}\_{\mathbf{\tilde{\rho}}\boldsymbol{x}\cdot\mathbf{q}} = \Sigma\_{\mathbf{x}\mathbf{x}} - a\Sigma\_{\mathbf{\tilde{\rho}}\mathbf{x}},
$$

by noting from (A14) that *Nx<sup>β</sup>*·*qN*−<sup>1</sup> *ββ*·*<sup>q</sup>* <sup>=</sup> *<sup>α</sup>* holds.

Recalling the decomposition Δ*Xt* = (Δ*Y <sup>t</sup>* , Δ*Z t*) , we find the following equivalence in the lower-right submatrix of (A7) in Lemma A3:

*Econometrics* **2019**, *7*, 42

$$
\begin{pmatrix}
\begin{array}{cc}
\Sigma\_{xx} & \Sigma\_{x\emptyset} \\
\Sigma\_{\beta x} & \Sigma\_{\beta \emptyset}
\end{array}
\end{pmatrix} = \begin{pmatrix}
\begin{array}{cc}
\Sigma\_{yy} & \Sigma\_{yz} & \Sigma\_{y\emptyset} \\
\Sigma\_{zy} & \Sigma\_{zz} & \Sigma\_{z\emptyset} \\
\Sigma\_{\beta y} & \Sigma\_{\beta z} & \Sigma\_{\beta \emptyset}
\end{array}
\end{pmatrix} = \Sigma.
$$

Under the normality assumption for *ε<sup>t</sup>* as in HJNR, we could form the conditional variance of the two elements Δ*Yt* and *β Xt*−<sup>1</sup> + *tγ<sup>j</sup>* given the element Δ*Zt*. Moving away from normality under Assumptions 2 and 3, we need to consider instead the limit of a product moment matrix consisting of *linear combinations* of these elements, defined in the following manner:

$$
\begin{pmatrix}
\Sigma\_{yy\cdot z} & \Sigma\_{y\circledast z} \\
\Sigma\_{\beta y\cdot z} & \Sigma\_{\beta \bar{\beta}\cdot z}
\end{pmatrix} = A^\prime \Sigma A = \begin{pmatrix}
\Sigma\_{yy} & \Sigma\_{y\circledast} \\
\Sigma\_{\beta y} & \Sigma\_{\beta \bar{\beta}}
\end{pmatrix} - \begin{pmatrix}
\Sigma\_{yz} \\
\Sigma\_{\beta z}
\end{pmatrix} \Sigma\_{zz}^{-1} \begin{pmatrix}
\Sigma\_{zy}, \Sigma\_{z\bar{\beta}}
\end{pmatrix},
\tag{A16}
$$

.

which appears in (A17) in Lemma A6, and where

$$A' = \begin{pmatrix} I & -\Sigma\_{\theta z} \Sigma\_{zz}^{-1} & 0 \\ 0 & -\Sigma\_{\theta z} \Sigma\_{zz}^{-1} & I \end{pmatrix}.$$

Let us recall the weak exogeneity condition *α<sup>z</sup>* = 0, which implies

$$\boldsymbol{\alpha} = \begin{pmatrix} \boldsymbol{\alpha}\_{\boldsymbol{y}} \\ \boldsymbol{0} \end{pmatrix} \quad \text{and} \quad \boldsymbol{\alpha}\_{\perp} = \begin{pmatrix} \boldsymbol{\alpha}\_{\boldsymbol{y}\perp} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{I}\_{\boldsymbol{p}-\boldsymbol{m}} \end{pmatrix}$$

Finally, recall from (4) that the limit variance of innovations in the partial equation equals <sup>Ω</sup>*yy*·*<sup>z</sup>* <sup>=</sup> <sup>Ω</sup>*yy* <sup>−</sup> <sup>Ω</sup>*yz*Ω−<sup>1</sup> *zz* <sup>Ω</sup>*zy* under Assumption A1. Within each sub-sample period, the setup here is identical to that of HJNR. We therefore obtain the following equation, which adapts Equation (10).6 in Johansen (1995, Lemma 10.1).

**Lemma A4** (HJNR, Lemma 4)**.** *Suppose that Assumptions 1 and A1 are satisfied under α<sup>z</sup>* = 0*. Then,*

$$a\_{y\perp}(a'\_{y\perp}\Omega\_{\mathcal{Y}\mathcal{Y}^\*z}a\_{y\perp})^{-1}a'\_{y\perp} = \Sigma^{-1}\_{yy\cdot z} - \Sigma^{-1}\_{yy\cdot z}\Sigma\_{y\cdot \mathcal{Y}\cdot z}(\Sigma\_{\mathcal{Y}\mathcal{Y}^\*z}\Sigma\_{y\cdot \mathcal{Y}\cdot z}^{-1}\Sigma\_{y\cdot \mathcal{Y}\cdot z})^{-1}\Sigma\_{\mathcal{Y}\mathcal{Y}\cdot z}\Sigma^{-1}\_{yy\cdot z}.$$

We now explore the limit of the common trends within each sub-sample period. Define

$$B\_T^{\*\prime} = \begin{pmatrix} \mathfrak{a}\_\perp^{\prime} \Gamma & 0\\ 0 & T^{-1/2} \end{pmatrix} \begin{pmatrix} I\_p & -\mathfrak{r}\_{\ell,j} \\ 0 & 1 \end{pmatrix} \quad \text{and} \quad X\_t^\* = \begin{pmatrix} X\_{t-1} \\ t \end{pmatrix},$$

and the next lemma is a combination of Lemma A.1 in JMN and Lemma 5 in HJNR.

**Lemma A5.** *Suppose that Assumptions 1 and A1 are satisfied. Consider the* (*p* − *r* + 1)*-dimensional process T*−1/2*B*∗ *<sup>T</sup> X*<sup>∗</sup> int(*Tu*) *on D*[0, 1] *endowed with the Skorokhod metric with common distortion across the dimensions. Let Wu be a* (*<sup>p</sup>* − *<sup>r</sup>*)*-dimensional Brownian motion with variance* <sup>Ω</sup> *for* 0 ≤ *<sup>u</sup>* ≤ 1. *For <sup>υ</sup>j*−<sup>1</sup> ≤ *<sup>u</sup>* < *<sup>υ</sup><sup>j</sup> and* 1 ≤ *j* ≤ *q*, *the process X*<sup>∗</sup> int(*Tu*) *satisfies*

$$T^{-1/2}B\_T^{\*\prime}(X\_{\text{int}(T\mathfrak{u})}^\* - X\_{\text{int}(Tv\_{j-1})}^\*) \stackrel{\mathcal{D}}{\rightarrow} \left\{ \begin{array}{c} \alpha\_\perp^\prime \left(W\_\mathcal{U} - W\_{v\_{j-1}}\right) \\ \mu - v\_{j-1} \end{array} \right\}.$$

*The convergence holds jointly for* 1 ≤ *j* ≤ *q and* 0 ≤ *u* ≤ 1*.*

**Proof of Lemma A5.** The Granger–Johansen representation (18) implies that, for 1 ≤ *j* ≤ *q* and *Tj*−<sup>1</sup> + *<sup>k</sup>* < *<sup>t</sup>* ≤ *Tj*,

$$T^{-1/2}a'\_{\perp}\Gamma(X\_t - \pi\_{\ell,j}t) \approx T^{-1/2}a'\_{\perp}\Gamma(\mathbb{C}\sum\_{s=T\_{j-1}+k+1}^t \varepsilon\_s + lI\_t + \pi\_{\mathfrak{c},j})\dots$$

Since *α* ⊥Γ*<sup>C</sup>* <sup>=</sup> *<sup>α</sup>* <sup>⊥</sup> has full row rank and *Ut* is bounded in probability as shown in the proof of Lemma A3, the random walk component *α* <sup>⊥</sup> <sup>∑</sup>*<sup>t</sup> <sup>s</sup>*=*Tj*−1+*k*+<sup>1</sup> *<sup>ε</sup><sup>s</sup>* dominates *<sup>α</sup>* ⊥Γ*Ut*. The initial value *τc*,*<sup>j</sup>* could be large when *j* > 1, but it is eliminated when taking differences *X*<sup>∗</sup> int(*Tu*) − *X*<sup>∗</sup> int(*Tυj*−1) . Thus, the first element of *T*−1/2*B*∗ *<sup>T</sup>* (*X*<sup>∗</sup> int(*Tu*) − *X*<sup>∗</sup> int(*Tυj*−1) ) converges to *α* <sup>⊥</sup>(*Wu* <sup>−</sup> *<sup>W</sup>υj*−<sup>1</sup> ) by the Functional Central Limit Theorem (A5) in Assumption A1. The second element also converges as desired since *T*−1int(*Tu*) converges to *u*. As the number of breaks is finite, the convergence holds jointly for 1 ≤ *j* ≤ *q*.

Decompose *Wu* = *W* <sup>1</sup>*u*, *W* 2*u* , in which the dimensions of *W*1*<sup>u</sup>* and *W*2*<sup>u</sup>* are *m* and *p* − *m*, respectively. Let us recall the notation *X*- *<sup>t</sup>*−<sup>1</sup> = (*X <sup>t</sup>*−1, *tE t*) , which was used in reduced rank regression in Section 3.1. The next lemma establishes the asymptotic theory for the product moment matrices *Sij* for *i*, *j* = 0, 1 defined in (24).

**Lemma A6.** *Suppose that Assumptions 1 and A1 are satisfied under α<sup>z</sup>* = 0*. Define β*- = (*β* , *γ*) *, τ*- = (*τ*-,1,..., *τ*-,*q*) <sup>∈</sup> *<sup>R</sup>p*×*<sup>q</sup> and*

$$B\_T^{\ell\prime} = \left( \begin{array}{cc} \alpha\_\perp^{\prime} \Gamma & 0\\ 0 & T^{-1/2} I\_q \end{array} \right) \left( \begin{array}{cc} I\_p & -\mathfrak{r}\_\ell \\ 0 & I\_q \end{array} \right) \cdot$$

*It then follows that*

$$
\begin{pmatrix}
\mathcal{S}\_{00} & \mathcal{S}\_{01}\mathcal{J}^{\ell} \\
\mathcal{J}^{\ell\ell}\mathcal{S}\_{10} & \mathcal{J}^{\ell\ell}\mathcal{S}\_{11}\mathcal{J}^{\ell}
\end{pmatrix}
\quad \stackrel{\mathsf{P}}{\rightarrow} \quad \begin{pmatrix}
\Sigma\_{yy\cdot z} & \Sigma\_{y\cdot \ell \cdot z} \\
\Sigma\_{\beta y\cdot z} & \Sigma\_{\beta \cdot \ell \cdot z}
\end{pmatrix},
\tag{A17}
$$

$$T^{-1}B\_T^{\ell\ell}S\_{11}B\_T^{\ell} \quad \xrightarrow{\mathcal{D}} \quad \int\_0^1 F\_u F\_u' du\_\prime \tag{A18}$$

$$B\_T^{lf} \left( S\_{10} - S\_{11} \beta^\ell a\_y' \right) \quad \xrightarrow{\mathcal{D}} \quad \int\_0^1 F\_u d\left( W\_{1u} - \omega W\_{2u} \right),\tag{A19}$$

$$B\_T^{\ell\prime} S\_{11} \beta^{\ell} \quad = \quad \bullet\_{\mathbb{P}} (1), \tag{A20}$$

*where*

$$F\_{\rm{il}} = \left(\begin{array}{c} \mathfrak{a}\_{\perp}^{\prime} \,\!\!\!W\_{\rm{il}} \\ \mathfrak{a}\mathfrak{e}\_{\rm{il}} \end{array}\right) - \int\_{0}^{1} \left(\begin{array}{c} \mathfrak{a}\_{\perp}^{\prime} \,\!\!W\_{\rm{s}} \\ \mathfrak{s}\mathfrak{e}\_{\rm{s}} \end{array}\right) \mathfrak{e}\_{\rm{s}}^{\prime} \mathrm{ds} \left(\int\_{0}^{1} \mathfrak{e}\_{\rm{s}} \mathfrak{e}\_{\rm{s}}^{\prime} \mathrm{ds}\right)^{-1} \mathfrak{e}\_{\rm{l}} \dots$$

**Proof of Lemma A6.** Recall the decomposition Δ*Xt* = (Δ*Y <sup>t</sup>* , Δ*Z t*) .

For (A17), we start with the left-hand side of (A12) and further partial out Δ*Zt* from the process, to which we then apply the Law of Large Numbers (A4) in Assumption A1. Follow the proof of Lemma A3 afterwards, supplemented with the definition of the limit expression (A16), in order to verify (A17).

For (A18), use Lemma A5, the continuous mapping theorem and Johansen (1995, Lemma 10.3).

For (A19), we note *B*- *T <sup>S</sup>*<sup>10</sup> <sup>−</sup> *<sup>S</sup>*11*βα y* = *B*- *<sup>T</sup> S*1*ε*. The Law of Large Numbers (A4) in Assumption A1 implies *B*- *<sup>T</sup> <sup>S</sup>*1*<sup>ε</sup>* <sup>=</sup> *<sup>B</sup>*- *<sup>T</sup>* (*T* − *k*) <sup>−</sup><sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=*k*+<sup>1</sup> Z1,*t*−1*ε <sup>y</sup>*·*z*,*<sup>t</sup>* <sup>+</sup> <sup>o</sup>P(1), in which <sup>Z</sup>1,*<sup>t</sup>* is the demeaned version of *X*- *<sup>t</sup>* as defined in (23). By (A3) and the Granger–Johansen representation in Theorem 1, we can replace *B*- *<sup>T</sup>* <sup>Z</sup>1,*<sup>t</sup>* with the demeaned version of (∑*<sup>t</sup> <sup>s</sup>*=*k*+<sup>1</sup> *ε <sup>s</sup>α*⊥, *tE <sup>t</sup>*). The stochastic integral (A6) in Assumption A1 then gives (A19).

For (A20), we follow the strategy used for (A19); see also Johansen (1995, Lemma 10.3).

#### *Appendix C.3. Proofs of the Theorems in Section 3*

**Proof of Theorem 2.** Follow the proof of Theorem 11.1 in Johansen (1995) by using Lemmas A4 and A6 given above instead of his Lemmas 10.1 and 10.3, and also utilise invariance properties with respect to non-singular linear transformations as in the proof of Theorem 1 in HJNR.

**Proof of Theorem 3.** The proof presented here is based on Doornik (1998, §9). The asymptotic distribution of the *LR* test statistic in the partial model in Theorem 2 is rewritten as

$$\begin{aligned} &\text{tr}\left\{\int\_{0}^{1} dB\_{\boldsymbol{u}}^{(m-r)} G\_{\boldsymbol{u}}^{\prime} \left(\int\_{0}^{1} G\_{\boldsymbol{u}} G\_{\boldsymbol{u}}^{\prime} d\boldsymbol{u}\right)^{-1} \int\_{0}^{1} G\_{\boldsymbol{u}} dB\_{\boldsymbol{u}}^{(m-r)\prime} \right\} \\ &= \sum\_{i=1}^{m-r} \int\_{0}^{1} dB\_{i,\boldsymbol{u}} G\_{\boldsymbol{u}}^{\prime} \left(\int\_{0}^{1} G\_{\boldsymbol{u}} G\_{\boldsymbol{u}}^{\prime} d\boldsymbol{u}\right)^{-1} \int\_{0}^{1} G\_{\boldsymbol{u}} dB\_{i,\boldsymbol{u}}^{\prime} = \sum\_{i=1}^{m-r} \mathbb{T}\_{i}.\end{aligned}$$

The process T*<sup>i</sup>* for *i* = 1, ... , *m* − *r* is a function of *Bi*,*<sup>u</sup>* and *Gu*, both of which are functions of the (*p* − *r*)-dimensional standard Brownian motion *Bu*. Inspection of these functions shows that they are invariant to the relabelling of the coordinates of *Bu*, so that T1, ... ,T*m*−*<sup>r</sup>* are identically distributed and any pairs T*j*,T*<sup>k</sup>* are also identically distributed. Hence,

$$\begin{split} \mathsf{E}\left(\sum\_{i=1}^{m-r} \mathsf{T}\_{i}\right) &=& \sum\_{i=1}^{m-r} \mathsf{E}\left(\mathsf{T}\_{i}\right) = \left(m-r\right) \mathsf{E}\left(\mathsf{T}\_{1}\right), \\ \mathsf{Var}\left(\sum\_{i=1}^{m-r} \mathsf{T}\_{i}\right) &=& \sum\_{j=1}^{m-r} \sum\_{k=1}^{m-r} \mathsf{Var}\left(\mathsf{T}\_{j}\mathsf{T}\_{k}\right) = \sum\_{j=1}^{m-r} \mathsf{Var}\left(\mathsf{T}\_{j}\right) + \sum\_{j\neq k}^{m-r} \mathsf{Var}\left(\mathsf{T}\_{j}\mathsf{T}\_{k}\right), \\ &=& \left(m-r\right) \mathsf{Var}\left(\mathsf{T}\_{1}\right) + \left(m-r\right) \left(m-r-1\right) \mathsf{Cov}\left(\mathsf{T}\_{1}, \mathsf{T}\_{2}\right). \end{split}$$

In order to relate the moments of the limit distributions of the *LR* test statistics in the partial and full models, we evaluate the above expressions for *m* − *r* in general and for *m* − *r* = *p* − *r*. For the means of the limit distributions, we find

$$\mathbb{E}\left(\sum\_{i=1}^{m-r} \mathsf{T}\_i\right) = (m-r)\mathbb{E}\left(\mathsf{T}\_1\right) \quad \text{and} \quad \mathbb{E}\left(\sum\_{i=1}^{p-r} \mathsf{T}\_i\right) = (p-r)\mathbb{E}\left(\mathsf{T}\_1\right).$$

Solving both equations for E (T1) and equating the resulting expressions yield

$$\mathbb{E}\left(\sum\_{i=1}^{m-r} \mathsf{T}\_i\right) = \left(\frac{m-r}{p-r}\right) \mathbb{E}\left(\sum\_{i=1}^{p-r} \mathsf{T}\_i\right).$$

For their variances, we obtain a set of equations similarly, which are solved for Var(T1) to find the desired expression.

#### **References**

Anderson, Theodore W., and Naoto Kunitomo. 1992. Asymptotic distributions of regression and autoregression coefficients with martingale difference disturbances. *Journal of Multivariate Analysis* 40: 221–43. [CrossRef]

Bårdsen, Gunnar, Øyvind Eitrheim, Eilev S. Jansen, and Ragnar Nymoen. 2005. *The Econometrics of Macroeconomic Modelling*. Oxford: Oxford University Press.

Berenguer-Rico, Vanessa, and Bent Nielsen. 2017. *Marked and Weighted Empirical Processes of Residuals with Applications to Robust Regressions*. Discussion Paper 841. Oxford: Department of Economics, University of Oxford.

Berenguer-Rico, Vanessa, and Ines Wilms. 2018. *White Heteroscedasticity Testing in Robust Regressions*. Discussion Paper 853. Oxford: University of Oxford, Department of Economics.

Billingsley, Patrick. 1968. *Convergence of Probability Measures*. New York: John Wiley & Sons.

Boswijk, H. Peter. 1992. *Cointegration, Identification and Exogeneity*, 3rd ed. Tinbergen Institute Research Series. Amsterdam: Thesis Publishers, volume 37.


sur l'espace *D*<sup>1</sup> de skorokhod. *Probability Theory and Related Fields* 81: 111–37. [CrossRef]

Johansen, Søren. 1988. Statistical analysis of cointegration vectors. *Journal of Economic Dynamics & Control* 12: 231–54.


Juselius, Katarina 2006. *The Cointegrated VAR Model*. Oxford: Oxford University Press.


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
