*Article* **Modeling I(2) Processes Using Vector Autoregressions Where the Lag Length Increases with the Sample Size**

#### **Yuanyuan Li and Dietmar Bauer \***

Faculty of Business Administration and Economics, Bielefeld University, Universitätsstrasse 25, D-33615 Bielefeld, Germany; yuan\_yuan.li@uni-bielefeld.de

**\*** Correspondence: Dietmar.Bauer@uni-bielefeld.de

Received: 4 May 2018; Accepted: 8 September 2020; Published: 17 September 2020

**Abstract:** In this paper the theory on the estimation of vector autoregressive (VAR) models for I(2) processes is extended to the case of long VAR approximation of more general processes. Hereby the order of the autoregression is allowed to tend to infinity at a certain rate depending on the sample size. We deal with unrestricted OLS estimators (in the model formulated in levels as well as in vector error correction form) as well as with two stage estimation (2SI2) in the vector error correction model (VECM) formulation. Our main results are analogous to the I(1) case: We show that the long VAR approximation leads to consistent estimates of the long and short run dynamics. Furthermore, tests on the autoregressive coefficients follow standard asymptotics. The pseudo likelihood ratio tests on the cointegrating ranks (using the Gaussian likelihood) used in the 2SI2 algorithm show under the null hypothesis the same distributions as in the case of data generating processes following finite order VARs. The same holds true for the asymptotic distribution of the long run dynamics both in the unrestricted VECM estimation and the reduced rank regression in the 2SI2 algorithm. Building on these results we show that if the data is generated by an invertible VARMA process, the VAR approximation can be used in order to derive a consistent initial estimator for subsequent pseudo likelihood optimization in the VARMA model.

**Keywords:** vector autoregressions; vector error correction model; integrated processes of order two

#### **1. Introduction**

Many macroeconomic variables have been found to exhibit trend-like behaviour that can be modelled by using vector autoregressions (VARs). Katarina Juselius (2006) states that empirical modelling led to the development of I(1) and I(2) models since certain features of the datasets considered required including first and second differences in order to obtain stationary time series. Additionally cointegrating relations were found in the corresponding analyses. Similar findings have reoccurred numerous times in the literature for example related to money demand Johansen (1992b); Juselius (1994), inflation Banerjee et al. (2001); Georgoutsos and Kouretas (2004), interest rates and real exchange rates Johansen et al. (2007); Juselius and Assenmacher (2017); Juselius and Stillwagon (2018); Stillwagon (2018) to mention only a few sources.

The predominant methodological approach to model integration and cointegration in the I(1) and the I(2) case in the vector autoregressive (VAR) framework has been established mainly by Søren Johansen and Katarina Juselius together with a number of coauthors (see the lists of references in Johansen (1995); Juselius (2006) for details) building on vector error correction models (see Engle and Granger (1987) for early comments on the history of using error correction models for co-integrated processes). Extending the main ideas for cointegration modeling for the I(1) setting Johansen (1997) see, e.g., Johansen (1992a) suggested a representation for the I(2) case. Johansen (1997) established asymptotic distributions for the suggested two step I(2) estimator (2SI2) as an approximation to pseudo maximum likelihood

estimation involving numerical optimization. Asymptotics for the corresponding likelihood ratio tests has been developed in Paruolo (1994, 1996), its asymptotic equivalence to pseudo likelihood (using the Gaussian distribution) optimization (and hence in a certain sense statistical efficiency) is shown in Paruolo (2000). However, Nielsen and Rahbek (2007) shows that in finite samples the likelihood ratio test has size advantages. The testing of restrictions on the parameters has been investigated by Boswijk and Doornik (2004); Boswijk and Paruolo (2017); Johansen and Lütkepohl (2005). Due to the implicit vector error correction (VECM) modeling, deterministic terms in the VECM produce complex deterministic terms in the solutions processes. In the I(2) context Nielsen and Rahbek (2007); Paruolo (1994, 2006); Rahbek et al. (1999); Kurita et al. (2011) discuss the impacts of deterministic terms.

As the VECM representation includes the representation of reduced rank matrices by a product of two matrices, identification conditions are of particular importance, see Juselius (2006); Mosconi and Paruolo (2013, 2017). In this context also weak exogeneity has been studied Kurita (2012); Paruolo and Rahbek (1999).

The main idea underlying the VECM approach for estimating VAR models in the I(2) context is to reparameterize the problem such that integration and cointegration properties relate to the rank of two matrices. Assuming the data generating process to be a VAR of known finite order, the rank of matrices can be tested using (pseudo) likelihood ratio tests.

Sometimes the assumption of known order is not justified. For example it is known that a subset of variables that are generated using a finite order VAR cannot be described by a finite order VAR, but instead requires a vector autoregressive moving average (VARMA) model. However, the class of VARs provides flexibility in the sense that a VAR of infinite order can represent a large set of linear dynamical systems including all invertible VARMA systems. For stationary processes Berk (1974) and Lewis and Reinsel (1985) show that by letting the order of the VAR tend to infinity at a suitable function of the sample size, consistent estimation of the underlying transfer function can be achieved for data generating processes that can be described by a VAR(∞) subject to mild assumptions on the summability of the VAR coefficients. Additionally Lewis and Reinsel (1985) also establishes asymptotic normality (in a very specific sense) of linear combinations of the estimated autoregressive coefficients. Hannan and Deistler (1988) make the concepts operational by showing that in the case of a VARMA process generating the dataset the required rate of letting the order tend to infinity can be estimated using BIC model selection.

In the case of I(1) processes the estimation theory for long VAR approximations to VARMA processes has been extended based on the techniques in the stationary case of Lewis and Reinsel in a series of papers by Saikkonen and coauthors Saikkonen (1991, 1992); Lütkepohl and Saikkonen (1997); Saikkonen and Lütkepohl (1996); Saikkonen and Luukkonen (1997). Additionally also the Johansen framework of rank restricted estimation in the VECM model has been extended to the long VAR approximations by Saikkonen and Luukkonen (1997). Bauer and Wagner (2004) provide extensions to the multi frequency I(1) case where unit roots may occur at the seasonal frequencies.

For the I(2) case no such extensions are currently known. This is the research gap this paper tries to fill: First we establish consistency and asymptotic normality of estimated autoregressive coefficients (in the sense of Lewis and Reinsel) for unrestricted ordinary least squares (OLS) estimation in the VECM representation. This can be used in order to derive Wald type tests of linear restrictions on the autoregressive parameters. Secondly, we extend the rank restricted regression techniques in the I(2) case to the long VAR approximations showing that the asymptotics (for estimated cointegrating relations, likelihood ratio tests and the two step estimation procedures) are identical in the case of long VAR approximations and VARs of finite known order. Third, we show that if the data generating process is an invertible VARMA process the long VAR system estimator can be used in order to obtain consistent initial estimators for subsequent pseudo likelihood maximization in the VARMA model class. In all results we limit ourselves to the case of no deterministic terms being included in the VECM representation. The inclusion of deterministic terms requires changing the test distribution, compare the theory contained for example in Rahbek et al. (1999).

The paper is organized as follows: In the next section the data generating process and the main assumptions are described. Section 3 then provides the results for the unrestricted estimation. Section 4 deals with rank restricted regression in the 2SI2 procedure, while Section 5 investigates the initial guess in the VARMA setting for subsequent pseudo likelihood maximization. Finally Section 6 concludes the paper. Proofs are relegated to an appendix.

Throughout the paper we will use the notation introduced by Johansen (1997): For a matrix *<sup>C</sup>* <sup>∈</sup> <sup>R</sup>*p*×*<sup>s</sup>* ,*s* < *p*, of full column rank we use the notation *C*¯ = *C*(*C <sup>C</sup>*)−1. Furthermore, *<sup>C</sup>*<sup>⊥</sup> denotes a full column rank matrix of dimension *p* × (*p* − *s*) such that *C* ⊥*<sup>C</sup>* <sup>=</sup> 0. Whenever this notation is used the particular choice of *<sup>C</sup>*<sup>⊥</sup> is not of importance. For a matrix *<sup>C</sup>* = (*Ci*,*j*) <sup>∈</sup> <sup>R</sup>*p*×*<sup>s</sup>* we let *C* denote the Frobenius norm *C* = ∑*p <sup>i</sup>*=<sup>1</sup> <sup>∑</sup>*<sup>s</sup> <sup>j</sup>*=<sup>1</sup> *C*<sup>2</sup> *i*,*j* .

#### **2. Data Generating Process and Assumptions**

In this paper we use the following assumptions on the data generating process:

**Assumption 1** (DGP)**.** *The process* (*yt*)*t*∈Z, *yt* <sup>∈</sup> <sup>R</sup>*p*, *is generated from the difference equation for t* <sup>∈</sup> <sup>Z</sup>*:*

$$
\Delta^2 y\_t = a\beta' y\_{t-1} + \Gamma \Delta y\_{t-1} + \sum\_{j=1}^{\infty} \Pi\_j \Delta^2 y\_{t-j} + \varepsilon\_t \tag{1}
$$

*where <sup>α</sup>*, *<sup>β</sup>* <sup>∈</sup> <sup>R</sup>*p*×*<sup>r</sup>* , 0 ≤ *r* < *p are full column rank matrices,* Δ = (1 − *L*) *with L denoting the backward shift operator such that <sup>L</sup>*(*yt*)*t*∈<sup>Z</sup> = (*yt*−1)*t*∈Z*. The matrix function <sup>A</sup>*(*z*)=(<sup>1</sup> <sup>−</sup> *<sup>z</sup>*)<sup>2</sup> *Ip* <sup>−</sup> *αβ z* − Γ*z*(1 − *z*) − ∑<sup>∞</sup> *<sup>j</sup>*=<sup>1</sup> <sup>Π</sup>*j*(<sup>1</sup> <sup>−</sup> *<sup>z</sup>*)2*z<sup>j</sup> fulfills the special marginal stability condition that*

$$|A(z)| = 0 \quad \text{implies that} \quad |z| > 1 \quad \text{or} \quad z = 1. \tag{2}$$

*Furthermore, there exists a real δ* > 0 *such that the power series defining A*(*z*) *converges absolutely for* |*z*| < 1 + *<sup>δ</sup>. Define <sup>β</sup>*<sup>2</sup> = *<sup>β</sup>*⊥*η*⊥, *<sup>α</sup>*<sup>2</sup> = *<sup>α</sup>*⊥*ζ*<sup>⊥</sup> *where <sup>α</sup>* <sup>⊥</sup>Γ*β*<sup>⊥</sup> <sup>=</sup> *ζη* , *<sup>η</sup>*, *<sup>ζ</sup>* <sup>∈</sup> <sup>R</sup>(*p*−*r*)×*<sup>s</sup> are of full column rank s* < *p* − *r. Then it is assumed that the matrix*

$$a\_2'(I\_p + \Gamma \not\mathbb{R} \mathbb{R}'\Gamma - \sum\_{j=1}^{\infty} \Gamma \mathbb{I}\_j)\beta\_2 \tag{3}$$

*is nonsingular.*

*Furthermore, the process* (*εt*)*t*∈<sup>Z</sup> *denotes independent identically distributed (iid) white noise with mean zero and variance* Σ > 0*.*

It is well known that the conditions (2) and (3) are necessary and sufficient for the existence of solutions to the difference equation that are I(2) processes, see for example Johansen (1992a). Moreover, note that the assumption of absolute convergence of *<sup>A</sup>*(*z*) for <sup>|</sup>*z*<sup>|</sup> <sup>&</sup>lt; <sup>1</sup> <sup>+</sup> *<sup>δ</sup>* implies that <sup>∑</sup><sup>∞</sup> *<sup>j</sup>*=<sup>0</sup> *j <sup>k</sup>*Π*j* <sup>&</sup>lt; <sup>∞</sup> for every *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>. In particular <sup>∑</sup><sup>∞</sup> *<sup>j</sup>*=<sup>0</sup> *j* <sup>2</sup>Π*j* <sup>&</sup>lt; <sup>∞</sup> follows as will be used frequently below.

Every vector autoregressive function *A*(*z*) corresponding to the autoregression *A*(*L*)*yt* = *εt*, that fulfills Assumption 1, allows a representation as *<sup>A</sup>*(*z*)=(<sup>1</sup> <sup>−</sup> *<sup>z</sup>*)<sup>2</sup> *Ip* <sup>−</sup> *αβ z* − Γ*z*(1 − *z*) − ∑<sup>∞</sup> *<sup>j</sup>*=<sup>1</sup> <sup>Π</sup>*j*(<sup>1</sup> <sup>−</sup> *<sup>z</sup>*)2*z<sup>j</sup>* <sup>=</sup> *<sup>g</sup>*˜(*z*)*B*˜(*z*), *<sup>B</sup>*˜(*z*)=(<sup>1</sup> <sup>−</sup> *<sup>z</sup>*)<sup>2</sup> *Ip* <sup>−</sup> <sup>Π</sup>˜ *<sup>z</sup>* <sup>−</sup> <sup>Γ</sup>˜ *<sup>z</sup>*(<sup>1</sup> <sup>−</sup> *<sup>z</sup>*), *<sup>g</sup>*˜(*z*) = *Ip* <sup>+</sup> <sup>∑</sup><sup>∞</sup> *<sup>j</sup>*=<sup>1</sup> *Gjz<sup>j</sup>* . This can be seen as follows:

*<sup>ε</sup><sup>t</sup>* <sup>=</sup> *<sup>A</sup>*(*L*)*yt* = (*A*(1) <sup>−</sup> *<sup>A</sup>*˙(1)<sup>Δ</sup> <sup>+</sup> *<sup>A</sup>*∗(*L*)Δ2)*yt* = (*A*(1) <sup>−</sup> *<sup>A</sup>*˙(1)<sup>Δ</sup> <sup>+</sup> *<sup>A</sup>*∗(*L*)Δ2)BB *yt* = [−*α*, 0, 0]+[*α*, 0, 0]<sup>Δ</sup> <sup>−</sup> <sup>Γ</sup>B<sup>Δ</sup> <sup>+</sup> *<sup>A</sup>*∗(*L*)BΔ<sup>2</sup> B *yt* = [−*α*, −Γ*β*1, −Γ*β*2]+[*α* − Γ*β*, *A*<sup>∗</sup> <sup>1</sup> (*L*), *A*<sup>∗</sup> <sup>2</sup> (*L*)]Δ + *A*∗ <sup>0</sup> (*L*), 0, 0 Δ2 ⎛ ⎜⎝ *β β* 1Δ *β* 2Δ ⎞ ⎟⎠ *yt* = [−*α*, −Γ*β*1, −Γ*β*<sup>2</sup> + *αα*¯ Γ*β*2]+[*α* − Γ*β*, *A*<sup>∗</sup> <sup>1</sup> (*L*), *<sup>A</sup>*˜ <sup>∗</sup> <sup>2</sup> (*L*)]Δ + *A*∗ <sup>0</sup> (*L*), 0, −*A*<sup>∗</sup> <sup>0</sup> (*L*)*α*¯ Γ*β*<sup>2</sup> Δ2 ⎛ ⎜⎝ *β* + *α*¯ Γ*β*2*β* 2Δ *β* 1Δ *β* 2Δ ⎞ ⎟⎠ *yt* = [−*α*, <sup>−</sup>Γ*β*1, *<sup>A</sup>*˜ <sup>∗</sup> <sup>2</sup> (*L*)] + [*α* − Γ*β*, *A*<sup>∗</sup> <sup>1</sup> (*L*), −*A*<sup>∗</sup> <sup>0</sup> (*L*)*α*¯ Γ*β*2]Δ + *A*∗ <sup>0</sup> (*L*), 0, 0 Δ2 ⎛ ⎜⎝ *β* + *α*¯ Γ*β*2*β* 2Δ *β* 1Δ *β* 2Δ<sup>2</sup> ⎞ ⎟⎠ *yt* = *g*(*L*)*B*(*L*)*yt*

where B = [*β*, *<sup>β</sup>*1, *<sup>β</sup>*2], *<sup>β</sup>*<sup>1</sup> = *<sup>β</sup>*⊥*η*, is without restriction of generality assumed to be an orthonormal matrix, *A*∗(*L*)B = [*A*<sup>∗</sup> <sup>0</sup> (*L*), *A*<sup>∗</sup> <sup>1</sup> (*L*), *A*<sup>∗</sup> <sup>2</sup> (*L*)], *A*(1) = −*αβ* , *<sup>A</sup>*˙(1) = <sup>−</sup>*αβ* <sup>+</sup> <sup>Γ</sup> and where we use that

$$
\Gamma \beta \gamma - \kappa \bar{\alpha}' \Gamma \beta \gamma = (I\_{\mathcal{V}} - \kappa \bar{\alpha}') \Gamma \beta \gamma = \bar{\alpha}\_{\perp} \mathfrak{a}\_{\perp}' \Gamma \beta\_{\perp} \eta\_{\perp} = 0.
$$

Here

$$B(L) = \begin{pmatrix} \beta' + \bar{\alpha}' \Gamma \beta\_2 \beta\_2' \Delta \\ \beta\_1' \Delta \\ \beta\_2' \Delta^2 \end{pmatrix}.$$

In this representation

$$\mathbf{g}(1) = \left[ -\mathfrak{a}\_{\prime} - \Gamma \beta\_1, \bar{A}\_2^\*(1) \right],$$

is nonsingular due to assumption (3). Furthermore, *g*(*z*) = ∑<sup>∞</sup> *<sup>j</sup>*=<sup>0</sup> *Gjz<sup>j</sup>* is a transfer function with ∑<sup>∞</sup> *<sup>j</sup>*=<sup>0</sup> *Gjj* <sup>2</sup> < ∞ since ∑<sup>∞</sup> *<sup>j</sup>*=<sup>1</sup> Π*jj* <sup>2</sup> < ∞ and thus the same holds for the power series coefficients *A*∗(*L*). Since |*B*(*z*)| = 0, *z* = 1 it follows that |*g*(*z*)| = 0, |*z*| ≤ 1. Therefore

$$B(L)y\_t = u\_t, \quad \text{g}(L)u\_t = \varepsilon\_t \tag{4}$$

is a VAR process. Note, however, that *g*(0) = *G*<sup>0</sup> = *Ip* in general. This constitutes a triangular representation of the process denoting *y*1,*<sup>t</sup>* = *β yt* <sup>∈</sup> <sup>R</sup>*p*<sup>1</sup> , *<sup>y</sup>*2,*<sup>t</sup>* <sup>=</sup> *<sup>β</sup>* <sup>1</sup>*yt* <sup>∈</sup> <sup>R</sup>*p*<sup>2</sup> , *<sup>y</sup>*3,*<sup>t</sup>* <sup>=</sup> *<sup>β</sup>* <sup>2</sup>*yt* <sup>∈</sup> <sup>R</sup>*p*<sup>3</sup> such that

$$\begin{aligned} y\_{1,t} &= -\mathbb{A}' \Gamma \beta\_2 \Delta y\_{3,t} + u\_{1,t} = A \Delta y\_{3,t} + u\_{1,t} \qquad A: p\_1 \times p\_3\\ \Delta y\_{2,t} &= u\_{2,t\_{\prime}}\\ \Delta^2 y\_{3,t} &= u\_{3,t} \end{aligned}$$

where *ut* = [*u* 1,*t* , *u* 2,*t* , *u* 3,*t*] has a VAR(∞) representation. Furthermore, defining

$$\begin{aligned} \mathcal{B}(L) &= \mathcal{B} \begin{pmatrix} I\_{p\_1} & 0 & -\mathbb{A}^\prime \Gamma \beta\_2 \\ 0 & I\_{p\_2} & 0 \\ 0 & 0 & I\_{p\_3} \end{pmatrix} B(L) = \Delta^2 I\_p + \beta \beta^\prime L + (\beta \beta^\prime + \beta \mathbb{A}^\prime \Gamma \beta\_2 \beta\_2^\prime + \beta\_1 \beta\_1^\prime) L \Delta, \\ \mathcal{G}(L) &= \mathcal{G}(L) \left( \mathcal{B} \begin{pmatrix} I\_{p\_1} & 0 & -\mathbb{A}^\prime \Gamma \beta\_2 \\ 0 & I\_{p\_2} & 0 \\ 0 & 0 & I\_{p\_3} \end{pmatrix} \right)^{-1} \end{aligned}$$

we obtain *A*(*L*) = *g*(*L*)*B*(*L*) = *g*˜(*L*)*B*˜(*L*) such that

$$\mathcal{B}(L)y\_t = \Delta^2 y\_t + \Gamma \mathcal{Y} y\_{t-1} + \Gamma \Delta y\_{t-1} = \upsilon\_t, \quad \mathfrak{F}(L)\upsilon\_t = \varepsilon\_t$$

is another representation of the process (*yt*)*t*∈<sup>Z</sup> with *<sup>B</sup>*˜(0) = *Ip*. It follows that the triangular representation can be seen as a special case where one has partial information on the matrices *β*, *β*1, *β*2. For estimation the VECM representation is approximated using a finite order *h*:

$$
\Delta^2 y\_t = \Phi y\_{t-1} + \Psi \Delta y\_{t-1} + \sum\_{j=1}^{h-2} \Pi\_j \Delta^2 y\_{t-j} + c\_t
$$

where *et* = *ε<sup>t</sup>* + *e*1*t*,*e*1*<sup>t</sup>* = ∑<sup>∞</sup> *<sup>j</sup>*=*h*−<sup>1</sup> <sup>Π</sup>*j*Δ<sup>2</sup>*yt*−*j*. As in the VECM representation the dimensions of *<sup>β</sup>*, *<sup>β</sup>*1, *<sup>β</sup>*<sup>2</sup> are linked to the rank of the matrices Φ and *α* ⊥Ψ*β*⊥. Restricting these matrices to be of particular rank is simpler than imposing the equivalent restrictions in the VAR(h) representation directly.

In the following we will first investigate the unrestricted ordinary least squares estimator in the VECM representation without taking rank restrictions into account. In the second step the 2SI2 procedure as presented in Paruolo (2000) for imposing the two rank restrictions in two steps is investigated.

For both procedures the selection of the order *h* is of importance. In this respect the following assumption will be used:

**Assumption 2** (Lag order *h*)**.** *The order h is chosen subject to the following restrictions:*

$$\begin{array}{ll} 1. & h = o(T^{1/5}). \\ 2. & T^{1/2} \sum\_{j=h+1}^{\infty} ||\Pi\_j|| \to 0 \text{ as } T, h \to \infty. \end{array}$$

This condition defines an upper bound for the order which is usually directly assured during order selection using for example information criteria. The upper bound is smaller than the usual rate *T*1/3 for technical reasons. The stronger bound is not needed for all results. However, the implications for practical applications are minor as for example in the range 1 <sup>≤</sup> *<sup>T</sup>* <sup>≤</sup> 950 we have 2.5*T*1/5 <sup>&</sup>gt; *<sup>T</sup>*1/3. The second condition of Assumption 2 implies a lower bound for the increase of *h* as a function of the sample size. Clearly ∑<sup>∞</sup> *<sup>j</sup>*=*h*+<sup>1</sup> Π*j* → 0 for *h* → ∞. The bound implies that for *h* = *h*(*T*) this convergence needs to be fast enough such that *T*1/2 ∑<sup>∞</sup> *<sup>j</sup>*=*h*(*T*)+<sup>1</sup> Π*j* still converges to zero. The lower bound depends on the underlying true parameters. For invertible VARMA processes – which can be seen as the leading case – Π*j* ≤ *Cρ j* <sup>0</sup> for some 0 ≤ *ρ*<sup>0</sup> < 1. Hannan and Deistler (1988) show that for an invertible stationary VARMA process the lower bound (in this case proportional to log *T*) can be achieved asymptotically by using BIC as the order selection procedure. Thus in this case also the stronger condition (*h* = *o*(*T*1/5)) is satisfied. Bauer and Wagner (2004) extend this result to the multi frequency I(1) setting. For the I(2) case no analogous result is known, although the developments of Bauer and Wagner (2004) suggest that a similar result holds also there. This is left for future research.

Therefore the difference between the 'usual' rates and the ones assumed above are deemed to be of minor practical consequences. Thus we are not explicit in the main text as to which results hold true under the less restrictive set of results and which do not. In the appendix, we will comment on this point, however.

#### **3. Unrestricted Estimation**

In this section the results of Lewis and Reinsel (1985) and Saikkonen and Lütkepohl (1996) are extended to the I(2) case. To simplify notation define *at*, *bt* <sup>=</sup> <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=*h*+<sup>1</sup> *atb <sup>t</sup>* for sequences *at*, *bt*, *t* = 1, ... , *T*. <sup>1</sup> Then the unrestricted least squares estimator in the finite VECM model uses the regressor vector *Zt*,*<sup>h</sup>* = [*y <sup>t</sup>*−1, <sup>Δ</sup>*y <sup>t</sup>*−1, <sup>Δ</sup>2*y <sup>t</sup>*−1, ... , <sup>Δ</sup>2*y <sup>t</sup>*−*h*+2] <sup>∈</sup> <sup>R</sup>*ph*. The corresponding ordinary least squares estimator is given as

<sup>1</sup> Here somewhat sloppily we use the same symbols for processes and their realizations.

*Econometrics* **2020**, *8*, 38

$$\begin{split} \left[\Phi, \Psi, \text{\textquotedblleft}\mathbf{1}\_{\text{l}}, \dots, \text{\textquotedblleft}\mathbf{1}\_{\text{l}-2}\right] &= \left[\left<\Delta^{2}y\_{t\prime}y\_{t-1}\right>, \left<\Delta^{2}y\_{t\prime}\Delta y\_{t-1}\right>, \left<\Delta^{2}y\_{t\prime}\Delta^{2}y\_{t-1}\right>, \dots, \left<\Delta^{2}y\_{t\prime}\Delta^{2}y\_{t-\hbar+2}\right>\right] \left^{-1} \\ &= \left<\Delta^{2}y\_{t\prime}Z\_{t\prime\hbar}\rangle \langle Z\_{t\prime\hbar\nu}Z\_{t\prime\hbar}\rangle^{-1}. \end{split}$$

The noise covariance is estimated from the residuals as usual as

$$\hat{\Sigma}\_{\mathfrak{c}} = N^{-1} \langle \hat{\varepsilon}\_{t}, \hat{\varepsilon}\_{t} \rangle, \quad \hat{\varepsilon}\_{t} = \Delta^{2} y\_{t} - \Phi y\_{t-1} - \Psi \Delta y\_{t-1} - \sum\_{j=1}^{n-2} \hat{\Pi}\_{j} \Delta^{2} y\_{t-j} \tag{5}$$

where *N* = *T* − *h* denotes the effective sample size.

#### *3.1. Estimation in the Triangular VECM Representation*

As typical for the cointegration framework, analysis is easier in the triangular representation which separates stationary components from I(1) and I(2) processes: Let *yt* = [*y* 1,*t* , *y* 2,*t* , *y* 3,*t*] <sup>∈</sup> <sup>R</sup>*<sup>p</sup>* where *yi*,*<sup>t</sup>* <sup>∈</sup> <sup>R</sup>*pi* is such that

$$\begin{aligned} y\_{1,t} &= A \Delta y\_{3,t} + u\_{1,t'}\\ \Delta y\_{2,t} &= u\_{2,t'}\\ \Delta^2 y\_{3,t} &= u\_{3,t} \end{aligned}$$

where *ut* = [*u* 1,*t* , *u* 2,*t* , *u* 3,*t*] has a VAR(∞) representation *g*(*L*)*ut* = *ε<sup>t</sup>* where

$$\mathbf{g}(0) = \begin{pmatrix} I & 0 & A \\ 0 & I & 0 \\ 0 & 0 & I \end{pmatrix}.$$

Note, however, that using the triangular representation implies that the matrix *B*(*L*) is known up the value of the matrix *A*. For applications this is the case only seldom.

Thus letting *g*(*z*) = *g*(1) + *g*∗(*z*)Δ we obtain

*ε<sup>t</sup>* = *g*(*L*) ⎛ ⎜⎝ *y*1,*<sup>t</sup>* − *A*Δ*y*3,*<sup>t</sup>* Δ*y*2,*<sup>t</sup>* Δ2*y*3,*<sup>t</sup>* ⎞ ⎟⎠ <sup>=</sup> *<sup>g</sup>*(*L*) ⎛ ⎜⎝ <sup>Δ</sup>2*y*1,*<sup>t</sup>* <sup>+</sup> <sup>Δ</sup>*y*1,*t*−<sup>1</sup> <sup>+</sup> *<sup>y</sup>*1,*t*−<sup>1</sup> <sup>−</sup> *<sup>A</sup>*Δ2*y*3,*<sup>t</sup>* <sup>−</sup> *<sup>A</sup>*Δ*y*3,*t*−<sup>1</sup> <sup>Δ</sup>2*y*2,*<sup>t</sup>* <sup>+</sup> <sup>Δ</sup>*y*2,*t*−<sup>1</sup> Δ2*y*3,*<sup>t</sup>* ⎞ ⎟⎠ = *g*(*L*) ⎛ ⎜⎝ *I* 0 −*A* 0 *I* 0 0 0 *I* ⎞ ⎟⎠ Δ2*yt* + *g*(*L*) ⎛ ⎜⎝ *y*1,*t*−<sup>1</sup> 0 0 ⎞ ⎟⎠ <sup>+</sup> *<sup>g</sup>*(*L*) ⎛ ⎜⎝ Δ*y*1,*t*−<sup>1</sup> − *A*Δ*y*3,*t*−<sup>1</sup> Δ*y*2,*t*−<sup>1</sup> 0 ⎞ ⎟⎠ = *g*˜(*L*)Δ2*yt* + [*g*(1) + *g*∗(*L*)Δ] ⎛ ⎜⎝ *y*1,*t*−<sup>1</sup> 0 0 ⎞ ⎟⎠ <sup>+</sup> *<sup>g</sup>*(1) ⎛ ⎜⎝ Δ*y*1,*t*−<sup>1</sup> − *A*Δ*y*3,*t*−<sup>1</sup> Δ*y*2,*t*−<sup>1</sup> 0 ⎞ ⎟⎠ = *π*(*L*)Δ2*yt* + *g*(1) ⎛ ⎜⎝ *y*1,*t*−<sup>1</sup> 0 0 ⎞ ⎟⎠ <sup>+</sup> G<sup>1</sup> + G<sup>∗</sup> <sup>1</sup> G<sup>2</sup> −G1*A* ⎛ ⎜⎝ Δ*y*1,*t*−<sup>1</sup> Δ*y*2,*t*−<sup>1</sup> Δ*y*3,*t*−<sup>1</sup> ⎞ ⎟⎠ = *π*(*L*)Δ2*yt* + <sup>G</sup><sup>1</sup> 0 0 *yt*−<sup>1</sup> + G<sup>1</sup> + G<sup>∗</sup> <sup>1</sup> G<sup>2</sup> −G1*A* Δ*yt*−<sup>1</sup>

with *<sup>π</sup>*(*L*) = *Ip* <sup>−</sup> <sup>∑</sup><sup>∞</sup> *<sup>j</sup>*=<sup>1</sup> Π*jL<sup>j</sup>* leads to the corresponding VECM representation:

$$
\Delta^2 y\_t = \Phi y\_{t-1} + \Psi \Delta y\_{t-1} + \sum\_{j=1}^{\infty} \Pi\_j \Delta^2 y\_{t-j} + \varepsilon\_{t-1}
$$

Here <sup>G</sup> :<sup>=</sup> *<sup>g</sup>*(1) = <sup>∑</sup><sup>∞</sup> *<sup>j</sup>*=<sup>0</sup> *Gj* = [G1, G2, G3], where G*<sup>i</sup>* is *p* × *pi* for *i* = 1, 2, 3: Similarly, G<sup>∗</sup> := *g*∗(1) = <sup>−</sup> <sup>∑</sup><sup>∞</sup> *<sup>j</sup>*=<sup>0</sup> *jGj* = [G<sup>∗</sup> <sup>1</sup> , G<sup>∗</sup> <sup>2</sup> , G<sup>∗</sup> <sup>3</sup> ], where G<sup>∗</sup> *<sup>i</sup>* is *<sup>p</sup>* <sup>×</sup> *pi* for *<sup>i</sup>* <sup>=</sup> 1, 2, 3. The sums exists since <sup>∑</sup><sup>∞</sup> *<sup>j</sup>*=<sup>1</sup> *Gjj* <sup>2</sup> < ∞ by assumption. Similarly, we partition Φ, Ψ and Π*<sup>j</sup>* into [Φ1, Φ2, Φ3], [Ψ1, Ψ2, Ψ3] and [Π*j*1, Π*j*2, Π*j*3], respectively. The analogous partitioning is used for estimates.

Then Φ = −[G1, 0, 0], Ψ = [−G<sup>∗</sup> <sup>1</sup> − G1, −G2, G1*A*]. Therefore Ψ<sup>3</sup> = −Φ1*A*. Note that in this notation the I(2) components on the right hand side are *yt*−1,3, the I(1) components are *yt*−1,1, *yt*−1,2, Δ*yt*−1,3, where *yt*−1,1 − *A*Δ*yt*−1,3 is stationary. Thus in order to separate regressors of different integration orders in the proof (as is usually done in the literature) we use a transformation using the unknown matrix *A* such that the regressor *yt*−1,1 is replaced by *yt*−1,1 − *A*Δ*yt*−1,3. Consequently the estimate Ψˆ <sup>3</sup> of Ψ<sup>3</sup> is replaced by the estimate Θˆ = Ψˆ <sup>3</sup> + Φˆ <sup>1</sup>*A* of Θ = Ψ<sup>3</sup> + Φ1*A* = 0.

Based on the estimates Ψˆ and Φˆ then *A* can be estimated as

$$
\hat{A} = - (\Phi\_1' \Sigma\_c^{-1} \Phi\_1)^{-1} \Phi\_1' \Sigma\_c^{-1} \Psi\_3. \tag{6}
$$

Here the insertion of Σˆ <sup>−</sup><sup>1</sup> appears somewhat arbitrary. A motivation for this choice in the I(1) case can be found in Saikkonen (1992) equation (12). However, any other positive definite matrix could be used as well. Currently there is no knowledge on the optimality of the choice suggested above.

In the asymptotic distribution of the estimation error Brownian motions occur relating to the process (*ut*)*t*∈Z: Under Assumption <sup>1</sup> we have

$$\frac{1}{\sqrt{T}}\sum\_{t=1}^{\lfloor rT \rfloor} u\_t \Rightarrow B(r) = [B\_1(r)', B\_\varepsilon(r)']' = [B\_1(r)', B\_2(r)', B\_3(r)]'$$

where *B*(*r*), 0 ≤ *r* ≤ 1, denotes a Brownian motion with corresponding variance

$$
\Omega = \left[\begin{array}{c|c}
\Omega\_{11} & \Omega\_{1c} \\
\hline
\Omega\_{c1} & \Omega\_{cc}
\end{array}\right] = \left[\begin{array}{c|c}
\Omega\_{11} & \Omega\_{12} & \Omega\_{13} \\
\hline
\Omega\_{21} & \Omega\_{22} & \Omega\_{23} \\
\hline
\Omega\_{31} & \Omega\_{32} & \Omega\_{33}
\end{array}\right] = g(1)^{-1} \,\Sigma\_c \,(g(1)')^{-1} \,\Lambda
$$

where *<sup>B</sup>*1.*c*(*r*) = *<sup>B</sup>*1(*r*) <sup>−</sup> <sup>Ω</sup>1*c*Ω−<sup>1</sup> *cc Bc*(*r*) is a *<sup>p</sup>*1-dimensional Brownian motion, which is independent of *Bc*(*r*), with covariance

$$
\Omega\_{1\mathcal{L}} = \Omega\_{11} - \Omega\_{1c} \Omega\_{cc}^{-1} \Omega\_{c1}.
$$

An estimator of Ω1.*<sup>c</sup>* is given by2

$$
\hat{\Omega}\_{1.c} = (\hat{\Phi}\_1' \hat{\Sigma}\_c^{-1} \hat{\Phi}\_1)^{-1}. \tag{7}
$$

With these definitions we can state our first result of the paper (which is proved in Appendix B):

**Theorem 1.** *Under Assumptions 1 and 2 for the triangular VECM representation we have: (A) Consistency:*

(*i*) <sup>Φ</sup><sup>ˆ</sup> *<sup>p</sup>* −→ <sup>Φ</sup> ; (*ii*) <sup>Σ</sup><sup>ˆ</sup> *p* −→ <sup>Σ</sup> ; (*iii*) <sup>Ω</sup><sup>ˆ</sup> 1.*<sup>c</sup> p* −→ <sup>Ω</sup>1.*c*; (*iv*) <sup>Ψ</sup><sup>ˆ</sup> *<sup>p</sup>* −→ <sup>Ψ</sup>; (*v*) <sup>Θ</sup><sup>ˆ</sup> *<sup>p</sup>* −→ 0; (*vi*) *<sup>A</sup>*<sup>ˆ</sup> *<sup>p</sup>* −→ *A* .

*(B) Asymptotic distribution of coefficients to nonstationary regressors: Under Assumptions 1 and 2 we have (N* = *T* − *h):*

$$(i) \left[N\hat{\Phi}\_2, N\hat{\Theta}, N^2\hat{\Phi}\_3\right] \stackrel{d}{\rightarrow} g(1) \int\_0^1 dB F' \left(\int\_0^1 FF'\right)^{-1}, \quad (ii) \quad N(\hat{A} - A) \stackrel{d}{\rightarrow} \int\_0^1 dB\_{1x} L' \left(\int\_0^1 LL'\right)^{-1} \tag{8}$$

<sup>2</sup> Note that *α* = [*Ip*<sup>1</sup> , 0] , and thus Ω1.*<sup>c</sup>* = ([Ω−1]11)−<sup>1</sup> = (*α* Ω−1*α*)−<sup>1</sup> = *α g*(1) Σ−<sup>1</sup> *g*(1)*α* −<sup>1</sup> = (Φ <sup>1</sup>Σ−<sup>1</sup> <sup>Φ</sup>1)−1. *Econometrics* **2020**, *8*, 38

$$where \ F(u) = \begin{bmatrix} B\_{\varepsilon}(u) \\ \int\_{0}^{u} B\_{3}(v) dv \end{bmatrix}, F\_{a}(u) = \begin{bmatrix} B\_{2}(u) \\ \int\_{0}^{u} B\_{3}(v) dv \end{bmatrix} and \ L(u) = B\_{3}(u) - \int\_{0}^{1} B\_{3}F\_{a}'(\int\_{0}^{1} F\_{a}F\_{a}')^{-1} F\_{a}(u).$$
  $A commutative distribution of coefficients to stationary measures: Iet I, h.e. a sequence of \ (n^{2}(h-2)) \ \text{E} \ \text{E} \ \text{E}$ 

*(C) Asymptotic distribution of coefficients to stationary regressors: Let Lh be a sequence of* (*p*2(*<sup>h</sup>* <sup>−</sup> <sup>2</sup>) + *p*(2*p*<sup>1</sup> + *p*2)) × *J matrices such that L h*(Γ−<sup>1</sup> *ECM* <sup>⊗</sup> <sup>Σ</sup>)*Lh* <sup>→</sup> *<sup>M</sup>* <sup>&</sup>gt; <sup>0</sup> *where* <sup>Γ</sup>*ECM* <sup>=</sup> <sup>E</sup>(*XtX <sup>t</sup>*) *with Xt* :<sup>=</sup> *u* 1,*t*−1, <sup>Δ</sup>*y* 1,*t*−1, <sup>Δ</sup>*y* 2,*t*−1, <sup>Δ</sup>2*y <sup>t</sup>*−1,..., <sup>Δ</sup>2*y t*−*h*+2 *. Let*

$$
\underline{\Pi} = \left[ \begin{array}{cccc} \Phi\_1 & \Psi\_1 & \Psi\_2 & \Pi\_1 & \dots & \Pi\_{\mathrm{h}-2} \end{array} \right] \dots
$$

*Then*

$$N^{\frac{1}{2}}L'\_h\mathbf{vec}(\hat{\Pi}-\underline{\Pi}) \stackrel{d}{\rightarrow} N(0,M).$$

*(D) Asymptotic distribution on Wald type tests: Finally letting*

$$\hat{\Gamma}\_{ECM} = N^{-1} (\langle \tilde{X}\_{t}, \tilde{X}\_{t} \rangle - \langle \tilde{X}\_{t}, \Delta y\_{3,t-1} \rangle \langle \Delta y\_{3,t-1}, \Delta y\_{3,t-1} \rangle^{-1} \langle \Delta y\_{3,t-1}, \tilde{X}\_{t} \rangle)$$

*where X*˜*<sup>t</sup>* = *y* 1,*t*−1, <sup>Δ</sup>*y* 1,*t*−1, <sup>Δ</sup>*y* 2,*t*−1, <sup>Δ</sup>2*y <sup>t</sup>*−1, ... , <sup>Δ</sup>2*y t*−*h*+2 *, the Wald test for the null hypothesis H*<sup>0</sup> : *L <sup>h</sup>vec*(Π) = *lh is given by*

$$\lambda\_{\text{Wald}} = \text{N} \left( L\_h' \text{vec}(\text{\textdegree } \text{\textdegree } l \text{\textdegree })^{\prime} \left( L\_h' (\text{\textdegree } ^{\text{-}1}\_{\text{ECM}} \odot \text{\textdegree } \text{\textdegree } ) L\_h \right)^{-1} \left( L\_h' \text{vec}(\text{\textdegree } \text{\textdegree } l \text{\textdegree }) \text{\textdegree } l \text{\textdegree } \right)$$

*Then if Lh is such that L h*(Γ−<sup>1</sup> *ECM* <sup>⊗</sup> <sup>Σ</sup>)*Lh* <sup>→</sup> *<sup>M</sup>* <sup>&</sup>gt; <sup>0</sup>*, under the null hypothesis <sup>λ</sup>*<sup>ˆ</sup> *Wald d* <sup>→</sup> *<sup>χ</sup>*2(*J*)*.*

The theorem provides the asymptotic distributions of the OLS estimates in the triangular system. Note that in this somewhat special case the properties of the regressor components (stationary or not) are known such that for each entry the convergence speed is known. Correspondingly the definition of the regressor vector *X*˜*<sup>t</sup>* involves only lags of *yt* but omits all nonstationary regressors except the ones cointegrated with Δ*y*3,*t*−1.

The assumptions on *Lh* are more restrictive than needed. Lewis and Reinsel (1985) and Saikkonen and Lütkepohl (1996) only require that *Lh* has full column rank when deriving the normalized convergence to normal distribution with unit variance as the limit for

$$\mathcal{N}^{\frac{1}{2}} \left( L\_h' \left( \Gamma\_{ECM}^{-1} \otimes \Sigma\_{\mathfrak{c}} \right) L\_h \right)^{-1/2} L\_h' \mathfrak{c} \mathfrak{c} \left( \hat{\Pi} - \underline{\Pi} \right).$$

Similar arguments could be used here.

#### *3.2. Estimation in the General VECM Representation*

The previous section dealt with the special case that a triangular representation is used and hence knowledge on the matrices [*β*, *β*1, *β*2] is given. This section provides a result for the general case, which, however, is limited to the coefficients to the stationary components. Since a general process generated according to Assumption 1 can be rewritten into a triangular representation using the knowledge of [*β*, *β*1, *β*2], some asymptotic properties of the unrestricted OLS estimators can be derived from Theorem 1 for the general case (which is proved in Appendix C):

**Theorem 2.** *Let the regressor vector Zt*,*<sup>h</sup>* = [*y <sup>t</sup>*−1, <sup>Δ</sup>*y <sup>t</sup>*−1, <sup>Δ</sup>2*y <sup>t</sup>*−1,..., <sup>Δ</sup>2*y <sup>t</sup>*−*h*+2] *and define*

$$
\Delta = \begin{bmatrix}
\Phi & \Psi & \Pi\_1 & \dots & \Pi\_{\hbar - 2} \\
\end{bmatrix}, \quad \tilde{\Lambda} = \langle \Delta^2 y\_{\prime \nu} Z\_{t \mu} \rangle \langle Z\_{t \mu \nu} Z\_{t \mu} \rangle^{-1}, \qquad \tilde{\Gamma}\_{\text{ECM}} = \mathcal{N}^{-1} \langle Z\_{t \mu \nu} Z\_{t \mu} \rangle.
$$

*Then under Assumptions <sup>1</sup> and <sup>2</sup> it follows that* <sup>Λ</sup>˜ <sup>−</sup> <sup>Λ</sup> <sup>=</sup> *oP*(1)*.*

*Furthermore, let Lh* <sup>∈</sup> <sup>R</sup>*p*2(*h*+2)×*<sup>J</sup> be such that L h*(Γ˜ <sup>−</sup><sup>1</sup> *ECM* ⊗ Σ)*Lh* → *M* > 0*. Then*

$$N^{\frac{1}{2}}L'\_h\text{vec}(\tilde{\Lambda}-\underline{\Delta}) \stackrel{d}{\rightarrow} N(0,M).$$

Beside consistency the theorem implies that linear combination of OLS estimators show asymptotic normality and hence standard inference, if the asymptotic variance is nonsingular. One application of such results consists in the so called 'surplus lag' formulation in the context of Granger causality testing, see Bauer and Maynard (2012); Dolado and Lütkepohl (1996).

Finally note that this section does not contain results with regard to the cointegrating rank or the cointegrating space. The theorem above merely allows to test coefficients corresponding to stationary regressors. Therefore the usage is limited to somewhat special situations like the surplus-lag causality tests. However, it is also relevant for impulse response analysis, compare Inoue and Kilian (2020).

#### **4. Rank Restricted Regression**

The previous sections show that for the estimators discussed in that sections full inference on all coefficients is only possible when information on the matrices *β*, *β*<sup>1</sup> and *β*<sup>2</sup> exists. The dimensions of the matrices relate to the ranks of the matrices Φ = *αβ* and, conditional on this, to the rank of *α*¯ ⊥Ψ*β*¯ ⊥. The two rank restrictions make estimation and specification more complex than in the I(1) case.

Johansen (1995) provides the two-step approach 2SI2 that can be used for estimation and specification of the two integer valued parameters *p*<sup>1</sup> and *p*2. Paruolo and Rahbek (1999) extend the 2SI2 procedure suggested in section 8 of Johansen (1997). Paruolo (2000) shows that this 2SI2 procedure achieves the same asymptotic distribution as pseudo maximum likelihood estimation which could be performed subsequent to 2SI2 estimation. This makes the procedure attractive from a practical point of view. In this section we show that these approaches extend naturally to the long VAR case. The main focus here lies on the derivation of the asymptotic properties of the rank tests.

Recall the long VAR approximation given as

$$
\Delta^2 y\_t = \Phi y\_{t-1} + \Psi \Delta y\_{t-1} + \sum\_{j=1}^{h-2} \Pi\_j \Delta^2 y\_{t-j} + e\_t \tag{9}
$$

where Φ = *αβ* has reduced rank *r* < *p* and *α*¯ ⊥Ψ*β*¯ <sup>⊥</sup> = *ζη* has reduced rank *<sup>s</sup>* < *<sup>p</sup>* − *<sup>r</sup>*. In this notation the 2SI2 procedure works as follows: In the first step the rank constraint on *α*¯ ⊥Ψ*β*¯ <sup>⊥</sup> is neglected estimating *α* and *β* by using reduced-rank regression (RRR). Then in the second step the reduced rank of *α*¯ ⊥Ψ*β*¯ <sup>⊥</sup> is imposed using RRR in a transformed equation.

In more detail using the Johansen notation we denote with *R*0*t*, *R*1*<sup>t</sup>* and *R*2*<sup>t</sup>* the residuals of regressing <sup>Δ</sup>2*yt*, <sup>Δ</sup>*yt*−<sup>1</sup> and *yt*−<sup>1</sup> on <sup>Δ</sup><sup>2</sup>*yt*−1,..., <sup>Δ</sup><sup>2</sup>*yt*−*h*<sup>+</sup>2, respectively; then we can rewrite (9) as

$$R\_{0t} = \alpha \beta' R\_{2t} + \Psi R\_{1t} + \vec{\varepsilon}\_t. \tag{10}$$

Concentrating out *R*1*<sup>t</sup>* and denoting the residuals as *R*0.1*<sup>t</sup>* and *R*2.1*<sup>t</sup>* we obtain with *Sij*.1 = *Rit*, *Rjt* −*Rit*, *<sup>R</sup>*1*<sup>t</sup> <sup>R</sup>*1*t*, *<sup>R</sup>*1*<sup>t</sup>* <sup>−</sup><sup>1</sup>*<sup>R</sup>*1*t*, *Rjt* the solution to the RRR problem from solving the eigenvalue problem

$$|\lambda S\_{22.1} - S\_{20.1} S\_{00.1}^{-1} S\_{02.1}| = 0,\tag{11}$$

with solutions 1 <sup>&</sup>gt; *<sup>λ</sup>*<sup>ˆ</sup> <sup>1</sup><sup>≥</sup> ... <sup>≥</sup>*λ*<sup>ˆ</sup> *<sup>p</sup>* <sup>&</sup>gt; 0 ordered with decreasing size and corresponding vectors *V* = (*v*1,..., *vp*). Then as usual the trace statistic of testing the model *Hr* with rank(Φ) ≤ *r*, *r* < *p*, in the model *Hp* with rank(Φ) ≤ *p*, is given as

$$Q\_r = -2\log Q(H\_r|H\_p) = -T\sum\_{i=r+1}^p \log(1-\vec{\lambda}\_i).\tag{12}$$

The optimizers for *α*, *β* are given by

$$
\hat{\beta} = (v\_1, \dots, v\_{\tau}), \qquad \hat{n} = S\_{02.1} \hat{\beta}, \qquad \hat{\Sigma}\_{\tilde{\mathbf{c}}} = S\_{00.1} - \hbar \hat{\hbar}'. \tag{13}
$$

In the second step, given *α* and *β* known, we can obtain by multiplying (10) by *α*¯ <sup>⊥</sup> that

$$
\mathbb{R}'\_{\perp}\mathbb{R}\_{0t} = \mathbb{R}'\_{\perp}\Psi(\mathbb{R}\_{\perp}\beta'\_{\perp} + \mathbb{R}\beta')\mathbb{R}\_{1t} + \mathbb{R}'\_{\perp}\mathbb{E}\_{t} = \mathbb{\zeta}\eta'(\beta'\_{\perp}\mathbb{R}\_{1t}) + \mathbb{C}(\beta'\mathbb{R}\_{1t}) + \mathbb{R}'\_{\perp}\mathbb{E}\_{t}.\tag{14}$$

Note that *β <sup>R</sup>*1*<sup>t</sup>* is stationary. Thus concentrating out *<sup>C</sup>* and denoting the residuals as *<sup>R</sup>α*¯ <sup>⊥</sup>.*β*,*<sup>t</sup>* and *<sup>R</sup>β*⊥.*β*,*t*, respectively, we can define *Sab*.*<sup>β</sup>* :<sup>=</sup> *Ra*.*β*,*t*, *Rb*.*β*,*<sup>t</sup>* , for *<sup>a</sup>*, *<sup>b</sup>* <sup>=</sup> *<sup>α</sup>*¯ <sup>⊥</sup> or *<sup>β</sup>*⊥. Then the likelihood ratio test of the model *Hr*,*<sup>s</sup>* with rank(*ζη* ) <sup>≤</sup> *<sup>s</sup>*, *<sup>s</sup>* <sup>&</sup>lt; *<sup>p</sup>* <sup>−</sup> *<sup>r</sup>* in the model *<sup>H</sup>*<sup>0</sup> *<sup>r</sup>* with rank(*α*¯ ⊥Ψ*β*¯ <sup>⊥</sup>) = *<sup>p</sup>* − *<sup>r</sup>* is given by

$$Q\_{r,s} = -2\log Q\left(H\_{r,s}|H\_r^0\right) = -T\sum\_{i=s+1}^{p-r} \log(1-\rho\_i).\tag{15}$$

where 1 > *ρ*ˆ1≥ ... ≥*ρ*ˆ*p*−*<sup>r</sup>* > 0 are the solutions of the eigenvalue problem

$$|\rho S\_{\mathbb{A}\perp\mathbb{A}^\*\perp\mathbb{A}} - S\_{\mathbb{A}\perp\mathbb{A}^\*\perp\mathbb{A}} S\_{\mathbb{A}^\*\mathbb{A}^\*\perp\mathbb{A}}^{-1} S\_{\mathbb{A}^\*\mathbb{A}^\*\mathbb{A}^\*\mathbb{A}}| = 0,\tag{16}$$

and the corresponding eigenvectors are *W* = (*w*1,..., *wp*−*r*). Estimators of *ζ* and *η* are given by

$$
\hat{\eta} = (w\_1, \dots, w\_{\delta}), \qquad \hat{\zeta} = \mathbf{S}\_{\mathbb{R}\_{\perp}\mathbb{R}\_{\perp}, \beta} \ \hat{\eta}. \tag{17}
$$

For the 2SI2 procedure in this second step the first step estimates *α*ˆ and *β*ˆ are used in place of the unknown true quantities. Then we obtain the following analogon to the results in the finite order VAR framework (the proof is given in Appendix D):

**Theorem 3.** *Let the data be generated according to Assumption 1 and let the VAR order fulfill Assumption 2. Then the following asymptotic results hold:*

*(A) The asymptotic distribution of the likelihood ratio statistic Qr under the null hypothesis Hr is given by*

$$Q\_r \stackrel{d}{\rightarrow} \text{tr}\left\{ \int\_0^1 d\mathcal{W}\_\dagger F\_\dagger' \left( \int\_0^1 F\_\dagger F\_\dagger' du \right)^{-1} \int\_0^1 F\_\dagger d\mathcal{W}\_\dagger' \right\}. \tag{18}$$

*where W*† = (*α* ⊥Σ*α*⊥)−1/2*α* ⊥*W*, *Fa*(*u*) = *B*2(*u*) *u* <sup>0</sup> *<sup>B</sup>*3(*v*)*dv and F*†(*u*) = *Fa*(*u*) −

 1 <sup>0</sup> *FaB* 3( 1 <sup>0</sup> *B*3*B* <sup>3</sup>)−1*B*3(*u*)*. This is identical to the distribution achieved in the finite VAR case. (B) The asymptotic distribution of the likelihood ratio statistic Qr*,*s under the null hypothesis Hr*,*s is given by*

$$Q\_{r,s} \stackrel{d}{\rightarrow} tr\left\{\int\_0^1 d\mathcal{W}\_2' B\_3' \left(\int\_0^1 B\_3 B\_3' du\right)^{-1} \int\_0^1 B\_3 dW\_2'\right\}.\tag{19}$$

*where W*2(*u*)=(*α* 2Σ*α*2)−1/2*α* <sup>2</sup>*W*(*u*)*.*

*(C) The asymptotic distribution of the test statistic Sr*,*<sup>s</sup>* = *Qr* + *Qr*,*<sup>s</sup> under the null hypothesis Hr*,*<sup>s</sup> is given by*

$$S\_{r,s} \stackrel{d}{\rightarrow} tr \left\{ \int\_0^1 d\mathcal{W}\_\dagger F\_\dagger' \left( \int\_0^1 F\_\dagger F\_\dagger' du \right)^{-1} \int\_0^1 F\_\dagger d\mathcal{W}\_\dagger' \right\} \\ + tr \left\{ \int\_0^1 d\mathcal{W}\_2 B\_3' \left( \int\_0^1 B\_3 B\_3' du \right)^{-1} \int\_0^1 B\_3 d\mathcal{W}\_2' \right\}. \tag{20}$$

*(D) Using suitable normalizations all estimators are consistent: α*ˆ(*c αα*ˆ)−<sup>1</sup> *<sup>p</sup>* <sup>→</sup> *<sup>α</sup>*, *<sup>β</sup>*ˆ(*c ββ*ˆ)−<sup>1</sup> *<sup>p</sup>* <sup>→</sup> *<sup>β</sup>*, <sup>ˆ</sup> *ζ*(*c ζ* ˆ *<sup>ζ</sup>*)−<sup>1</sup> *<sup>p</sup>* → *ζ*, *η*ˆ(*c ηη*ˆ)−<sup>1</sup> *<sup>p</sup>* <sup>→</sup> *<sup>η</sup>*, <sup>Ψ</sup><sup>ˆ</sup> *<sup>p</sup>* <sup>→</sup> <sup>Ψ</sup>, <sup>Π</sup><sup>ˆ</sup> *<sup>j</sup> p* → Π*<sup>j</sup> where for example c αα* = *Ir.*

*(E) The asymptotic distributions of the coefficients to the nonstationary regressors are identical to the ones in the*

*finite order VAR case stated in Paruolo (2000). The asymptotic distribution of the coefficients* Πˆ *<sup>j</sup> are identical to the ones in Theorem 1.*

The main message of the theorem is that the 2SI2 procedure shows the same asymptotic properties including the rank tests as in the finite order VAR case. As usual also restricting the coefficients for the non-stationary regressors does not influence the asymptotics for the coefficients corresponding to the stationary regressors.

Note that Paruolo (2000) shows that in the finite VAR case 2SI2 estimates have the same asymptotic distribution as pseudo maximum likelihood (pML) estimates maximizing the Gaussian likelihood. The first order conditions for the pML estimates of the coefficients to the non-stationary regressors provided in the first display on p. 548 in Paruolo (2000) depend on the data only via the matrices *Sij* defined above. These matrices depend on the lag length of the VECM only via the concentration step. The proof of our Theorem 3 shows that these terms have the same asymptotic distributions for the finite order VAR and the long VAR. Theorem 4.3 of Paruolo (2000) shows that the asymptotic distribution of the coefficients due to stationary regressors does not depend on the distribution of the coefficients corresponding to the non-stationary regressors as long as they are estimated super-consistently. Thus our results imply that also in the long VAR case the asymptotic distribution of all estimates for the 2SI2 and the pML approach is identical.

#### **5. Initial Guess for VARMA Estimation**

One usage of long VAR approximations is as preliminary estimate for VARMA model estimation. Hannan and Kavalieris (1986) provide properties of such an approach in the stationary case, Lütkepohl and Claessen (1997) extend the procedure to the I(1) case. Here we extend this idea to the I(2) case.

The goal is to provide a consistent initial guess for the estimation of a VARMA model for I(2) processes. In this respect we assume the following data generating process:

**Assumption 3** (VARMA dgp)**.** *The process* (*yt*)*t*∈<sup>Z</sup> *is generated as the solution to the state space equations*

$$y\_t = \mathbb{C}\mathbf{x}\_t + \varepsilon\_t, \qquad \mathbf{x}\_{t+1} = A\mathbf{x}\_t + B\varepsilon\_t \tag{21}$$

*where* (*εt*)*t*∈<sup>Z</sup> *denotes white noise subject to the same assumptions as in Assumption 1.*

*Here xt* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> is the unobserved state process. The system* (*A*, *<sup>B</sup>*, *<sup>C</sup>*) *is assumed to be minimal and in the canonical form of Bauer and Wagner (2012), that is*

$$A = \begin{bmatrix} I\_{\boldsymbol{\zeta}} & I\_{\boldsymbol{\zeta}} & 0 & 0 \\ 0 & I\_{\boldsymbol{\zeta}} & 0 & 0 \\ 0 & 0 & I\_{\boldsymbol{d}} & 0 \\ 0 & 0 & 0 & A\_{\bullet} \end{bmatrix}, \quad B = \begin{bmatrix} B\_1 \\ B\_2 \\ B\_3 \\ B\_{\bullet} \end{bmatrix}, \quad \mathbf{C} = \begin{bmatrix} \mathbf{C}\_1 & \mathbf{C}\_2 & \mathbf{C}\_3 & \mathbf{C}\_{\bullet} \end{bmatrix}.$$

*where* |*λmax*(*A*•)| < 1 *(the matrix A*• *is stable), C* <sup>1</sup>*C*<sup>1</sup> = *Ic*, *C* <sup>3</sup>*C*<sup>3</sup> = *Id*, *C* <sup>1</sup>*C*<sup>3</sup> = 0, *C* <sup>1</sup>*C*<sup>2</sup> = 0, *C* <sup>2</sup>*C*<sup>3</sup> = 0*. Furthermore, the system is strictly minimum-phase, that is ρ*<sup>0</sup> = |*λmax*(*A* − *BC*)| < 1*. Finally the matrix <sup>A</sup>*¯ <sup>=</sup> *<sup>A</sup>* <sup>−</sup> *BC is nonsingular.*

*At time t* = 0 *the state x*<sup>0</sup> = [*x* 0,*u*, *x* •] , *<sup>x</sup>*0,*<sup>u</sup>* <sup>∈</sup> <sup>R</sup>2*c*+*d*, *is such that <sup>x</sup>*0,*<sup>u</sup> is deterministic and <sup>x</sup>*0,• <sup>=</sup> ∑<sup>∞</sup> *<sup>j</sup>*=<sup>1</sup> *<sup>A</sup>j*−<sup>1</sup> • *<sup>B</sup>*•*ε*−*<sup>j</sup> denotes the stationary solution to the stable part of the system.*

In this situation it follows that (*yt*)*t*∈<sup>Z</sup> is an I(2) process in the definition of Bauer and Wagner (2012), that is its second difference is a stationary VARMA process. The integers *c* and *d* are connected to the integers *p*1, *p*2, *p*<sup>3</sup> via *c* = *p*3, *d* = *p*<sup>2</sup> such that *p*<sup>1</sup> = *p* − *c* − *d*. It can furthermore be shown that a process generated using Assumption 3 possesses a VAR(*h*) approximation:

*Econometrics* **2020**, *8*, 38

$$y\_t + \sum\_{j=1}^{h} A\_j y\_{t-j} = \varepsilon\_t + \mathcal{C}(A - B\mathcal{C})^h x\_{t-h}$$

where *Aj* <sup>=</sup> <sup>−</sup>*C*(*<sup>A</sup>* <sup>−</sup> *BC*)*j*−1*B*, *Aj* ≤ *μρ<sup>j</sup>* (0 <sup>≤</sup> *<sup>ρ</sup>*<sup>0</sup> <sup>&</sup>lt; *<sup>ρ</sup>* <sup>&</sup>lt; 1) converges to zero exponentially fast for *j* → ∞ due to the strict minimum-phase condition. Letting *h* → ∞ then implies the existence of a VAR(∞) representation. It follows that for such systems *<sup>A</sup>*(*z*) converges absolutely for <sup>|</sup>*z*<sup>|</sup> <sup>&</sup>lt; *<sup>ρ</sup>*−<sup>1</sup> where 1 < *ρ*−1.

From the autoregressive representation the VECM representation can be obtained:

$$a(z) = I\_p + \sum\_{j=1}^{\infty} A\_j z^j = I\_p - \sum\_{j=1}^{\infty} \mathbb{C} A^{j-1} B z^j = (1 - z)^2 I\_p - \Phi z - \Psi z (1 - z) - (1 - z)^2 \sum\_{j=1}^{\infty} \Pi\_j z^j$$

where *<sup>A</sup>*¯ <sup>=</sup> *<sup>A</sup>* <sup>−</sup> *BC* such that

$$
\Delta^2 y\_t = \Phi y\_{t-1} + \Psi \Delta y\_{t-1} + \sum\_{j=1}^{\infty} \Pi\_j \Delta^2 y\_{t-j} + \varepsilon\_t.
$$

A comparison of power series coefficients provides the identities:

$$\begin{aligned} \Phi &= -I\_p + \mathcal{C}(I - \bar{A})^{-1} B\_\prime \\ \Psi &= -I\_p - \mathcal{C}(I - \bar{A})^{-2} A B\_\prime \\ \Pi\_{\bar{j}} &= [\mathcal{C} \bar{A}^2 (I - \bar{A})^{-2}] \bar{A}^{j-1} B = D \bar{A}^{j-1} B\_\prime \bar{j} = 1, 2, \dots \end{aligned}$$

It follows that the coefficients Π*j*, *j* = 1, 2, ... form the impulse response of a rational transfer function of order smaller or equal to *n*. If *A*¯ is nonsingular then the order equals *n* and the system (*A*¯, *B*, *D*) is minimal. Furthermore, it follows that for arbitrary Φ and Ψ the transfer function

$$a(z) = (1 - z)^2 I\_p - \Phi z - \Psi z (1 - z) - (1 - z)^2 z D (I - z \vec{A})^{-1} B$$

is a rational transfer function with the additional property that

$$a(1) = -\Phi = -a\beta', \quad \mathbb{R}'\_{\perp}\dot{a}(1)\mathbb{R}\_{\perp} = \mathbb{R}'\_{\perp}(-\Phi + \Psi)\mathbb{R}\_{\perp} = -\mathbb{R}'\_{\perp}\mathbb{C}(I-\mathbb{A})^{-2}B\mathbb{R}\_{\perp} = \mathbb{\zeta}\eta'.$$

Consequently Φ and Ψ determine the integration properties of processes generated using *a*(*z*). Conversely whenever the constraints

$$-I\_p + \mathcal{C}(I - \bar{A})^{-1}B = \mathfrak{a}\mathfrak{F}', \quad -\mathbb{R}'\_{\perp}\mathcal{C}(I - \bar{A})^{-2}B\mathcal{B}\_{\perp} = \zeta\eta'$$

hold the corresponding triple (*A*, *B*, *C*) corresponds to an I(2) process (if the eigenvalues of *A* are in the closed unit disc). Defining *C*<sup>∗</sup> = *α*¯ *C*, *C*† = *α*¯ ⊥*<sup>C</sup>* we obtain

$$-\bar{\mathfrak{a}}' + \mathbb{C}\_\*(I - \bar{A})^{-1}B = \mathfrak{f}', \quad -\bar{\mathfrak{a}}'\_{\perp} + \mathbb{C}\_{\bar{\mathfrak{f}}}(I - \bar{A})^{-1}B = 0, \quad -\mathbb{C}\_{\bar{\mathfrak{f}}}(I - \bar{A})^{-2}B\bar{\mathfrak{f}}\_{\perp} = \mathbb{\zeta}\eta'.\tag{22}$$

The third equation does not have a solution for fixed *Bβ*¯ <sup>⊥</sup>, *<sup>ζ</sup>*, *<sup>η</sup>*, if the row space of *<sup>B</sup>β*¯ <sup>⊥</sup> does not contain the space spanned by the rows of *η* . In this case row-wise projection of *η* onto the space spanned by the rows of *Bβ*¯ <sup>⊥</sup> allows for (not necessarily unique) solutions in *C*†. In the limit no projection is needed. Consequently for large enough *T* the projected matrix will have full row rank. The second equation then determines *α*¯ <sup>⊥</sup> which in turn determines *α*¯ up to the choice of the basis such that *α*¯ = *TCα*¯*<sup>o</sup>* for some full row rank matrix *α*¯*<sup>o</sup>* <sup>∈</sup> <sup>R</sup>*r*×*p*, *<sup>α</sup>*¯*<sup>o</sup> <sup>α</sup>*¯ <sup>⊥</sup> = 0. The first equation then can be rewritten as

$$[T\_{C\_{\prime}}C\_{\ast}]\underbrace{\begin{bmatrix}-d\_{o}^{\prime}\color{l}{\cdot}^{\prime}\\\ (I-\bar{A})^{-1}B\end{bmatrix}}\_{R\_{1}}=\beta^{\prime}.$$

The second equation shows that the row space of (*<sup>I</sup>* <sup>−</sup> *<sup>A</sup>*¯)−1*<sup>B</sup>* contains the row space of *<sup>α</sup>*¯ ⊥. Thus the matrix *R*<sup>1</sup> has full row rank. It follows that this equation has solutions.

Having obtained a solution for *C*∗, *C*†, *α*¯, *α*¯ <sup>⊥</sup> then *C* is obtained from

$$\mathbf{C} = \begin{bmatrix} \mathfrak{a} & \mathfrak{a}\_{\perp} \end{bmatrix} \begin{bmatrix} \mathbf{C}\_{\ast} \\ \mathbf{C}\_{\mathsf{f}} \end{bmatrix}.$$

A unique solution then can be obtained from adding the restrictions <sup>Π</sup>*<sup>j</sup>* <sup>=</sup> *<sup>C</sup>*(*<sup>I</sup>* <sup>−</sup> *<sup>A</sup>*¯)−2*A*¯*j*+1*B*, *<sup>j</sup>* <sup>=</sup> 1, 2, ... , 2*n* which for the estimates are to be solved in a least squares sense among all solutions to equations (22).

It then follows that for the true matrices Φ, Ψ, Π*<sup>j</sup>* the only solution for given *A*¯, *B* consists in the corresponding true *C*. These facts therefore can be used in order to develop an initial guess for subsequent pseudo likelihood maximization using the parameterization of I(2) processes in state space representation: Given the integer valued parameters *n*, *c* and *d*:


The algorithm obtains a minimal state space system of order *n* in the canonical form for I(2) processes given in Bauer and Wagner (2012) and hence can be used as an initial guess for subsequent pseudo-likelihood optimization in the set *Mn*(*r*,*s*) of all order *n* rational transfer functions corresponding to I(2) processes with state space unit root structure ((0,(*c*, *c* + *d*))).

**Theorem 4** (Consistent initial guess)**.** *Let* (*yt*)*t*∈<sup>Z</sup> *denote a process generated using the system* (*A*0, *<sup>B</sup>*0, *<sup>C</sup>*0) *according to Assumption 3 and let the system* (*A*˜, *B*˜, *C*˜) *be estimated based on the long VAR approximation with lag order chosen according to Assumption 2. Then* (*A*˜, *B*˜, *C*˜) *is a weakly consistent estimator of the data generating system* (*A*0, *B*0, *C*0) *in the sense that C*˜ *A*˜*<sup>j</sup> <sup>B</sup>*˜ *<sup>p</sup>* <sup>→</sup> *<sup>C</sup>*0*A<sup>j</sup>* <sup>0</sup>*B*0, *j* = 0, 1, ... *and hence the corresponding transfer functions converge in pointwise topology.*

The proof of this theorem can be found in Appendix E.

#### **6. Conclusions**

In this paper the theory on long VAR approximation of general linear dynamical processes is extended to the case of I(2) processes. We find that we need slightly narrower upper and lower bounds in the approximations. The tighter bounds are not needed for all results and appear not very restrictive for applications.

The main results are completely analogous to the I(1) case: The asymptotics in many respects is identical to the finite order VAR case. Asymptotic distributions for the coefficients to non- stationary variables are the same as in the finite order VAR case. This holds true both for unrestricted OLS

estimates as well as the 2SI2 approach in the Johansen framework. Tests on cointegrating ranks show identical asymptotic distributions under the null as in the finite order VAR case and hence do not require other tables. In this respect the main conclusion is that the usual procedure of estimating the lag order in the first step and then applying the Johansen procedure for estimated lag order is justified also for processes generated from a VAR(∞) that is approximated with a choice of the lag order lying within the prescribed bounds.

Additionally in the VARMA case the long VAR approximation can be used in order to derive consistent initial guesses that can be used in subsequent pseudo likelihood estimation.

Thus the paper provides both a full extension of results that have been achieved in the I(1) case as well as a useful starting point for subsequent VARMA modeling which might be preferable in situations which require a high VAR order or show a large number of variables to be modeled, a situation where VARMA models can be more parsimonious than VAR models.

**Author Contributions:** The two authors of the paper have contributed equally, via joint efforts, regarding both ideas, research, and writing. Conceptualization, Y.L. and D.B.; methodology, Y.L. and D.B.; software, not applicable; validation, not applicable; formal analysis, Y.L. and D.B.; investigation, Y.L. and D.B.; resources, not applicable; writing–original draft preparation, Y.L. and D.B.; writing–review and editing, Y.L. and D.B.; visualization, not applicable; supervision, not applicable; project administration, D.B.; funding acquisition, D.B. Both authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation —Projektnummer 276051388) which is gratefully acknowledged. We acknowledge support for the publication costs by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.

**Acknowledgments:** The reviewers and in particular the two guest editors provided significant comments that helped in improving the paper, which is gratefully acknowledged.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Preliminaries**

The theory in this paper follows closely the arguments in Lewis and Reinsel (1985) and its extension to the I(1) case in Saikkonen and Lütkepohl (1996). To this end consider the finite order VECM approximation:

$$
\Delta^2 y\_t = \Phi y\_{t-1} + \Psi \Delta y\_{t-1} + \sum\_{j=1}^h \Pi\_j \Delta^2 y\_{t-j} + \varepsilon\_t. \tag{A1}
$$

The properties of the various estimators heavily use the following rewriting of the approximation using the triangular representation of *yt*:

$$\begin{aligned} \Delta^2 y\_l &= [\Phi\_1, \Phi\_2, \Phi\_3] \begin{bmatrix} A\Delta y\_{3,l-1} + u\_{1,l-1} \\ y\_{2,l-1} \\ y\_{3,l-1} \end{bmatrix} + [\Psi\_1, \Psi\_2, \Psi\_3] \begin{bmatrix} A u\_{3,l-1} + \Delta u\_{1,l-1} \\ u\_{2,l-1} \\ \Delta y\_{3,l-1} \end{bmatrix} \\ &+ \sum\_{j=1}^h [\Pi\_{j,1}, \Pi\_{1,2}, \Pi\_{1,3}] \begin{bmatrix} A\Delta u\_{3,l-j} + \Delta^2 u\_{1,l-j} \\ \Delta u\_{2,l-j} \\ u\_{3,l-j} \end{bmatrix} + \epsilon\_l \\ &= \Phi\_2 y\_{2,l-1} + \Phi\_3 y\_{3,l-1} + \Theta \Delta y\_{3,l-1} + \sum\_{j=1}^h \Xi\_j u\_{l-j} + [\Xi\_{h+1,1,\nu} \Sigma\_{h+1,2}] \begin{bmatrix} u\_{1,l-h-1} \\ u\_{2,l-h-1} \end{bmatrix} + \Xi\_{h+2,1} a\_{1,l-h-2} + e\_{l,l} \end{aligned} \tag{A2}$$

where *<sup>u</sup>*˜1,*<sup>t</sup>*−*h*−<sup>2</sup> := *<sup>u</sup>*1,*t*−*h*−<sup>2</sup> − *Au*3,*t*−*h*−<sup>1</sup> and <sup>Φ</sup><sup>2</sup> = <sup>Φ</sup><sup>3</sup> = 0 , <sup>Θ</sup> = <sup>Φ</sup>1*<sup>A</sup>* + <sup>Ψ</sup><sup>3</sup> = 0 , and

$$\begin{array}{lclcl}\Sigma\_{1} &=& [\Phi\_{1} + \Psi\_{1} + \Pi\_{1,1} \; \prime \; \Psi\_{2} + \Pi\_{1,2} \; \prime (\Psi\_{1} + \Pi\_{1,1})A + \Pi\_{1,3} \; ],\\\Sigma\_{2} &=& [-\Psi\_{1} + \Pi\_{2,1} - 2\Pi\_{1,1} \; \Pi\_{2,2} - \Pi\_{1,2} \; \prime (\Pi\_{2,1} - \Pi\_{1,1})A + \Pi\_{2,3}],\\\Sigma\_{j} &=& [\Pi\_{j,1} - 2\Pi\_{j-1,1} + \Pi\_{j-2,1} \; \prime \; \Pi\_{j,2} - \Pi\_{j-1,2} \; \prime (\Pi\_{j,1} - \Pi\_{j-1,1})A + \Pi\_{j,3}], \quad j = 3, \dots, h,\end{array}$$

Furthermore, we can see that ∑*h*+<sup>2</sup> *<sup>j</sup>*=<sup>1</sup> <sup>Ξ</sup>*j*,1 <sup>=</sup> <sup>Φ</sup>1, <sup>∑</sup>*h*+<sup>1</sup> *<sup>j</sup>*=<sup>1</sup> <sup>Ξ</sup>*j*,2 <sup>=</sup> <sup>Ψ</sup>2, and <sup>∑</sup>*<sup>h</sup> <sup>j</sup>*=<sup>1</sup> <sup>Ξ</sup>*j*,3 = <sup>Ψ</sup>1*<sup>A</sup>* + <sup>∑</sup>*<sup>h</sup> <sup>j</sup>*=<sup>1</sup> Π*j*,3. Finally <sup>Ψ</sup><sup>1</sup> <sup>=</sup> <sup>−</sup> <sup>∑</sup>*h*+<sup>2</sup> *<sup>j</sup>*=<sup>2</sup> (*j* − 1)Ξ*j*,1.

Note that in the reparametrization (A2), the I(1) components, *yc*,*<sup>t</sup>* := (*y* 2,*t* , Δ*y* 3,*t*) , as well as the I(2) components, *y*3,*t*−1, are isolated from the stationary ones, *ut*−*j*, and have coefficients equal to zero, which facilitates the derivation of the asymptotic properties.

In the reparameterized setting define <sup>3</sup> <sup>Ξ</sup> := [Ξ1, ... ,Ξ*h*,Ξ*h*<sup>+</sup>1,1,Ξ*h*<sup>+</sup>1,2,Ξ*h*+2,1], *<sup>p</sup>* <sup>×</sup> (*ph* <sup>+</sup> <sup>2</sup>*p*<sup>1</sup> <sup>+</sup> *<sup>p</sup>*2), *Ut* := [*u <sup>t</sup>*−1,..., *<sup>u</sup> <sup>t</sup>*−*h*, *<sup>u</sup>* 1,*t*−*h*−1, *<sup>u</sup>* 2,*t*−*h*−1, *<sup>u</sup>*˜ 1,*t*−*h*−2] , (*ph* + 2*p*<sup>1</sup> + *p*2) × 1, Λ := [Ξ, Φ2, Θ, Φ3]=[Ξ, 0], *p* × *p*(*h* + 2), *Wt* := [*U <sup>t</sup>*, *y <sup>c</sup>*,*t*−1, *<sup>y</sup>* 3,*t*−1] , *p*(*h* + 2) × 1. we have Δ2*yt* = Λ*Wt* + *et*, (A3)

and correspondingly,

$$
\Delta^2 y\_t = \Lambda \mathcal{W}\_t + \varepsilon\_t
$$

where

$$
\hat{\Lambda} = [\hat{\Xi}, \ \Phi\_2, \ \hat{\Theta}, \ \Phi\_3] = \langle \Delta^2 y\_{t\prime} \ \mathcal{W}\_t \rangle \langle \mathcal{W}\_{t\prime} \ \mathcal{W}\_t \rangle^{-1},
$$

is the OLS estimator of <sup>Λ</sup>. Here *Xt*, *Zt* :<sup>=</sup> <sup>∑</sup>*<sup>T</sup> <sup>t</sup>*=*h*+<sup>3</sup> *XtZ t*.

Note that *Wt* and the regressors in (A1) are in one-one correspondence. In the original Equation (A1) beside the nonstationary regressors *yc*,*t*−<sup>1</sup> and *y*3,*t*−<sup>1</sup> the regressor vector

> *X*˜*<sup>t</sup>* = [*y* 1,*t*−1, <sup>Δ</sup>*y* 1,*t*−1, *<sup>u</sup>* 2,*t*−1, <sup>Δ</sup>2*y <sup>t</sup>*−1,..., <sup>Δ</sup>2*y <sup>t</sup>*−*h*] <sup>∈</sup> <sup>R</sup>2*p*1+*p*2+*ph*

occurs which cointegrates with Δ*y*3,*t*−<sup>1</sup> such that

$$X\_t = \mathcal{X}\_t - [A', 0]' \Delta y\_{3, t-1} = T\_h \mathcal{U}\_t \tag{A4}$$

is stationary. Here the nonsingular matrix *Th* <sup>∈</sup> <sup>R</sup>(*ph*+2*p*1+*p*2)×(*ph*+2*p*1+*p*2) is defined as:

It can be verified that *Th* is invertible. The asymptotic properties of <sup>Λ</sup><sup>ˆ</sup> <sup>−</sup> <sup>Λ</sup> are clarified in the next lemma:

**Lemma A1.** *Under the assumptions of Theorem 1 using N* = *T* − *h* − 2 *as the effective sample size*

<sup>3</sup> In this appendix processes whose dimension depends on the choice of *h* are denoted using upper case letters neglecting the dependence on *h* in the notation otherwise for simplicity.

$$N^{\frac{1}{2}}(\mathfrak{S}-\underline{\Xi}) = N^{\frac{1}{2}} \langle \varepsilon\_t, lI\_t \rangle (\mathbb{E}LI\_t lI\_t')^{-1} + o\_P(h^{\frac{1}{2}}),$$

$$[N\hat{\Phi}\_2, N\hat{\Phi}, N^2\hat{\Phi}\_3] \Rightarrow g(1) \left[ \begin{array}{cc} \int\_0^1 dB B\_c^{\prime} & \int\_0^1 dB H\_3^{\prime} \\ \end{array} \right] \left[ \begin{array}{cc} \int\_0^1 B\_c B\_c^{\prime} & \int\_0^1 B\_c H\_3^{\prime} \\ \int\_0^1 H\_3 B\_c^{\prime} & \int\_0^1 H\_3 H\_3^{\prime} \\ \end{array} \right]^{-1}$$

*where H*3(*u*) = *<sup>u</sup>* <sup>0</sup> *B*3(*s*)*ds.*

**Proof.** The proof essentially shows that the coefficients corresponding to the stationary regressors and the ones corresponding to the integrated regressors asymptotically can be dealt with separately. Let *DT* :<sup>=</sup> diag[*N*<sup>−</sup> <sup>1</sup> <sup>2</sup> *Iph*<sup>+</sup>2*p*1+*p*<sup>2</sup> , *<sup>N</sup>*−<sup>1</sup> *Ip*2+*p*<sup>3</sup> , *<sup>N</sup>*−<sup>2</sup> *Ip*<sup>3</sup> ]. Note that *<sup>N</sup>* <sup>1</sup> <sup>2</sup> (Ξ<sup>ˆ</sup> <sup>−</sup> <sup>Ξ</sup>), *<sup>N</sup>*[Φ<sup>ˆ</sup> 2, <sup>Θ</sup><sup>ˆ</sup> ], and *<sup>N</sup>*2Φ<sup>ˆ</sup> <sup>3</sup> are the 1st, 2nd and 3rd column blocks of (Λ<sup>ˆ</sup> <sup>−</sup> <sup>Λ</sup>)*D*−<sup>1</sup> *<sup>T</sup>* , respectively. Moreover, we have

$$(\mathring{\Lambda} - \underline{\Delta})D\_T^{-1} = \langle \mathfrak{e}\_{t\prime} \mathcal{W}\_t \rangle D\_T \left( D\_T \langle \mathcal{W}\_{t\prime} \mathcal{W}\_t \rangle D\_T \right)^{-1}.$$

Let *<sup>R</sup>*<sup>ˆ</sup> :<sup>=</sup> *DTWt*, *Wt DT*, and define *<sup>R</sup>* :<sup>=</sup> diag [Γ*u*, *<sup>R</sup>*2] , where <sup>Γ</sup>*<sup>u</sup>* <sup>=</sup> <sup>E</sup>[*UtU <sup>t</sup>*] , and

$$R\_2 := \begin{bmatrix} N^{-2} \langle y\_{\mathfrak{c},t-1}, y\_{\mathfrak{c},t-1} \rangle & N^{-3} \langle y\_{\mathfrak{c},t-1}, y\_{\mathfrak{3},t-1} \rangle \\ N^{-3} \langle y\_{\mathfrak{3},t-1}, y\_{\mathfrak{c},t-1} \rangle & N^{-4} \langle y\_{\mathfrak{3},t-1}, y\_{\mathfrak{3},t-1} \rangle \end{bmatrix}.$$

.

.

.

Note that each block of the matrix *R*<sup>2</sup> is of order *Op*(1), and moreover, both *R*<sup>2</sup> and its limit are almost surely invertible, as there is no cointegration between *yc*,*t*−<sup>1</sup> and *y*3,*t*−<sup>1</sup> (see Lemma 3.1.1 in Chan and Wei (1988), and Sims et al. (1990)). Note that

$$(\hat{\Lambda} - \underline{\Delta})D\_T^{-1} - \langle \varepsilon\_{l\prime} \mathcal{W}\_l \rangle D\_T R^{-1} = \underbrace{\langle \varepsilon\_{1\prime} \mathcal{W}\_l \rangle D\_T R^{-1}}\_{=: E\_1} + \underbrace{\langle \varepsilon\_{l\prime} \mathcal{W}\_l \rangle D\_T (\mathring{R}^{-1} - R^{-1})}\_{=: E\_2} + \underbrace{\langle \varepsilon\_{l\prime} \mathcal{W}\_l \rangle D\_T (\mathring{R}^{-1} - R^{-1})}\_{=: E\_3}$$

Here *<sup>ε</sup>t*, *Wt DTR*−<sup>1</sup> has the limits stated in the lemma since:

$$\begin{split} N^{-1} \langle \varepsilon\_{t}, \underline{\boldsymbol{y}}\_{\boldsymbol{c},t-1} \rangle &\Rightarrow \operatorname{g} (1) \int\_{0}^{1} dB B\_{\varepsilon^{\prime}}^{\prime} \quad N^{-2} \langle \varepsilon\_{t}, \underline{\boldsymbol{y}}\_{3,t-1} \rangle \Rightarrow \operatorname{g} (1) \int\_{0}^{1} dB H\_{3}^{\prime}, \\ \begin{bmatrix} N^{-2} \langle \underline{\boldsymbol{y}}\_{\boldsymbol{c},t-1}, \underline{\boldsymbol{y}}\_{\boldsymbol{c},t-1} \rangle & N^{-3} \langle \underline{\boldsymbol{y}}\_{\boldsymbol{c},t-1}, \underline{\boldsymbol{y}}\_{3,t-1} \rangle \\ N^{-3} \langle \underline{\boldsymbol{y}}\_{3,t-1}, \underline{\boldsymbol{y}}\_{\boldsymbol{c},t-1} \rangle & N^{-4} \langle \underline{\boldsymbol{y}}\_{3,t-1}, \underline{\boldsymbol{y}}\_{3,t-1} \rangle \end{bmatrix} & \Rightarrow \begin{bmatrix} \int\_{0}^{1} B\_{\varepsilon} B\_{\varepsilon}^{\prime} & \int\_{0}^{1} B\_{\varepsilon} H\_{3}^{\prime} \\ \int\_{0}^{1} H\_{3} B\_{\varepsilon}^{\prime} & \int\_{0}^{1} B\_{3} H\_{3}^{\prime} \end{bmatrix} \end{split}$$

The lemma therefore holds, if *E*<sup>1</sup> = [*oP*(*h*1/2), *oP*(1), *oP*(1)], *E*<sup>2</sup> = *oP*(1), *E*<sup>3</sup> = *oP*(1) can be shown (where the blocks in *E*<sup>1</sup> correspond to the partitioning of *Wt* into stationary, I(1) and I(2) components). For this it is sufficient to show:

$$\text{(I)}\quad \|\hat{R}^{-1} - R^{-1}\|\_1 = O\_P(h/N^{\frac{1}{2}})$$

$$\text{(II)} \quad \| \langle \varepsilon\_{1t}, \mathbb{W}\_t \rangle D\_T \| = o\_P(h^{1/2}) \text{ where } N^{-1} \langle \varepsilon\_{1t}, y\_{\varepsilon, t-1} \rangle = o\_P(1) \text{ and } N^{-2} \langle \varepsilon\_{1t}, y\_{3, t-1} \rangle = o\_P(1)$$

(III) *εt*, *Wt DT* <sup>=</sup> *OP*(*h*1/2).

Here .<sup>1</sup> denotes the spectral norm of a matrix while . denotes the Frobenius norm. (I) To see *R*<sup>ˆ</sup> <sup>−</sup><sup>1</sup> <sup>−</sup> *<sup>R</sup>*−1<sup>1</sup> <sup>=</sup> *Op*(*h*/*<sup>N</sup>* <sup>1</sup> <sup>2</sup> ), according to Lewis and Reinsel (1985), it is sufficient to show *R*<sup>ˆ</sup> <sup>−</sup> *<sup>R</sup>*<sup>1</sup> <sup>=</sup> *Op*(*h*/*<sup>N</sup>* <sup>1</sup> <sup>2</sup> ), *R*−1<sup>1</sup> <sup>=</sup> *Op*(1). Note that

$$
\hat{R} - R = \begin{bmatrix} N^{-1} \langle \mathcal{U}\_{l}, \mathcal{U}\_{l} \rangle - \Gamma\_{u} & N^{-\frac{3}{2}} \langle \mathcal{U}\_{l}, y\_{\mathcal{c}, t-1} \rangle & N^{-\frac{5}{2}} \langle \mathcal{U}\_{l}, y\_{3, t-1} \rangle \\ N^{-\frac{3}{2}} \langle y\_{\mathcal{c}, t-1}, \mathcal{U}\_{l} \rangle & 0 & 0 \\ N^{-\frac{5}{2}} \langle y\_{3, t-1}, \mathcal{U}\_{l} \rangle & 0 & 0 \end{bmatrix} =: \begin{bmatrix} \hat{Q} & \hat{P}\_{12} & \hat{P}\_{13} \\ \hat{P}\_{21} & 0 & 0 \\ \hat{P}\_{31} & 0 & 0 \end{bmatrix},
$$

then we have <sup>E</sup>*R*<sup>ˆ</sup> <sup>−</sup> *<sup>R</sup>*<sup>2</sup> <sup>1</sup> <sup>≤</sup> <sup>E</sup>*R*<sup>ˆ</sup> <sup>−</sup> *<sup>R</sup>*<sup>2</sup> <sup>=</sup> <sup>E</sup>*Q*<sup>ˆ</sup> <sup>2</sup> <sup>+</sup> <sup>2</sup>(E*P*<sup>ˆ</sup> <sup>12</sup><sup>2</sup> <sup>+</sup> <sup>E</sup>*P*<sup>ˆ</sup> <sup>13</sup>2).

Now let *U<sup>o</sup> <sup>t</sup>* := [*u <sup>t</sup>*−1, ... , *<sup>u</sup> <sup>t</sup>*−*h*−2] , then there exists a transformation *T<sup>u</sup>* of full row rank, such that *Ut* = *TuUo <sup>t</sup>* , where *<sup>T</sup><sup>u</sup>* is a (*ph* <sup>+</sup> <sup>2</sup>*p*<sup>1</sup> <sup>+</sup> *<sup>p</sup>*2) <sup>×</sup> *<sup>p</sup>*(*<sup>h</sup>* <sup>+</sup> <sup>2</sup>) matrix:

Then, we have *Q*ˆ = *TuQ*ˆ *oTu* , where *Q*ˆ *<sup>o</sup>* = <sup>1</sup> *<sup>N</sup> <sup>U</sup><sup>o</sup> <sup>t</sup>* , *U<sup>o</sup> <sup>t</sup>*<sup>−</sup> <sup>E</sup>[*U<sup>o</sup> <sup>t</sup> Uo <sup>t</sup>* ]; moreover, *<sup>P</sup>*<sup>ˆ</sup> <sup>1</sup>*<sup>i</sup>* = *TuP*ˆ*<sup>o</sup>* <sup>1</sup>*<sup>i</sup>* for *i* = 2, 3, where *P*ˆ*<sup>o</sup>* <sup>12</sup> <sup>=</sup> *<sup>N</sup>*<sup>−</sup> <sup>3</sup> <sup>2</sup> *<sup>U</sup><sup>o</sup> <sup>t</sup>* , *yc*,*t*−<sup>1</sup> , *<sup>P</sup>*ˆ*<sup>o</sup>* <sup>13</sup> <sup>=</sup> *<sup>N</sup>*<sup>−</sup> <sup>5</sup> <sup>2</sup> *<sup>U</sup><sup>o</sup> <sup>t</sup>* , *<sup>y</sup>*3,*t*−<sup>1</sup> . Since *Tu*<sup>1</sup> <sup>=</sup> *<sup>O</sup>*(1), *<sup>Q</sup>*<sup>ˆ</sup> and *<sup>P</sup>*<sup>ˆ</sup> <sup>1</sup>*<sup>i</sup>* have the same rate of convergence as *Q*ˆ *<sup>o</sup>* and *P*ˆ*<sup>o</sup>* 1*i* , respectively. From Saikkonen (1991) Lemma A.2. we know <sup>E</sup>*Q*<sup>ˆ</sup> *<sup>o</sup>*<sup>2</sup> <sup>=</sup> *<sup>O</sup>*(*h*2/*N*) and <sup>E</sup>*P*ˆ*<sup>o</sup>* <sup>12</sup><sup>2</sup> <sup>=</sup> *<sup>O</sup>*(*h*/*N*) by direct calculation.

For *P*ˆ*<sup>o</sup>* <sup>13</sup> note that

$$\mathbb{E}\left\|y\_{3,t-1}\right\|^2 = \mathbb{E}\left\|\sum\_{j=1}^{t-1}\sum\_{i=1}^j u\_{3,i}\right\|^2 = \mathbb{E}\left\|\sum\_{i=1}^{t-1} i\,u\_{3,t-1-i}\right\|^2 = O(t^3).$$

Then analogous calculation as for *P*ˆ*<sup>o</sup>* <sup>12</sup> show that <sup>E</sup>*P*ˆ*<sup>o</sup>* <sup>13</sup><sup>2</sup> <sup>=</sup> *<sup>O</sup>*(*h*/*N*). Concluding we obtain <sup>E</sup>*R*<sup>ˆ</sup> <sup>−</sup> *<sup>R</sup>*<sup>2</sup> <sup>1</sup> <sup>=</sup> *<sup>O</sup>*(*h*2/*N*) such that *R*<sup>ˆ</sup> <sup>−</sup> *<sup>R</sup>*<sup>1</sup> <sup>=</sup> *OP*(*h*/*<sup>N</sup>* <sup>1</sup> <sup>2</sup> ).

To show *R*−1<sup>1</sup> <sup>=</sup> *OP*(1) note that *<sup>R</sup>*−<sup>1</sup> <sup>=</sup> diag{Γ−<sup>1</sup> *<sup>u</sup>* , *<sup>R</sup>*−<sup>1</sup> <sup>2</sup> } where Γ−<sup>1</sup> *<sup>u</sup>* <sup>1</sup> <sup>=</sup> *<sup>O</sup>*(1) (see Lewis and Reinsel (1985), p. 397) and *R*−<sup>1</sup> <sup>2</sup> <sup>1</sup> = *OP*(1), since *R*<sup>2</sup> is a.s. invertible and converges in distribution to an almost surely nonsingular random matrix.

(II) With respect to *e*1*t*, *Wt DT* <sup>=</sup> *oP*(*h*1/2) note that

$$\left\lVert \left\lVert \left\langle \varepsilon\_{1t}, \mathcal{W}\_{t} \right\rangle D\_{T} \right\rVert \right\rVert \leq \left\lVert \left\lVert \mathcal{N}^{-\frac{1}{2}} \left\langle \varepsilon\_{1t}, \mathcal{U}\_{t} \right\rangle \right\rVert \right\rVert + \left\lVert \left\lVert \mathcal{N}^{-1} \left\langle \varepsilon\_{1t}, \mathcal{Y}\_{\varepsilon,t-1} \right\rangle \right\rVert \right\rVert + \left\lVert \left\lVert \mathcal{N}^{-2} \left\langle \varepsilon\_{1t}, \mathcal{Y}\_{3,t-1} \right\rangle \right\rVert \right\rVert.$$

From Saikkonen (1991) Lemma A.5 we have : : : *N*<sup>−</sup> <sup>1</sup> <sup>2</sup> *e*1*t*, *Ut* : : : <sup>=</sup> *oP*(*<sup>h</sup>* 1 <sup>2</sup> ), and : : : *<sup>N</sup>*−<sup>1</sup>*<sup>e</sup>*1*t*, *yc*,*t*−<sup>1</sup> : : : <sup>=</sup> *oP*(1). Then <sup>E</sup>*y*3,*t*−1<sup>2</sup> <sup>=</sup> *<sup>O</sup>*(*<sup>t</sup>* <sup>3</sup>) and <sup>E</sup>*e*1*t*<sup>2</sup> <sup>=</sup> *<sup>o</sup>*(*N*−1) imply

$$\mathbb{E}\left\|N^{-2}\langle\varepsilon\_{1t},y\_{3,t-1}\rangle\right\| \leq N^{-2}\sum\_{t=k+3}^{T}\left(\mathbb{E}\|\varepsilon\_{1t}\|^2\mathbb{E}\|y\_{3,t-1}\|^2\right)^{\frac{1}{2}} = o(N^{-2}NN^{-1/2}N^{3/2}) = o(1).$$

(III) To show *εt*, *Wt DT* <sup>=</sup> *OP*(*h*1/2) note that *<sup>N</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup> *<sup>ε</sup>t*, *Ut* <sup>=</sup> *OP*(*h*1/2), *<sup>N</sup>*−<sup>1</sup>*<sup>ε</sup>t*, *yc*,*t*−<sup>1</sup> <sup>=</sup> *OP*(1) according to (A.7) of Saikkonen (1992). Moreover *N*−<sup>2</sup> ∑*<sup>T</sup> <sup>t</sup>*=*h*+<sup>3</sup> *εty* 3,*t*−<sup>1</sup> <sup>⇒</sup> *<sup>g</sup>*(1) 1 <sup>0</sup> *dBH* <sup>3</sup> implies *<sup>N</sup>*−<sup>2</sup>*<sup>ε</sup>t*, *<sup>y</sup>*3,*t*−<sup>1</sup> <sup>=</sup> *OP*(1).

Note that for the lemma to hold we only need *<sup>h</sup>*3/*<sup>N</sup>* <sup>→</sup> 0 and *<sup>N</sup>*1/2 <sup>∑</sup><sup>∞</sup> *<sup>j</sup>*=*h*+<sup>1</sup> Π*j* = *o*(1).

#### **Appendix B. Proof of Theorem 1**

#### *Appendix B.1. (A) Consistency*

(i) Lemma A1 implies <sup>Φ</sup><sup>ˆ</sup> <sup>2</sup> <sup>→</sup> <sup>0</sup> <sup>=</sup> <sup>Φ</sup>2, <sup>Φ</sup><sup>ˆ</sup> <sup>3</sup> <sup>→</sup> <sup>0</sup> <sup>=</sup> <sup>Φ</sup>3. Furthermore, the reparameterization implies Φ<sup>1</sup> = ∑*h*+<sup>2</sup> *<sup>j</sup>*=<sup>1</sup> <sup>Ξ</sup>*j*<sup>1</sup> and thus <sup>Φ</sup><sup>ˆ</sup> <sup>1</sup> <sup>=</sup> <sup>∑</sup>*h*+<sup>2</sup> *<sup>j</sup>*=<sup>1</sup> <sup>Ξ</sup><sup>ˆ</sup> *<sup>j</sup>*,1 leading to

*Econometrics* **2020**, *8*, 38

$$\begin{aligned} \|\Phi\_1 - \underline{\Phi}\_1\| &\le \|\sum\_{j=1}^{h+2} \underline{\Theta}\_{j,1} - \sum\_{j=1}^{h+2} \underline{\Xi}\_{j,1}\|\\ &\le \sum\_{j=1}^{h+2} \|\underline{\Theta}\_{j,1} - \underline{\Xi}\_{j,1}\| \le \|\underline{\Theta} - \underline{\Xi}\| = O\_P(h^{3/2}/N^{1/2}), \end{aligned}$$

where the last inequality holds due to *<sup>ε</sup>t*, *ut*−*<sup>j</sup>* <sup>=</sup> *OP*(*N*1/2) in combination with Lemma A1. (ii) Note that

$$
\Delta\_{\varepsilon} = N^{-1} \langle \Delta^2 y\_t - \hat{\Lambda} \mathcal{W}\_t, \Delta^2 y\_t - \hat{\Lambda} \mathcal{W}\_t \rangle \\
= N^{-1} \langle \varepsilon\_t + (\underline{\Delta} - \hat{\Lambda}) \mathcal{W}\_t, \varepsilon\_t + (\underline{\Delta} - \hat{\Lambda}) \mathcal{W}\_t \rangle.
$$

Now

$$\langle (\Delta - \hat{\Lambda}) \mathcal{W}\_{\mathsf{t}\prime} (\Delta - \hat{\Lambda}) \mathcal{W}\_{\mathsf{t}\prime} \rangle = (\Delta - \hat{\Lambda}) D\_T^{-1} D\_T \langle \mathcal{W}\_{\mathsf{t}\prime} \mathcal{W}\_{\mathsf{t}\prime} \rangle D\_T D\_T^{-1} (\Delta - \hat{\Lambda})^{\prime}$$

where *<sup>R</sup>*<sup>ˆ</sup> <sup>=</sup> *DTWt*, *Wt DT* such that *R*<sup>ˆ</sup> <sup>1</sup> <sup>=</sup> *OP*(1) and (<sup>Λ</sup> <sup>−</sup> <sup>Λ</sup><sup>ˆ</sup> )*D*−<sup>1</sup> *<sup>T</sup>* <sup>=</sup> *OP*(*h*1/2). Consequently

$$N^{-1} \langle (\underline{\Delta} - \hat{\Lambda}) \mathcal{W}\_{t\prime} (\underline{\Delta} - \hat{\Lambda}) \mathcal{W}\_t \rangle = O\_P(h/N) \to 0.$$

Next, from the definition of *et*, we can show that

$$N^{-1} \langle \varepsilon\_t + \varepsilon\_{1t}, \varepsilon\_t + \varepsilon\_{1t} \rangle = N^{-1} \langle \varepsilon\_t, \varepsilon\_t \rangle + o\_P(1) = \Sigma\_\mathbb{C} + o\_P(1),$$

where the last equality follows the law of large numbers and the first equality is implied by the fact that *e*1*t*<sup>2</sup> <sup>=</sup> *oP*(*T*−1) and *εt*<sup>2</sup> <sup>=</sup> *OP*(1). (iii) From (i) and (ii), Ωˆ 1.*<sup>c</sup>* = (Φˆ <sup>1</sup>Σ<sup>ˆ</sup> <sup>−</sup><sup>1</sup> <sup>Φ</sup><sup>ˆ</sup> <sup>1</sup>)−<sup>1</sup> = (Φ <sup>1</sup>Σ−<sup>1</sup> <sup>Φ</sup>1)−<sup>1</sup> + *oP*(1) = <sup>Ω</sup>1.*<sup>c</sup>* + *oP*(1) directly follows. (iv) With respect to Ψˆ recall that

$$\Psi\_1 = -\sum\_{j=2}^{h+2} (j-1)\Xi\_{j,1\prime} \quad \Psi\_2 = \sum\_{j=1}^{h+1} \Xi\_{j,2\prime}$$

Then Lemma A1 shows that each entry of <sup>Ξ</sup><sup>ˆ</sup> <sup>−</sup> <sup>Ξ</sup> is of order *OP*(*h*1/2/*N*1/2). Then

$$\|\Psi\_1 - \Psi\_1\| \le \sum\_{j=2}^{h+2} (j-1) \|\mathfrak{S}\_{j,1} - \mathfrak{S}\_{j,1}\| = O\_P(\sum\_{j=2}^{h+2} (j-1) h^{1/2} / N^{1/2}) = O\_P(h^{5/2} / N^{1/2})$$

which converges to zero for *<sup>h</sup>*5/*<sup>T</sup>* <sup>→</sup> 0. Similarly <sup>Ψ</sup><sup>ˆ</sup> <sup>2</sup> <sup>−</sup> <sup>Ψ</sup><sup>2</sup> <sup>=</sup> *OP*(*h*3/2/*N*1/2).

For <sup>Ψ</sup><sup>ˆ</sup> <sup>3</sup> note that <sup>Θ</sup> <sup>=</sup> <sup>Φ</sup>1*<sup>A</sup>* <sup>+</sup> <sup>Ψ</sup>3. Thus <sup>Ψ</sup><sup>ˆ</sup> <sup>3</sup> <sup>=</sup> <sup>Θ</sup><sup>ˆ</sup> <sup>−</sup> <sup>Φ</sup><sup>ˆ</sup> <sup>1</sup>*<sup>A</sup>* such that <sup>Ψ</sup><sup>ˆ</sup> <sup>3</sup> <sup>→</sup> <sup>Ψ</sup><sup>3</sup> from (i) and Lemma A1. (v) is contained in Lemma A1.

(vi) From (6), and the definition Ωˆ 1.*<sup>c</sup>* = (Φˆ <sup>1</sup>Σ<sup>ˆ</sup> <sup>−</sup><sup>1</sup> <sup>Φ</sup><sup>ˆ</sup> <sup>1</sup>)−1, we have

$$\begin{split} \hat{A} - A &= - (\hat{\Phi}\_1' \hat{\Sigma}\_{\mathfrak{c}}^{-1} \hat{\Phi}\_1)^{-1} \hat{\Phi}\_1' \hat{\Sigma}\_{\mathfrak{c}}^{-1} \hat{\Psi}\_3 - A \\ &= - \, \Omega\_{1\mathfrak{c}} \, \Phi\_1' \Sigma\_{\mathfrak{c}}^{-1} \Psi\_3 - \Omega\_{1\mathfrak{c}} \, \Omega\_{1\mathfrak{c}}^{-1} A = - \Omega\_{1\mathfrak{c}} \, \Phi\_1' \Sigma\_{\mathfrak{c}}^{-1} \Psi\_3 - \Omega\_{1\mathfrak{c}} \, \Phi\_1' \Sigma\_{\mathfrak{c}}^{-1} \Phi\_1 A \\ &= - \, \Omega\_{1\mathfrak{c}} \, \Phi\_1' \Sigma\_{\mathfrak{c}}^{-1} (\Psi\_3 + \Phi\_1 A) = - \, \Omega\_{1\mathfrak{c}} \, \Phi\_1' \Sigma\_{\mathfrak{c}}^{-1} \Phi. \end{split}$$

Then (i-iii, v) show the result.

#### *Appendix B.2. (B) Asymptotic Distribution of Coefficients to Nonstationary Regressors*

(i) The distribution of the coefficients due to the nonstationary components is contained in Lemma A1. (ii) With respect to the cointegrating relation note that from the proof of Theorem 1 we have

$$N(\boldsymbol{\hat{A}} - \boldsymbol{A}) = -N\boldsymbol{\Omega}\_{1x}\boldsymbol{\Phi}\_1'\boldsymbol{\Sigma}\_{\boldsymbol{\varepsilon}}^{-1}\boldsymbol{\Theta} = -\boldsymbol{\Omega}\_{1x}\boldsymbol{\Phi}\_1'\boldsymbol{\Sigma}\_{\boldsymbol{\varepsilon}}^{-1} \cdot N\boldsymbol{\Theta} + o\_P(1).$$

Note that *<sup>N</sup>*Θ<sup>ˆ</sup> = [*N*Φ<sup>ˆ</sup> 2, *<sup>N</sup>*Θ<sup>ˆ</sup> , *<sup>N</sup>*2Φ<sup>ˆ</sup> <sup>3</sup>]*η*, where *<sup>η</sup>* = [0*p*3×*p*<sup>2</sup> , *Ip*<sup>3</sup> , 0*p*3×*p*<sup>3</sup> ] . Then by Lemma A1, we have

*<sup>N</sup>*(*A*<sup>ˆ</sup> <sup>−</sup> *<sup>A</sup>*) ⇒ −Ω1.*c*Φ 1Σ−<sup>1</sup> · *g*(1) 4 1 0 *dBF* 4 1 0 *FF* −<sup>1</sup> *η* = −Ω1.*c*Φ 1Σ−<sup>1</sup> · *g*(1) 4 1 0 *dBL* 4 1 0 *LL* −<sup>1</sup> .

Note that Φ<sup>1</sup> = *g*(1)*α*, and by definition Ω = Ω<sup>11</sup> Ω1*<sup>c</sup>* <sup>Ω</sup>*c*<sup>1</sup> <sup>Ω</sup>*cc* = *g*(1)−<sup>1</sup> Σ *g*(1)−1, we have

$$\begin{split} -\Omega\_{1\boldsymbol{\mathcal{L}}}\Phi\_{1}^{\prime}\Sigma\_{\boldsymbol{\mathcal{L}}}^{-1}\boldsymbol{\mathcal{g}}(1)\boldsymbol{B} &= -\Omega\_{1\boldsymbol{\mathcal{L}}}\boldsymbol{a}^{\prime}\boldsymbol{\mathcal{g}}(1)^{\prime}\Sigma\_{\boldsymbol{\mathcal{L}}}^{-1}\boldsymbol{\mathcal{g}}(1)\boldsymbol{B} = -\Omega\_{1\boldsymbol{\mathcal{L}}}\left[I\_{\mathbb{P}\_{1}}\,\boldsymbol{0}\right]\boldsymbol{\Omega}^{-1}\boldsymbol{B} \\ &= \Omega\_{1\boldsymbol{\mathcal{L}}}\left[(\boldsymbol{\Omega}^{-1})\_{11}\left(\boldsymbol{\Omega}^{-1}\right)\_{1\boldsymbol{\mathcal{L}}}\right]\boldsymbol{B} = \Omega\_{1\boldsymbol{\mathcal{L}}}\left[\boldsymbol{\Omega}\_{1\boldsymbol{\mathcal{L}}}^{-1}\quad -\Omega\_{1\boldsymbol{\mathcal{L}}}^{-1}\Omega\_{1\boldsymbol{\mathcal{L}}}\Omega\_{\boldsymbol{\mathcal{L}}}^{-1}\right]\boldsymbol{B} \\ &= \left[I\_{p\_{1}}\quad -\Omega\_{1\boldsymbol{\mathcal{L}}}\Omega\_{\boldsymbol{\mathcal{L}}}^{-1}\right]\begin{bmatrix} \boldsymbol{B}\_{1} \\ \boldsymbol{B}\_{\boldsymbol{\mathcal{L}}} \end{bmatrix} = \boldsymbol{B}\_{1} - \Omega\_{1\boldsymbol{\mathcal{L}}}\Omega\_{\boldsymbol{\mathcal{L}}}^{-1}\boldsymbol{B}\_{\boldsymbol{\mathcal{L}}} = \boldsymbol{B}\_{1\boldsymbol{\mathcal{L}}}. \end{split}$$

Therefore, we have

$$N(\hat{A} - A) \Rightarrow \int\_0^1 dB\_{1.c} L' \left( \int\_0^1 LL' \right)^{-1} \cdot$$

#### *Appendix B.3. (C) Asymptotic Distribution of Coefficients to Stationary Regressors*

Since the regressor vector *Ut* is stationary, the asymptotic distribution of *N*1/2*L <sup>h</sup>vec*(Ξ<sup>ˆ</sup> <sup>−</sup> <sup>Ξ</sup>) follows from Lewis and Reinsel (1985) in combination with uniform boundedness of the maximal and the minimal eigenvalue of Γ*<sup>u</sup>* = E*UtU <sup>t</sup>*, see above. Analogously the result for the coefficients corresponding to the regressor vector *Xt* are shown as *Xt* = *ThUt* for nonsingular matrix *Th*.

#### *Appendix B.4. (D) Asymptotic Distribution of Wald Type Tests*

For the Wald test in addition to (C) note that the variance Γ*ECM* is replaced by an estimate Γˆ *ECM*. For

$$L'\_{\hbar}(\Gamma\_{\to C}^{-1} \otimes \Sigma\_{\mathfrak{c}})L\_{\hbar} - L'\_{\hbar}(\hat{\Gamma}\_{\to C}^{-1} \otimes \hat{\Sigma}\_{\mathfrak{c}})L\_{\hbar}$$

note that <sup>Σ</sup><sup>ˆ</sup> <sup>−</sup> <sup>Σ</sup> <sup>=</sup> *oP*(1) due to (A) (ii). The regressor vectors *<sup>X</sup>*˜*<sup>t</sup>* and *Xt* differ only in the first block where *y*1,*t*−<sup>1</sup> = *u*1,*t*−<sup>1</sup> + *A*Δ*y*3,*t*−<sup>1</sup> replaces *u*1,*t*−1. Regressing out Δ*y*3,*t*−<sup>1</sup> eliminates this difference. Then Γ<sup>ˆ</sup> *ECM* <sup>−</sup> <sup>Γ</sup>*ECM*<sup>1</sup> <sup>=</sup> *OP*(*h*/*N*1/2) according to (Saikkonen and Lütkepohl 1996, p. 835, l. 3). There also invertibility of Γ*ECM* is shown. Using Lemma A.2 of Saikkonen and Lütkepohl (1996) this implies Γ<sup>ˆ</sup> <sup>−</sup><sup>1</sup> *ECM* <sup>−</sup> <sup>Γ</sup>−<sup>1</sup> *ECM*<sup>1</sup> <sup>=</sup> *OP*(*h*/*N*1/2).

The rest then follows as the proof of Theorem 4 in Saikkonen and Lütkepohl (1996).

#### **Appendix C. Proof of Theorem 2**

Consistency follows directly from Theorem 1 as the general representation can be transformed into a triangular representation using the matrix B = [*β*, *β*1, *β*2], see (4).

With respect to the asymptotic distribution following the proof of Theorem 1 there exists a nonsingular transformation matrix *Sh* such that *Wt* <sup>=</sup> *ShZt*,*h*. From *R*<sup>ˆ</sup> <sup>−</sup><sup>1</sup> <sup>−</sup> *<sup>R</sup>*−1 <sup>=</sup> *OP*(*h*/*N*1/2) it follows that

$$(N^{-1} \langle \mathcal{W}\_{\mathbf{t}}, \mathcal{W}\_{\mathbf{t}} \rangle)^{-1} = \begin{bmatrix} (\Gamma\_{\mathbf{t}})^{-1} & 0\\ 0 & 0 \end{bmatrix} + o\_P(h/N^{1/2}).$$

Therefore it follows that the blocks corresponding to the nonstationary regressors do not contribute to the asymptotic distribution. Then standard arguments for the stationary part of the regressor vector can be used.

#### **Appendix D. Proofs for Theorem 3**

The proof combines the ideas of Saikkonen and Luukkonen (1997) (in the following S&L) with the asymptotics of 2SI2 of Paruolo (2000) (in the following P). In the proof we will work without restriction of generality with the triangular representation.

The key to the asymptotic properties of the estimators obtained from the 2SI2 algorithm lies in the results of P Lemma A.4 and Lemma A.5 in the appendix. These lemmas deal with the limits of various moment matrices of the form *<sup>N</sup>*−*<sup>a</sup>Rit*, *Rjt* corrected for the stationary components <sup>Δ</sup><sup>2</sup>*yt*−*j*, *<sup>j</sup>* <sup>=</sup> 1, . . . , *<sup>h</sup>* <sup>−</sup> 2. The correction involves a regressor vector growing in dimension with sample size. This is dealt with in S&L.

In this respect let *St* = [Δ2*y <sup>t</sup>*−1, ... , <sup>Δ</sup>2*y <sup>t</sup>*−*h*+2] which according to (A4) is a linear function of *Ut* such that *St* <sup>=</sup> <sup>T</sup>*sUt*. The definition of *Ut* implies *<sup>Q</sup>*<sup>ˆ</sup> <sup>=</sup> *<sup>N</sup>*−<sup>1</sup>*Ut*, *Ut*<sup>−</sup> <sup>E</sup>*UtU <sup>t</sup>* = *OP*(*h*/*N*1/2). On p. 543 in P the matrices Σ*ij*, *i*, *j* ∈ {*Y*, *U*, 0} are defined as limits of second moment matrices. Here *U* refers to *β* <sup>1</sup>Δ*yt*−<sup>1</sup> = *u*2,*t*−<sup>1</sup> in the triangular representation, *Y* refers to *β yt*−<sup>1</sup> + *δβ* <sup>2</sup>Δ*yt*−<sup>1</sup> = *y*1,*t*−<sup>1</sup> − *A*Δ*y*3,*t*−<sup>1</sup> = *u*1,*t*−<sup>1</sup> and 0 refers to Δ2*yt*. These are all stationary processes and linear functions of *ut*, *ut*−1, *ut*−2. Additional to *St* also *β* Δ*yt*−<sup>1</sup> = Δ*u*1,*t*−<sup>1</sup> + *Au*3,*t*−<sup>1</sup> is corrected for in the second stage.

The arguments on p. 114 and 115 of S&L deal with terms of the form

$$N^{-1} \langle u\_{1,t-1}, u\_{1,t-1} \rangle - N^{-1} \langle u\_{1,t-1}, \mathcal{S}\_t \rangle \langle \mathcal{S}\_t, \mathcal{S}\_t \rangle^{-1} \langle \mathcal{S}\_t, u\_{1,t-1} \rangle \dots$$

Analogous arguments to S&L(A.12) show that this equals (up to terms of order *oP*(1))

$$\mathbb{C}\_{11} = \mathbb{E}\mu\_{1,t-1}\mu'\_{1,t-1} - \mathbb{E}\mu\_{1,t-1}\mathbb{S}'\_t(\mathbb{E}\mathbb{S}\_t\mathbb{S}'\_t)^{-1}\mathbb{E}\mathbb{S}\_t\mu'\_{1,t-1}.$$

S&L state that this is bounded from above and bounded away from zero. The second claim actually is wrong. If (*u*1,*t*)*t*∈<sup>Z</sup> is univariate white noise with unit variance then *<sup>C</sup>*<sup>11</sup> <sup>=</sup> <sup>1</sup> *<sup>h</sup>* is achieved by predicting *u*1,*t*−<sup>1</sup> by

$$\sum\_{j=1}^{h} \frac{h-j}{h} \Delta u\_{1,t-j} = u\_{1,t-1} - \frac{1}{h} \sum\_{j=1}^{h} u\_{1,t-j}$$

including integration of the regressors in the form of the summation. This does not change the remaining arguments in S&L, it only implies that the separation of the eigenvalues corresponding to the stationary regressors and the ones corresponding to the non-stationary ones is weaker.

In the current case one can show that for

$$N^{-1} \langle \boldsymbol{\mu}\_{1,t-1}, \boldsymbol{\mu}\_{1,t-1} \rangle - N^{-1} \langle \boldsymbol{\mu}\_{1,t-1}, \boldsymbol{\mathcal{S}}\_{t} \rangle \langle \boldsymbol{\mathcal{S}}\_{t}, \boldsymbol{\mathcal{S}}\_{t} \rangle^{-1} \langle \boldsymbol{\mathcal{S}}\_{t}, \boldsymbol{\mu}\_{1,t-1} \rangle^{\cdot}$$

where *St* contains <sup>Δ</sup>*u*1,*t*−<sup>1</sup> and <sup>Δ</sup>2*u*1,*t*−*j*, *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>h</sup>* for the corresponding limit *<sup>C</sup>*<sup>11</sup> the lower bound *hC*<sup>11</sup> ≥ *cI* holds for some 0 < *c*. The order of the lower bound is achieved by including a double integration of the regressors. For

$$N^{-1} \langle \Delta \mu\_{1,t-1}, \Delta \mu\_{1,t-1} \rangle - N^{-1} \langle \Delta \mu\_{1,t-1}, \mathcal{S}\_t \rangle \langle \mathcal{S}\_{t\prime}, \mathcal{S}\_t \rangle^{-1} \langle \mathcal{S}\_t, \Delta \mu\_{1,t-1} \rangle = \mathcal{C}\_{\Lambda \Lambda} + o\_{\overline{\mathbb{P}}}(1)$$

we have *<sup>h</sup>*3*C*ΔΔ <sup>≥</sup> *cI*. Here the arguments from above can be applied to the process (Δ*ut*)*t*∈Z. For a differenced process the smallest eigenvalue of the matrix

$$\mathbb{E}\delta \mathbb{U} I\_t \delta \mathbb{U}\_{t\prime}^{\prime} \quad \delta \mathbb{U}\_t^{\prime} = \left[ \Delta u\_{t\prime}^{\prime} \Delta u\_{t-1\prime}^{\prime}, \dots, \Delta u\_{t-h}^{\prime} \right]$$

is of order *h*−2, compare Theorem 2 of Palma and Bondon (2003).

$$\begin{aligned} \text{Since } N^{-1} \langle S\_{t\prime} y\_{\varepsilon, t-1} \rangle &= O\_P(h^{1/2}) \text{ and } N^{-2} \langle S\_{t\prime} y\_{3, t-1} \rangle = O\_P(h^{1/2}) \text{ it follows that} \\ N^{-1} (\langle u\_{1, t-1}, y\_{\varepsilon, t-1} \rangle - \langle u\_{1, t-1}, S\_t \rangle \langle S\_{t\prime} S\_t \rangle^{-1} \langle S\_{t\prime} y\_{\varepsilon, t-1} \rangle) &= O\_P(h^{1/2}), \\ N^{-2} (\langle y\_{\varepsilon, t-1}, y\_{\varepsilon, t-1} \rangle - \langle y\_{\varepsilon, t-1}, S\_t \rangle \langle S\_{t\prime} S\_t \rangle^{-1} \langle S\_{t\prime} y\_{\varepsilon, t-1} \rangle) &= N^{-2} \langle y\_{\varepsilon, t-1}, y\_{\varepsilon, t-1} \rangle + o\_P((h/N)^{1/2}) \end{aligned}$$

as well as

$$\begin{split} N^{-2}(\langle\mu\_{1,t-1},y\_{3,t-1}\rangle - \langle\mu\_{1,t-1},\mathcal{S}\_{t}\rangle\langle\mathcal{S}\_{t},\mathcal{S}\_{t}\rangle^{-1}\langle\mathcal{S}\_{t},y\_{3,t-1}\rangle) &= O\_{P}(\hbar^{1/2}),\\ N^{-3}(\langle\mathcal{y}\_{\varepsilon,t-1},y\_{3,t-1}\rangle - \langle\mathcal{y}\_{\varepsilon,t-1},\mathcal{S}\_{t}\rangle\langle\mathcal{S}\_{t},\mathcal{S}\_{t}\rangle^{-1}\langle\mathcal{S}\_{t},y\_{3,t-1}\rangle) &= N^{-3}\langle\mathcal{y}\_{\varepsilon,t-1},y\_{3,t-1}\rangle + o\_{P}((\hbar/\mathcal{N})^{1/2}),\\ N^{-4}(\langle\mathcal{y}\_{3,t-1},y\_{3,t-1}\rangle - \langle\mathcal{y}\_{3,t-1},\mathcal{S}\_{t}\rangle\langle\mathcal{S}\_{t},\mathcal{S}\_{t}\rangle^{-1}\langle\mathcal{S}\_{t},y\_{3,t-1}\rangle) &= N^{-4}\langle\mathcal{y}\_{3,t-1},y\_{3,t-1}\rangle + o\_{P}((\hbar/\mathcal{N})^{1/2}).\end{split}$$

Therefore the limits of the moment matrices *Mij* are not affected by the correction using stationary terms even if *<sup>h</sup>* <sup>→</sup> <sup>∞</sup> except for the terms involving the orders *OP*(*h*1/2). For all stationary terms we find convergence to the corresponding limits denoted Σ*ij* in P.

The first step in the 2SI2 procedure then uses RRR in the equation

$$
\Delta^2 y\_t = \Psi \Delta y\_{t-1} + \alpha \beta' y\_{t-1} + \underline{\Pi} S\_t + e\_t.
$$

Then *<sup>R</sup>*0*<sup>t</sup>* denotes <sup>Δ</sup>2*yt* corrected for *St*, *<sup>R</sup>*1,*<sup>t</sup>* denotes <sup>Δ</sup>*yt*−<sup>1</sup> corrected for *St* and *<sup>R</sup>*2,*<sup>t</sup>* denotes *yt*−<sup>1</sup> corrected for *St*. Lemma A.4 of P derives the limits of different directions of *Mij*.*<sup>k</sup>* defined as

$$M\_{i\bar{j},k} = M\_{i\bar{j}} - M\_{i\bar{k}} M\_{\bar{k}k}^{-1} M\_{\bar{k}\bar{j}\prime} M\_{\bar{i}\bar{j}} = N^{-1} \langle R\_{\bar{i},t\prime} R\_{\bar{j},t} \rangle\_t$$

where *i*, *j* ∈ {0, 1, 2,*ε*, *β*}. Here *Rε*,*<sup>t</sup>* equals *et* correct for *St* and *Rβ*,*<sup>t</sup>* = *β R*1,*t*. Further P uses the notation *AT* = [*β*¯ 1, *T*−1*β*¯ <sup>2</sup>] and *β*¯ 2,*<sup>T</sup>* = *β*¯ 2. Here and below we assume without restriction of generality that [*β*, *β*1, *β*2] is an orthonormal matrix. Consequently *β*¯ = *β*, *β*¯ <sup>1</sup> = *β*1, *β*¯ <sup>2</sup> = *β*2. Then the results above imply all results of Lemma A.4. of P except that now *A <sup>T</sup>M*20.1 = *OP*(*h*1/2).

In particular we obtain the following limits:

$$\begin{array}{llll} A\_{T}^{\prime}M\_{2\varepsilon,1} \stackrel{d}{\rightarrow} \int\_{0}^{1} F\_{\mathsf{T}}(d\mathsf{W})^{\prime} & , & A\_{T}^{\prime}M\_{22.1}A\_{T} \stackrel{d}{\rightarrow} \int\_{0}^{1} F\_{\mathsf{T}}F\_{\mathsf{T}}^{\prime} \\ \beta\_{2}^{\prime}M\_{1\varepsilon,\mathsf{f}} \stackrel{d}{\rightarrow} \int\_{0}^{1} B\_{3}(d\mathsf{W})^{\prime} & , & T^{-1}\beta\_{2}^{\prime}M\_{11.\beta}\beta\_{2} \stackrel{d}{\rightarrow} \int\_{0}^{1} B\_{3}B\_{3\prime}^{\prime} \\ \beta\_{2}^{\prime}M\_{1\varepsilon,\mathsf{b}} \stackrel{d}{\rightarrow} \int\_{0}^{1} L(d\mathsf{W})^{\prime} & , & T^{-1}\beta\_{2}^{\prime}M\_{11.\mathsf{b}}\beta\_{2} \stackrel{d}{\rightarrow} \int\_{0}^{1} LL^{\prime}. \end{array}$$

Here *<sup>W</sup>* = *<sup>g</sup>*(1)*<sup>B</sup>* denotes the Brownian motion corresponding to (*εt*)*t*∈Z, *<sup>F</sup>*† denotes the Brownian motion corresponding to *R*2*<sup>t</sup>* (equaling *yt*−<sup>1</sup> corrected for *St*) corrected for *R*1*<sup>t</sup>* (Δ*yt*−<sup>1</sup> whose only nonstationary component equals Δ*y*3,*t*−<sup>1</sup> with corresponding Brownian motion *B*3). Thus we obtain the following definitions (where *L* is as in Theorem 1):

$$\begin{aligned} F\_{\mathfrak{d}}(\boldsymbol{u}) &= \begin{bmatrix} B\_2(\boldsymbol{u}) \\ \int\_0^\mathbf{u} B\_3(\boldsymbol{v}) d\boldsymbol{v} \end{bmatrix}, \quad F\_{\mathfrak{f}}(\boldsymbol{u}) = F\_{\mathfrak{d}}(\boldsymbol{u}) - \int\_0^1 F\_{\mathfrak{d}} B\_3'(\int\_0^1 B\_3 B\_3')^{-1} B\_3(\boldsymbol{u}), \\\ L(\boldsymbol{u}) &= B\_3(\boldsymbol{u}) - \int\_0^1 B\_3 F\_{\mathfrak{d}}'(\int\_0^1 F\_{\mathfrak{d}} F\_{\mathfrak{d}}')^{-1} F\_{\mathfrak{d}}(\boldsymbol{u}). \end{aligned}$$

The above arguments show that in the current setting *Ut*−<sup>1</sup> = *u*2,*t*−<sup>1</sup> and *Yt*−<sup>1</sup> = *u*1,*t*−<sup>1</sup> are contained in the space spanned by *St* for *h* → ∞. Therefore Σ*ij* = 0 for *i*, *j* ∈ {*U*,*Y*}. The subscript 'b' refers to correcting for *β* <sup>⊥</sup>*R*2*<sup>t</sup>* used in the second stage of 2SI2.

Let <sup>Σ</sup>˜ *YY* denote the limit of *<sup>h</sup>Yt*−1,*Yt*−<sup>1</sup> and analogously define <sup>Σ</sup>˜ *YU*, <sup>Σ</sup>˜ *UU*, <sup>Σ</sup>˜ <sup>0</sup>*<sup>Y</sup>* and <sup>Σ</sup>˜ <sup>0</sup>*U*. For the latter two note that Σ˜ <sup>0</sup>*<sup>Y</sup>* denotes the limit of

$$h\langle \Delta^2 y\_t, \boldsymbol{\Upsilon}\_{t-1} \rangle = h a \langle \boldsymbol{\Upsilon}\_{t-1}, \boldsymbol{\Upsilon}\_{t-1} \rangle + h \langle \boldsymbol{\zeta} L\_{t-1}, \boldsymbol{\Upsilon}\_{t-1} \rangle + h \langle \boldsymbol{\zeta}\_2 \boldsymbol{\beta}^\prime \Delta y\_{t-1}, \boldsymbol{\Upsilon}\_{t-1} \rangle + h \underline{\Pi} \langle \boldsymbol{\varsigma}\_t, \boldsymbol{\Upsilon}\_{t-1} \rangle + h \langle e\_t, \boldsymbol{\Upsilon}\_{t-1} \rangle$$

corrected for *St* and *β* <sup>Δ</sup>*yt*−1. Since *Yt*−<sup>1</sup> is stationary the last term is of order *OP*((*h*3/*N*)1/2) = *oP*(1). Therefore it follows that Σ˜ <sup>0</sup>*<sup>Y</sup>* = *α*Σ˜ *YY* + *ζ*Σ˜ *UY*. Then the results of Lemma A.5 of P hold where in (A.11) and (A.14) Σ*ij* can be replaced by Σ˜ *ij*.

The asymptotic analysis below will heavily use the Johansen approach of investigating the solutions to eigenvalue problems in order to maximize the pseudo-likelihood corresponding to the reduced rank regression problem. In order to use the corresponding local analysis one has to first clarify consistency for the various estimators as well as rates of convergence.

The main tool in this respect is Theorem A.1 of Johansen (1997) which establishes in the I(2) setting for the regression *yt* = *θ Zt* + *ε<sup>t</sup>* (*Zt* being composed of stationary, I(1) and I(2) components) where *DTZt*, *Zt DT* <sup>=</sup> *OP*(1) and *DTZt*,*ε<sup>t</sup>* <sup>=</sup> *oP*(1) that *<sup>D</sup>*−<sup>1</sup> *<sup>T</sup>* (<sup>ˆ</sup> *<sup>θ</sup>* <sup>−</sup> *<sup>θ</sup>*) = *oP*(1) where <sup>ˆ</sup> *θ* denotes the pseudo likelihood estimator over some closed parameter set Θ.

It is straightforward to see that analogous results hold in the present setting when first concentrating out the stationary components: Consider *yt* = *θ* <sup>1</sup>*zt* + *θ* <sup>2</sup>*Zt* <sup>+</sup> *et*. Then <sup>ˆ</sup> *θ*2(*θ*1) is obtained from the concentration step and the pseudo likelihood involves *Rt*,*<sup>y</sup>* − *θ* <sup>1</sup>*Rt*,*z*, *Rt*,*<sup>y</sup>* − *θ* <sup>1</sup>*Rt*,*<sup>z</sup>* where again the processes *Rt*,*y* and *Rt*,*z* denote the processes *yt* and *zt* with the corresponding stationary regressors *Zt* regressed out. These concentrated quantities now can be used in the proof of Theorem A.1 of Johansen (1997) essentially without changes to show consistency for ˆ *θ*1. Consistency of ˆ *θ*2(ˆ *θ*1) then follows from the unrestricted estimation as contained in Theorem 2. As shown above the rates of convergence as well as the limits are unchanged for the coefficients corresponding to the non-stationary components of the regressors for the long VAR case compared to the finite VAR case.

Note that these results hold for general closed parameter space Θ, thus including the unrestricted as well as the rank-reduced problem. This shows that we can always reduce the asymptotic analysis of the eigenvalue problems to a neighborhood of the true value as is done in P.

The first step in the proof of Theorem 4.1. of P consists in the investigation of the solutions to the

$$\begin{aligned} \text{Equation } (\not p = \beta H + \beta\_1 H\_1 + \beta\_2 H\_2, \text{ letting } B\_T' = \begin{bmatrix} \beta' \\ T^{-1/2} \beta\_1' \\ T^{-3/2} \beta\_2' \end{bmatrix} \text{)} \\ \text{B}\_T' M\_{22.1} \text{B}\_T \begin{bmatrix} H \\ T^{1/2} H\_1 \\ T^{3/2} H\_2 \end{bmatrix} \Lambda = \text{B}\_T' M\_{20.1} M\_{00.1}^{-1} M\_{02.1} \text{B}\_T \begin{bmatrix} H \\ T^{1/2} H\_1 \\ T^{3/2} H\_2 \end{bmatrix} . \end{aligned} \tag{A6}$$

Now Lemma A.4 implies that the matrix *B <sup>T</sup>M*22.1*BT* on the left hand side converges to diag(Σ*YY*.*U*, 1 <sup>0</sup> *F*†*F* †). *B <sup>T</sup>M*20.1 = Σ*Y*0.*<sup>U</sup>* 0 + *OP*(*h*1/2*T*−1/2), *M*00.1 = Σ00.*<sup>U</sup>* + *OP*(*T*−1/2). Multiplying the equation by *h*<sup>2</sup> we obtain the limiting eigenvalue problem

$$
\begin{bmatrix}
\boldsymbol{\Sigma}\_{\boldsymbol{Y}\boldsymbol{Y}\boldsymbol{\mathcal{U}}} & \boldsymbol{O}\_{\boldsymbol{P}}(T^{-1/2}h^{3/2}) \\
\boldsymbol{O}\_{\boldsymbol{P}}(T^{-1/2}h^{3/2}) & h\int\_{0}^{1}\boldsymbol{F}\_{\boldsymbol{\mathcal{V}}}\boldsymbol{F}\_{\boldsymbol{\mathcal{V}}}' \\
\end{bmatrix}
\begin{bmatrix}
\boldsymbol{H} \\
\boldsymbol{T}^{1/2}\boldsymbol{H}\_{1} \\
\boldsymbol{T}^{3/2}\boldsymbol{H}\_{2} \\
\end{bmatrix} h\boldsymbol{\Lambda} = \begin{bmatrix}
\boldsymbol{\Sigma}\_{\boldsymbol{Y}\boldsymbol{\mathcal{U}}\boldsymbol{\mathcal{U}}}\boldsymbol{\Sigma}\_{\boldsymbol{\mathcal{U}}\boldsymbol{\mathcal{U}}\boldsymbol{\mathcal{U}}}^{-1}\boldsymbol{\mathcal{O}}\_{\boldsymbol{Y}\boldsymbol{\mathcal{U}}\boldsymbol{\mathcal{U}}} & \boldsymbol{O}\_{\boldsymbol{P}}(T^{-1/2}h^{5/2}) \\
\boldsymbol{O}\_{\boldsymbol{P}}(T^{-1/2}h^{5/2}) & \boldsymbol{O}\_{\boldsymbol{P}}(T^{-1}h^{3}) \\
\end{bmatrix}
\begin{bmatrix}
\boldsymbol{H} \\
\boldsymbol{T}^{1/2}\boldsymbol{H}\_{1} \\
\boldsymbol{T}^{3/2}\boldsymbol{H}\_{2} \\
\end{bmatrix}.
$$

equation

Therefore asymptotically the first *p* − *r* eigenvalues of *h*Λ are positive, the remaining ones tending to zero. Likewise the eigenvectors converge at the same speed as the matrices. Thus *H*<sup>1</sup> = *OP*(*h*5/2/*T*), *H*<sup>2</sup> = *OP*(*h*5/2/*T*2) from which

$$\beta^{'}M\_{22.1}\beta H \Lambda H^{-1} = \beta^{'}M\_{20.1}M\_{00.1}^{-1}M\_{02.1}\beta + O\_P(h^4/T)$$

and thus using (A.11)

$$H\Lambda H^{-1} = \Sigma\_{YY.ll}^{-1} \Sigma\_{Y0.ll} \Sigma\_{00.ll}^{-1} \Sigma\_{0Y.ll} / h + O\_P(hT^{-1/2}) = a' \Sigma\_{00.ll}^{-1} a \Sigma\_{YY.ll} / h + O\_P(hT^{-1/2})$$

follows. Then as in P we have<sup>4</sup>

$$M\_{22.1} \tilde{\underline{\beta}} = M\_{20.1} (\Sigma\_{00.1}^{-1} \tilde{\Sigma}\_{0Y.0} (hH \Lambda H^{-1})^{-1} + O\_P(hT^{-1/2})) = M\_{22.1} \beta + M\_{2x.1} \Sigma\_c^{-1} a (a' \Sigma\_c^{-1} a)^{-1} + a\_1$$

where *a*<sup>1</sup> = *M*20.1*OP*(*h*2*T*−1/2) = *oP*(1) and *β*˜ = *β*˜*H*−1. Then the remaining arguments on p. 546 of P show that the asymptotic distribution of (*Tβ*1, *T*2*β*2) (*β*˜ <sup>−</sup> *<sup>β</sup>*) is identical for the long VAR case as in the finite VAR case.

From these arguments the distribution of the likelihood ratio test of *Hr* versus *Hp* can be shown: Define *<sup>S</sup>*1(*λ*) :<sup>=</sup> *<sup>λ</sup>M*22.1 <sup>−</sup> *<sup>M</sup>*20.1*M*−<sup>1</sup> 00.1*M*02.1 , *AT* := (*β*1, *<sup>T</sup>*−1*β*2) and *<sup>B</sup>*˜*<sup>T</sup>* := (*β*, *AT*)=(*β*, *<sup>β</sup>*1, *<sup>T</sup>*−1*β*2). Note that *<sup>B</sup>*˜*<sup>T</sup>* is of full rank, (11) is equivalent to <sup>|</sup>*B*˜ *<sup>T</sup>S*1(*λ*)*B*˜*T*<sup>|</sup> <sup>=</sup> 0; that is,

$$\left| \begin{pmatrix} \beta'\\ \beta\_1'\\ T^{-1}\beta\_2' \end{pmatrix} S\_1(\lambda)(\beta, \beta\_1, T^{-1}\beta\_2) \right| = \left| \beta' S\_1(\lambda)\beta \right| \cdot \left| A\_T' \left( S\_1(\lambda) - S\_1(\lambda)\beta \left( \beta' \mathbb{S}\_1(\lambda)\beta \right)^{-1} \beta' \mathbb{S}\_1(\lambda) \right) A\_T \right| = 0. \tag{A7}$$

Let *δ*<sup>1</sup> = *Tλ*, so that for every *δ*<sup>1</sup> we have that *λ* → 0, as *T* → ∞. By the above arguments we have that

$$h^2 \left| \beta' \mathcal{S}\_1(\lambda) \beta \right| = \left| \delta\_1 \frac{h^2}{T} \beta' M\_{22.1} \beta - h^2 \beta' M\_{20.1} M\_{00.1}^{-1} M\_{02.1} \beta \right| \stackrel{p}{\to} \left| -\Sigma\_{Y0.1l} \Sigma\_{00.l}^{-1} \Sigma\_{0Y.ll} \right| \neq 0,$$

which has no zero root. Moreover, we have

$$hA\_T^\prime S\_1(\lambda)\beta = h\lambda A\_T^\prime M\_{22.1}\beta - hA\_T^\prime M\_{20.1}M\_{00.1}^{-1}M\_{02.1}\beta = -A\_T^\prime M\_{20.1}\Sigma\_{00.1}^{-1}\Sigma\_{0Y.ll} + o\_P(1),$$

which yields that

 *A T S*1(*λ*) − *S*1(*λ*)*β* (*β S*1(*λ*)*β*)−<sup>1</sup> *β S*1(*λ*) *AT* = *δ*1 1 *<sup>T</sup> <sup>A</sup> TM*22.1*AT* − *A TM*20.1*M*−<sup>1</sup> 00.1*M*02.1*AT* − *A TS*1(*λ*)*β β S*1(*λ*)*β* −<sup>1</sup> *β S*1(*λ*)*AT* = *δ*1 1 *<sup>T</sup> <sup>A</sup> TM*22.1*AT* − *A TM*20.1 *M*−<sup>1</sup> 00.1 <sup>−</sup> <sup>Σ</sup>−<sup>1</sup> 00.*U*Σ˜ <sup>0</sup>*Y*.*<sup>U</sup>* Σ˜ *<sup>Y</sup>*0.*U*Σ−<sup>1</sup> 00.*U*Σ˜ <sup>0</sup>*Y*.*<sup>U</sup>* −1 Σ˜ *<sup>Y</sup>*0.*U*Σ−<sup>1</sup> 00.*<sup>U</sup>* + *oP*(1) *M*02.1*AT* = *δ*1 1 *<sup>T</sup> <sup>A</sup> TM*22.1*AT* − *A TM*20.1 Σ−<sup>1</sup> 00.*<sup>U</sup>* <sup>−</sup> <sup>Σ</sup>−<sup>1</sup> 00.*Uα α*Σ−<sup>1</sup> *α* −1 *α* Σ−<sup>1</sup> 00.*<sup>U</sup>* + *oP*(1) *M*02.1*AT d* −→ *δ*1 4 1 0 *F*†*F* † − 4 1 0 *F*†*dW <sup>α</sup>*⊥(*α* ⊥Σ*α*⊥)−1*α* ⊥ 4 1 0 *dWF* † = *δ*1 4 1 0 *F*†*F* † − ( 4 1 0 *F*†*dW* †)(<sup>4</sup> <sup>1</sup> 0 *dW*†*F* †) 

where *W*† = (*α* ⊥Σ*εα*⊥)−1/2*α* ⊥*W*. Thus, the smallest (*<sup>p</sup>* <sup>−</sup> *<sup>r</sup>*) solutions of (11) converge in distribution to the solutions of *δ*1 1 <sup>0</sup> *F*†*F* † − ( 1 <sup>0</sup> *F*†*dW* †)( <sup>1</sup> <sup>0</sup> *dW*†*F* †) <sup>=</sup> 0, which implies that the test statistic *Qr* has the following limiting distribution,

$$Q\_r = \sum\_{i=r+1}^p \delta\_{1,i} + o\_P(1) \stackrel{d}{\rightarrow} tr\left(\left(\int\_0^1 d\mathcal{W}\_\dagger F\_\dagger'\right) \left(\int\_0^1 F\_\dagger F\_\dagger'\right)^{-1} \left(\int\_0^1 F\_\dagger d\mathcal{W}\_\dagger'\right)\right).$$

For the second stage the arguments are very similar. The eigenvalue problem solved here is the following:

$$
\overline{\beta}'\_{\perp} \mathcal{M}\_{11\dots\vec{\beta}} \overline{\beta}\_{\perp} \eta \Upsilon = \overline{\beta}'\_{\perp} \mathcal{M}\_{1\mathbb{A}\_{\perp}, \vec{\beta}} \mathcal{M}\_{\mathbb{A}\_{\perp}, \vec{\mathbb{A}}\_{\perp}, \vec{\beta}}^{-1} \mathcal{M}\_{\mathbb{A}\_{\perp}, 1, \vec{\beta}} \overline{\beta}\_{\perp} \eta \cdot \eta
$$

<sup>4</sup> Contrary to the usual Johansen notation we use Σ as the noise covariance and Ω as the variance of the Brownian motion corresponding to (*ut*)*t*∈Z. Thus some of the formulas in this part show 'unusual' form.

This formula uses *α*˜ <sup>⊥</sup>, the ortho-complement of

$$\vec{\alpha} = M\_{02.1} \vec{\beta} (\vec{\beta}' M\_{22.1} \vec{\beta})^{-1}$$

From the above results noting that *hβ*˜ *<sup>M</sup>*22.1*β*˜ <sup>→</sup> <sup>Σ</sup>˜ *YY*.*<sup>U</sup>* and *hM*02.1*β*˜ <sup>→</sup> *<sup>α</sup>*Σ˜ *YY*.*<sup>U</sup>* according to Lemma A.4 we have *<sup>α</sup>*˜ <sup>→</sup> *<sup>α</sup>*. Considering the order of convergence we obtain *<sup>α</sup>*˜ <sup>−</sup> *<sup>α</sup>* <sup>=</sup> *OP*(*hT*−1/2). As in P this implies *<sup>α</sup>*˜ <sup>⊥</sup> <sup>−</sup> *<sup>α</sup>*<sup>⊥</sup> <sup>=</sup> *OP*(*hT*−1/2). Using *<sup>β</sup>*˜ <sup>−</sup> *<sup>β</sup>* <sup>=</sup> *OP*(*h*5/2/*T*) from stage 1 one observes that in the eigenvalue problem estimates can be replaced by true quantities introducing an error of order *oP*(*hT*−1/2):

$$
\overline{\beta}'\_{\perp} \mathcal{M}\_{11,\beta} \tilde{\beta}\_1 Y = \overline{\beta}'\_{\perp} \mathcal{M}\_{1a\_{\perp},\beta} \mathcal{M}\_{a\_{\perp}a\_{\perp},\beta}^{-1} \mathcal{M}\_{a\_{\perp}1,\beta} \tilde{\beta}\_1 + o\_P(hT^{-1/2}).
$$

Then as in P consider *β*˜ <sup>1</sup> <sup>=</sup> *<sup>β</sup><sup>H</sup>* <sup>+</sup> *<sup>β</sup>*1*H*<sup>1</sup> <sup>+</sup> *<sup>β</sup>*2*H*2, reusing the symbols *<sup>H</sup>*, *<sup>H</sup>*1, *<sup>H</sup>*<sup>2</sup> here for *<sup>β</sup>*˜ <sup>1</sup> in place of *β*˜ as before. Identical arguments as around (A6) show that *H*<sup>1</sup> = *OP*(1) and *H*<sup>2</sup> = *OP*(*h*2/*T*). Then combining the arguments around (A6) with the developments in P, p. 546 and 547 we obtain (A.21) of P:

$$
\overline{\mathcal{S}}\_{\perp}^{\prime} \mathcal{M}\_{11,\emptyset}(\underline{\widetilde{\beta}}\_{1} - \beta\_{1}) = \overline{\mathcal{B}}\_{\perp}^{\prime} \mathcal{M}\_{1\varepsilon,\emptyset} \mathfrak{a}\_{\perp} \Sigma\_{\mathfrak{a}\_{\perp}\mathfrak{a}\_{\perp}}^{-1} \zeta(\zeta^{\prime} \Sigma\_{\mathfrak{a}\_{\perp}\mathfrak{a}\_{\perp}}^{-1} \zeta)^{-1} + o\_{P}(1).
$$

The rest of the proof of (4.3a) and (4.3b) of P follows as in P. With respect to the second likelihood ratio test consider

$$\delta\_2(\rho) = \rho \overline{\beta}'\_{\perp} \mathcal{M}\_{11\dots\overline{\beta}} \overline{\beta}\_{\perp} - \overline{\beta}'\_{\perp} \mathcal{M}\_{1\mathbb{A}\_{\perp},\overline{\beta}} \mathcal{M}\_{\mathbb{A}\_{\perp}\mathbb{A}\_{\perp},\overline{\beta}}^{-1} \mathcal{M}\_{\mathbb{A}\_{\perp}\mathbb{1},\overline{\beta}} \overline{\beta}\_{\perp} \dots$$

The results above imply that *S*˜ <sup>2</sup>(*ρ*) has uniformly in |*ρ*| < *C* (for every 0 < *C* < ∞) distance to *S*2(*ρ*) of order *OP*(*hT*−1/2) where

$$S\_2(\rho) = \rho \overline{\beta}'\_{\perp} \mathcal{M}\_{11\dots\emptyset} \overline{\beta}\_{\perp} - \overline{\beta}'\_{\perp} \mathcal{M}\_{1\alpha\_{\perp}\dots\emptyset} \mathcal{M}^{-1}\_{\alpha\_{\perp}\alpha\_{\perp}\dots\emptyset} \mathcal{M}\_{\alpha\_{\perp}1\dots\emptyset} \overline{\beta}\_{\perp} \dots$$

Note that since (*η*, *<sup>η</sup>*⊥) is of full rank, (16) is equivalent to

$$\left| \begin{pmatrix} \eta'\\ \eta'\_{\perp} \end{pmatrix} \mathbb{S}\_2(\rho)(\eta, \eta\_{\perp}) \right| = \left| \eta' \mathbb{S}\_2(\rho)\eta \right| \cdot \left| \eta'\_{\perp} \left( \mathbb{S}\_2(\rho) - \mathbb{S}\_2(\rho)\eta \left( \eta' \mathbb{S}\_2(\rho)\eta \right)^{-1} \eta' \mathbb{S}\_2(\rho) \right) \eta\_{\perp} \right| = 0. \tag{A8}$$

Let *δ*<sup>2</sup> = *Tρ*, so that *ρ* → 0, as *T* → ∞. As above it can be seen that

$$\frac{1}{2}h^{2}\left|\eta'S\_{2}(\frac{\rho}{T})\eta\right| = h^{2}\left|\frac{\rho}{T}\beta\_{1}^{\prime}M\_{11,\notin}\beta\_{1} - \beta\_{1}^{\prime}M\_{1k\_{\perp},\notin}M\_{k\_{\perp},\notin}^{-1}M\_{k\_{\perp},1,\notin}\beta\_{1}\right|\stackrel{p}{\rightarrow}\left|-\Sigma\_{\mathcal{U}0}a\_{\perp}\left(a\_{\perp}^{\prime}\Sigma\_{\mathcal{O}0}a\_{\perp}\right)^{-1}a\_{\perp}^{\prime}\Sigma\_{\mathcal{U}1}\right|\neq 0.7$$

This shows that the *s* larger roots of *S*2(*ρ*) tend to zero slower than *O*(1/*T*). Moreover, we have

$$\eta\_{\perp} \eta\_{\perp}' \,^{\mathsf{S}} \mathfrak{z}(\frac{\mathsf{s}}{\mathsf{T}}) \eta = h \left( \stackrel{\mathsf{s}}{\mathsf{T}} \,^{\mathsf{f}}\_{2} \mathfrak{z}'\_{1\mathsf{L},\mathsf{f}} \,^{\mathsf{f}}\_{1} \mathfrak{z} - \beta\_{2}' M\_{1\overline{\mathsf{h}}\perp,\mathsf{f}} M\_{\overline{\mathsf{h}}\perp,\mathsf{f}}^{-1} M\_{\overline{\mathsf{h}}\perp,\mathsf{f}} \beta\_{\overline{\mathsf{h}}\perp,\mathsf{f}} \beta\_{\overline{\mathsf{h}}} \right) = -\beta\_{2}' M\_{1\overline{\mathsf{h}}\perp,\mathsf{f}} (a'\_{\perp} \Sigma\_{00} a\_{\perp})^{-1} a\_{\perp}' \Sigma\_{0\overline{\mathsf{L}}} + o\_{\mathbb{P}}(1),$$

which yields that (using *PM* := (*α* ⊥Σ00*α*⊥)−1*α* ⊥Σ˜ <sup>0</sup>*U*(Σ˜ *<sup>U</sup>*0*α*⊥(*α* ⊥Σ00*α*⊥)−1*α* ⊥Σ˜ <sup>0</sup>*U*)−1Σ˜ *<sup>U</sup>*0*α*⊥(*α* ⊥Σ00*α*⊥)−1)

 *η* ⊥ *S*2( *δ*2 *<sup>T</sup>* ) <sup>−</sup> *<sup>S</sup>*2( *δ*2 *<sup>T</sup>* )*<sup>η</sup>* (*η S*2( *δ*2 *<sup>T</sup>* )*η*)−<sup>1</sup> *<sup>η</sup> S*2( *δ*2 *T* ) *η*⊥ = *δ*2 1 *T β* <sup>2</sup>*M*11.*ββ*<sup>2</sup> − *β* <sup>2</sup>*M*1*α*¯ <sup>⊥</sup>.*βM*−<sup>1</sup> *<sup>α</sup>*¯ <sup>⊥</sup>*α*¯ <sup>⊥</sup>.*βMα*¯ <sup>⊥</sup>1.*ββ*<sup>2</sup> − *hη* <sup>⊥</sup>*S*2( *δ*2 *<sup>T</sup>* )*<sup>η</sup> h*2*η S*2( *δ*2 *<sup>T</sup>* )*<sup>η</sup>* −1 *hη S*1( *δ*2 *<sup>T</sup>* )*η*<sup>⊥</sup> = *δ*2 1 *T β* <sup>2</sup>*M*11.*ββ*<sup>2</sup> − *β* <sup>2</sup>*M*1*α*⊥.*β*(*α*⊥ <sup>Σ</sup>00*α*⊥)−1*Mα*⊥1.*ββ*<sup>2</sup> + *β* <sup>2</sup>*M*1*α*⊥.*<sup>β</sup>PM <sup>M</sup>α*⊥1.*ββ*<sup>2</sup> + *oP*(1) *d* −→ *δ*2 4 1 0 *B*3*B* 3 − 4 1 0 *B*3*dW α*2(*α* 2Σ*α*2)−1*α* 2 4 1 0 *dWB* 3 = *δ*2 4 1 0 *B*3*B* <sup>3</sup> − ( 4 1 0 *B*3*dW* 2)(<sup>4</sup> <sup>1</sup> 0 *dW*2*B* 3) 

using the results of Lemma A.5 of P. and (A.18) of Paruolo (1996) as an expression for

$$\left(\mathfrak{a}\_{\perp}^{\prime}\Sigma\_{00}\mathfrak{a}\_{\perp}\right)^{-1} - P\_{\mathcal{M}}$$

where *W*<sup>2</sup> = (*α* 2Σ*α*2)−1/2*α* <sup>2</sup>*W*.

Thus, the smallest (*p* − *r* − *s*) solutions of (16) converge in distribution to the solutions of

$$\left| \delta\_2 \int\_0^1 B\_3 B\_3' - (\int\_0^1 B\_3 dV\_2') (\int\_0^1 dV\_2 B\_3') \right| = 0\_\prime$$

which shows that the test statistic *Qr*,*s* has the following limiting distribution,

$$Q\_{r,s} = \sum\_{i=s+1}^{p-r} \delta\_{2,i} + o\_P(1) \stackrel{d}{\rightarrow} tr\left(\int\_0^1 d\mathcal{W}\_2 B\_3^{\prime} \left(\int\_0^1 B\_3 B\_3^{\prime}\right)^{-1} \int\_0^1 B\_3 d\mathcal{W}\_2^{\prime}\right).$$

It follows also that the sum *Sr*,*<sup>s</sup>* = *Qs* + *Qr*,*<sup>s</sup>* converges in distribution showing (C).

The rest of the proof of relations (4.3a, b) of P follow exactly as in P. In P (4.4) the order of convergence is replaced by *oP*(*T*−1), in (4.5) the error term can be shown to be *oP*(*T*−1/2) and in (4.6) instead of the term *OP*(*T*−2) we achieve *oP*(1).

These terms show consistency for *β*˜, *η*˜. Using the results of Lemma A.4 of P then consistency for *α*˜, ˜ *ζ* follow.

Following the proof of Theorem 4.2. on pp. 548+549 of P we can show consistency for *ψ*˜ of P. The only changes refer to the orders of convergence where our setting introduces orders of *h* into the arguments. Jointly this proves consistency of Ψ˜ and Γ˜. Consistency for the coefficients to the stationary terms <sup>Δ</sup><sup>2</sup>*yt*−*<sup>j</sup>* follows as usual from the consistency of the estimates for the coefficients to non-stationary regressors. This completes the proof of (D).

With respect to (E) note that the results above show that the asymptotics for the two eigenvalue problems to be solved converge to the same quantities as in the finite VAR case. This shows that the results of P in this respect hold also in the case of long VARs.

Finally for the matrices Π*<sup>j</sup>* note that Theorem 4.3. of P shows that the asymptotic distribution for all quantities corresponding to stationary regressors are identical for every super-consistent estimator for the coefficients to the non-stationary components.

#### **Appendix E. Proof of Theorem 4**

From Theorem <sup>3</sup> it follows that <sup>Φ</sup><sup>ˆ</sup> <sup>=</sup> *<sup>α</sup>*<sup>ˆ</sup> *<sup>β</sup>*ˆ <sup>→</sup> <sup>Φ</sup>, <sup>Ψ</sup><sup>ˆ</sup> <sup>→</sup> <sup>Ψ</sup>, <sup>Π</sup><sup>ˆ</sup> *<sup>j</sup>* <sup>→</sup> <sup>Π</sup>*j*, *<sup>j</sup>* <sup>=</sup> 1, 2, ... , 2 *<sup>f</sup>* <sup>−</sup> 1. Therefore the Hankel matrix of impulse response coefficients Πˆ *<sup>j</sup>* converges to the Hankel matrix corresponding to the Π *j* s. As (*A*¯, *B*) is controllable, (*A*, *B*, *C*) is minimal and *A*¯ is nonsingular according to the assumptions, this Hankel matrix has rank *n*. This implies that the stochastic realisation algorithm of Appendix F provides consistent estimates (*A*ˆ¯, *<sup>B</sup>*ˆ, *<sup>D</sup>*<sup>ˆ</sup> ) <sup>→</sup> (*A*¯, *<sup>B</sup>*, *<sup>D</sup>*). This implies

$$
\hat{a}(z) = (1 - z)^2 I\_{\mathbb{P}} - \hat{\Phi}z - \hat{\Psi}z(1 - z) - (1 - z)^2 z \hat{D} (I\_{\mathbb{R}} - z\hat{A})^{-1} \hat{B} \to a(z).
$$

For details see Appendix F.

*a*ˆ(*z*) does not necessarily correspond to a rational transfer function of order *n*. It does so, however, if the additional restrictions (22) hold. Step 3 and 4 of the proposed algorithm achieve this. Here step 3 ascertains that solutions to the third equation exist. The second equation explicitly provides a solution *α*¯ <sup>⊥</sup> for given *C*†. This solution not necessarily is of full row rank. As in the limit this is the case, it also holds for large enough *T*. The first equation always admits solutions. Thus for large enough *T* the set of all solutions is defined by polynomial restrictions. Adding the least squares distance to the estimated impulse response sequence then leads to a quadratic problem under non-linear differentiable constraints, which in the limit has a unique solution. Thus the solution is unique for large enough *T*.

Consistency of the estimates in combination with continuity of the solution of step 4 implies consistency for the system (*A*ˆ¯, *B*ˆ, *C*ˆ). This implies consistency for the inverse system (*A*ˆ, *B*ˆ, *C*ˆ) in the sense of converging impulse response coefficients and hence consistency for the transfer function estimator in the pointwise topology. The fulfillment of restrictions (22) ensures the structure of the corresponding matrix *A*ˆ according to state space unit root structure ((0,(*c*, *c* + *d*))).

#### **Appendix F. Stochastic Realization Using Overlapping Echelon Forms**

This section describes the approximate realization of the first *f* coefficients *Gj*, *j* = 1, ... , 2 *f* of an impulse response sequence using a rational transfer function of order *n* where *f* ≥ *n*. More details can be found in Section 2.6. of Hannan and Deistler (1988).

Define the Hankel matrix

$$
\mathcal{H}\_{f,f} = \begin{bmatrix}
\mathcal{G}\_1 & \mathcal{G}\_2 & \mathcal{G}\_3 & \dots & \mathcal{G}\_f \\
\mathcal{G}\_2 & \mathcal{G}\_3 & \dots & \\
\mathcal{G}\_3 & \dots & \\
\vdots & & \vdots \\
\mathcal{G}\_f & \mathcal{G}\_{f+1} & \dots & \mathcal{G}\_{2f-1}
\end{bmatrix} = \begin{bmatrix}
h(1,1) \\
h(1,2) \\
\vdots \\
h(1,p) \\
h(2,1) \\
\vdots \\
h(f,p)
\end{bmatrix}.
$$

Here *h*(*i*, *j*) denotes the *j*-th row in the *i*-th block row. Let *α* = (*n*1, ... , *np*) define a *nice selection* of rows5 of <sup>H</sup> such that <sup>H</sup>*<sup>α</sup>* <sup>∈</sup> <sup>R</sup>*n*×*f p*, the submatrix of <sup>H</sup> containing the rows *<sup>h</sup>*(*i*, *<sup>j</sup>*), *<sup>i</sup>* <sup>≤</sup> *nj*, is of full row rank. If the impulse response corresponds to a transfer function of order at least *n* there exists such a nice selection *<sup>α</sup>*. Finally let <sup>H</sup>*α*+<sup>1</sup> <sup>∈</sup> <sup>R</sup>*n*×*f p* denote the matrix <sup>H</sup>*<sup>α</sup>* shifted down one block row (that is in each row where H*<sup>α</sup>* contains *h*(*i*, *j*), H*α*+<sup>1</sup> contains *h*(*i* + 1, *j*)).

Then it is derived in Hannan and Deistler (1988), Theorem 2.6.2. that if *Gj* corresponds to a transfer function *k*(*z*) = ∑<sup>∞</sup> *<sup>j</sup>*=<sup>1</sup> *Gjz*−*<sup>j</sup>* of order exactly *<sup>n</sup>* such that the corresponding <sup>H</sup>*<sup>α</sup>* is formed using a nice selection, then a system (*A*, *B*, *C*) can be defined using the following formulas

$$A\mathcal{H}\_{\mathfrak{a}} = \mathcal{H}\_{\mathfrak{a}+1\mathfrak{b}} \quad B = \mathcal{H}\_{\mathfrak{a}} \begin{bmatrix} I\_p \\ 0 \end{bmatrix}, \quad \mathcal{CH}\_{\mathfrak{a}} = \begin{bmatrix} G\_1 & G\_2 & \dots & G\_f \end{bmatrix} \tag{A9}$$

such that *Gj* = *CAj*−1*B*, *j* = 1, 2, . . ..

If the order of the transfer function is larger than *n*, then the equations for *A* and *C* can be solved using least squares. If a sequence of impulse responses *<sup>G</sup>*ˆ*<sup>j</sup>* <sup>→</sup> *Gj*, *<sup>j</sup>* <sup>=</sup> 1, ... , 2 *<sup>f</sup>* <sup>−</sup> 1, and the limit *Gj* corresponds to a transfer function where the rank of H*<sup>α</sup>* equals *n*, it is obvious that the resulting systems (*A*ˆ, *<sup>B</sup>*ˆ, *<sup>C</sup>*ˆ) <sup>→</sup> (*A*, *<sup>B</sup>*, *<sup>C</sup>*) since in this case the least squares solution depends continuously on the matrix H.

#### **References**


<sup>5</sup> A nice selection is such that if *h*(*i*, *j*) is contained in the selection, then also *h*(*l*, *j*) are contained for all 0 < *l* < *i*.

Bauer, Dietmar, and Martin Wagner. 2012. A state space canonical form for unit root processes. *Econometric Theory* 28: 1313–49. [CrossRef]

Berk, Kenneth N. 1974. Consistent autoregressive spectral estimates. *The Annals of Statistics* 2: 489–502. [CrossRef]


Hannan, Edward James, and Manfred Deistler. 1988. *The Statistical Theory of Linear Systems*. New York: John Wiley.


Johansen, Søren. 1997. Likelihood analysis of the I(2) model. *Scandinavian Journal of Statistics* 24: 433–62. [CrossRef]


Paruolo, Paolo. 1994. The role of the drift in I(2) systems. *Journal of the Italian Statistical Society* 3: 93–123. [CrossRef]


Paruolo, Paolo. 2006. Common trends and cycles in I(2) VAR systems. *Journal of Econometrics* 132: 143–68. [CrossRef]


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
