Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models

Johansen, Søren; Tabor, Morten Nyboe

doi:10.3390/econometrics5030036

Open AccessArticle

Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models

by

Søren Johansen

^*

and

Morten Nyboe Tabor

Department of Economics, University of Copenhagen, Øster Farimagsgade 5, Building 26, 1353 Copenhagen K, Denmark

^*

Author to whom correspondence should be addressed.

Econometrics 2017, 5(3), 36; https://doi.org/10.3390/econometrics5030036

Submission received: 1 March 2017 / Revised: 15 August 2017 / Accepted: 17 August 2017 / Published: 22 August 2017

(This article belongs to the Special Issue Recent Developments in Cointegration)

Download

Browse Figures

Versions Notes

Abstract

:

A state space model with an unobserved multivariate random walk and a linear observation equation is studied. The purpose is to find out when the extracted trend cointegrates with its estimator, in the sense that a linear combination is asymptotically stationary. It is found that this result holds for the linear combination of the trend that appears in the observation equation. If identifying restrictions are imposed on either the trend or its coefficients in the linear observation equation, it is shown that there is cointegration between the identified trend and its estimator, if and only if the estimators of the coefficients in the observation equations are consistent at a faster rate than the square root of sample size. The same results are found if the observations from the state space model are analysed using a cointegrated vector autoregressive model. The findings are illustrated by a small simulation study.

Keywords:

cointegration of trends; state space models; cointegrated vector autoregressive models

JEL Classification:

C32

1. Introduction and Summary

This paper is inspired by a study on long-run causality, see Hoover et al. (2014). Causality is usually studied for a sequence of multivariate i.i.d. variables using conditional independence, see Spirtes et al. (2000) or Pearl (2009). For stationary autoregressive processes, causality is discussed in terms of the variance of the shocks, that is, the variance of the i.i.d. error term. For nonstationary cointegrated variables, the common trends play an important role for long-run causality. In Hoover et al. (2014), the concept is formulated in terms of independent common trends and their causal impact coefficients on the nonstationary observations. Thus, the emphasis is on independent trends, and how they enter the observation equations, rather than on the variance of the measurement errors.

The trend is modelled as an

m -

dimensional Gaussian random walk, starting at

T_{0}

,

T_{t + 1} = T_{t} + η_{t + 1}, t = 0, \dots, n - 1,

(1)

where

η_{t}

are i.i.d.

N_{m} (0, Ω_{η})

, that is, Gaussian in m dimensions with mean zero and

m \times m

variance

Ω_{η} > 0

. This trend has an impact on future values of the

p -

dimensional observation

y_{t}

modelled by

y_{t + 1} = B T_{t} + ε_{t + 1}, t = 0, \dots, n - 1,

(2)

where

ε_{t}

are i.i.d.

N_{p} (0, Ω_{ε})

and

Ω_{ε} > 0

. It is also assumed that the

ε_{s}

and

η_{t}

are independent for all s and t. In the following the joint distribution of

T_{1}, \dots, T_{n}, y_{1}, \dots, y_{n}

conditional on a given value of

T_{0}

is considered.

The observations are collected in the matrices

Y_{n}

,

p \times n

, and

Δ Y_{n}

,

p \times (n - 1)

, which are defined as

Y_{n} = (y_{1}, \dots, y_{n}), and Δ Y_{n} = (y_{2} - y_{1}, \dots, y_{n} - y_{n - 1}) .

The processes

y_{t}

and

T_{t}

are obviously nonstationary, but the conditional distribution of

Y_{n}

given

T_{0}

is well defined. We define

\begin{matrix} E_{t} T_{t} & = E (T_{t} | Y_{t}, T_{0}), \\ V_{t} & = V a r_{t} (T_{t}) = V a r (T_{t} | Y_{t}, T_{0}) . \end{matrix}

Then the density of

Y_{n}

conditional on

T_{0}

is given by the prediction error decomposition

p (Y_{n} | T_{0}) = p (y_{1} | T_{0}) \prod_{t = 1}^{n - 1} p (y_{t + 1} | Y_{t}, T_{0}),

where

y_{t + 1}

given

(Y_{t}, T_{0})

is p dimensional Gaussian with mean and variance

\begin{matrix} E_{t} y_{t + 1} & = B E_{t} T_{t}, \\ V a r_{t} (y_{t + 1}) & = B V_{t} B^{'} + Ω_{ε} . \end{matrix}

In this model it is clear that

y_{t}

and

T_{t}

cointegrate, that is,

y_{t + 1} - B T_{t + 1} = ε_{t + 1} - B η_{t + 1}

is stationary, and the same holds for

T_{t}

and the extracted trend

E_{t} T_{t} = E (T_{t} | y_{1}, \dots, y_{t}, T_{0})

. Note that in the statistical model defined by (1) and (2) with parameters

B, Ω_{η}

, and

Ω_{ε}

, only the matrices

B Ω_{η} B^{'}

and

Ω_{ε}

are identified because for any

m \times m

matrix

ξ

of full rank,

B ξ^{- 1}

and

ξ Ω_{η} ξ^{'}

give the same likelihood, by redefining the trend as

ξ T_{t}

.

Let

{\hat{E}}_{t} T_{t}

be an estimator of

E_{t} T_{t}

. The paper investigates whether there is cointegration between

E_{t} T_{t}

and

{\hat{E}}_{t} T_{t}

given two different estimation methods: A simple cointegrating regression and the maximum likelihood estimator in an autoregressive representation of the state space model.

Section 2, on the probability analysis of the data generating process, formulates the model as a common trend state space model, and summarizes some results in three Lemmas. Lemma 1 contains the Kalman filter equations and the convergence of

V a r (T_{t} | y_{1}, \dots, y_{t}, T_{0})

, see Durbin and Koopman (2012), and shows how its limit can be calculated by solving an eigenvalue problem. Lemma 1 also shows how

y_{t}

can be represented in terms of its prediction errors

v_{j} = y_{j} - E_{j - 1} y_{j}

,

j = 1, \dots, t

. This result is used in Lemma 2 to represent

y_{t}

in steady state as an infinite order cointegrated vector autoregressive model, see (Harvey 2006, p. 373). Section 3 discusses the statistical analysis of the data and the identification of the trends and their loadings. Two examples are discussed. In the first example, only B is restricted and the trends are allowed to be correlated. In the second example, B is restricted but the trends are uncorrelated, so that also the variance matrix is restricted. Lemma 3 analyses the data from (1) and (2) using a simple cointegrating regression, see Harvey and Koopman (1997), and shows that the estimator of the coefficient B suitably normalized is n-consistent.

Section 4 shows in Theorem 1 that the spread between

B E_{t} T_{t}

and its estimator

\hat{B} {\hat{E}}_{t} T_{t}

is asymptotically stationary irrespective of the identification of B and

T_{t}

. Then Theorem 2 shows that the spread between

E_{t} T_{t}

and its estimator

{\hat{E}}_{t} T_{t}

is asymptotically stationary if and only if B has been identified so that the estimator of B is superconsistent, that is, consistent at a rate faster than

n^{1 / 2}

.

The findings are illustrated with a small simulation study in Section 5. Data are generated from (1) and (2) with

T_{0} = 0

, and the observations are analysed using the cointegrating regression discussed in Lemma 3. If the trends and their coefficients are identified by the trends being independent, the trend extracted by the state space model does not cointegrate with its estimator. If, however, the trends are identified by restrictions on the coefficients alone, they do cointegrate.

2. Probability Analysis of the Data Generating Model

This section contains first two examples, which illustrate the problem to be solved. Then a special parametrization of the common trends model is defined and some, mostly known, results are given in Lemmas 1 concerning the Kalman filter recursions. Lemma 2 is about the representation of the steady state solution as an autoregressive process. All proofs are given in the Appendix.

2.1. Two Examples

Two examples are given which illustrate the problem investigated. The examples are analysed further by a simulation study in Section 5.

Example 1.

In the first example the two random walks

T_{1 t}

and

T_{2 t}

are allowed to be dependent, so

Ω_{η}

is unrestricted, and identifying restrictions are imposed only on their coefficients B. The equations are

\begin{matrix} y_{1, t + 1} & = & T_{1 t} + ε_{1, t + 1}, \\ y_{2, t + 1} & = & T_{2 t} + ε_{2, t + 1}, \\ y_{3, t + 1} & = & b_{31} T_{1 t} + b_{32} T_{2 t} + ε_{3, t + 1} . \end{matrix}

(3)

for

t = 0, \dots, n - 1

.Thus,

y_{t} = {(y_{1 t}, y_{2 t}, y_{3 t})}^{'}

,

T_{t} = {(T_{1 t}, T_{2 t})}^{'}

, and

B = (\begin{matrix} 1 & 0 \\ 0 & 1 \\ b_{31} & b_{32} \end{matrix}) .

(4)

Moreover,

Ω_{η} > 0

is

2 \times 2

,

Ω_{ε} > 0

is

3 \times 3

, and both are unrestricted positive definite. Simulations indicate that

E_{t} y_{t + 1} - {\hat{E}}_{t} y_{t + 1} = B E_{t} T_{t} - \hat{B} {\hat{E}}_{t} T_{t}

is stationary, and this obviously implies that the same holds for the first two coordinates

E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t}

and

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

. ■

Example 2.

The second example concerns two independent random walks

T_{1 t}

and

T_{2 t}

, and the three observation equations

\begin{matrix} y_{1, t + 1} & = & T_{1 t} + ε_{1, t + 1}, \\ y_{2, t + 1} & = & b_{21} T_{1 t} + T_{2 t} + ε_{2, t + 1}, \\ y_{3, t + 1} & = & b_{31} T_{1 t} + b_{32} T_{2 t} + ε_{3, t + 1} . \end{matrix}

(5)

In this example

B = (\begin{matrix} 1 & 0 \\ b_{21} & 1 \\ b_{31} & b_{32} \end{matrix}), Ω_{η} = d i a g (σ_{1}^{2}, σ_{2}^{2}),

(6)

and

Ω_{ε} > 0

is

3 \times 3

and unrestricted positive definite. Thus the nonstationarity is caused by two independent trends. The first,

T_{1 t}

, is the cause of the nonstationarity of

y_{1 t}

, whereas both trends are causes of the nonstationarity of

(y_{2 t}, y_{3 t})

. From the first equation it is seen that

y_{1 t}

and

T_{1 t}

cointegrate. It is to be expected that also the extracted trend

E_{t} T_{1 t}

cointegrates with

T_{1 t}

, and also that

E_{t} T_{1 t}

cointegrates with its estimator

{\hat{E}}_{t} T_{1 t}

. This is all supported by the simulations. Similarly, it turns out that

E_{t} y_{2, t + 1} - {\hat{E}}_{t} y_{2, t + 1} = b_{21} E_{t} T_{1 t} - {\hat{b}}_{21} {\hat{E}}_{t} T_{1 t} + E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t},

is asymptotically stationary. In this case, however,

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

is not asymptotically stationary, and the paper provides an answer to why this is the case. ■

The problem to be solved is why in the first example cointegration was found between the extracted trends and their estimators, and in the second example they do not cointegrate. The solution to the problem is that it depends on the way the trends and their coefficients are identified. For some identification schemes the estimator of B is n-consistent, and then stationarity of

E_{t} T_{t} - {\hat{E}}_{t} T_{t}

can be proved. But if identification is achieved by imposing restrictions also on the covariance of the trends, as in Example 2, then the estimator for B is only

n^{1 / 2}

-consistent, and that is not enough to get asymptotic stationarity of

E_{t} T_{t} - {\hat{E}}_{t} T_{t}

.

2.2. Formulation of the Model as a Common Trend State Space Model

The common trend state space model with constant coefficients is defined by

\begin{matrix} α_{t + 1} = α_{t} + η_{t}, \\ y_{t} = B α_{t} + ε_{t}, \end{matrix}

(7)

t = 1, \dots, n

, see Durbin and Koopman (2012) or Harvey (1989), with initial state

α_{1}

. Here

α_{t}

is the unobserved m−dimensional state variable and

y_{t}

the p−dimensional observation and B is

p \times m

of rank

m < p

. The errors

ε_{t}

and

η_{t}

are as specified in the discussion of the model given by (1) and (2).

Defining

T_{t} = α_{t + 1}, t = 0, \dots, n

, gives the model (1) and (2). Note that in this notation

E_{t} T_{t} = E_{t} α_{t + 1}

is the predicted value of the trend

α_{t + 1}

, which means that it is easy to formulate the Kalman filter.

The Kalman filter calculates the prediction

a_{t + 1} = E_{t} α_{t + 1}

and its conditional variance

P_{t + 1} = V a r_{t} (α_{t + 1})

by the equations

\begin{matrix} a_{t + 1} & = a_{t} + P_{t} B^{'} {(B P_{t} B^{'} + Ω_{ε})}^{- 1} (y_{t} - E_{t - 1} (y_{t})), \end{matrix}

(8)

\begin{matrix} P_{t + 1} & = P_{t} + Ω_{η} - P_{t} B^{'} {(B P_{t} B^{'} + Ω_{ε})}^{- 1} B P_{t}, \end{matrix}

(9)

starting with

a_{1} = α_{1}

and

P_{1} = 0

.

The recursions (8) and (9) become

\begin{matrix} E_{t + 1} T_{t + 1} & = E_{t} T_{t} + K_{t}^{'} (y_{t + 1} - E_{t} y_{t + 1}), \end{matrix}

(10)

\begin{matrix} V_{t + 1} & = Ω_{η} + V_{t} - K_{t}^{'} B V_{t}, \end{matrix}

(11)

t = 0, \dots, n - 1

starting with

E_{1} T_{1} = T_{0}

and

V_{1} = Ω_{η}

, and defining the Kalman gain

K_{t}^{'} = V_{t} B^{'} {(B V_{t} B^{'} + Ω_{ε})}^{- 1} .

(12)

Lemma 1 contains the result that

V_{t + 1}

converges for

t \to \infty

to a finite limit V, which can be calculated by solving an eigenvalue problem. Equation (11) is an algebraic Ricatti equation, see Chan et al. (1984), where the convergence result can be found. The recursion (10) is used to represent

y_{t + 1}

in terms of its cumulated prediction errors

v_{t + 1} = y_{t + 1} - E_{t} y_{t + 1}

, as noted by Harvey (2006, Section 7.3.2).

Lemma 1.

Let

V_{t} = V a r (T_{t} | Y_{t})

and

E_{t} T_{t} = E (T_{t} | Y_{t})

.

(a) The recursion for

V_{t}

, (11), can be expressed as

V_{t + 1} = Ω_{η} + V_{t} - V_{t} {(V_{t} + Ω_{B})}^{- 1} V_{t} \to V, t \to \infty,

(13)

where

Ω_{B} = V a r ({\bar{B}}^{'} ε_{t} | B_{⊥}^{'} ε_{t})

for

\bar{B} = B {(B^{'} B)}^{- 1}

. Moreover,

I_{m} - K_{t}^{'} B = I_{m} - V_{t} B^{'} {(B V_{t} B^{'} + Ω_{ε})}^{- 1} B \to I_{m} - K^{'} B = Ω_{B} {(V + Ω_{B})}^{- 1}, t \to \infty,

(14)

which has positive eigenvalues less than one, such that

I_{m} - K^{'} B

is a contraction, that is,

{(I_{m} - K^{'} B)}^{n} \to 0

,

n \to \infty

.

(b) The limit of

V_{t}

can be found by solving the eigenvalue problem

| λ Ω_{B} - Ω_{η} | = 0,

for eigenvectors W and eigenvalues

(λ_{1}, \dots, λ_{m})

, such that

W^{'} Ω_{B} W = I_{m}

and

W^{'} Ω_{η} W = d i a g (λ_{1}, \dots, λ_{m})

. Hence,

W^{'} V W = d i a g (τ_{1}, \dots, τ_{m})

for

τ_{i} = \frac{1}{2} {λ_{i} + {(λ_{i}^{2} + 4 λ_{i})}^{1 / 2}} .

(15)

(c) Finally, using the prediction error,

v_{t + 1} = y_{t + 1} - E_{t} y_{t + 1}

, it is found from (10) that

E_{t} T_{t} = T_{0} + \sum_{j = 1}^{t} K_{j - 1}^{'} v_{j}, a n d y_{t + 1} = v_{t + 1} + B (T_{0} + \sum_{j = 1}^{t} K_{j - 1}^{'} v_{j}) .

(16)

The prediction errors are independent Gaussian with mean zero and variances

V a r (v_{t + 1}) = V a r_{t} (y_{t + 1}) = V a r_{t} (B T_{t} + ε_{t + 1}) = B V_{t} B^{'} + Ω_{ε} \to B V B^{'} + Ω_{ε}, t \to \infty,

such that in steady state the prediction errors are i.i.d.

N_{p} (0, B V B^{'} + Ω_{ε})

, and (16) shows that

y_{t}

is approximately an

A R (\infty)

process, for which the reduced form autoregressive representation can be found, see (Harvey 2006, Section 7.3.2).

Lemma 2.

If the system (7) is in steady state, prediction errors

v_{t}

are i.i.d.

N (0, B V B^{'} + Ω_{ε})

and

Δ y_{t} = Δ v_{t} + B K^{'} v_{t - 1} .

(17)

Applying the Granger Representation Theorem,

y_{t}

is given by

Δ y_{t} = α β^{'} y_{t - 1} + \sum_{i = 1}^{\infty} Γ_{i} Δ y_{t - i} + v_{t} .

(18)

Here

α = - K_{⊥} {(B_{⊥}^{'} K_{⊥})}^{- 1}

and

β = B_{⊥}

.

2.3. Cointegration among the Observations and Trends

In model (1) and (2), the equation

y_{t + 1} = B T_{t} + ε_{t + 1}

shows that

y_{t}

and

T_{t}

are cointegrated. It also holds that

T_{t} - E_{t} T_{t}

is asymptotically stationary because

v_{t + 1} = y_{t + 1} - E_{t} y_{t + 1} = B T_{t} + ε_{t + 1} - B E_{t} T_{t},

which shows that

B (T_{t} - E_{t} T_{t}) = v_{t + 1} - ε_{t + 1}

is asymptotically stationary. Multiplying by

{\bar{B}}^{'} = {(B^{'} B)}^{- 1} B^{'}

, the same holds for

T_{t} - E_{t} T_{t}

.

In model (18) the extracted trend is

T_{t}^{*} = α_{⊥}^{'} \sum_{i = 1}^{t} v_{i} = K^{'} \sum_{i = 1}^{t} v_{i},

and (16) shows that in steady state,

y_{t + 1} - B T_{t}^{*} = v_{t + 1} + B T_{0}

is stationary, so that

y_{t}

cointegrates with

T_{t}^{*}

. Thus, the process

y_{t}

and the trends

T_{t}

,

T_{t}^{*}

, and

E_{t} T_{t}

all cointegrate, in the sense that suitable linear combinations are asymptotically stationary. The next section investigates when similar results hold for the estimated trends.

3. Statistical Analysis of the Data

In this section it is shown how the parameters of (7) can be estimated from the CVAR (18) using results of Saikkonen (1992) and Saikkonen and Lutkepohl (1996), or using a simple cointegrating regression, see (Harvey and Koopman 1997, p. 276) as discussed in Lemma 3. For both the state space model (1)–(2) and for the CVAR in (18) there is an identification problem between

T_{t}

and its coefficient B, or between

β_{⊥}

and

T_{t}^{*}

, because for any

m \times m

matrix

ξ

of full rank, one can use

B ξ^{- 1}

as parameter and

ξ T_{t}

as trend and

ξ Ω_{η} ξ^{'}

as variance, and similarly for

β_{⊥}

and

T_{t}^{*}

. In order to estimate B, T, and

Ω_{η}

, it is therefore necessary to impose identifying restrictions. Examples of such identification are given next.

Identification 1.

Because B has rank m, the rows can be permuted such that

B^{'} = (B_{1}^{'}, B_{2}^{'})

, where

B_{1}

is

m \times m

and has full rank. Then the parameters and trend are redefined as

B^{†} = (\begin{matrix} I_{m} \\ B_{2} B_{1}^{- 1} \end{matrix}) = (\begin{matrix} I_{m} \\ γ^{'} \end{matrix}), Ω_{η}^{†} = B_{1} Ω_{η} B_{1}^{'}, T_{t}^{†} = B_{1} T_{t} .

(19)

Note that

B^{†} T_{t}^{†} = B T_{t}

and

B^{†} Ω_{η}^{†} B^{†'} = B Ω_{η} B^{'}

. This parametrization is the simplest which separates parameters that are n-consistently estimated,

γ

, from those that are

n^{1 / 2}

-consistently estimated,

(Ω_{η}, Ω_{ε})

, see Lemma 3. Note that the (correlated) trends are redefined by choosing

T_{1 t}

as the trend in

y_{1 t}

, then

T_{2 t}

as the trend in

y_{2 t}

, as in Example 1.

A more general parametrization, which also gives n-consistency, is defined, as in simultaneous equations, by imposing linear restrictions on each of the m columns and require the identification condition to hold, see Fisher (1966). ■

Identification 2.

The normalization with diagonality of

Ω_{η}^{†}

is part of the next identification, because this is the assumption in the discussion of long-run causality. Let

Ω_{η}^{†} = C_{η} d i a g (σ_{1}^{2}, \dots, σ_{m}^{2}) C_{η}^{'}

be a Cholesky decomposition of

Ω_{η}

. That is,

C_{η}

is lower-triangular with one in the diagonal, corresponding to an ordering of the variables. Using this decomposition the new parameters and the trend are

B^{#} = (\begin{matrix} C_{η} \\ γ^{'} C_{η} \end{matrix}), Ω_{η}^{#} = d i a g (σ_{1}^{2}, \dots, σ_{m}^{2}), T_{t}^{#} = C_{η}^{- 1} T_{t}^{#},

(20)

such that

B^{#} T_{t}^{#} = B^{†} T_{t}^{†} = B T_{t}

and

B^{#} Ω_{η}^{#} B^{#'} = B^{†} Ω_{η}^{†} B^{†'} = B Ω_{η} B^{'}

.

Identification of the trends is achieved in this case by defining the trends to be independent and constrain how they load into the observations. In Example 2,

T_{1 t}

was defined as the trend in

y_{1 t}

, and

T_{2 t}

as the trend in

y_{2 t}

, but orthogonalized on

T_{1 t}

, such that the trend in

y_{2 t}

is a combination of

T_{1 t}

and

T_{2 t}

. ■

3.1. The Vector Autoregressive Model

When the process is in steady state, the infinite order CVAR representation is given in (18). The model is approximated by a sequence of finite lag models, depending on sample size n,

Δ y_{t} = α β^{'} y_{t - 1} + \sum_{i = 1}^{k_{n}} Γ_{i} Δ y_{t - i} + v_{t},

where the lag length

k_{n}

is chosen to depend on n such that

k_{n}

increases to infinity with n, but so slowly that

k_{n}^{3} / n

converges to zero. Thus one can choose for instance

k_{n} = n^{1 / 3} / log n

or

k_{n} = n^{1 / 3 - ϵ}

, for some

ϵ > 0

. With this choice of asymptotics, the parameters

α

,

β

,

Γ = I_{p} - \sum_{i = 1}^{\infty} Γ_{i}

,

Σ = V a r (v_{t})

, and the residuals,

v_{t}

, can be estimated consistently, see Johansen and Juselius (2014) for this application of the results of Saikkonen and Lutkepohl (1996).

This defines for each sample size consistent estimators

\overset{˘}{α}

,

\overset{˘}{β}

,

\overset{˘}{Γ}

and

\overset{˘}{Σ}

, as well residuals

{\overset{˘}{v}}_{t}

. In particular the estimator of the common trend is

{\overset{˘}{T}}_{t}^{*} = {\overset{˘}{α}}_{⊥}^{'} \sum_{i = 1}^{t} {\overset{˘}{v}}_{i}

. Thus,

\overset{˘}{α} {\overset{˘}{β}}^{'} \overset{P}{\to} α β^{'}

,

\overset{˘}{C} = {\overset{˘}{β}}_{⊥} {({\overset{˘}{α}}_{⊥}^{'} \overset{˘}{Γ} {\overset{˘}{β}}_{⊥})}_{⊥}^{- 1} {\overset{˘}{α}}_{⊥}^{'} \overset{P}{\to} C = {B K}^{'}

and

\overset{˘}{Σ} \overset{P}{\to} Σ = {B V B}^{'} + Ω_{ε}

. If

β_{⊥}

is identified as

{(I_{m}, γ)}^{'}

, then

\overset{˘}{B} = {\overset{˘}{β}}_{⊥} \overset{P}{\to} β_{⊥}

. In steady state, the relations

\begin{matrix} Ω_{η} & = V B^{'} {(B V B^{'} + Ω_{ε})}^{- 1} B^{'} V = V B^{'} Σ^{- 1} B^{'} V, \\ C & = B K^{'} = B V B^{'} {(B V B^{'} + Ω_{ε})}^{- 1} = B V B^{'} Σ^{- 1}, \end{matrix}

hold, see (11) and Lemma 2. It follows that

\overset{˘}{B} {\overset{˘}{Ω}}_{η} {\overset{˘}{B}}^{'} = \overset{˘}{C} \overset{˘}{Σ} {\overset{˘}{C}}^{'} \overset{P}{\to} B Ω_{η} B^{'}, and {\overset{˘}{Ω}}_{η} = {({\overset{˘}{B}}^{'} \overset{˘}{B})}^{- 1} {\overset{˘}{B}}^{'} \overset{˘}{C} \overset{˘}{Σ} {\overset{˘}{C}}^{'} \overset{˘}{B} {({\overset{˘}{B}}^{'} \overset{˘}{B})}^{- 1} \overset{P}{\to} Ω_{η} .

Finally, an estimator for

Ω_{ε}

can be found as

{\overset{˘}{Ω}}_{ε} = \overset{˘}{Σ} - \frac{1}{2} (\overset{˘}{C} \overset{˘}{Σ} + \overset{˘}{Σ} {\overset{˘}{C}}^{'}) \overset{P}{\to} B V B^{'} + Ω_{ε} - \frac{1}{2} (B V B^{'} + B V B^{'}) = Ω_{ε} .

Note that

C Σ

is not a symmetric matrix in model (18), but its estimator converges in probability towards the symmetric matrix

B V B^{'}

.

3.2. The State Space Model

The state space model is defined by (1) and (2). It can be analysed using the Kalman filter to calculate the diffuse likelihood function, see Durbin and Koopman (2012), and an optimizing algorithm can be used to find the maximum likelihood estimator for the parameters

Ω_{η}

,

Ω_{ε}

, and B, once B is identified.

In this paper, an estimator is used which is simpler to analyse and which gives an n-consistent estimator for B suitably normalized, see (Harvey and Koopman 1997, p. 276).

The estimators are functions of

Δ Y_{n}

and

B_{⊥}^{'} Y_{n}

, and therefore do not involve the initial value

T_{0}

. Irrespective of the identification, the relations

\begin{matrix} V a r (Δ y_{t}) & = B Ω_{η} B^{'} + 2 Ω_{ε}, \end{matrix}

(21)

\begin{matrix} C o v (Δ y_{t}, Δ y_{t + 1}) & = - Ω_{ε}, \end{matrix}

(22)

hold, and they gives rise to two moment estimators, which determine

Ω_{η}

and

Ω_{ε}

, once B has been identified and estimated.

Consider the identified parametrization (19), where

B = {(I_{m}, γ)}^{'}

, and take

B_{⊥} = {(γ^{'}, - I_{p - m})}^{'}

. Then define

z_{1 t} = {(y_{1 t}, \dots, y_{m t})}^{'}

and

z_{2 t} = {(y_{m + 1, t}, \dots, y_{p t})}^{'}

, such that

y_{t} = {(z_{1 t}^{'}, z_{2 t}^{'})}^{'}

and

B_{⊥}^{'} y_{t} = γ^{'} z_{1 t} - z_{2 t} = B_{⊥}^{'} ε_{t}

, that is,

z_{2 t} = γ^{'} z_{1 t} - B_{⊥}^{'} ε_{t} .

(23)

This equation defines the regression estimator

{\hat{γ}}_{r e g}

:

{\hat{γ}}_{r e g} = {(\sum_{t = 0}^{n - 1} z_{1 t} z_{1 t}^{'})}^{- 1} \sum_{t = 0}^{n - 1} z_{1 t} z_{2 t}^{'} = γ - {(\sum_{t = 0}^{n - 1} z_{1 t} z_{1 t}^{'})}^{- 1} \sum_{t = 0}^{n - 1} z_{1 t} ε_{t}^{'} B_{⊥} .

(24)

To describe the asymptotic properties of

{\hat{γ}}_{r e g}

, two Brownian motions are introduced

n^{- 1 / 2} \sum_{t = 1}^{[n u]} ε_{t} \overset{D}{\to} W_{ε} (u) and n^{- 1 / 2} \sum_{t = 1}^{[n u]} η_{t} \overset{D}{\to} W_{η} (u) .

(25)

Lemma 3.

Let the data be generated by the state space model (1) and (2).

(a) From (21) and (22) it follows that

\begin{matrix} S_{n 1} = n^{- 1} \sum_{i = 1}^{n} Δ y_{t} Δ y_{t}^{'} \overset{P}{\to} B Ω_{η} B^{'} + 2 Ω_{ε}, \\ S_{n 2} = n^{- 1} \sum_{i = 2}^{n} (Δ y_{t} Δ y_{t - 1}^{'} + Δ y_{t - 1} Δ y_{t}^{'}) \overset{P}{\to} - 2 Ω_{ε}, \end{matrix}

(26)

define

n^{1 / 2}

-consistent asymptotically Gaussian estimators for

B Ω_{η} B^{'}

and

Ω_{ε}

, irrespective of the identification of B.

(b) If B and

B_{⊥}

are identified as

B = {(I_{m}, γ)}^{'}

,

B_{⊥}^{'} = (γ^{'}, - I_{p - m})

, and

Ω_{η}

is adjusted accordingly, then

{\hat{γ}}_{r e g}

in (24) is n-consistent with asymptotic Mixed Gaussian distribution

\begin{matrix} n ({\hat{γ}}_{r e g} - γ) & = - n {(\hat{B} - B)}^{'} B_{⊥} = n B^{'} ({\hat{B}}_{⊥} - B_{⊥}) \\ \overset{D}{\to} - {(\int_{0}^{1} W_{η} W_{η}^{'} d u)}^{- 1} \int_{0}^{1} W_{η} {(d W_{ε})}^{'} B_{⊥} . \end{matrix}

(27)

(c) If B is identified as

B = {(C_{η}^{'}, C_{η}^{'} γ)}^{'}

,

B_{⊥}^{'} = (γ^{'}, - I_{p - m})

, and

Ω_{η} = d i a g (σ_{1}^{2}, \dots, σ_{m}^{2})

, then

\hat{B} - B = O_{P} (n^{- 1 / 2})

, but (27) still holds for

- n {(\hat{B} - B)}^{'} B_{⊥} = {\hat{C}}_{η}^{'} ({\hat{γ}}_{r e g} - γ)

, so that some linear combinations of

\hat{B}

are

n -

consistent.

Note that the parameters

B = {(I_{m}, γ)}^{'}

,

Ω_{η}

, and

Ω_{ε}

can be estimated consistently from (24) and (26) by

\hat{B} = (\begin{matrix} I_{m} \\ {\hat{γ}}_{r e g}^{'} \end{matrix}), {\hat{Ω}}_{ε} = - \frac{1}{2} S_{n 2}, and {\hat{Ω}}_{η} = {({\hat{B}}^{'} \hat{B})}^{- 1} {\hat{B}}^{'} (S_{n 1} + S_{n 2}) \hat{B} {({\hat{B}}^{'} \hat{B})}^{- 1} .

(28)

In the simulations of Examples 1 and 2 the initial value is

T_{0} = 0

, so the Kalman filter with

T_{0} = 0

is used to calculate the extracted trend

E_{t} T_{t}

using observations and known parameters. Similarly the estimator of the extracted trend

{\hat{E}}_{t} T_{t}

is calculated using observations and estimated parameters based on Lemma 3. The next section investigates to what extent these estimated trends cointegrate with the extracted trends, and if they cointegrate with each other.

4. Cointegration between Trends and Their Estimators

This section gives the main results in two theorems with proofs in the Appendix. In Theorem 1 it is shown, using the state space model to extract the trends and the estimator from Lemma 3, that

B E_{t} T_{t} - \hat{B} {\hat{E}}_{t} T_{t}

is asymptotically stationary. For the CVAR model it holds that

B T_{t}^{*} - \overset{˘}{B} {\overset{˘}{T}}_{t}^{*} \overset{P}{\to} 0

, such that this spread is asymptotically stationary. Finally, the estimated trends in the two models are compared, and it is shown that

\hat{B} {\hat{E}}_{t} T_{t} - \overset{˘}{B} {\overset{˘}{T}}_{t}^{*}

is asymptotically stationary. The conclusion is that in terms of cointegration of the trends and their estimators, it does not matter which model is used to extract the trends, as long as the focus is on the identified trends

B T_{t}

and

B T_{t}^{*}

.

Theorem 1.

Let

y_{t}

and

T_{t}

be generated by the DGP given in (1) and (2).

(a) If the state space model is used to extract the trends, and Lemma 3 is used for estimation, then

B E_{t} T_{t} - \hat{B} {\hat{E}}_{t} T_{t}

is asymptotically stationary.

(b) If the vector autoregressive model is used to extract the trends and for estimation, then.

B T_{t}^{*} - \overset{˘}{B} {\overset{˘}{T}}_{t}^{*} \overset{P}{\to} 0

.

(c) Under assumptions of (a) and (b), it holds that

\hat{B} {\hat{E}}_{t} T_{t} -

\overset{˘}{B} {\overset{˘}{T}}_{t}^{*}

is asymptotically stationary.

In Theorem 2 a necessary and sufficient condition for asymptotic stationarity of

T_{t}^{*} - {\overset{˘}{T}}_{t}^{*}

,

E_{t} T_{t} - {\hat{E}}_{t} T_{t}

, and

{\hat{E}}_{t} T_{t} - {\overset{˘}{T}}_{t}^{*}

is given.

Theorem 2.

In the notation of Theorem 1, any of the spreads

T_{t}^{*} - {\overset{˘}{T}}_{t}^{*}

,

E_{t} T_{t} - {\hat{E}}_{t} T_{t}

or

{\hat{E}}_{t} T_{t} - {\overset{˘}{T}}_{t}^{*}

is asymptotically stationary if and only if B and the trend are identified such that the corresponding estimator for B satisfies

n^{1 / 2} (\hat{B} - B) = o_{P} (1)

and

n^{1 / 2} (\overset{˘}{B} - B) = o_{P} (1)

.

The missing cointegration between

E_{t} T_{t}

and

{\hat{E}}_{t} T_{t}

, say, can be explained in terms of the identity

\hat{B} (E_{t} T_{t} - {\hat{E}}_{t} T_{t}) = (\hat{B} - B) E_{t} T_{t} + (B E_{t} T_{t} - \hat{B} {\hat{E}}_{t} T_{t}) .

Here the second term,

B E_{t} T_{t} - \hat{B} {\hat{E}}_{t} T_{t}

, is asymptotically stationary by Theorem 1

(a)

. But the first term,

(\hat{B} - B) T_{t}

, is not necessarily asymptotically stationary, because in general, that is, depending on the identification of the trend and B, it holds that

\hat{B} - B = O_{P} (n^{- 1 / 2})

and

E_{t} T_{t} = O_{P} (n^{1 / 2})

, see (16).

The parametrization

B = {(I_{m}, γ)}^{'}

ensures n-consistency of

\hat{B}

, so there is asymptotic stationarity of

T_{t}^{*} - {\overset{˘}{T}}_{t}^{*}

,

E_{t} T_{t} - {\hat{E}}_{t} T_{t}

, and

{\hat{E}}_{t} T_{t} -

{\overset{˘}{T}}_{t}^{*}

in this case. This is not so surprising because

B E_{t} T_{t} - \hat{B} {\hat{E}}_{t} T_{t} = (\begin{matrix} E_{t} T_{t} - {\hat{E}}_{t} T_{t} \\ γ^{'} E_{t} T_{t} - {\hat{γ}}^{'} {\hat{E}}_{t} T_{t} \end{matrix}),

is stationary. Another situation where the estimator for B is n-consistent is if

B = (B_{1}, \dots, B_{m})

satisfies linear restriction of the columns,

R_{i}^{'} B_{i} = 0

, or equivalently

B_{i} = R_{i ⊥} ϕ_{i}

for some

ϕ_{i}

, and the condition for identification is satisfied

r a n k {R_{i}^{'} (R_{1 ⊥} ϕ_{1}, \dots, R_{m ⊥} ϕ_{m})} = r - 1, for i = 1, \dots, m,

(29)

see Fisher (1966). For a just-identified system, one can still use

{\hat{γ}}_{r e g}

, and then solve for the identified parameters. For overidentified systems, the parameters can be estimated by a nonlinear regression of

z_{2 t}

on

z_{1 t}

reflecting the overidentified parametrization. In either case the estimator is n-consistent such that

T_{t}^{*} - {\overset{˘}{T}}_{t}^{*}

,

E_{t} T_{t} - {\hat{E}}_{t} T_{t}

, and

{\hat{E}}_{t} T_{t} - {\overset{˘}{T}}_{t}^{*}

are asymptotically stationary.

If the identification involves the variance

Ω_{η}

, however, the estimator of B is only

n^{1 / 2}

-consistent, and hence no cointegration is found between the trend and estimated trend.

The analogy with the results for the CVAR, where

β

and

α

need to be identified, is that if

β

is identified using linear restrictions (29) then

\hat{β}

is n-consistent, whereas if

β

is identified by restrictions on

α

then

β

is

n^{1 / 2}

-consistent. An example of the latter is if

β

is identified as the first m rows of the matrix

Π = α β^{'}

, corresponding to

α = {(I_{m}, ϕ)}^{'}

, then

\hat{β}

is

n^{1 / 2}

-consistent and asymptotically Gaussian, see (Johansen 2010, Section 4.3).

5. A Small Simulation Study

The two examples introduced in Section 2.1 are analysed by simulation. The equations are given in (5) and (3). Both examples have

p = 3

and

m = 2

. The parameters B and

Ω_{η}

contain

6 + 3

parameters, but the

3 \times 3

matrix

B Ω_{η} B^{'}

is of rank 2 and has only 5 estimable parameters. Thus, 4 restrictions must be imposed to identify the parameters. In both examples the Kalman filter with

T_{0} = 0

is used to extract the trends, and the cointegrating regression in Lemma 3 is used to estimate the parameters.

Example 1 continued.

The parameter B is given in (4), and the parameters are just-identified. Now

E_{t} B T_{1 t} - {\hat{E}}_{t} \hat{B} T_{1 t} = (\begin{matrix} E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t} \\ E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t} \\ b_{31} E_{t} T_{1 t} + b_{32} E_{t} T_{2 t} - {\hat{b}}_{31} {\hat{E}}_{t} T_{1 t} - {\hat{b}}_{32} {\hat{E}}_{t} T_{2 t} \end{matrix}) .

(30)

As

E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t}

and

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

are the first two rows of

E_{t} B T_{1 t} - {\hat{E}}_{t} \hat{B} T_{1 t}

in (30), they are both asymptotically stationary by Theorem 1(a).

To illustrate the results, data are simulated with

n = 100

observations starting with

T_{0} = 0

and parameter values

b_{31} = b_{32} = 0.5

,

σ_{1}^{2} = σ_{2}^{2} = 1

, and

σ_{12} = 0

, such that

B = (\begin{matrix} 1 & 0 \\ 0 & 1 \\ 0.5 & 0.5 \end{matrix}), Ω_{η} = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}) .

(31)

The parameters are estimated by (28) and the estimates become

{\hat{b}}_{31} = 0.48

,

{\hat{b}}_{32} = 0.41

,

{\hat{σ}}_{1}^{2} = 0.93

,

{\hat{σ}}_{12} = 0.26

, and

{\hat{σ}}_{2}^{2} = 1.63

. The extracted and estimated trends are plotted in Figure 1. Panels a and b show plots of

(E_{t} T_{1 t}, {\hat{E}}_{t} T_{1 t})

and

(E_{t} T_{2 t}, {\hat{E}}_{t} T_{2 t})

, respectively, and it is seen that they co-move. In panels c and d the differences

{\hat{E}}_{t} T_{1 t} - E_{t} T_{1 t}

and

{\hat{E}}_{t} T_{2 t} - E_{t} T_{2 t}

both appear to be stationary in this parametrization of the model. ■

Example 2 continued.

The parameter B in this example is given in (6) such that

E_{t} B T_{t} - {\hat{E}}_{t} \hat{B} T_{t} = (\begin{matrix} E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t} \\ b_{21} E_{t} T_{1 t} + E_{t} T_{2 t} - {\hat{b}}_{21} {\hat{E}}_{t} T_{1 t} - {\hat{E}}_{t} T_{2 t} \\ b_{31} E_{t} T_{1 t} + b_{32} E_{t} T_{2 t} - {\hat{b}}_{31} {\hat{E}}_{t} T_{1 t} - {\hat{b}}_{32} {\hat{E}}_{t} T_{2 t} \end{matrix}) .

(32)

By the results in Theorem 1(a), all three rows are asymptotically stationary, in particular

E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t}

. Moreover, the second row of (32),

(b_{21} E_{t} T_{1 t} - {\hat{b}}_{21} {\hat{E}}_{t} T_{1 t}) + (E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t})

, is asymptotically stationary. Thus, asymptotic stationarity of

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

requires asymptotic stationary of the term

b_{21} E_{t} T_{1 t} - {\hat{b}}_{21} {\hat{E}}_{t} T_{1 t} = (b_{21} - {\hat{b}}_{21}) E_{t} T_{1 t} + {\hat{b}}_{21} (E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t}) .

(33)

Here, the second term,

{\hat{b}}_{21} (E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t})

, is asymptotically stationary because

E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t}

is. However, the first term,

(b_{21} - {\hat{b}}_{21}) E_{t} T_{1 t}

, is not asymptotically stationary because

{\hat{b}}_{21}

is

n^{1 / 2}

-consistent. In this case

n^{1 / 2} (b_{21} - {\hat{b}}_{21}) \overset{D}{\to} Z

, which has a Gaussian distribution, and

n^{- 1 / 2} E_{[n u]} T_{1 [n u]} \overset{D}{\to} W_{η_{1}} (u)

, where

W_{η_{1}}

is the Brownian motion generated by the sum of

η_{1 t} .

It follows that their product

(b_{21} - {\hat{b}}_{21}) E_{[n u]} T_{1 [n u]} = {n^{1 / 2} (b_{21} - {\hat{b}}_{21})} {n^{- 1 / 2} E_{[n u]} T_{1 [n u]}}

converges in distribution to the product of Z and

W_{η_{1}} (u)

,

n \to \infty

, and this limit is nonstationary. It follows that

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

is not asymptotically stationary for the identification in this example. This argument is a special case of the proof of Theorem 2.

To illustrate the results, data are simulated from the model with

n = 100

observations starting with

T_{0} = 0

and parameter values

b_{21} = 0.0

,

b_{31} = b_{32} = 0.5

, and

σ_{1}^{2} = σ_{2}^{2} = 1

, which is identical to (31).

The model is written in the form (19) with a transformed B and

Ω_{η},

as

B^{†} = (\begin{matrix} 1 & 0 \\ 0 & 1 \\ b_{31} - b_{32} b_{21} & b_{32} \end{matrix}), Ω_{η}^{†} = (\begin{matrix} σ_{1}^{2} & b_{21} σ_{1}^{2} \\ b_{21} σ_{1}^{2} & σ_{2}^{2} + b_{21}^{2} σ_{1}^{2} \end{matrix}) .

The parameters are estimed as in Example 1 and we find

{\hat{b}}_{31} - {\hat{b}}_{32} {\hat{b}}_{21} = 0.48

,

{\hat{b}}_{32} = 0.41

,

{\hat{σ}}_{1}^{2} = 0.93

,

{\hat{b}}_{21} {\hat{σ}}_{12} = 0.26

, and

{\hat{σ}}_{2}^{2} + {\hat{b}}_{21}^{2} {\hat{σ}}_{1}^{2} = 1.63

, which are solved for

{\hat{b}}_{21} = 0.28

,

{\hat{b}}_{31} = 0.59

,

{\hat{b}}_{32} = 0.41

,

{\hat{σ}}_{1}^{2} = 0.93

, and

{\hat{σ}}_{2}^{2} = 1.56

. The extracted and estimated trends are plotted in Figure 2. The panels a and b show plots of

(E_{t} T_{1 t}, {\hat{E}}_{t} T_{1 t})

and

(E_{t} T_{2 t}, {\hat{E}}_{t} T_{2 t})

, respectively. It is seen that

E_{t} T_{1 t}

and

{\hat{E}}_{t} T_{1 t}

co-move, whereas

E_{t} T_{2 t}

and

{\hat{E}}_{t} T_{2 t}

do not co-move. In panels c and d, the differences

E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t}

and

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

are plotted. Note that the first looks stationary, whereas the second is clearly nonstationary. When comparing with the plot of

E_{t} T_{1 t}

in panel a, it appears that the process

{\hat{E}}_{t} T_{1 t}

can explain the nonstationarity of

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

. This is consistent with Equation (33) with

b_{21} = 0

and

{\hat{b}}_{21} = 0.28

. In panel d,

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t} - 0.28 {\hat{E}}_{t} T_{1 t}

is plotted and it is indeed stationary. ■

6. Conclusions

The paper analyses a sample of n observations from a common trend model, where the state is an unobserved multivariate random walk and the observation is a linear combination of the lagged state variable and a noise term. For such a model, the trends and their coefficients in the observation equation need to be identified before they can be estimated separately. The model leads naturally to cointegration between observations, trends, and the extracted trends. Using simulations it was discovered, that the extracted trends do not necessarily cointegrate with their estimators. This problem is investigated, and it is found to be related to the identification of the trends and their coefficients in the observation equation. It is shown in Theorem 1, that provided only the linear combinations of the trends from the observation equation are considered, there is always cointegration between extracted trends and their estimators. If the trends and their coefficients are defined by identifying restrictions, the same result holds if and only if the estimated identified coefficients in the observation equation are consistent at a rate faster than

n^{1 / 2}

. For the causality study mentioned in the introduction, where the components of the unobserved trend are assumed independent, the result has the following implication: For the individual extracted trends to cointegrate with their estimators, overidentifying restrictions must be imposed on the trend’s causal impact coefficients on the observations, such that the estimators of these become super-consistent.

Acknowledgments

S.J. is grateful to CREATES—Center for Research in Econometric Analysis of Time Series (DNRF78), funded by the Danish National Research Foundation. M.N.T. is grateful to the Carlsberg Foundation (grant reference 2013_01_0972). We have benefitted from discussions with Siem Jan Koopman and Eric Hillebrand on state space models and thankfully acknowledge the insightful comments from two anonymous referees.

Author Contributions

S.J. has contributed most of the mathematical derivations. M.N.T. has performed the simulations and posed the problem to be solved.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Lemma 1

Proof of (a): Let

N = (\bar{B}, B_{⊥})

,

\bar{B} = B {(B^{'} B)}^{- 1}

, such that

\begin{matrix} K_{t}^{'} B & = V_{t} B^{'} {[B V_{t} B^{'} + Ω_{ε}]}^{- 1} B = V_{t} B^{'} N {[(N^{'} B V_{t} B^{'} N + N^{'} Ω_{ε} N)]}^{- 1} N^{'} B \\ = V_{t} {(\begin{matrix} I_{m} \\ 0 \end{matrix})}^{'} {(\begin{matrix} V_{t} + {\bar{B}}^{'} Ω_{ε} \bar{B} & {\bar{B}}^{'} Ω_{ε} B_{⊥} \\ B_{⊥}^{'} Ω_{ε} \bar{B} & B_{⊥}^{'} Ω_{ε} B_{⊥} \end{matrix})}^{- 1} (\begin{matrix} I_{m} \\ 0 \end{matrix}) = V_{t} {(V_{t} + Ω_{B})}^{- 1}, \end{matrix}

where

Ω_{B} = {\bar{B}}^{'} [Ω_{ε} - Ω_{ε} B_{⊥} {(B_{⊥}^{'} Ω_{ε} B_{⊥})}^{- 1} B_{⊥}^{'} Ω_{ε}] \bar{B} = V a r ({\bar{B}}^{'} ε_{t} | B_{⊥}^{'} ε_{t}) .

This proves (13) and (14).

Proof of (b): If the recursion starts with

V_{1} = Ω_{η}

, then

V_{t}

can be diagonalized by W for all t and the limit satisfies

W^{'} V W = d i a g (τ_{1}, \dots, τ_{m})

, where

τ_{i} = λ_{i} + τ_{i} - \frac{τ_{i}^{2}}{1 + τ_{i}} .

This has solution given in (15).

Proof of (c): The first result follows by summation from the recursion for

E_{t} T_{t}

in (10), and the second from

y_{t + 1} = v_{t + 1} + B E_{t} T_{t}

. ■

Proof of Lemma 2

The polynomial

Φ (z) = I_{p} - z (I_{p} - B K^{'})

describes (17) as

(1 - L) y_{t} = Φ (L) v_{t} .

Note that

Φ (1) = B K^{'}

is singular and

{d Φ (z) / d z |}_{z = 1} = B K^{'} - I_{p} = B V B^{'} {(B V B^{'} + Ω_{ε})}^{- 1} - I_{p}

, satisfies

B_{⊥}^{'} (B K^{'} - I_{p}) K_{⊥} = B_{⊥}^{'} Ω_{ε} B_{⊥}

is nonsingular, where

K_{⊥} = (B V B^{'} + Ω_{ε}) B_{⊥}

. This means that the Granger Representation Theorem (Johansen 1996, Theorem 4.5) can be applied and gives the expansion (18) for

α = - K_{⊥} {(B_{⊥}^{'} K_{⊥})}^{- 1}

and

β = B_{⊥}^{'}

. ■

Proof of Lemma 3

Proof of (a): Consider first the product moments (21) and (22). The result (26) follows from the Law of Large Numbers and the asymptotic Gaussian distribution of

{\hat{Ω}}_{ε} = - \frac{1}{2} S_{n 2}

and

{\hat{Ω}}_{η} = {\hat{\bar{B}}}^{'} (S_{1 n} + S_{n 2}) \hat{\bar{B}}

follows from the Central Limit Theorem.

Proof of (b): It follows from (23), (24), and (25) that the least squares estimator

{\hat{γ}}_{r e g}

satisfies (27). Let

B_{⊥} = {(γ^{'}, - I_{p - m})}^{'}

, then

- {(\hat{B} - B)}^{'} B_{⊥} = {\hat{γ}}_{r e g} - γ = B^{'} ({\hat{B}}_{⊥} - B_{⊥}) .

Proof of (c): Note that for the other parametrization, (20), where

B = {(C_{η}^{'}, C_{η}^{'} γ)}^{'}

, it holds that

B_{⊥} = {(γ^{'}, - I_{p - m})}^{'}

, such that for both parametrizations (27) holds. The estimator of B, in the parametrization (20), is

{\hat{B}}^{'} = ({\hat{C}}_{η}^{'}, {\hat{C}}_{η}^{'} \hat{γ})

, where

{\hat{C}}_{η}

is derived from the

n^{1 / 2}

-consistent estimator of

Ω_{η}

, such that for this parametrization, estimation of B is not n-consistent, but only

n^{1 / 2}

-consistent and

\hat{B} - B = O_{P} (n^{- 1 / 2})

. ■

Proof of Theorem 1.

Proof of (a): Let

w_{t} = B T_{t} - \hat{B} {\hat{E}}_{t} T_{t}

, then

B E_{t} T_{t} - \hat{B} {\hat{E}}_{t} T_{t} = B (E_{t} T_{t} - T_{t}) + (B T_{t} - \hat{B} {\hat{E}}_{t} T_{t}) = B (E_{t} T_{t} - T_{t}) + w_{t} .

(A1)

Here

B (E_{t} T_{t} - T_{t})

is stationary, so that it is enough to show that

w_{t}

is asymptotically stationary. From the definition of

T_{t + 1}

and the Kalman filter recursion (10) calculated for

T_{0} = 0

and for the estimated parameters, it holds that

\begin{matrix} B T_{t + 1} & = B T_{t} + B η_{t + 1}, \\ \hat{B} {\hat{E}}_{t + 1} T_{t + 1} & = \hat{B} {\hat{E}}_{t} T_{t} - \hat{B} {\hat{K}}_{t}^{'} (y_{t + 1} - \hat{B} {\hat{E}}_{t} T_{t}) . \end{matrix}

Subtracting the expressions gives

\begin{matrix} B T_{t + 1} - \hat{B} {\hat{E}}_{t + 1} T_{t + 1} & = B T_{t} + B η_{t + 1} - \hat{B} {\hat{E}}_{t} T_{t} - \hat{B} {\hat{K}}_{t}^{'} (y_{t + 1} - B {\hat{E}}_{t} T_{t}) \\ = B T_{t} - \hat{B} {\hat{E}}_{t} T_{t} - \hat{B} {\hat{K}}_{t}^{'} (B T_{t} + ε_{t + 1} - \hat{B} {\hat{E}}_{t} T_{t}) + B η_{t + 1}, \end{matrix}

which gives the recursion

w_{t + 1} = (I_{p} - \hat{B} {\hat{K}}_{t}^{'}) w_{t} - \hat{B} {\hat{K}}_{t}^{'} ε_{t + 1} + B η_{t + 1} .

(A2)

Note that

(I_{p} - \hat{B} {\hat{K}}_{t}^{'})

is not a contraction, because

p - m

eigenvalues are one. Hence it is first proved that

{\hat{B}}_{⊥}^{'} w_{t}

is small and then a contraction is found for

{\hat{B}}^{'} w_{t}

. From the definition of

w_{t}

, it follows from (27), that

{\hat{B}}_{⊥}^{'} w_{t} = {\hat{B}}_{⊥}^{'} B T_{t} = {\hat{B}}_{⊥}^{'} (B - \hat{B}) T_{t} = - {(B_{⊥} - {\hat{B}}_{⊥})}^{'} \hat{B} T_{t} = O_{P} (n^{- 1}) O_{P} (n^{1 / 2}) = O_{P} (n^{- 1 / 2}) .

Next define

\hat{\bar{B}} = \hat{B} {({\hat{B}}^{'} \hat{B})}^{- 1}

and

{\hat{\bar{B}}}_{⊥} = {\hat{B}}_{⊥} {({\hat{B}}_{⊥}^{'} {\hat{B}}_{⊥})}^{- 1}

, such that

I_{p} = \hat{B} {\hat{\bar{B}}}^{'} + {\hat{B}}_{⊥} {\hat{\bar{B}}}_{⊥}^{'}

. From (A2) it follows by multiplying by

{\hat{\bar{B}}}^{'}

and using

{\hat{\bar{B}}}^{'} B = {\hat{\bar{B}}}^{'} (B - \hat{B}) + I_{m} = I_{m} + O_{P} (n^{- 1 / 2})

, that

\begin{matrix} {\hat{\bar{B}}}^{'} w_{t + 1} & = {(\hat{\bar{B}} - {\hat{K}}_{t})}^{'} w_{t} - {\hat{K}}_{t}^{'} ε_{t + 1} + {\hat{\bar{B}}}^{'} B η_{t + 1} \\ = {(\hat{\bar{B}} - {\hat{K}}_{t})}^{'} (\hat{B} {\hat{\bar{B}}}^{'} + {\hat{B}}_{⊥} {\hat{\bar{B}}}_{⊥}^{'}) w_{t} - {\hat{K}}_{t}^{'} ε_{t + 1} + η_{t + 1} + {\hat{\bar{B}}}^{'} (B - \hat{B}) η_{t + 1} \\ = (I_{m} - {\hat{K}}_{t}^{'} \hat{B}) {\hat{\bar{B}}}^{'} w_{t} - {\hat{K}}_{t}^{'} ε_{t + 1} + η_{t + 1} + O_{P} (n^{- 1 / 2}), \end{matrix}

because

{\hat{\bar{B}}}^{'} (B - \hat{B}) η_{t + 1} = O_{P} (n^{- 1 / 2})

and

{(\hat{\bar{B}} - {\hat{K}}_{t})}^{'} {\hat{B}}_{⊥} {\hat{\bar{B}}}_{⊥}^{'} w_{t} = - {\hat{K}}_{t}^{'} {\hat{B}}_{⊥} {\hat{\bar{B}}}_{⊥}^{'} w_{t} = O_{P} (n^{- 1 / 2})

.

From (14) it is seen that

I_{m} - {\hat{K}}_{t}^{'} \hat{B} \overset{P}{\to} Ω_{B} {(V + Ω_{B})}^{- 1}

and

{(I_{m} - K^{'} B)}^{n} \to 0, n \to \infty

. This shows that

{\hat{\bar{B}}}^{'} w_{t}

and hence

w_{t}

is asymptotically a stationary

A R (1)

process.

Proof of (b): The CVAR (18) is expressed as

Π (L) y_{t} = v_{t}

, and the parameters are estimated using maximum likelihood with lag length

k_{n} \to \infty

and

k_{n}^{3} / n \to 0

. This gives estimators

(\overset{˘}{α}, \overset{˘}{β}, \overset{˘}{Γ}, \overset{˘}{C}, \overset{˘}{Σ})

and residuals

{\overset{˘}{v}}_{t}

. The representation of

y_{t}

in terms of

v_{t}

is given by

y_{t} = C \sum_{i = 1}^{t} v_{i} + \sum_{i = 0}^{\infty} C_{i} v_{t - i} + A,

where

β^{'} A = 0

. This relation also holds for the estimated parameters and residuals, and subtracting one finds

B T_{t}^{*} \sum_{i = 1}^{t} v_{i} - \overset{˘}{B} {\overset{˘}{T}}_{t}^{*} \sum_{i = 1}^{t} {\overset{˘}{v}}_{i} = \sum_{i = 0}^{\infty} {\overset{˘}{C}}_{i} {\overset{˘}{v}}_{t - i} - \sum_{i = 0}^{\infty} C_{i} v_{t - i} - A + \overset{˘}{A} .

It is seen that the right hand side is

o_{P} (1)

and hence asymptotically stationary.

Proof of (c): Each estimated trend is compared with the corresponding trend which gives

\hat{B} {\hat{E}}_{t} T_{t} - \overset{˘}{B} {\overset{˘}{T}}_{t}^{*} = (\hat{B} {\hat{E}}_{t} T_{t} - B T_{t}) + (B T_{t} - B T_{t}^{*}) + (B T_{t}^{*} - \overset{˘}{B} {\overset{˘}{T}}_{t}^{*}) .

Here the first term is asymptotically stationary using Theorem 2(a), the middle term is asymptotically stationary, and the last is

o_{P} (1)

by Theorem 1(b). ■

Proof of Theorem 2.

The proof is the same for all the spreads, so consider

E_{t} T_{t} - {\hat{E}}_{t} T_{t}

, and the identity

{\hat{B}}^{'} (B E_{t} T_{t} - \hat{B} {\hat{E}}_{t} T_{t}) = {\hat{B}}^{'} (B - \hat{B}) E_{t} T_{t} + {\hat{B}}^{'} \hat{B} (E_{t} T_{t} - {\hat{E}}_{t} T_{t}) .

The left hand side is asymptotically stationary by Theorem 1(a) and therefore

E_{t} T_{t} - {\hat{E}}_{t} T_{t}

is asymptotically stationary if and only

{\hat{B}}^{'} (B - \hat{B}) E_{t} T_{t} = [n^{1 / 2} {\hat{B}}^{'} (B - \hat{B})] [n^{- 1 / 2} E_{t} T_{t}],

is asymptotically stationary. Here the second factor converges to a nonstationary process,

n^{- 1 / 2} E_{[n u]} T_{[n u]} = n^{- 1 / 2} E_{0} T_{0} + n^{- 1 / 2} \sum_{j = 2}^{[n u]} K_{j - 1}^{'} v_{j} \overset{D}{\to} W_{v} (u),

see (16), so for the term

[n^{1 / 2} {\hat{B}}^{'} (B - \hat{B})] [n^{- 1 / 2} E_{t} T_{t}]

to be asymptotically stationary it is necessary and sufficient that

n^{1 / 2} {\hat{B}}^{'} (B - \hat{B}) \overset{P}{\to} 0

. ■

References

Chan, Siew Wah, Graham Clifford Goodwin, and Kwai Sang Sin. 1984. Convergence properties of the Ricatti difference equation in optimal filtering of nonstabilizable systems. IEEE Transaction on Automatic Control 29: 110–18. [Google Scholar] [CrossRef]
Durbin, Jim, and Siem Jan Koopman. 2012. Time Series Analysis by State Space Methods, 2nd ed. Oxford: Oxford University Pres. [Google Scholar]
Fisher Frank, M. 1966. The Identification Problem in Econometrics. New York: McGraw-Hill. [Google Scholar]
Harvey, Andrew. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. [Google Scholar]
Harvey, Andrew C. 2006. Forecasting with Unobserved Components Time Series Models. In Handbook of Economic Forecasting. Edited by G. Elliot, C. Granger and A. Timmermann. Amsterdam: North Holland, pp. 327–412. [Google Scholar]
Harvey, Andrew C., and Siem Jan Koopman. 1997. Multivariate structural time series models. In System Dynamics in Economics and Financial Models. Edited by C. Heij, J.M. Schumacher, B. Hanzon and C. Praagman. New York: John Wiley and Sons. [Google Scholar]
Hoover, Kevin D., Søren Johansen, Katarina Juselius, and Morten Nyboe Tabor. 2014. Long-run Causal Order: A Progress Report. Unpublished manuscript. [Google Scholar]
Johansen, Søren. 1996. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models, 2nd ed. Oxford: Oxford University Press. [Google Scholar]
Johansen, Søren. 2010. Some identification problems in the cointegrated vector autoregressive model. Journal of Economietrics 158: 262–73. [Google Scholar] [CrossRef]
Johansen, Søren, and Katarina Juselius. 2014. An asymptotic invariance property of the common trends under linear transformations of the data. Journal of Econometrics 178: 310–15. [Google Scholar] [CrossRef]
Pearl, Judea. 2009. Causality: Models, Reasoning and Inference, 2nd ed. Cambridge: Cambridge University Press. [Google Scholar]
Saikkonen, Pentti. 1992. Estimation and testing of cointegrated systems by an autoregressive approximation. Econometric Theory 8: 1–27. [Google Scholar] [CrossRef]
Saikkonen, Pentti, and Helmut Lütkepohl. 1996. Infinite order cointegrated vector autoregressive processes. Estimation and Inference. Econometric Theory 12: 814–44. [Google Scholar] [CrossRef]
Spirtes, Peter, Clark Glymour, and Richard Scheines. 2000. Causation, Prediction, and Search, 2nd ed. Cambridge: MIT Press. [Google Scholar]

Figure 1. The figure shows the extracted and estimated trends for the simulated data in Example 1 with the identification in (19). Panels a and b show plots of

E_{t} T_{1 t}

and

{\hat{E}}_{t} T_{1 t}

, and

E_{t} T_{2 t}

and

{\hat{E}}_{t} T_{2 t}

, respectively. Note that in both cases, the processes seem to co-move. In panels c and d,

E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t}

and

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

are plotted and appear stationary, because they are both recovered from

B E_{t} T_{t} - \hat{B} {\hat{E}}_{t} T_{t}

as the first two coordinates, see (19).

Figure 1. The figure shows the extracted and estimated trends for the simulated data in Example 1 with the identification in (19). Panels a and b show plots of

E_{t} T_{1 t}

and

{\hat{E}}_{t} T_{1 t}

, and

E_{t} T_{2 t}

and

{\hat{E}}_{t} T_{2 t}

, respectively. Note that in both cases, the processes seem to co-move. In panels c and d,

E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t}

and

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

are plotted and appear stationary, because they are both recovered from

B E_{t} T_{t} - \hat{B} {\hat{E}}_{t} T_{t}

as the first two coordinates, see (19).

Figure 2. The figure shows the extracted and estimated trends for the simulated data in Example 2 with the identification in (20). Panels a and b show plots of

E_{t} T_{1 t}

and

{\hat{E}}_{t} T_{1 t}

, and

E_{t} T_{2 t}

and

{\hat{E}}_{t} T_{2 t}

, respectively. Note that

E_{t} T_{1 t}

and

{\hat{E}}_{t} T_{1 t}

seem to co-move, whereas

E_{t} T_{2 t}

and

{\hat{E}}_{t} T_{2 t}

do not. In panel

c,

E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t}

is plotted and appears stationary, but in panel d the spread

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

is nonstationary, whereas

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t} - 0.28 {\hat{E}}_{t} T_{1 t}

is stationary.

Figure 2. The figure shows the extracted and estimated trends for the simulated data in Example 2 with the identification in (20). Panels a and b show plots of

E_{t} T_{1 t}

and

{\hat{E}}_{t} T_{1 t}

, and

E_{t} T_{2 t}

and

{\hat{E}}_{t} T_{2 t}

, respectively. Note that

E_{t} T_{1 t}

and

{\hat{E}}_{t} T_{1 t}

seem to co-move, whereas

E_{t} T_{2 t}

and

{\hat{E}}_{t} T_{2 t}

do not. In panel

c,

E_{t} T_{1 t} - {\hat{E}}_{t} T_{1 t}

is plotted and appears stationary, but in panel d the spread

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t}

is nonstationary, whereas

E_{t} T_{2 t} - {\hat{E}}_{t} T_{2 t} - 0.28 {\hat{E}}_{t} T_{1 t}

is stationary.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Johansen, S.; Tabor, M.N. Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models. Econometrics 2017, 5, 36. https://doi.org/10.3390/econometrics5030036

AMA Style

Johansen S, Tabor MN. Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models. Econometrics. 2017; 5(3):36. https://doi.org/10.3390/econometrics5030036

Chicago/Turabian Style

Johansen, Søren, and Morten Nyboe Tabor. 2017. "Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models" Econometrics 5, no. 3: 36. https://doi.org/10.3390/econometrics5030036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models

Abstract

1. Introduction and Summary

2. Probability Analysis of the Data Generating Model

2.1. Two Examples

2.2. Formulation of the Model as a Common Trend State Space Model

2.3. Cointegration among the Observations and Trends

3. Statistical Analysis of the Data

3.1. The Vector Autoregressive Model

3.2. The State Space Model

4. Cointegration between Trends and Their Estimators

5. A Small Simulation Study

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI