Next Article in Journal
The Turkish Spatial Wage Curve
Next Article in Special Issue
Short-Term Expectation Formation Versus Long-Term Equilibrium Conditions: The Danish Housing Market
Previous Article in Journal
Building News Measures from Textual Data and an Application to Volatility Forecasting
Previous Article in Special Issue
Using a Theory-Consistent CVAR Scenario to Test an Exchange Rate Model Based on Imperfect Knowledge
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models

by
Søren Johansen
* and
Morten Nyboe Tabor
Department of Economics, University of Copenhagen, Øster Farimagsgade 5, Building 26, 1353 Copenhagen K, Denmark
*
Author to whom correspondence should be addressed.
Econometrics 2017, 5(3), 36; https://doi.org/10.3390/econometrics5030036
Submission received: 1 March 2017 / Revised: 15 August 2017 / Accepted: 17 August 2017 / Published: 22 August 2017
(This article belongs to the Special Issue Recent Developments in Cointegration)

Abstract

:
A state space model with an unobserved multivariate random walk and a linear observation equation is studied. The purpose is to find out when the extracted trend cointegrates with its estimator, in the sense that a linear combination is asymptotically stationary. It is found that this result holds for the linear combination of the trend that appears in the observation equation. If identifying restrictions are imposed on either the trend or its coefficients in the linear observation equation, it is shown that there is cointegration between the identified trend and its estimator, if and only if the estimators of the coefficients in the observation equations are consistent at a faster rate than the square root of sample size. The same results are found if the observations from the state space model are analysed using a cointegrated vector autoregressive model. The findings are illustrated by a small simulation study.

1. Introduction and Summary

This paper is inspired by a study on long-run causality, see Hoover et al. (2014). Causality is usually studied for a sequence of multivariate i.i.d. variables using conditional independence, see Spirtes et al. (2000) or Pearl (2009). For stationary autoregressive processes, causality is discussed in terms of the variance of the shocks, that is, the variance of the i.i.d. error term. For nonstationary cointegrated variables, the common trends play an important role for long-run causality. In Hoover et al. (2014), the concept is formulated in terms of independent common trends and their causal impact coefficients on the nonstationary observations. Thus, the emphasis is on independent trends, and how they enter the observation equations, rather than on the variance of the measurement errors.
The trend is modelled as an m - dimensional Gaussian random walk, starting at T 0 ,
T t + 1 = T t + η t + 1 , t = 0 , , n 1 ,
where η t are i.i.d. N m ( 0 , Ω η ) , that is, Gaussian in m dimensions with mean zero and m × m variance Ω η > 0 . This trend has an impact on future values of the p - dimensional observation y t modelled by
y t + 1 = B T t + ε t + 1 , t = 0 , , n 1 ,
where ε t are i.i.d. N p ( 0 , Ω ε ) and Ω ε > 0 . It is also assumed that the ε s and η t are independent for all s and t. In the following the joint distribution of T 1 , , T n , y 1 , , y n conditional on a given value of T 0 is considered.
The observations are collected in the matrices Y n , p × n , and Δ Y n , p × ( n 1 ) , which are defined as
Y n = ( y 1 , , y n ) , and Δ Y n = ( y 2 y 1 , , y n y n 1 ) .
The processes y t and T t are obviously nonstationary, but the conditional distribution of Y n given T 0 is well defined. We define
E t T t = E ( T t | Y t , T 0 ) , V t = V a r t ( T t ) = V a r ( T t | Y t , T 0 ) .
Then the density of Y n conditional on T 0 is given by the prediction error decomposition
p ( Y n | T 0 ) = p ( y 1 | T 0 ) t = 1 n 1 p ( y t + 1 | Y t , T 0 ) ,
where y t + 1 given ( Y t , T 0 ) is p dimensional Gaussian with mean and variance
E t y t + 1 = B E t T t , V a r t ( y t + 1 ) = B V t B + Ω ε .
In this model it is clear that y t and T t cointegrate, that is, y t + 1 B T t + 1 = ε t + 1 B η t + 1 is stationary, and the same holds for T t and the extracted trend E t T t = E ( T t | y 1 , , y t , T 0 ) . Note that in the statistical model defined by (1) and (2) with parameters B , Ω η , and Ω ε , only the matrices B Ω η B and Ω ε are identified because for any m × m matrix ξ of full rank, B ξ 1 and ξ Ω η ξ give the same likelihood, by redefining the trend as ξ T t .
Let E ^ t T t be an estimator of E t T t . The paper investigates whether there is cointegration between E t T t and E ^ t T t given two different estimation methods: A simple cointegrating regression and the maximum likelihood estimator in an autoregressive representation of the state space model.
Section 2, on the probability analysis of the data generating process, formulates the model as a common trend state space model, and summarizes some results in three Lemmas. Lemma 1 contains the Kalman filter equations and the convergence of V a r ( T t | y 1 , , y t , T 0 ) , see Durbin and Koopman (2012), and shows how its limit can be calculated by solving an eigenvalue problem. Lemma 1 also shows how y t can be represented in terms of its prediction errors v j = y j E j 1 y j , j = 1 , , t . This result is used in Lemma 2 to represent y t in steady state as an infinite order cointegrated vector autoregressive model, see (Harvey 2006, p. 373). Section 3 discusses the statistical analysis of the data and the identification of the trends and their loadings. Two examples are discussed. In the first example, only B is restricted and the trends are allowed to be correlated. In the second example, B is restricted but the trends are uncorrelated, so that also the variance matrix is restricted. Lemma 3 analyses the data from (1) and (2) using a simple cointegrating regression, see Harvey and Koopman (1997), and shows that the estimator of the coefficient B suitably normalized is n-consistent.
Section 4 shows in Theorem 1 that the spread between B E t T t and its estimator B ^ E ^ t T t is asymptotically stationary irrespective of the identification of B and T t . Then Theorem 2 shows that the spread between E t T t and its estimator E ^ t T t is asymptotically stationary if and only if B has been identified so that the estimator of B is superconsistent, that is, consistent at a rate faster than n 1 / 2 .
The findings are illustrated with a small simulation study in Section 5. Data are generated from (1) and (2) with T 0 = 0 , and the observations are analysed using the cointegrating regression discussed in Lemma 3. If the trends and their coefficients are identified by the trends being independent, the trend extracted by the state space model does not cointegrate with its estimator. If, however, the trends are identified by restrictions on the coefficients alone, they do cointegrate.

2. Probability Analysis of the Data Generating Model

This section contains first two examples, which illustrate the problem to be solved. Then a special parametrization of the common trends model is defined and some, mostly known, results are given in Lemmas 1 concerning the Kalman filter recursions. Lemma 2 is about the representation of the steady state solution as an autoregressive process. All proofs are given in the Appendix.

2.1. Two Examples

Two examples are given which illustrate the problem investigated. The examples are analysed further by a simulation study in Section 5.
Example 1.
In the first example the two random walks T 1 t and T 2 t are allowed to be dependent, so Ω η is unrestricted, and identifying restrictions are imposed only on their coefficients B. The equations are
y 1 , t + 1 = T 1 t + ε 1 , t + 1 , y 2 , t + 1 = T 2 t + ε 2 , t + 1 , y 3 , t + 1 = b 31 T 1 t + b 32 T 2 t + ε 3 , t + 1 .
for t = 0 , , n 1 .Thus, y t = ( y 1 t , y 2 t , y 3 t ) , T t = ( T 1 t , T 2 t ) , and
B = 1 0 0 1 b 31 b 32 .
Moreover, Ω η > 0 is 2 × 2 , Ω ε > 0 is 3 × 3 , and both are unrestricted positive definite. Simulations indicate that E t y t + 1 E ^ t y t + 1 = B E t T t B ^ E ^ t T t is stationary, and this obviously implies that the same holds for the first two coordinates E t T 1 t E ^ t T 1 t and E t T 2 t E ^ t T 2 t .      ■
Example 2.
The second example concerns two independent random walks T 1 t and T 2 t , and the three observation equations
y 1 , t + 1 = T 1 t + ε 1 , t + 1 , y 2 , t + 1 = b 21 T 1 t + T 2 t + ε 2 , t + 1 , y 3 , t + 1 = b 31 T 1 t + b 32 T 2 t + ε 3 , t + 1 .
In this example
B = 1 0 b 21 1 b 31 b 32 , Ω η = d i a g ( σ 1 2 , σ 2 2 ) ,
and Ω ε > 0 is 3 × 3 and unrestricted positive definite. Thus the nonstationarity is caused by two independent trends. The first, T 1 t , is the cause of the nonstationarity of y 1 t , whereas both trends are causes of the nonstationarity of ( y 2 t , y 3 t ) . From the first equation it is seen that y 1 t and T 1 t cointegrate. It is to be expected that also the extracted trend E t T 1 t cointegrates with T 1 t , and also that E t T 1 t cointegrates with its estimator E ^ t T 1 t . This is all supported by the simulations. Similarly, it turns out that
E t y 2 , t + 1 E ^ t y 2 , t + 1 = b 21 E t T 1 t b ^ 21 E ^ t T 1 t + E t T 2 t E ^ t T 2 t ,
is asymptotically stationary. In this case, however, E t T 2 t E ^ t T 2 t is not asymptotically stationary, and the paper provides an answer to why this is the case.      ■
The problem to be solved is why in the first example cointegration was found between the extracted trends and their estimators, and in the second example they do not cointegrate. The solution to the problem is that it depends on the way the trends and their coefficients are identified. For some identification schemes the estimator of B is n-consistent, and then stationarity of E t T t E ^ t T t can be proved. But if identification is achieved by imposing restrictions also on the covariance of the trends, as in Example 2, then the estimator for B is only n 1 / 2 -consistent, and that is not enough to get asymptotic stationarity of E t T t E ^ t T t .

2.2. Formulation of the Model as a Common Trend State Space Model

The common trend state space model with constant coefficients is defined by
α t + 1 = α t + η t , y t = B α t + ε t ,
t = 1 , , n , see Durbin and Koopman (2012) or Harvey (1989), with initial state α 1 . Here α t is the unobserved m−dimensional state variable and y t the p−dimensional observation and B is p × m of rank m < p . The errors ε t and η t are as specified in the discussion of the model given by (1) and (2).
Defining T t = α t + 1 , t = 0 , , n , gives the model (1) and (2). Note that in this notation E t T t = E t α t + 1 is the predicted value of the trend α t + 1 , which means that it is easy to formulate the Kalman filter.
The Kalman filter calculates the prediction a t + 1 = E t α t + 1 and its conditional variance P t + 1 = V a r t ( α t + 1 ) by the equations
a t + 1 = a t + P t B ( B P t B + Ω ε ) 1 ( y t E t 1 ( y t ) ) ,
P t + 1 = P t + Ω η P t B ( B P t B + Ω ε ) 1 B P t ,
starting with a 1 = α 1 and P 1 = 0 .
The recursions (8) and (9) become
E t + 1 T t + 1 = E t T t + K t ( y t + 1 E t y t + 1 ) ,
V t + 1 = Ω η + V t K t B V t ,
t = 0 , , n 1 starting with E 1 T 1 = T 0 and V 1 = Ω η , and defining the Kalman gain
K t = V t B ( B V t B + Ω ε ) 1 .
Lemma 1 contains the result that V t + 1 converges for t to a finite limit V, which can be calculated by solving an eigenvalue problem. Equation (11) is an algebraic Ricatti equation, see Chan et al. (1984), where the convergence result can be found. The recursion (10) is used to represent y t + 1 in terms of its cumulated prediction errors v t + 1 = y t + 1 E t y t + 1 , as noted by Harvey (2006, Section 7.3.2).
Lemma 1.
Let V t = V a r ( T t | Y t ) and E t T t = E ( T t | Y t ) .
(a) The recursion for V t , (11), can be expressed as
V t + 1 = Ω η + V t V t ( V t + Ω B ) 1 V t V , t ,
where Ω B = V a r ( B ¯ ε t | B ε t ) for B ¯ = B ( B B ) 1 . Moreover,
I m K t B = I m V t B ( B V t B + Ω ε ) 1 B I m K B = Ω B ( V + Ω B ) 1 , t ,
which has positive eigenvalues less than one, such that I m K B is a contraction, that is, ( I m K B ) n 0 , n .
(b) The limit of V t can be found by solving the eigenvalue problem
| λ Ω B Ω η | = 0 ,
for eigenvectors W and eigenvalues ( λ 1 , , λ m ) , such that W Ω B W = I m and W Ω η W = d i a g ( λ 1 , , λ m ) . Hence, W V W = d i a g ( τ 1 , , τ m ) for
τ i = 1 2 { λ i + ( λ i 2 + 4 λ i ) 1 / 2 } .
(c) Finally, using the prediction error, v t + 1 = y t + 1 E t y t + 1 , it is found from (10) that
E t T t = T 0 + j = 1 t K j 1 v j , a n d y t + 1 = v t + 1 + B ( T 0 + j = 1 t K j 1 v j ) .
The prediction errors are independent Gaussian with mean zero and variances
V a r ( v t + 1 ) = V a r t ( y t + 1 ) = V a r t ( B T t + ε t + 1 ) = B V t B + Ω ε B V B + Ω ε , t ,
such that in steady state the prediction errors are i.i.d. N p ( 0 , B V B + Ω ε ) , and (16) shows that y t is approximately an A R ( ) process, for which the reduced form autoregressive representation can be found, see (Harvey 2006, Section 7.3.2).
Lemma 2.
If the system (7) is in steady state, prediction errors v t are i.i.d. N ( 0 , B V B + Ω ε ) and
Δ y t = Δ v t + B K v t 1 .
Applying the Granger Representation Theorem, y t is given by
Δ y t = α β y t 1 + i = 1 Γ i Δ y t i + v t .
Here α = K ( B K ) 1 and β = B .

2.3. Cointegration among the Observations and Trends

In model (1) and (2), the equation y t + 1 = B T t + ε t + 1 shows that y t and T t are cointegrated. It also holds that T t E t T t is asymptotically stationary because
v t + 1 = y t + 1 E t y t + 1 = B T t + ε t + 1 B E t T t ,
which shows that B ( T t E t T t ) = v t + 1 ε t + 1 is asymptotically stationary. Multiplying by B ¯ = ( B B ) 1 B , the same holds for T t E t T t .
In model (18) the extracted trend is
T t = α i = 1 t v i = K i = 1 t v i ,
and (16) shows that in steady state, y t + 1 B T t = v t + 1 + B T 0 is stationary, so that y t cointegrates with T t . Thus, the process y t and the trends T t , T t , and E t T t all cointegrate, in the sense that suitable linear combinations are asymptotically stationary. The next section investigates when similar results hold for the estimated trends.

3. Statistical Analysis of the Data

In this section it is shown how the parameters of (7) can be estimated from the CVAR (18) using results of Saikkonen (1992) and Saikkonen and Lutkepohl (1996), or using a simple cointegrating regression, see (Harvey and Koopman 1997, p. 276) as discussed in Lemma 3. For both the state space model (1)–(2) and for the CVAR in (18) there is an identification problem between T t and its coefficient B, or between β and T t , because for any m × m matrix ξ of full rank, one can use B ξ 1 as parameter and ξ T t as trend and ξ Ω η ξ as variance, and similarly for β and T t . In order to estimate B, T, and Ω η , it is therefore necessary to impose identifying restrictions. Examples of such identification are given next.
Identification 1.
Because B has rank m, the rows can be permuted such that B = ( B 1 , B 2 ) , where B 1 is m × m and has full rank. Then the parameters and trend are redefined as
B = I m B 2 B 1 1 = I m γ , Ω η = B 1 Ω η B 1 , T t = B 1 T t .
Note that B T t = B T t and B Ω η B = B Ω η B . This parametrization is the simplest which separates parameters that are n-consistently estimated, γ , from those that are n 1 / 2 -consistently estimated, ( Ω η , Ω ε ) , see Lemma 3. Note that the (correlated) trends are redefined by choosing T 1 t as the trend in y 1 t , then T 2 t as the trend in y 2 t , as in Example 1.
A more general parametrization, which also gives n-consistency, is defined, as in simultaneous equations, by imposing linear restrictions on each of the m columns and require the identification condition to hold, see Fisher (1966).   ■
Identification 2.
The normalization with diagonality of Ω η is part of the next identification, because this is the assumption in the discussion of long-run causality. Let Ω η = C η d i a g ( σ 1 2 , , σ m 2 ) C η be a Cholesky decomposition of Ω η . That is, C η is lower-triangular with one in the diagonal, corresponding to an ordering of the variables. Using this decomposition the new parameters and the trend are
B # = C η γ C η , Ω η # = d i a g ( σ 1 2 , , σ m 2 ) , T t # = C η 1 T t # ,
such that B # T t # = B T t = B T t and B # Ω η # B # = B Ω η B = B Ω η B .
Identification of the trends is achieved in this case by defining the trends to be independent and constrain how they load into the observations. In Example 2, T 1 t was defined as the trend in y 1 t , and T 2 t as the trend in y 2 t , but orthogonalized on T 1 t , such that the trend in y 2 t is a combination of T 1 t and T 2 t .      ■

3.1. The Vector Autoregressive Model

When the process is in steady state, the infinite order CVAR representation is given in (18). The model is approximated by a sequence of finite lag models, depending on sample size n,
Δ y t = α β y t 1 + i = 1 k n Γ i Δ y t i + v t ,
where the lag length k n is chosen to depend on n such that k n increases to infinity with n, but so slowly that k n 3 / n converges to zero. Thus one can choose for instance k n = n 1 / 3 / log n or k n = n 1 / 3 ϵ , for some ϵ > 0 . With this choice of asymptotics, the parameters α , β , Γ = I p i = 1 Γ i , Σ = V a r ( v t ) , and the residuals, v t , can be estimated consistently, see Johansen and Juselius (2014) for this application of the results of Saikkonen and Lutkepohl (1996).
This defines for each sample size consistent estimators α ˘ , β ˘ , Γ ˘ and Σ ˘ , as well residuals v ˘ t . In particular the estimator of the common trend is T ˘ t = α ˘ i = 1 t v ˘ i . Thus, α ˘ β ˘ P α β , C ˘ = β ˘ ( α ˘ Γ ˘ β ˘ ) 1 α ˘ P C = B K and Σ ˘ P Σ = B V B + Ω ε . If β is identified as ( I m , γ ) , then B ˘ = β ˘ P β . In steady state, the relations
Ω η = V B ( B V B + Ω ε ) 1 B V = V B Σ 1 B V , C = B K = B V B ( B V B + Ω ε ) 1 = B V B Σ 1 ,
hold, see (11) and Lemma 2. It follows that
B ˘ Ω ˘ η B ˘ = C ˘ Σ ˘ C ˘ P B Ω η B , and Ω ˘ η = ( B ˘ B ˘ ) 1 B ˘ C ˘ Σ ˘ C ˘ B ˘ ( B ˘ B ˘ ) 1 P Ω η .
Finally, an estimator for Ω ε can be found as
Ω ˘ ε = Σ ˘ 1 2 ( C ˘ Σ ˘ + Σ ˘ C ˘ ) P B V B + Ω ε 1 2 ( B V B + B V B ) = Ω ε .
Note that C Σ is not a symmetric matrix in model (18), but its estimator converges in probability towards the symmetric matrix B V B .

3.2. The State Space Model

The state space model is defined by (1) and (2). It can be analysed using the Kalman filter to calculate the diffuse likelihood function, see Durbin and Koopman (2012), and an optimizing algorithm can be used to find the maximum likelihood estimator for the parameters Ω η , Ω ε , and B, once B is identified.
In this paper, an estimator is used which is simpler to analyse and which gives an n-consistent estimator for B suitably normalized, see (Harvey and Koopman 1997, p. 276).
The estimators are functions of Δ Y n and B Y n , and therefore do not involve the initial value T 0 . Irrespective of the identification, the relations
V a r ( Δ y t ) = B Ω η B + 2 Ω ε ,
C o v ( Δ y t , Δ y t + 1 ) = Ω ε ,
hold, and they gives rise to two moment estimators, which determine Ω η and Ω ε , once B has been identified and estimated.
Consider the identified parametrization (19), where B = ( I m , γ ) , and take B = ( γ , I p m ) . Then define z 1 t = ( y 1 t , , y m t ) and z 2 t = ( y m + 1 , t , , y p t ) , such that y t = ( z 1 t , z 2 t ) and B y t = γ z 1 t z 2 t = B ε t , that is,
z 2 t = γ z 1 t B ε t .
This equation defines the regression estimator γ ^ r e g :
γ ^ r e g = ( t = 0 n 1 z 1 t z 1 t ) 1 t = 0 n 1 z 1 t z 2 t = γ ( t = 0 n 1 z 1 t z 1 t ) 1 t = 0 n 1 z 1 t ε t B .
To describe the asymptotic properties of γ ^ r e g , two Brownian motions are introduced
n 1 / 2 t = 1 [ n u ] ε t D W ε ( u ) and n 1 / 2 t = 1 [ n u ] η t D W η ( u ) .
Lemma 3.
Let the data be generated by the state space model (1) and (2).
(a) From (21) and (22) it follows that
S n 1 = n 1 i = 1 n Δ y t Δ y t P B Ω η B + 2 Ω ε , S n 2 = n 1 i = 2 n ( Δ y t Δ y t 1 + Δ y t 1 Δ y t ) P 2 Ω ε ,
define n 1 / 2 -consistent asymptotically Gaussian estimators for B Ω η B and Ω ε , irrespective of the identification of B.
(b) If B and B are identified as B = ( I m , γ ) , B = ( γ , I p m ) , and Ω η is adjusted accordingly, then γ ^ r e g in (24) is n-consistent with asymptotic Mixed Gaussian distribution
n ( γ ^ r e g γ ) = n ( B ^ B ) B = n B ( B ^ B ) D ( 0 1 W η W η d u ) 1 0 1 W η ( d W ε ) B .
(c) If B is identified as B = ( C η , C η γ ) , B = ( γ , I p m ) , and Ω η = d i a g ( σ 1 2 , , σ m 2 ) , then B ^ B = O P ( n 1 / 2 ) , but (27) still holds for n ( B ^ B ) B = C ^ η ( γ ^ r e g γ ) , so that some linear combinations of B ^ are n consistent.
Note that the parameters B = ( I m , γ ) , Ω η , and Ω ε can be estimated consistently from (24) and (26) by
B ^ = I m γ ^ r e g , Ω ^ ε = 1 2 S n 2 , and Ω ^ η = ( B ^ B ^ ) 1 B ^ ( S n 1 + S n 2 ) B ^ ( B ^ B ^ ) 1 .
In the simulations of Examples 1 and 2 the initial value is T 0 = 0 , so the Kalman filter with T 0 = 0 is used to calculate the extracted trend E t T t using observations and known parameters. Similarly the estimator of the extracted trend E ^ t T t is calculated using observations and estimated parameters based on Lemma 3. The next section investigates to what extent these estimated trends cointegrate with the extracted trends, and if they cointegrate with each other.

4. Cointegration between Trends and Their Estimators

This section gives the main results in two theorems with proofs in the Appendix. In Theorem 1 it is shown, using the state space model to extract the trends and the estimator from Lemma 3, that B E t T t B ^ E ^ t T t is asymptotically stationary. For the CVAR model it holds that B T t B ˘ T ˘ t P 0 , such that this spread is asymptotically stationary. Finally, the estimated trends in the two models are compared, and it is shown that B ^ E ^ t T t B ˘ T ˘ t is asymptotically stationary. The conclusion is that in terms of cointegration of the trends and their estimators, it does not matter which model is used to extract the trends, as long as the focus is on the identified trends B T t and B T t .
Theorem 1.
Let y t and T t be generated by the DGP given in (1) and (2).
(a) If the state space model is used to extract the trends, and Lemma 3 is used for estimation, then B E t T t B ^ E ^ t T t is asymptotically stationary.
(b) If the vector autoregressive model is used to extract the trends and for estimation, then. B T t B ˘ T ˘ t P 0 .
(c) Under assumptions of (a) and (b), it holds that B ^ E ^ t T t B ˘ T ˘ t is asymptotically stationary.
In Theorem 2 a necessary and sufficient condition for asymptotic stationarity of T t T ˘ t , E t T t E ^ t T t , and E ^ t T t T ˘ t is given.
Theorem 2.
In the notation of Theorem 1, any of the spreads T t T ˘ t , E t T t E ^ t T t or E ^ t T t T ˘ t is asymptotically stationary if and only if B and the trend are identified such that the corresponding estimator for B satisfies n 1 / 2 ( B ^ B ) = o P ( 1 ) and n 1 / 2 ( B ˘ B ) = o P ( 1 ) .
The missing cointegration between E t T t and E ^ t T t , say, can be explained in terms of the identity
B ^ ( E t T t E ^ t T t ) = ( B ^ B ) E t T t + ( B E t T t B ^ E ^ t T t ) .
Here the second term, B E t T t B ^ E ^ t T t , is asymptotically stationary by Theorem 1 ( a ) . But the first term, ( B ^ B ) T t , is not necessarily asymptotically stationary, because in general, that is, depending on the identification of the trend and B, it holds that B ^ B = O P ( n 1 / 2 ) and E t T t = O P ( n 1 / 2 ) , see (16).
The parametrization B = ( I m , γ ) ensures n-consistency of B ^ , so there is asymptotic stationarity of T t T ˘ t , E t T t E ^ t T t , and E ^ t T t T ˘ t in this case. This is not so surprising because
B E t T t B ^ E ^ t T t = E t T t E ^ t T t γ E t T t γ ^ E ^ t T t ,
is stationary. Another situation where the estimator for B is n-consistent is if B = ( B 1 , , B m ) satisfies linear restriction of the columns, R i B i = 0 , or equivalently B i = R i ϕ i for some ϕ i , and the condition for identification is satisfied
r a n k { R i ( R 1 ϕ 1 , , R m ϕ m ) } = r 1 , for i = 1 , , m ,
see Fisher (1966). For a just-identified system, one can still use γ ^ r e g , and then solve for the identified parameters. For overidentified systems, the parameters can be estimated by a nonlinear regression of z 2 t on z 1 t reflecting the overidentified parametrization. In either case the estimator is n-consistent such that T t T ˘ t , E t T t E ^ t T t , and E ^ t T t T ˘ t are asymptotically stationary.
If the identification involves the variance Ω η , however, the estimator of B is only n 1 / 2 -consistent, and hence no cointegration is found between the trend and estimated trend.
The analogy with the results for the CVAR, where β and α need to be identified, is that if β is identified using linear restrictions (29) then β ^ is n-consistent, whereas if β is identified by restrictions on α then β is n 1 / 2 -consistent. An example of the latter is if β is identified as the first m rows of the matrix Π = α β , corresponding to α = ( I m , ϕ ) , then β ^ is n 1 / 2 -consistent and asymptotically Gaussian, see (Johansen 2010, Section 4.3).

5. A Small Simulation Study

The two examples introduced in Section 2.1 are analysed by simulation. The equations are given in (5) and (3). Both examples have p = 3 and m = 2 . The parameters B and Ω η contain 6 + 3 parameters, but the 3 × 3 matrix B Ω η B is of rank 2 and has only 5 estimable parameters. Thus, 4 restrictions must be imposed to identify the parameters. In both examples the Kalman filter with T 0 = 0 is used to extract the trends, and the cointegrating regression in Lemma 3 is used to estimate the parameters.
Example 1 continued.
The parameter B is given in (4), and the parameters are just-identified. Now
E t B T 1 t E ^ t B ^ T 1 t = E t T 1 t E ^ t T 1 t E t T 2 t E ^ t T 2 t b 31 E t T 1 t + b 32 E t T 2 t b ^ 31 E ^ t T 1 t b ^ 32 E ^ t T 2 t .
As E t T 1 t E ^ t T 1 t and E t T 2 t E ^ t T 2 t are the first two rows of E t B T 1 t E ^ t B ^ T 1 t in (30), they are both asymptotically stationary by Theorem 1(a).
To illustrate the results, data are simulated with n = 100 observations starting with T 0 = 0 and parameter values b 31 = b 32 = 0 . 5 , σ 1 2 = σ 2 2 = 1 , and σ 12 = 0 , such that
B = 1 0 0 1 0.5 0.5 , Ω η = 1 0 0 1 .
The parameters are estimated by (28) and the estimates become b ^ 31 = 0.48 , b ^ 32 = 0.41 , σ ^ 1 2 = 0.93 , σ ^ 12 = 0.26 , and σ ^ 2 2 = 1.63 . The extracted and estimated trends are plotted in Figure 1. Panels a and b show plots of ( E t T 1 t , E ^ t T 1 t ) and ( E t T 2 t , E ^ t T 2 t ) , respectively, and it is seen that they co-move. In panels c and d the differences E ^ t T 1 t E t T 1 t and E ^ t T 2 t E t T 2 t both appear to be stationary in this parametrization of the model.      ■
Example 2 continued.
The parameter B in this example is given in (6) such that
E t B T t E ^ t B ^ T t = E t T 1 t E ^ t T 1 t b 21 E t T 1 t + E t T 2 t b ^ 21 E ^ t T 1 t E ^ t T 2 t b 31 E t T 1 t + b 32 E t T 2 t b ^ 31 E ^ t T 1 t b ^ 32 E ^ t T 2 t .
By the results in Theorem 1(a), all three rows are asymptotically stationary, in particular E t T 1 t E ^ t T 1 t . Moreover, the second row of (32), ( b 21 E t T 1 t b ^ 21 E ^ t T 1 t ) + ( E t T 2 t E ^ t T 2 t ) , is asymptotically stationary. Thus, asymptotic stationarity of E t T 2 t E ^ t T 2 t requires asymptotic stationary of the term
b 21 E t T 1 t b ^ 21 E ^ t T 1 t = ( b 21 b ^ 21 ) E t T 1 t + b ^ 21 ( E t T 1 t E ^ t T 1 t ) .
Here, the second term, b ^ 21 ( E t T 1 t E ^ t T 1 t ) , is asymptotically stationary because E t T 1 t E ^ t T 1 t is. However, the first term, ( b 21 b ^ 21 ) E t T 1 t , is not asymptotically stationary because b ^ 21 is n 1 / 2 -consistent. In this case n 1 / 2 ( b 21 b ^ 21 ) D Z , which has a Gaussian distribution, and n 1 / 2 E [ n u ] T 1 [ n u ] D W η 1 ( u ) , where W η 1 is the Brownian motion generated by the sum of η 1 t . It follows that their product
( b 21 b ^ 21 ) E [ n u ] T 1 [ n u ] = { n 1 / 2 ( b 21 b ^ 21 ) } { n 1 / 2 E [ n u ] T 1 [ n u ] }
converges in distribution to the product of Z and W η 1 ( u ) , n , and this limit is nonstationary. It follows that E t T 2 t E ^ t T 2 t is not asymptotically stationary for the identification in this example. This argument is a special case of the proof of Theorem 2.
To illustrate the results, data are simulated from the model with n = 100 observations starting with T 0 = 0 and parameter values b 21 = 0.0 , b 31 = b 32 = 0 . 5 , and σ 1 2 = σ 2 2 = 1 , which is identical to (31).
The model is written in the form (19) with a transformed B and Ω η , as
B = 1 0 0 1 b 31 b 32 b 21 b 32 , Ω η = σ 1 2 b 21 σ 1 2 b 21 σ 1 2 σ 2 2 + b 21 2 σ 1 2 .
The parameters are estimed as in Example 1 and we find b ^ 31 b ^ 32 b ^ 21 = 0 . 48 , b ^ 32 = 0 . 41 , σ ^ 1 2 = 0 . 93 , b ^ 21 σ ^ 12 = 0 . 26 , and σ ^ 2 2 + b ^ 21 2 σ ^ 1 2 = 1 . 63 , which are solved for b ^ 21 = 0 . 28 , b ^ 31 = 0 . 59 , b ^ 32 = 0 . 41 , σ ^ 1 2 = 0 . 93 , and σ ^ 2 2 = 1 . 56 . The extracted and estimated trends are plotted in Figure 2. The panels a and b show plots of ( E t T 1 t , E ^ t T 1 t ) and ( E t T 2 t , E ^ t T 2 t ) , respectively. It is seen that E t T 1 t and E ^ t T 1 t co-move, whereas E t T 2 t and E ^ t T 2 t do not co-move. In panels c and d, the differences E t T 1 t E ^ t T 1 t and E t T 2 t E ^ t T 2 t are plotted. Note that the first looks stationary, whereas the second is clearly nonstationary. When comparing with the plot of E t T 1 t in panel a, it appears that the process E ^ t T 1 t can explain the nonstationarity of E t T 2 t E ^ t T 2 t . This is consistent with Equation (33) with b 21 = 0 and b ^ 21 = 0 . 28 . In panel d, E t T 2 t E ^ t T 2 t 0 . 28 E ^ t T 1 t is plotted and it is indeed stationary.      ■

6. Conclusions

The paper analyses a sample of n observations from a common trend model, where the state is an unobserved multivariate random walk and the observation is a linear combination of the lagged state variable and a noise term. For such a model, the trends and their coefficients in the observation equation need to be identified before they can be estimated separately. The model leads naturally to cointegration between observations, trends, and the extracted trends. Using simulations it was discovered, that the extracted trends do not necessarily cointegrate with their estimators. This problem is investigated, and it is found to be related to the identification of the trends and their coefficients in the observation equation. It is shown in Theorem 1, that provided only the linear combinations of the trends from the observation equation are considered, there is always cointegration between extracted trends and their estimators. If the trends and their coefficients are defined by identifying restrictions, the same result holds if and only if the estimated identified coefficients in the observation equation are consistent at a rate faster than n 1 / 2 . For the causality study mentioned in the introduction, where the components of the unobserved trend are assumed independent, the result has the following implication: For the individual extracted trends to cointegrate with their estimators, overidentifying restrictions must be imposed on the trend’s causal impact coefficients on the observations, such that the estimators of these become super-consistent.

Acknowledgments

S.J. is grateful to CREATES—Center for Research in Econometric Analysis of Time Series (DNRF78), funded by the Danish National Research Foundation. M.N.T. is grateful to the Carlsberg Foundation (grant reference 2013_01_0972). We have benefitted from discussions with Siem Jan Koopman and Eric Hillebrand on state space models and thankfully acknowledge the insightful comments from two anonymous referees.

Author Contributions

S.J. has contributed most of the mathematical derivations. M.N.T. has performed the simulations and posed the problem to be solved.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Lemma 1
Proof of (a): Let N = ( B ¯ , B ) , B ¯ = B ( B B ) 1 , such that
K t B = V t B [ B V t B + Ω ε ] 1 B = V t B N [ ( N B V t B N + N Ω ε N ) ] 1 N B = V t I m 0 V t + B ¯ Ω ε B ¯ B ¯ Ω ε B B Ω ε B ¯ B Ω ε B 1 I m 0 = V t ( V t + Ω B ) 1 ,
where
Ω B = B ¯ [ Ω ε Ω ε B ( B Ω ε B ) 1 B Ω ε ] B ¯ = V a r ( B ¯ ε t | B ε t ) .
This proves (13) and (14).
Proof of (b): If the recursion starts with V 1 = Ω η , then V t can be diagonalized by W for all t and the limit satisfies W V W = d i a g ( τ 1 , , τ m ) , where
τ i = λ i + τ i τ i 2 1 + τ i .
This has solution given in (15).
Proof of (c): The first result follows by summation from the recursion for E t T t in (10), and the second from y t + 1 = v t + 1 + B E t T t .   ■
Proof of Lemma 2
The polynomial Φ ( z ) = I p z ( I p B K ) describes (17) as
( 1 L ) y t = Φ ( L ) v t .
Note that Φ ( 1 ) = B K is singular and d Φ ( z ) / d z | z = 1 = B K I p = B V B ( B V B + Ω ε ) 1 I p , satisfies B ( B K I p ) K = B Ω ε B is nonsingular, where K = ( B V B + Ω ε ) B . This means that the Granger Representation Theorem (Johansen 1996, Theorem 4.5) can be applied and gives the expansion (18) for α = K ( B K ) 1 and β = B .      ■
Proof of Lemma 3
Proof of (a): Consider first the product moments (21) and (22). The result (26) follows from the Law of Large Numbers and the asymptotic Gaussian distribution of Ω ^ ε = 1 2 S n 2 and Ω ^ η = B ¯ ^ ( S 1 n + S n 2 ) B ¯ ^ follows from the Central Limit Theorem.
Proof of (b): It follows from (23), (24), and (25) that the least squares estimator γ ^ r e g satisfies (27). Let B = ( γ , I p m ) , then
( B ^ B ) B = γ ^ r e g γ = B ( B ^ B ) .
Proof of (c): Note that for the other parametrization, (20), where B = ( C η , C η γ ) , it holds that B = ( γ , I p m ) , such that for both parametrizations (27) holds. The estimator of B, in the parametrization (20), is B ^ = ( C ^ η , C ^ η γ ^ ) , where C ^ η is derived from the n 1 / 2 -consistent estimator of Ω η , such that for this parametrization, estimation of B is not n-consistent, but only n 1 / 2 -consistent and B ^ B = O P ( n 1 / 2 ) .      ■
Proof of Theorem 1.
Proof of (a): Let w t = B T t B ^ E ^ t T t , then
B E t T t B ^ E ^ t T t = B ( E t T t T t ) + ( B T t B ^ E ^ t T t ) = B ( E t T t T t ) + w t .
Here B ( E t T t T t ) is stationary, so that it is enough to show that w t is asymptotically stationary. From the definition of T t + 1 and the Kalman filter recursion (10) calculated for T 0 = 0 and for the estimated parameters, it holds that
B T t + 1 = B T t + B η t + 1 , B ^ E ^ t + 1 T t + 1 = B ^ E ^ t T t B ^ K ^ t ( y t + 1 B ^ E ^ t T t ) .
Subtracting the expressions gives
B T t + 1 B ^ E ^ t + 1 T t + 1 = B T t + B η t + 1 B ^ E ^ t T t B ^ K ^ t ( y t + 1 B E ^ t T t ) = B T t B ^ E ^ t T t B ^ K ^ t ( B T t + ε t + 1 B ^ E ^ t T t ) + B η t + 1 ,
which gives the recursion
w t + 1 = ( I p B ^ K ^ t ) w t B ^ K ^ t ε t + 1 + B η t + 1 .
Note that ( I p B ^ K ^ t ) is not a contraction, because p m eigenvalues are one. Hence it is first proved that B ^ w t is small and then a contraction is found for B ^ w t . From the definition of w t , it follows from (27), that
B ^ w t = B ^ B T t = B ^ ( B B ^ ) T t = ( B B ^ ) B ^ T t = O P ( n 1 ) O P ( n 1 / 2 ) = O P ( n 1 / 2 ) .
Next define B ¯ ^ = B ^ ( B ^ B ^ ) 1 and B ¯ ^ = B ^ ( B ^ B ^ ) 1 , such that I p = B ^ B ¯ ^ + B ^ B ¯ ^ . From (A2) it follows by multiplying by B ¯ ^ and using B ¯ ^ B = B ¯ ^ ( B B ^ ) + I m = I m + O P ( n 1 / 2 ) , that
B ¯ ^ w t + 1 = ( B ¯ ^ K ^ t ) w t K ^ t ε t + 1 + B ¯ ^ B η t + 1 = ( B ¯ ^ K ^ t ) ( B ^ B ¯ ^ + B ^ B ¯ ^ ) w t K ^ t ε t + 1 + η t + 1 + B ¯ ^ ( B B ^ ) η t + 1 = ( I m K ^ t B ^ ) B ¯ ^ w t K ^ t ε t + 1 + η t + 1 + O P ( n 1 / 2 ) ,
because B ¯ ^ ( B B ^ ) η t + 1 = O P ( n 1 / 2 ) and ( B ¯ ^ K ^ t ) B ^ B ¯ ^ w t = K ^ t B ^ B ¯ ^ w t = O P ( n 1 / 2 ) .
From (14) it is seen that I m K ^ t B ^ P Ω B ( V + Ω B ) 1 and ( I m K B ) n 0 , n . This shows that B ¯ ^ w t and hence w t is asymptotically a stationary A R ( 1 ) process.
Proof of (b): The CVAR (18) is expressed as Π ( L ) y t = v t , and the parameters are estimated using maximum likelihood with lag length k n and k n 3 / n 0 . This gives estimators ( α ˘ , β ˘ , Γ ˘ , C ˘ , Σ ˘ ) and residuals v ˘ t . The representation of y t in terms of v t is given by
y t = C i = 1 t v i + i = 0 C i v t i + A ,
where β A = 0 . This relation also holds for the estimated parameters and residuals, and subtracting one finds
B T t i = 1 t v i B ˘ T ˘ t i = 1 t v ˘ i = i = 0 C ˘ i v ˘ t i i = 0 C i v t i A + A ˘ .
It is seen that the right hand side is o P ( 1 ) and hence asymptotically stationary.
Proof of (c): Each estimated trend is compared with the corresponding trend which gives
B ^ E ^ t T t B ˘ T ˘ t = ( B ^ E ^ t T t B T t ) + ( B T t B T t ) + ( B T t B ˘ T ˘ t ) .
Here the first term is asymptotically stationary using Theorem 2(a), the middle term is asymptotically stationary, and the last is o P ( 1 ) by Theorem 1(b).    ■
Proof of Theorem 2.
The proof is the same for all the spreads, so consider E t T t E ^ t T t , and the identity
B ^ ( B E t T t B ^ E ^ t T t ) = B ^ ( B B ^ ) E t T t + B ^ B ^ ( E t T t E ^ t T t ) .
The left hand side is asymptotically stationary by Theorem 1(a) and therefore E t T t E ^ t T t is asymptotically stationary if and only
B ^ ( B B ^ ) E t T t = [ n 1 / 2 B ^ ( B B ^ ) ] [ n 1 / 2 E t T t ] ,
is asymptotically stationary. Here the second factor converges to a nonstationary process,
n 1 / 2 E [ n u ] T [ n u ] = n 1 / 2 E 0 T 0 + n 1 / 2 j = 2 [ n u ] K j 1 v j D W v ( u ) ,
see (16), so for the term [ n 1 / 2 B ^ ( B B ^ ) ] [ n 1 / 2 E t T t ] to be asymptotically stationary it is necessary and sufficient that n 1 / 2 B ^ ( B B ^ ) P 0 .   ■

References

  1. Chan, Siew Wah, Graham Clifford Goodwin, and Kwai Sang Sin. 1984. Convergence properties of the Ricatti difference equation in optimal filtering of nonstabilizable systems. IEEE Transaction on Automatic Control 29: 110–18. [Google Scholar] [CrossRef]
  2. Durbin, Jim, and Siem Jan Koopman. 2012. Time Series Analysis by State Space Methods, 2nd ed. Oxford: Oxford University Pres. [Google Scholar]
  3. Fisher Frank, M. 1966. The Identification Problem in Econometrics. New York: McGraw-Hill. [Google Scholar]
  4. Harvey, Andrew. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press. [Google Scholar]
  5. Harvey, Andrew C. 2006. Forecasting with Unobserved Components Time Series Models. In Handbook of Economic Forecasting. Edited by G. Elliot, C. Granger and A. Timmermann. Amsterdam: North Holland, pp. 327–412. [Google Scholar]
  6. Harvey, Andrew C., and Siem Jan Koopman. 1997. Multivariate structural time series models. In System Dynamics in Economics and Financial Models. Edited by C. Heij, J.M. Schumacher, B. Hanzon and C. Praagman. New York: John Wiley and Sons. [Google Scholar]
  7. Hoover, Kevin D., Søren Johansen, Katarina Juselius, and Morten Nyboe Tabor. 2014. Long-run Causal Order: A Progress Report. Unpublished manuscript. [Google Scholar]
  8. Johansen, Søren. 1996. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models, 2nd ed. Oxford: Oxford University Press. [Google Scholar]
  9. Johansen, Søren. 2010. Some identification problems in the cointegrated vector autoregressive model. Journal of Economietrics 158: 262–73. [Google Scholar] [CrossRef]
  10. Johansen, Søren, and Katarina Juselius. 2014. An asymptotic invariance property of the common trends under linear transformations of the data. Journal of Econometrics 178: 310–15. [Google Scholar] [CrossRef]
  11. Pearl, Judea. 2009. Causality: Models, Reasoning and Inference, 2nd ed. Cambridge: Cambridge University Press. [Google Scholar]
  12. Saikkonen, Pentti. 1992. Estimation and testing of cointegrated systems by an autoregressive approximation. Econometric Theory 8: 1–27. [Google Scholar] [CrossRef]
  13. Saikkonen, Pentti, and Helmut Lütkepohl. 1996. Infinite order cointegrated vector autoregressive processes. Estimation and Inference. Econometric Theory 12: 814–44. [Google Scholar] [CrossRef]
  14. Spirtes, Peter, Clark Glymour, and Richard Scheines. 2000. Causation, Prediction, and Search, 2nd ed. Cambridge: MIT Press. [Google Scholar]
Figure 1. The figure shows the extracted and estimated trends for the simulated data in Example 1 with the identification in (19). Panels a and b show plots of E t T 1 t and E ^ t T 1 t , and E t T 2 t and E ^ t T 2 t , respectively. Note that in both cases, the processes seem to co-move. In panels c and d, E t T 1 t E ^ t T 1 t and E t T 2 t E ^ t T 2 t are plotted and appear stationary, because they are both recovered from B E t T t B ^ E ^ t T t as the first two coordinates, see (19).
Figure 1. The figure shows the extracted and estimated trends for the simulated data in Example 1 with the identification in (19). Panels a and b show plots of E t T 1 t and E ^ t T 1 t , and E t T 2 t and E ^ t T 2 t , respectively. Note that in both cases, the processes seem to co-move. In panels c and d, E t T 1 t E ^ t T 1 t and E t T 2 t E ^ t T 2 t are plotted and appear stationary, because they are both recovered from B E t T t B ^ E ^ t T t as the first two coordinates, see (19).
Econometrics 05 00036 g001
Figure 2. The figure shows the extracted and estimated trends for the simulated data in Example 2 with the identification in (20). Panels a and b show plots of E t T 1 t and E ^ t T 1 t , and E t T 2 t and E ^ t T 2 t , respectively. Note that E t T 1 t and E ^ t T 1 t seem to co-move, whereas E t T 2 t and E ^ t T 2 t do not. In panel c , E t T 1 t E ^ t T 1 t is plotted and appears stationary, but in panel d the spread E t T 2 t E ^ t T 2 t is nonstationary, whereas E t T 2 t E ^ t T 2 t 0 . 28 E ^ t T 1 t is stationary.
Figure 2. The figure shows the extracted and estimated trends for the simulated data in Example 2 with the identification in (20). Panels a and b show plots of E t T 1 t and E ^ t T 1 t , and E t T 2 t and E ^ t T 2 t , respectively. Note that E t T 1 t and E ^ t T 1 t seem to co-move, whereas E t T 2 t and E ^ t T 2 t do not. In panel c , E t T 1 t E ^ t T 1 t is plotted and appears stationary, but in panel d the spread E t T 2 t E ^ t T 2 t is nonstationary, whereas E t T 2 t E ^ t T 2 t 0 . 28 E ^ t T 1 t is stationary.
Econometrics 05 00036 g002

Share and Cite

MDPI and ACS Style

Johansen, S.; Tabor, M.N. Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models. Econometrics 2017, 5, 36. https://doi.org/10.3390/econometrics5030036

AMA Style

Johansen S, Tabor MN. Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models. Econometrics. 2017; 5(3):36. https://doi.org/10.3390/econometrics5030036

Chicago/Turabian Style

Johansen, Søren, and Morten Nyboe Tabor. 2017. "Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models" Econometrics 5, no. 3: 36. https://doi.org/10.3390/econometrics5030036

APA Style

Johansen, S., & Tabor, M. N. (2017). Cointegration between Trends and Their Estimators in State Space Models and Cointegrated Vector Autoregressive Models. Econometrics, 5(3), 36. https://doi.org/10.3390/econometrics5030036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop