Next Article in Journal / Special Issue
Forward Rate Bias in Developed and Developing Countries: More Risky Not Less Rational
Previous Article in Journal
Data Revisions and the Statistical Relation of Global Mean Sea Level and Surface Temperature
Previous Article in Special Issue
Modeling I(2) Processes Using Vector Autoregressions Where the Lag Length Increases with the Sample Size
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Parameterization of Models for Unit Root Processes: Structure Theory and Hypothesis Testing

by
Dietmar Bauer
1,*,
Lukas Matuschek
1,2,
Patrick de Matos Ribeiro
1,2 and
Martin Wagner
3,4,5
1
Faculty of Business Administration and Economics, Bielefeld University, Universitätsstraße 25, 33615 Bielefeld, Germany
2
Faculty of Statistics, TU Dortmund, Vogelpothsweg 87, 44227 Dortmund, Germany
3
Department of Economics, University of Klagenfurt, Universitätsstraße 65-67, 9020 Klagenfurt, Austria
4
Bank of Slovenia, Slovenska 35, 1505 Ljubljana, Slovenia
5
Institute for Advanced Studies, Josefstädter Straße 39, 1080 Vienna, Austria
*
Author to whom correspondence should be addressed.
Econometrics 2020, 8(4), 42; https://doi.org/10.3390/econometrics8040042
Submission received: 18 April 2018 / Revised: 3 November 2020 / Accepted: 4 November 2020 / Published: 10 November 2020
(This article belongs to the Special Issue Celebrated Econometricians: Katarina Juselius and Søren Johansen)

Abstract

:
We develop and discuss a parameterization of vector autoregressive moving average processes with arbitrary unit roots and (co)integration orders. The detailed analysis of the topological properties of the parameterization—based on the state space canonical form of Bauer and Wagner (2012)—is an essential input for establishing statistical and numerical properties of pseudo maximum likelihood estimators as well as, e.g., pseudo likelihood ratio tests based on them. The general results are exemplified in detail for the empirically most relevant cases, the (multiple frequency or seasonal) I(1) and the I(2) case. For these two cases we also discuss the modeling of deterministic components in detail.

1. Introduction

Since the seminal contribution of Clive W.J. Granger (1981) that introduced the concept of cointegration, the modeling of multivariate (economic) time series with models and methods that allow for unit roots and cointegration has become standard econometric practice with applications ranging from macroeconomics to finance to climate science.
The most prominent (parametric) model class for cointegration analysis are vector autoregressive (VAR) models, popularized by the important contributions of Søren Johansen and Katarina Juselius and their co-authors, see, e.g., the monographs Johansen (1995) and Juselius (2006). The popularity of VAR cointegration analysis stems not only from the (relative) simplicity of the model class, but also from the fact that the VAR cointegration literature is very well-developed and provides a large battery of tools for diagnostic testing, impulse response analysis, forecast error variance decompositions and the like. All this makes VAR cointegration analysis to a certain extent the benchmark in the literature.1
The imposition of specific cointegration properties on an estimated VAR model becomes increasingly complicated as one moves away from the I(1) case. As discussed in Section 2, e.g., in the I(2) case a triple of indices needs to be chosen (fixed or determined via testing) to describe the cointegration properties. The imposition of cointegration properties in the estimation algorithm then leads to “switching” type algorithms that come together with non-trivial parameterization restrictions involving non-linear inter-relations, compare Paruolo (1996) or Paruolo (2000).2 Mathematically, these complications arise from the fact that the unit root and cointegration properties are in the VAR setting related to rank restrictions on the autoregressive polynomial matrix and its derivatives.
Restricting cointegration analysis to VAR processes may be too restrictive. First, it is well-known since Zellner and Palm (1974) that VAR processes are not invariant with respect to marginalization, i.e., subsets of the variables of a VAR process are in general vector autoregressive moving average (VARMA) processes. Second, similar to the first argument, aggregation of VAR processes also leads to VARMA processes, an issue relevant, e.g., in the context of temporal aggregation and in mixed-frequency settings. Third, the linearized solutions to dynamic stochastic general equilibrium (DSGE) models are typically VARMA rather than VAR processes, see, e.g., Campbell (1994). Fourth, a VARMA model may be a more parsimonious description of the data generating process (DGP) than a VAR model, with parsimony becoming more important with increasing dimension of the process.3
If one accepts the above arguments as a motivation for considering VARMA processes in cointegration analysis, it is convenient to move to the—essentially equivalent (see Hannan and Deistler 1988, chps. 1 and 2)—state space framework. A key challenge when moving from VAR to VARMA models—or state space models—is that identification becomes an important issue for the latter model class, whereas unrestricted VAR models are (reduced-form) identified. In other words, there are so-called equivalence classes of VARMA models that lead to the same dynamic behavior of the observed process. As is well-known, to achieve identification, restrictions have to be placed on the coefficient matrices in the VARMA case, e.g., zero or exclusion restrictions. A mapping attaching to every transfer function, i.e, the function relating the error sequence to the observed process, a unique VARMA (or state space) system from the corresponding class of observationally equivalent systems is called canonical form. Since not all entries of the coefficient matrices in canonical form are free parameters, for statistical analysis a so-called parameterization is required that maps the free parameters from coefficient matrices in canonical form into a parameter vector. These issues, including the importance of the properties such as continuity and differentiability of parameterizations, are discussed in detail in Hannan and Deistler (1988, chp. 2) and, of course, are also relevant for our setting in this paper.
The convenience of the state space framework for unit root and cointegration analysis stems from the fact that (static and dynamic) cointegration can be characterized by orthogonality constraints, see Bauer and Wagner (2012), once an appropriate basis for the state vector, which is a (potentially singular) VAR process of order one, is chosen. The integration properties are governed by the eigenvalue structure of unit modulus eigenvalues of the system matrix in the state equation. Eigenvalues of unit modulus and orthogonality constraints arguably are easier restrictions to deal with or to implement than the interrelated rank restrictions considered in the VAR or VARMA setting. The canonical form of Bauer and Wagner (2012) is designed for cointegration analysis by using a basis of the state vector that puts the unit root and cointegration properties to the center and forefront. Consequently, these results are key input for the present paper and are thus briefly reviewed in Section 3.
An important problem with respect to appropriately defining the “free parameters” in VARMA models is the fact that no continuous parameterization of all VARMA or state space models of a certain order n exists in the multivariate case (see Hazewinkel and Kalman 1976). This implies that the model set, M n say, has to be partitioned into subsets on which continuous parameterizations exist, i.e., M n = Γ G M Γ for some multi-index Γ varying in an index set G. Based on the canonical form of Bauer and Wagner (2012), the partitioning is according to systems—in addition to other restrictions such as fixed order n—with fixed unit root properties, to be precise over systems with given state space unit root structure. This has the advantage that, e.g., pseudo maximum likelihood (PML) estimation can straightforwardly be performed over systems with fixed unit root properties without any further ado, i.e., without having to consider (or ignore) rank restrictions on polynomial matrices. The definition and detailed discussion of the properties of this parameterization is the first main result of the paper.
The second main set of results, provided in Section 4, is a detailed discussion of the relationships between the different subsets of models M Γ for different indices Γ and the parameterization of the respective model sets. Knowledge concerning these relations is important to understand the asymptotic behavior of PML estimators and pseudo likelihood ratio tests based on them. In particular, the structure of the closures of M, M ¯ say, of the considered model set M has to be understood, since the difference M ¯ \ M cannot be avoided when maximizing the pseudo likelihood function4. Additionally, the inclusion properties between different sets M Γ need to be understood, as this knowledge is important for developing hypothesis tests, in particular for developing hypothesis tests for the dimensions of cointegrating spaces. Hypotheses testing, with a focus on the MFI(1) and I(2) cases, is discussed in Section 5, which shows how the parameterization results of the paper can be used to formulate a large number of hypotheses on (static and polynomial) cointegrating relationships as considered in the VAR cointegration literature. This discussion also includes commonly used deterministic components such as intercept, seasonal dummies, and linear trend, as well as restrictions on these components.
The paper is organized as follows: Section 2 briefly reviews VAR and VARMA models with unit roots and cointegration and discusses some of the complications arising in the VARMA case in addition to the complications arising due to the presence of unit roots and cointegration already in the VAR case. Section 3 presents the canonical form and the parameterization based on it, with the discussion starting with the multiple frequency I(1)—MFI(1)—and I(2) cases prior to a discussion of the general case. This section also provides several important definitions like, e.g., of the state space unit root structure. Section 4 contains a detailed discussion concerning the topological structure of the model sets and Section 5 discusses testing of a large number of hypotheses on the cointegrating spaces commonly tested in the cointegration literature. The discussion in Section 5 focuses on the empirically most relevant MFI(1) and I(2) cases and includes the usual deterministic components considered in the literature. Section 6 briefly summarizes and concludes the paper. All proofs are relegated to the Appendix A and Appendix B.
Throughout we use the following notation: L denotes the lag operator, i.e., L ( { x t } t Z ) : = { x t 1 } t Z , for brevity written as L x t = x t 1 . For a matrix γ C s × r , γ C r × s denotes its conjugate transpose. For γ C s × r with full column rank r < s , we define γ C s × ( s r ) of full column rank such that γ γ = 0 . I p denotes the p-dimensional identity matrix, 0 m × n the m times n zero matrix. For two matrices A C m × n , B C k × l , A B C m k × n l denotes the Kronecker product of A and B. For a complex valued quantity x, R ( x ) denotes its real part, I ( x ) its imaginary part and x ¯ its complex conjugate. For a set V, V ¯ denotes its closure.5 For two sets V and W, V \ W denotes the difference of V and W, i.e., { v V : v W } . For a square matrix A we denote the spectral radius (i.e., the maximum of the moduli of its eigenvalues) by λ | max | ( A ) and by det ( A ) its determinant.

2. Vector Autoregressive, Vector Autoregressive Moving Average Processes and Parameterizations

In this paper, we define VAR processes { y t } t Z , y t R s , as solution of
a ( L ) y t = y t + j = 1 p a j y t j = ε t + Φ d t ,
with a ( L ) : = I s + j = 1 p a j L j , where a j R s × s for j = 1 , , p , Φ R s × m , a p 0 , a white noise process { ε t } t Z , ε t R s , with Σ : = E ( ε t ε t ) > 0 and a vector sequence { d t } t Z , d t R m , comprising deterministic components like, e.g., the intercept, seasonal dummies or a linear trend. Furthermore, we impose the non-explosiveness condition det a ( z ) 0 for all | z | < 1 , with a ( z ) : = I s + j = 1 p a j z j and z denoting a complex variable.6
Thus, for given autoregressive order p, with—as defining characteristic of the order— a p 0 , the considered class of VAR models with specified deterministic components { d t } t Z is given by the set of all polynomial matrices a ( z ) such that (i) the non-explosiveness condition holds, (ii) a ( 0 ) = I s and (iii) a p 0 ; together with the set of all matrices Φ R s × m .
Equivalently, the model class can be characterized by a set of rational matrix functions k ( z ) : = a ( z ) 1 , referred to as transfer functions, and the input-output description for the deterministic variables, i.e.,
V p , Φ : = V p × R s × m , V p : = k ( z ) = j = 0 k j z j = a ( z ) 1 : a ( z ) = I s + j = 1 p a j z j , det a ( z ) 0 for | z | < 1 , a p 0 .
The associated parameter space is Θ p , Φ : = Θ p × R s m R s 2 p + s m , where the parameters
θ : = [ θ a , θ Φ ] = [ vec ( a 1 ) , , vec ( a p ) , vec ( Φ ) ]
are obtained from stacking the entries of the matrices a j and Φ , respectively.
Remark 1.
In the above discussion the parameters, θ Σ say, describing the variance covariance matrix Σ of ε t are not considered. These can be easily included, similarly to Φ by, e.g., parameterizing positive definite symmetric s × s matrices via their lower triangular Cholesky factor. This leads to a parameter space Θ p , Φ , Σ R s 2 p + s m + s ( s + 1 ) 2 . We omit θ Σ for brevity, since typically no cross-parameter restrictions involving parameters corresponding to Σ are considered, whereas as discussed in Section 5 parameter restrictions involving—in this paper in the state space rather than the VAR setting—both elements of Θ p and Φ, to, e.g., impose the absence of a linear trend in the cointegrating space, are commonly considered in the cointegration literature.7 The estimator of the variance covariance matrix Σ often equals the sample variance of suitable residuals ε ^ t ( θ ) from (1), if there are no cross-restrictions between θ and θ Σ . This holds, e.g., for the Gaussian pseudo maximum likelihood estimator. Thus, explicitly including θ Σ and Θ Σ in the discussion would only overload notation without adding any additional insights, given the simple nature of the parameterization of Σ.
Remark 2.
Our consideration of deterministic components is a special case of including exogenous variables. We include exogenous deterministic variables with a static input-output behavior governed solely by the matrix Φ. More general exogenous variables that are dynamically related to the output { y t } t Z could be considered, thereby considering so-called VARX models rather than VAR models, which would necessitate considering in addition to the transfer function k ( z ) also a transfer function l ( z ) , say, linking the exogenous variables dynamically to the output.
For the VAR case, the fact that the mapping assigning a given transfer function k ( z ) V p , to a parameter vector θ a Θ p —the parameterization—is continuous with continuously differentiable inverse is immediate.8 Homeomorphicity of a parameterization is important for the properties of parameter estimators, e.g., the ordinary least squares (OLS) or Gaussian PML estimator, compare the discussion in Hannan and Deistler (1988, Theorem 2.5.3 and Remark 1, p. 65).
For OLS estimation one typically considers the larger set V p O L S without the non-explosiveness condition and without the assumption a p 0 :
V p O L S : = k ( z ) = j = 0 k j z j = a ( z ) 1 : a ( z ) = I s + j = 1 p a j z j .
Considering V p O L S allows for unconstrained optimization. It is well-known that for { ε t } t Z as given above, the OLS estimator is consistent over the larger set V p O L S , i.e., without imposing non-explosiveness and also when specifying p too high. Alternatively, and closely related to OLS in the VAR case, the pseudo likelihood can be maximized over Θ p , Φ . With this approach, maxima respectively suprema can occur at the boundary of the parameter space, i.e., maximization effectively has to consider Θ ¯ p , Φ . It is well-known that the PML estimator is consistent for the stable case (cf. Hannan and Deistler 1988, Theorem 4.2.1), but the maximization problem is complicated by the restrictions on the parameter space stemming from the non-explosiveness condition. Avoiding these complications and asymptotic equivalence of OLS and PML in the stable VAR case explains why VAR models are usually estimated by OLS.9
To be more explicit, ignore deterministic components for a moment and consider the case where the DGP is a stationary VAR process, i.e., a solution of (1) with a ( z ) satisfying the stability condition det a ( z ) 0 for | z | 1 . Define the corresponding set of stable transfer functions by V p , :
V p , : = a ( z ) 1 V p : det a ( z ) 0 for | z | 1 , a p 0 .
Clearly, V p , is an open subset of V p . If the DGP is a stationary VAR process, the above-mentioned consistency result of the OLS estimator over V p O L S implies that the probability that the estimated transfer function, k ^ ( z ) = a ^ ( z ) 1 say, is contained in V p , converges to one as the sample size tends to infinity. Moreover, the asymptotic distribution of the estimated parameters is normal, under appropriate assumptions on { ε t } t Z .
The situation is a bit more involved if the transfer function of the DGP corresponds to a point in the set V ¯ p , \ V p , , which contains systems with unit roots, i.e., determinantal roots of a ( z ) on the unit circle, as well as lower order autoregressive systems—with these two cases non-disjoint. The stable lower order case is relatively unproblematic from a statistical perspective. If, e.g., OLS estimation is performed over V p O L S , while the true model corresponds to an element in V p * , , with p * < p , the OLS estimator is still consistent, since V p * , V p O L S . Furthermore, standard chi-squared pseudo likelihood ratio test based inference still applies. The integrated case, for a precise definition see the discussion below Definition 1, is a bit more difficult to deal with, as in this case not all parameters are asymptotically normally distributed and nuisance parameters may be present. Consequently, parameterizations that do not take the specific nature of unit root processes into account are not very useful for inference in the unit root case, see, e.g., Sims et al. (1990, Theorem 1). Studying the unit root and cointegration properties is facilitated by resorting to suitable parameterizations that “zoom in on the relevant characteristics”.
In case that the only determinantal root of a ( z ) on the unit circle is at z = 1 , the system corresponds to a so-called I ( d ) process, with the integration order d > 0 made precise in Definition 1 below. Consider first the I(1) case: As is well-known, the rank of the matrix a ( 1 ) equals the dimension of the cointegrating space given in Definition 3 below—also referred to as the cointegrating rank. Therefore, determination of the rank of this matrix is of key importance. With the parameterization used so far, imposing a certain (maximal) rank on a ( 1 ) implies complicated restrictions on the matrices a j , j = 1 , , p . This in turn renders the correspondingly restricted optimization unnecessarily complicated and not conducive to develop tests for the cointegrating rank. It is more convenient to consider the so-called vector error correction model (VECM) representation of autoregressive processes, discussed in full detail in the monograph Johansen (1995). To this end let us first introduce the differencing operator at frequency 0 ω π
Δ ω : = I s 2 cos ( ω ) L + L 2 for 0 < ω < π I s cos ( ω ) L for ω { 0 , π } .
For notational brevity, we omit the dependence on L in Δ ω ( L ) , henceforth denoted as Δ ω . Using this notation, the I(1) error correction representation is given by
Δ 0 y t = Π y t 1 + j = 1 p 1 Γ j Δ 0 y t j + ε t + Φ d t = α β y t 1 + j = 1 p 1 Γ j Δ 0 y t j + ε t + Φ d t ,
with the matrix Π : = a ( 1 ) = ( I s + j = 1 p a j ) of rank 0 r s factorized into the product of two full rank matrices α , β R s × r and Γ j : = m = j + 1 p a m , j = 1 , , p 1 .
This constitutes a reparameterization, where k ( z ) V p is now represented by the matrices ( α , β , Γ 1 , , Γ p 1 ) and a corresponding parameter vector θ a VECM Θ p , r VECM . Please note that stacking the entries of the matrices does not lead to a homeomorphic mapping from V p to Θ p , s VECM , since for 0 < r s the matrices α and β are not identifiable from the product α β , since α β = α M M 1 β = α ˜ β ˜ for all regular matrices M R r × r . One way to obtain identifiability is to introduce the restriction β = [ I r , β * ] , with β * R ( s r ) × r and α R s × r . With this additional restriction the parameter vector θ a VECM is given by stacking the vectorized matrices α , β * , Γ 1 , , Γ p 1 , similarly to (2). Then Θ p , r , Φ VECM = Θ p , r VECM × R s m R p s 2 ( s r ) 2 + s m . Note for completeness that the normalization of β = [ I r , β * ] may necessitate a re-ordering of the variables in { y t } t Z since—without potential reordering—this parameterization implies a restriction of generality as, e.g., processes, where the first variable is integrated, but does not cointegrate with the other variables, cannot be represented.
Define the following sets of transfer functions:
V p , r : = a ( z ) 1 V p : det a ( z ) 0 for { z : | z | = 1 , z 1 } , rank ( a ( 1 ) ) r , V p , r R R R : = a ( z ) 1 V p O L S : rank ( a ( 1 ) ) r .
The dimension of the parameter vector θ a VECM depends on the dimension of the cointegrating space, thus the parameterization of k ( z ) V p , r depends on r. The so-called reduced rank regression (RRR) estimator, given by the maximizer of the pseudo likelihood over V p , r R R R is consistent, see, e.g., Johansen (1995, chp. 6). The RRR estimator uses an “implicit” normalization of β and thereby implicitly addresses the mentioned identification problem. However, for testing hypotheses involving the free parameters in α or β , typically the identifying assumption given above is used, as discussed in Johansen (1995, chp. 7).
Furthermore, since V p , r V p , r * for r < r * s , with Θ p , r VECM a lower dimensional subset of Θ p , r * VECM , pseudo likelihood ratio testing can be used to sequentially test for the rank r, starting with the hypothesis of a rank r = 0 against the alternative of a rank 0 < r s , and increasing the assumed rank consecutively until the null hypothesis is not rejected.
Ensuring that { y t } t Z generated from (4) is indeed an I(1) process, requires on the one hand that Π is of reduced rank, i.e., r < s and on the other that the matrix
α Γ β : = α I s j = 1 p 1 Γ j β
has full rank. It is well-known that condition (5) is fulfilled on the complement of a “thin” algebraic subset of V p , r R R R , and is therefore, ignored in estimation, as it is “generically” fulfilled.10
The I(2) case is similar in structure to the I(1) case, but with two rank restrictions and one full rank condition to exclude even higher integration orders. The corresponding VECM is given by
Δ 0 2 y t = α β y t 1 Γ Δ 0 y t 1 + j = 1 p 2 Ψ j Δ 0 2 y t j + ε t ,
with α , β as defined in (4), Γ as defined in (5) and Ψ j : = k = j + 1 p 1 Γ k , j = 1 , , p 2 . From (5) we already know that reduced rank of
α Γ β = : ξ η ,
with ξ , η R ( s r ) × m , m < s r is required for higher integration orders. The condition for the corresponding solution process { y t } t Z to be an I(2) process is given by full rank of
ξ α Γ β ( β β ) 1 ( α α ) 1 α Γ + I s j = 1 p 2 Ψ j β η ,
which again is typically ignored in estimation, just like condition (5) in the I(1) case. Thus, I(2) processes correspond to a “thin subset” of V p , r R R R , which in turn constitutes a “thin subset” of V p O L S . The fact that integrated processes correspond to “thin sets” in V p O L S implies that obtaining estimated systems with specific integration and cointegration properties requires restricted estimation based on parameterizations tailor made to highlight these properties.
Already for the I(2) case, formulating parameterizations that allow conveniently studying the integration and cointegration properties is a quite challenging task. Johansen (1997) contains several different (re-)parameterizations for the I(2) case and Paruolo (1996) defines “integration indices”, r 0 , r 1 , r 2 say, as the number of columns of the matrices β R s × r 0 , β 1 : = β η R s × r 1 and β 2 : = β η R s × r 2 . Clearly, the indices r 0 , r 1 , r 2 are linked to the ranks of the above matrices Π and α Γ β , as r 0 = r and r 1 = m and the columns of [ β , β 1 , β 2 ] form a basis of R s , such that s = r 0 + r 1 + r 2 . It holds that { β 2 y t } t Z is an I(2) process without cointegration and { β 1 y t } t Z is an I(1) process without cointegration. The process { β y t } t Z is typically I ( 1 ) and in this case cointegrates with { β 2 Δ 0 y t } t Z to stationarity. Thus, there is a direct correspondence of these indices to the dimensions of the different cointegrating spaces—both static and dynamic (with precise definitions given below in Definition 3).11 Please note that again, as already before in the I(1) case, different values of the integration indices r 0 , r 1 , r 2 , lead to parameter spaces of different dimensions. Furthermore, in these parameterizations matrices describing different cointegrating spaces are (i) not identified and (ii) linked by restrictions, compare the discussion in Paruolo (2000, sct. 2.2) and (7). These facts render the analysis of the cointegration properties in I(2) VAR systems complicated. Also, in the I(2) VAR case usually some forms of RRR estimators are considered over suitable subsets V p , r , m R R R of V p , r R R R , again based on implicit normalizations. Inference, however, again requires one to consider parameterizations explicitly.
Estimation and inference issues are fundamentally more complex in the VARMA case than in the VAR case. This stems from the fact that unrestricted estimation—unlike in the VAR case—is not possible due to a lack of identification, as discussed below. This means that in the VARMA case identification and parameterization issues need to be tackled as the first step, compare the discussion in Hannan and Deistler (1988, chp. 2).
In this paper, we consider VARMA processes as solutions of the vector difference equation
y t + j = 1 p a j y t j = ε t + j = 1 q b j ε t j + Φ d t ,
with a ( L ) : = I s + j = 1 p a j L j , where a j R s × s for j = 1 , , p , a p 0 and the non-explosiveness condition d e t ( a ( z ) ) 0 for | z | < 1 . Similarly, b ( L ) : = I s + j = 1 q b j L j , where b j R s × s for j = 1 , , q , b q 0 and Φ R s × m . The transfer function corresponding to a VARMA process is k ( z ) : = a ( z ) 1 b ( z ) .
It is well-known that without further restrictions the VARMA realization ( a ( z ) , b ( z ) ) of the transfer function k ( z ) = a ( z ) 1 b ( z ) is not identified, i.e., different pairs of polynomial matrices ( a ( z ) , b ( z ) ) can realize the same transfer function k ( z ) . It is clear that k ( z ) = a ( z ) 1 m ( z ) 1 m ( z ) b ( z ) = a ( z ) 1 b ( z ) for all non-singular polynomial matrices m ( z ) . Thus, the mapping π attaching the transfer function k ( z ) = a ( z ) 1 b ( z ) to the pair of polynomial matrices ( a ( z ) , b ( z ) ) is not injective.12
Consequently, we refer for given rational transfer function k ( z ) to the class { ( a ( z ) , b ( z ) ) : k ( z ) = a ( z ) 1 b ( z ) } as a class of observationally equivalent VARMA realizations of k ( z ) . To achieve identification requires to define a canonical form, selecting one member of each class of observationally equivalent VARMA realizations for a set of considered transfer functions. A first step towards a canonical form is to only consider left coprime pairs ( a ( z ) , b ( z ) ) .13 However, left coprimeness is not sufficient for identification and thus further restrictions are required, leading to parameter vectors of smaller dimension than R s 2 ( p + q ) . A widely used canonical form is the (reverse) echelon canonical form, see Hannan and Deistler (1988, Theorem 2.5.1, p. 59), based on (monic) normalizations of the diagonal elements of a ( z ) and degree relationships between diagonal and off-diagonal elements as well as the entries in b ( z ) , which lead to zero restrictions. The (reverse) echelon canonical form in conjunction with a transformation to an error correction model was used in VARMA cointegration analysis in the I(1) case, e.g., in Poskitt (2006, Theorem 4.1), but, as for the VAR case, understanding the interdependencies of rank conditions already becomes complicated once one moves to the I(2) case.
In the VARMA case matters are further complicated by another well-known problem that makes statistical analysis considerably more involved compared to the VAR case. Although there exists a generalization of the autoregressive order to the VARMA case, such that any transfer function corresponding to a VARMA system has an order n N (with the precise definition given in the next section) it is known since Hazewinkel and Kalman (1976) that no continuous parameterization of all rational transfer functions of order n exists if s > 1 . Therefore, if one wants to keep the above-discussed advantages that continuity of a parameterization provides, the set of transfer functions of order n, henceforth referred to as M n , has to be partitioned into sets on which continuous parameterizations exist, i.e., M n = Γ G M Γ , for some index set G, as already mentioned in the introduction.14 For any given partitioning of the set M n it is important to understand the relationships between the different subsets M Γ , as well as the closures of the pieces M Γ , since in case of misspecification of M Γ points in M ¯ Γ \ M Γ cannot be avoided even asymptotically in, e.g., pseudo maximum likelihood estimation. These are more complicated issues in the VARMA case than in the VAR case, see the discussion in Hannan and Deistler (1988, Remark 1 after Theorem 2.5.3).
Based on these considerations, the following section provides and discusses a parameterization that focuses on unit root and cointegration properties, resorting to the state space framework that—as mentioned in the introduction—provides advantages for cointegration analysis. In particular, we derive an almost everywhere homeomorphic parameterization, based on partitioning the set of all considered transfer functions according to a multi-index Γ that contains, among other elements, the state space unit root structure. This implies that certain cointegration properties are invariant for all systems corresponding to a subset M Γ , i.e., the parameterization allows to directly impose cointegration properties such as the “cointegration indices” of Paruolo (1996) mentioned before.

3. The Canonical Form and the Parameterization

As a first step we define the class of VARMA processes considered in this paper, using the differencing operator defined in (3):
Definition 1.
The s-dimensional real VARMA process { y t } t Z has unit root structure Ω : = ( ω 1 , h 1 ) , , ( ω l , h l ) with 0 ω 1 < ω 2 < < ω l π , h k N , k = 1 , , l , l 1 , if it is a solution of the difference equation
Δ Ω ( y t Φ d t ) : = k = 1 l Δ ω k h k ( y t Φ d t ) = v t ,
where { d t } t Z is an m-dimensional deterministic sequence, Φ R s × m and { v t } t Z is a linearly regular stationary VARMA process, i.e., there exists a pair of left coprime matrix polynomials ( a ( z ) , b ( z ) ) , det a ( z ) 0 , | z | 1 such that v t = a ( L ) 1 b ( L ) ( ε t ) = : c ( L ) ( ε t ) for a white noise process { ε t } t Z with E ( ε t ε t ) = Σ > 0 , with furthermore c ( z ) 0 for z = e i ω k , k = 1 , , l .
  • The process { y t } t Z is called unit root process with unit roots z k : = e i ω k for k = 1 , , l , the set F ( Ω ) : = { ω 1 , , ω l } is the set of unit root frequencies and the integers h k , k = 1 , , l are the integration orders.
  • A unit root process with unit root structure ( ( 0 , d ) ) , d N , is an I(d)process.
  • A unit root process with unit root structure ( ( ω 1 , 1 ) , , ( ω l , 1 ) ) is an MFI(1), process.
A linearly regular stationary VARMA process has empty unit root structure Ω 0 : = { } .
As discussed in Bauer and Wagner (2012) the state space framework is convenient for the analysis of VARMA unit root processes. Detailed treatments of the state space framework are given in Hannan and Deistler (1988) and—in the context of unit root processes—Bauer and Wagner (2012).
A state space representation of a unit root VARMA process is15
y t = C x t + Φ d t + ε t , x t + 1 = A x t + B ε t ,
for a white noise process { ε t } t Z , ε t R s , a deterministic process { d t } t Z , d t R m and the unobserved state process { x t } t Z , x t C n , A C n × n , B C n × s , C C s × n and Φ R s × m .
Remark 3. 
Bauer and Wagner (2012, Theorem 2) show that every real valued unit root VARMA process { y t } t Z as given in (8) has a real valued state space representation with { x t } t Z real valued and real valued system matrices ( A , B , C ) . Considering complex valued state space representations in (9) is merely for algebraic convenience, as in general some eigenvalues of A are complex valued. Note for completeness that Bauer and Wagner (2012) contains a detailed discussion why considering the A-matrix in the canonical form in (up to reordering) the Jordan normal form is useful for cointegration analysis. For the sake of brevity we abstain from including this discussion again in the present paper. The key aspect of this construction is its usefulness for cointegration analysis, which becomes visible in Remark 4, where the “simple” unit root properties of blocks of the state vector are discussed.
The transfer function k ( z ) with real valued power series coefficients corresponding to a real valued unit root process { y t } t Z as given in Definition 1 is given by the rational matrix function k ( z ) = Δ Ω ( z ) 1 a ( z ) 1 b ( z ) . The (possibly complex valued) matrix triple ( A , B , C ) realizes the transfer function k ( z ) if and only if π ( A , B , C ) : = I s + z C ( I n z A ) 1 B = k ( z ) . Please note that as for VARMA realizations, for a transfer function k ( z ) there exist multiple state space realizations ( A , B , C ) , with possibly different state dimensions n. A state space system ( A , B , C ) is minimal if there exists no state space system of lower state dimension realizing the same transfer function k ( z ) . The order of the transfer function k ( z ) is the state dimension of a minimal system ( A , B , C ) realizing k ( z ) .
All minimal state space realizations of a transfer function k ( z ) only differ in the basis of the state (cf. Hannan and Deistler 1988, Theorem 2.3.4), i.e., π ( A , B , C ) = π ( A ˜ , B ˜ , C ˜ ) for two minimal state space systems ( A , B , C ) and ( A ˜ , B ˜ , C ˜ ) is equivalent to the existence of a regular matrix T C n such that A = T A ˜ T 1 , B = T B ˜ , C = C ˜ T 1 . Thus, the matrices A and A ˜ are similar for all minimal realizations of a transfer function k ( z ) .
By imposing restrictions on the matrices of a minimal state space system ( A , B , C ) realizing k ( z ) , Bauer and Wagner (2012, Theorem 2) provide a canonical form, i.e., a mapping of the set M n of transfer functions with real valued power series coefficients defined below onto unique state space realizations ( A , B , C ) . The set M n is defined as
M n : = k ( z ) = π ( A , B , C ) | λ | max | ( A ) 1 , A R n × n , B R n × s , C R s × n , ( A , B , C ) minimal .
To describe the necessary restrictions of the canonical form the following definition is useful:
Definition 2.
A matrix B = [ b i , j ] i = 1 , , c , j = 1 , , s C c × s is positive upper triangular (p.u.t.) if there exist integers 1 j 1 j 2 j c s , such that for j i s we have b i , j = 0 , j < j i , j i < j i + 1 , b i , j i R + ; i.e., B is of the form
B = 0 0 b 1 , j 1 * * 0 0 b 2 , j 2 * 0 0 b c , j c * ,
where the symbol * indicates unrestricted complex-valued entries.
A unique state space realization of k ( z ) M n is given as follows (cf. Bauer and Wagner 2012, Theorem 2):
Theorem 1.
For every transfer function k ( z ) M n there exists a unique minimal (complex) state space realization ( A , B , C ) such that
y t = C x t , C + Φ d t + ε t , x t + 1 , C = A x t , C + B ε t
with:
(i) 
A : = d i a g ( A u , A ) : = d i a g ( A 1 , C , , A l , C , A ) , A u C n u × n u , A R n × n , where it holds for k = 1 , , l that
-
for 0 < ω k < π :
A k , C : = J k 0 0 J ¯ k C 2 d k × 2 d k ,
-
for ω k { 0 , π } :
A k , C : = J k R d k × d k ,
with
J k : = z k ¯ I d 1 k [ I d 1 k , 0 d 1 k × ( d 2 k d 1 k ) ] 0 0 0 d 2 k × d 1 k z k ¯ I d 2 k [ I d 2 k , 0 d 2 k × ( d 3 k d 2 k ) ] 0 0 0 z k ¯ I d 3 k 0 [ I d h k 1 k , 0 d h k 1 k × ( d h k k d h k 1 k ) ] 0 0 0 z k ¯ I d h k k ,
where 0 < d 1 k d 2 k d h k k .
(ii) 
B : = [ B u , B ] : = [ B 1 , C , , B l , C , B ] and C : = [ C u , C ] : = [ C 1 , C , , C l , C , C ] are partitioned accordingly. It holds for k = 1 , , l that
-
for 0 < ω k < π :
B k , C : = B k B ¯ k C 2 d k × s a n d C k , C : = C k , C ¯ k C s × 2 d k .
-
for ω k { 0 , π } :
B k , C : = B k R d k × s a n d C k , C : = C k R s × d k .
(iii) 
Partitioning B k , h k in B k = [ B k , 1 , , B k , h k ] as B k , h k = [ B k , h k , 1 , , B k , h k , h k ] , with B k , h k , j C ( d j k d j 1 k ) × s it holds that B k , h k , j is p.u.t. for d j k > d j 1 k for j = 1 , , h k and k = 1 , , l .
(iv) 
For k = 1 , , l define C k = [ C k , 1 , C k , 2 , , C k , h k ] , C k , j = [ C k , j G , C k , j E ] , with C k , j E C s × ( d j k d j 1 k ) and C k , j G C s × d j 1 k for j = 1 , , h k , with d 0 k : = 0 . Furthermore, define C k E : = [ C k , 1 E , , C k , h k E ] C s × d h k k . It holds that ( C k E ) C k E = I d h k k and ( C k , j G ) C k , i E = 0 for 1 i j for j = 2 , , h k and k = 1 , , l .
(v) 
λ | max | ( A ) < 1 and the stable subsystem ( A , B , C ) of state dimension n = n n u is in echelon canonical form (cf. Hannan and Deistler 1988, Theorem 2.5.2).
Remark 4.
As indicated in Remark 3 and discussed in detail in Bauer and Wagner (2012) considering complex valued quantities is merely for algebraic convenience. For econometric analysis, interest is, of course, on real valued quantities. These can be straightforwardly obtained from the representation given in Theorem 1 as follows. First define a transformation matrix (and its inverse):
T R , d : = I d 1 i , I d 1 i C 2 d × 2 d , T R , d 1 : = 1 2 I d 1 , i I d 1 , i .
Starting from the complex valued canonical representation ( A , B , C ) , a real valued canonical representation
y t = C R x t , R + Φ d t + ε t , x t + 1 , R = A R x t , R + B R ε t ,
with real valued matrices ( A R , B R , C R ) follows from using the just defined transformation matrix. In particular it holds that:
A R : = d i a g ( A u , R , A ) : = d i a g ( A 1 , R , , A l , R , A ) , B R : = [ B u , R , B ] : = [ B 1 , R , , B l , R , B ] , C R : = [ C u , R , C ] : = [ C 1 , R , , C l , R , C ] ,
with
A k , R , B k , R , C k , R : = T R , d k A k T R , d k 1 , T R , d k B k , C k T R , d k 1 if 0 < ω k < π , A k , B k , C k if ω k { 0 , π } .
Before we turn to the real valued state process corresponding to the real valued canonical representation, we first consider the complex valued state process { x t , C } t Z in more detail. This process is partitioned according to the partitioning of the matrices C k , C into x t , C : = [ x t , u , x t , ] : = [ x t , 1 , C , , x t , l , C , x t , ] , where
x t , k , C : = [ x t , k , x ¯ t , k ] if 0 < ω k < π , x t , k if ω k { 0 , π } ,
with
x t + 1 , k = J k x t , k + B k ε t , for k = 1 , , l .
For k = 1 , , l the sub-vectors x t , k are further decomposed into x t , k : = [ ( x t , k 1 ) , , ( x t , k h k ) ] , with x t , k j C d j k for j = 1 , , h k according to the partitioning C k = [ C k , 1 , , C k , h k ] .
The partitioning of the complex valued process { x t , C } t Z leads to an analogous partitioning of the real valued state process { x t , R } t Z , x t , R : = [ x t , u , R , x t , ] : = [ x t , 1 , R , , x t , l , R , x t , ] , obtained from
x t , k , R : = T R , d k x t , k , C if 0 < ω k < π , x t , k if ω k { 0 , π } ,
with the corresponding block of the state equation given by
x t + 1 , k , R = A k , R x t , k , R + B k , R ε t .
For k = 1 , , l the sub-vectors x t , k , R are further decomposed into x t , k , R : = [ ( x t , k , R 1 ) , , ( x t , k , R h k ) ] , with x t , k , R j R 2 d j k if 0 < ω k < π and x t , k , R j R d j k if ω k { 0 , π } for j = 1 , , h k and C k , R : = [ C k , 1 , R , , C k , h k , R ] decomposed accordingly.
Bauer and Wagner (2012, Theorem 3, p. 1328) show that the processes { x t , k , R j } t Z have unit root structure ( ( ω k , h k j + 1 ) ) for j = 1 , , h k and k = 1 , , l . Furthermore, for j = 1 , , h k and k = 1 , , l the processes { x t , k , R j } t Z are not cointegrated, as defined in Definition 3 below. For ω k = 0 , the process { x t , k , R j } t Z is the d k j -dimensional process of stochastic trends of order h 1 j + 1 , while the 2 d j k components of { x t , k , R j } t Z , for 0 < ω k < π , and the d j k components of { x t , l , R j } t Z , for ω k = π , are referred to as stochastic cycles of order h k j + 1 at their corresponding frequencies ω k .
Remark 5.
Parameterizing the stable part of the transfer function using the echelon canonical form is merely one possible choice. Any other canonical form of the stable subsystem and suitable parameterization based on it can be used instead for the stable subsystem.
Remark 6.
Starting from a state space system (9) with matrices ( A , B , C ) in canonical form, a solution for y t , t > 0 (with the solution for t < 0 obtained completely analogously)—for some x 1 = [ x 1 , u , x 1 , ] —is given by
y t = j = 1 t 1 C u A u j 1 B u ε t j + C u A u t 1 x 1 , u + j = 1 t 1 C A j 1 B ε t j + C A t 1 x 1 , + Φ d t + ε t .
Clearly, the term C u A u t 1 x 1 , u is stochastically singular and is effectively like a deterministic component, which may lead to an identification problem with Φ d t . If, the deterministic component Φ d t is rich enough to “absorb” C u A u t 1 x 1 , u , then one solution of the identification problem is to set x 1 , u = 0 . Rich enough here means, e.g., in the I(1) case with A u = I that d t contains an intercept. Analogously, in the MFI(1) case d t has to contain seasonal dummy variables corresponding to all unit root frequencies. The term C A t 1 x 1 , decays exponentially and, therefore, does not impact the asymptotic properties of any statistical procedure. It is, therefore, inconsequential for statistical analysis but convenient (with respect to our definition of unit root processes) to set x 1 , = j = 1 A j 1 B ε 1 j . This corresponds to the steady state or stationary solution of the stable block of the state equation, and renders { x t , } t N or, when the solution on Z is considered, { x t , } t Z stationary. Please note that these issues with respect to starting values, potential identification problems and their impact or non-impact on statistical procedures also occur in the VAR setting.
Bauer and Wagner (2012, Theorem 2) show that minimality of the canonical state space realization ( A , B , C ) implies full row rank of the p.u.t. blocks B k , h k , j of B k , h k . In addition to proposing the canonical form, Bauer and Wagner (2012) also provide details how to transform any minimal state space realization into canonical form: Given a minimal state space system ( A , B , C ) realizing the transfer function k ( z ) M n , the first step is to find a similarity transformation T such that A ˜ = T A T 1 is of the form given in (10) by using an eigenvalue decomposition, compare Chatelin (1993). In the second step the corresponding subsystem ( A ˜ , B ˜ , C ˜ ) is transformed to echelon canonical form as described in Hannan and Deistler (1988, chp. 2). These two transformations do not lead to a unique realization, because the restrictions on A do not uniquely determine the unstable subsystem ( A u , B u , C u ) .
For example, in the case Ω = ( ( ω 1 , h 1 ) ) = ( ( 0 , 1 ) ) , n = 0 , d 1 1 < s , such that ( I d 1 1 , B 1 , C 1 ) is a corresponding state space system, the same transfer function k ( z ) = I s + z C 1 ( 1 z ) 1 B 1 = I s + C 1 B 1 z ( 1 z ) 1 is realized also by all systems ( I d 1 1 , T B 1 , C 1 T 1 ) , with some regular matrix T C d 1 1 × d 1 1 . To find a unique realization the product C 1 B 1 needs to be uniquely decomposed into factors C 1 and B 1 . This is achieved by performing a QR decomposition of C 1 B 1 (without pivoting) that leads to C 1 C 1 = I . The additional restriction of B 1 being a p.u.t. matrix of full row rank then leads to a unique factorization of C 1 B 1 into C 1 and B 1 . In the general case with an arbitrary unit root structure Ω , similar arguments lead to p.u.t. restrictions on sub-blocks B k , h k , j in B u and orthogonality restrictions on sub-blocks of C u .
The canonical form introduced in Theorem 1 was designed to be useful for cointegration analysis. To see this, first requires a definition of static and polynomial cointegration (cf. Bauer and Wagner 2012, Definitions 3 and 4).
Definition 3. 
(i) 
Let Ω ˜ = ( ( ω ˜ 1 , h ˜ 1 ) , , ( ω ˜ l ˜ , h ˜ l ˜ ) ) and Ω = ( ( ω 1 , h 1 ) , , ( ω l , h l ) ) be two unit root structures. Then Ω ˜ Ω if
-
F ( Ω ˜ ) F ( Ω ) .
-
For all ω F ( Ω ˜ ) for k ˜ and k such that ω ˜ k ˜ = ω k = ω it holds that h ˜ k ˜ h k .
Furthermore, Ω ˜ Ω if Ω ˜ Ω and Ω ˜ Ω . For two unit root structures Ω ˜ Ω define the decrease δ k ( Ω , Ω ˜ ) of the integration order at frequency ω k , for k = 1 , , l , as
δ k ( Ω , Ω ˜ ) : = h k h ˜ k ˜ k ˜ : ω ˜ k ˜ = ω k F ( Ω ˜ ) , h k ω k F ( Ω ˜ ) .
(ii) 
An s-dimensional unit root process { y t } t Z with unit root structure Ω is cointegrated of order ( Ω , Ω ˜ ) , where Ω ˜ Ω , if there exists a vector β R s , β 0 , such that { β y t } t Z has unit root structure Ω ˜ . In this case the vector β is a cointegrating vector (CIV) of order ( Ω , Ω ˜ ) .
(iii) 
All CIVs of order ( Ω , Ω ˜ ) span the (static) cointegrating space of order ( Ω , Ω ˜ ) .16
(iv) 
An s-dimensional unit root process { y t } t Z with unit root structure Ω is polynomially cointegrated of order ( Ω , Ω ˜ ) , where Ω ˜ Ω , if there exists a vector polynomial β ( z ) = m = 0 q β m z m , β m R s , m = 0 , , q , β q 0 , for some integer 1 q < such that
-
β ( L ) ( { y t } t Z ) has unit root structure Ω ˜ ,
-
max k = 1 , , l β ( e i ω k ) δ k ( Ω , Ω ˜ ) 0 .
In this case the vector polynomial β ( z ) is a polynomial cointegrating vector (PCIV) of order ( Ω , Ω ˜ ) .
(v) 
All PCIVs of order ( Ω , Ω ˜ ) span the polynomial cointegrating space of order ( Ω , Ω ˜ ) .
Remark 7. 
(i) 
It is merely a matter of taste whether cointegrating spaces are defined in terms of their order ( Ω , Ω ˜ ) or their decrease δ ( Ω , Ω ˜ ) : = ( δ 1 ( Ω , Ω ˜ ) , , δ l ( Ω , Ω ˜ ) ) , with δ k ( Ω , Ω ˜ ) as defined above. Specifying Ω and δ ( Ω , Ω ˜ ) contains the same information as providing the order of (polynomial) cointegration.
(ii) 
Notwithstanding the fact that CIVs and PCIVs in general may lead to changes of the integration orders at different unit root frequencies it may be of interest to “zoom in” on only one unit root frequency ω k , thereby leaving the potential reductions of the integration orders at other unit root frequencies unspecified. This allows to—entirely similarly as in Definition 3—define cointegrating and polynomial cointegrating spaces of different orders at a single unit root frequency ω k . Analogously one can also define cointegrating and polynomial cointegrating spaces of different orders for subsets of the frequencies in F ( Ω ) .
(iii) 
In principle the polynomial cointegrating spaces defined so far are infinite-dimensional as the polynomial degree is not bounded. However, since every polynomial vector β ( z ) can be written as β 0 ( z ) + β Ω ( z ) Δ Ω ( z ) , where by definition { Δ Ω y t } t Z has empty unit root structure, it suffices to consider PCIVs of polynomial degree smaller than the polynomial degree of Δ Ω ( z ) . This shows that it is sufficient to consider finite dimensional polynomial cointegrating spaces. When considering, as in item (ii), (polynomial) cointegration only for one unit root it similarly suffices to consider polynomials of maximal degree equal to h k 1 for real unit roots and 2 h k 1 for complex unit roots. Thus, in the I(2) case it suffices to consider polynomials of degree one.
(iv) 
The argument about maximal relevant polynomial degrees given in item (iii) can be made more precise and combined with the decrease in Ω achieved. Every polynomial vector β ( z ) can be written as β 0 ( z ) + β ω k , δ k ( z ) Δ ω k δ k ( z ) for δ k = 1 , , h k . By definition it holds that { Δ ω k δ k y t } t Z has integration order h k δ k at frequency ω k . Thus, it suffices to consider PCIVs of polynomial degree smaller than δ k for ω k { 0 , π } or 2 δ k for 0 < ω k < π when considering the polynomial cointegrating space at ω k with decrease δ k . In the MFI(1) case therefore, when considering only one unit root frequency, again only polynomials of degree one need to be considered. This space is often referred to in the literature as dynamic cointegration space.
To illustrate the advantages of the canonical form for cointegration analysis consider
y t = k = 1 l j = 1 h k C k , j , R x t , k , R j + C x t , + Φ d t + ε t .
By Remark 4, the process { x t , k , R j } t Z is not cointegrated. This implies that β R s , β 0 , reduces the integration order at unit root z k to h k j if and only if β [ C k , 1 , R , , C k , j , R ] = 0 and β C k , j + 1 , R 0 or equivalently β [ C k , 1 , , C k , j ] = 0 and β C k , j + 1 0 (using the transformation to the complex matrices of the canonical form, as discussed in Remark 4, and that β [ C k , C k ¯ ] = 0 if and only if β C k = 0 ). Thus, the CIVs are characterized by orthogonality to sub-blocks of C u .
The real valued representation given in Remark 4 used in its partitioned form just above immediately leads to necessary orthogonality constraint for polynomial cointegration of degree one:
β ( L ) ( y t ) = β ( L ) ( C u , R x t , u , R + C x t , + Φ d t + ε t ) = β 0 C u , R x t , u , R + β 1 C u , R x t 1 , u , R + β ( L ) ( C x t , + Φ d t + ε t ) = β 0 C u , R ( A u , R x t 1 , u , R + B u , R ε t 1 ) + β 1 C u , R x t 1 , u , R + β ( L ) ( C x t , + Φ d t + ε t ) = ( β 0 C u , R A u , R + β 1 C u , R ) x t 1 , u , R + β 0 C u , R B u , R ε t 1 + β ( L ) ( C x t , + Φ d t + ε t ) = ( β 0 C u A u + β 1 C u ) x t 1 , u + β 0 C u B u ε t 1 + β ( L ) ( C x t , + Φ d t + ε t )
follows. Since all terms except the first are stationary or deterministic, a necessary condition for a reduction of the unit root structure is the orthogonality of [ β 0 β 1 ] to sub-blocks of C u , R A u , R C u , R or sub-blocks of the complex matrix C u A u C u . Please note, however, that this orthogonality condition is not sufficient for [ β 0 , β 1 ] to be a PCIV, because it does not imply max k = 1 , , l β ( e i ω k ) δ k ( Ω , Ω ˜ ) 0 . For a detailed discussion of polynomial cointegration, when considering also higher polynomial degrees, see Bauer and Wagner (2012, sct. 5).
The following examples illustrate cointegration analysis in the state space framework for the empirically most relevant, i.e., the I(1), MFI(1) and I(2) cases.
Example 1 (Cointegration in the I(1) case).
In the I(1) case, neglecting the stable subsystem and the deterministic components for simplicity, it holds that
y t = C 1 x t , 1 + ε t , y t , ε t R s , x t , 1 R d 1 1 , C 1 R s × d 1 1 , x t + 1 , 1 = x t , 1 + B 1 ε t , B 1 R d 1 1 × s .
The vector β R s , β 0 , is a CIV of order ( ( 0 , 1 ) , { } ) if and only if β C 1 = 0 .
Example 2 (Cointegration in the MFI(1) case with complex unit root z k ).
In the MFI(1) case with unit root structure Ω = ( ( ω k , 1 ) ) and complex unit root z k , neglecting the stable subsystem and the deterministic components for simplicity, it holds that
y t = C k , R x t , k , R + ε t = [ C k C ¯ k ] x t , k x ¯ t , k + ε t , y t , ε t R s , x t , k , R R 2 d 1 k , x t , k C d 1 k , C k , R R s × 2 d 1 k , C k C s × d 1 k , x t + 1 , k x ¯ t + 1 , k = z ¯ k I d 1 k 0 0 z k I d 1 k x t , k x ¯ t , k + B k B ¯ k ε t , B k C d 1 k × s .
The vector β R s , β 0 , is a CIV of order ( Ω , { } ) if and only if
β C k = 0 ( and thus β C ¯ k = 0 ) .
The vector polynomial β ( z ) = β 0 + β 1 z , with β 0 , β 1 R s , [ β 0 , β 1 ] 0 , is a PCIV of order ( Ω , { } ) if and only if
[ β 0 , β 1 ] z ¯ k C k z k C ¯ k C k C ¯ k = 0 ,
which is equivalent to
( z ¯ k β 0 + β 1 ) C k = 0 .
The fact that the matrix in (11) has a block structure with two blocks of conjugate complex columns implies some additional structure also on the space of PCIVs, here with polynomial degree one. More specifically it holds that if β 0 + β 1 z is a PCIV of order ( Ω , { } ) , also β 1 + ( β 0 + 2 cos ( ω k ) β 1 ) z is a PCIV of order ( Ω , { } ) . This follows from
( z ¯ k ( β 1 ) + ( β 0 + 2 cos ( ω k ) β 1 ) ) C k = ( β 0 + ( 2 R ( z k ) z ¯ k ) β 1 ) C k = ( β 0 + z k β 1 ) C k = z k ( z ¯ k β 0 + β 1 ) C k = 0 .
Thus, the space of PCIVs of degree (up to) one inherits some additional structure emanating from the occurrence of complex eigenvalues in complex conjugate pairs.
Example 3 (Cointegration in the I(2) case).
In the I(2) case, neglecting the stable subsystem and the deterministic components for simplicity, it holds that
y t = C 1 , 1 E x t , 1 E + C 1 , 2 G x t , 2 G + C 1 , 2 E x t , 2 E + ε t , y t , ε t R s , x t , 1 E , x t , 2 G R d 1 1 , x t , 2 E R d 2 1 d 1 1 , C 1 , 1 E , C 1 , 2 G R s × d 1 1 , C 1 , 2 E R s × ( d 2 1 d 1 1 ) , x t + 1 , 1 E = x t , 1 E + x t , 2 G + B 1 , 1 ε t , x t + 1 , 2 G = x t , 1 E + x t , 2 G + B 1 , 2 , 1 ε t , x t + 1 , 2 E = x t , 1 E + x t , 2 E + B 1 , 2 , 2 ε t , B 1 , 1 R d 1 1 × s , B 1 , 2 , 1 R d 1 1 × s , B 1 , 2 , 2 R ( d 2 1 d 1 1 ) × s .
The vector β R s , β 0 is a CIV of order ( ( 0 , 2 ) , ( 0 , 1 ) ) if and only if
β C 1 , 1 E = 0 and β [ C 1 , 2 G , C 1 , 2 E ] 0 .
The vector β R s , β 0 , is a CIV of order ( ( 0 , 2 ) , { } ) if and only if
β [ C 1 , 1 E , C 1 , 2 G , C 1 , 2 E ] = 0 .
The vector polynomial β ( z ) = β 0 + β 1 z , with β 0 , β 1 R s is a PCIV of order ( ( 0 , 2 ) , { } ) if and only if
[ β 0 , β 1 ] C 1 , 1 E C 1 , 1 E + C 1 , 2 G C 1 , 2 E C 1 , 1 E C 1 , 2 G C 1 , 2 E = 0 and β ( 1 ) = β 0 + β 1 0 .
The above orthogonality constraint indicates that the two cases C 1 , 2 G = 0 and C 1 , 2 G 0 have to be considered separately for polynomial cointegration analysis. Consider first the case C 1 , 2 G = 0 . In this case the orthogonality constraints imply β 0 C 1 , 1 E = 0 , β 1 C 1 , 1 E = 0 and ( β 0 + β 1 ) C 1 , 2 E = 0 . Thus, the vector β 0 + β 1 is a CIV of order ( ( 0 , 2 ) , { } ) and therefore β ( z ) = β 0 + β 1 z is of “non-minimum” degree, one in this case rather than zero ( β 0 + β 1 ). For a formal definition of minimum degree PCIVs see Bauer and Wagner (2003, Definition 4). In case C 1 , 2 G 0 there are PCIVs of degree one that are not simple transformations of static CIVs. Consider β ( z ) = β 0 + β 1 z = γ 1 ( 1 z ) + γ 2 such that { γ 1 ( y t y t 1 ) + γ 2 y t } t Z is stationary. The integrated contribution to { γ 1 ( y t y t 1 ) } t Z is given by γ 1 ( 1 L ) ( { C 1 , 1 E x t , 1 E } t Z ) = { γ 1 C 1 , 1 E x t 1 , 2 G + γ 1 C 1 , 1 E B 1 , 1 ε t 1 } t Z , with γ 1 C 1 , 1 E 0 . This term is eliminated by { γ 2 C 1 , 2 G x t , 2 G } t Z in { γ 2 y t } t Z , if γ 1 C 1 , 1 E + γ 2 C 1 , 2 G = 0 , which is only possible if C 1 , 2 G 0 . Additionally, γ 2 [ C 1 , 1 E , C 1 , 2 E ] = 0 needs to hold, such that there is no further integrated contribution to { γ 2 y t } t Z . Neither γ 1 nor γ 2 are CIVs since both violate the necessary conditions given in the definition of CIVs, which implies that β ( z ) is indeed a “minimum degree” PCIV.
As was shown above, the unit root and cointegration properties of { y t } t Z depend on the sub-blocks of C u and the eigenvalue structure of A u . We, therefore, define the more encompassing state space unit root structure containing information on the geometrical and algebraic multiplicities of the eigenvalues of A u (cf. Bauer and Wagner 2012, Definition 2).
Definition 4.
A unit root process { y t } t Z with a canonical state space representation as given in Theorem 1 has state space unit root structure
Ω S : = ( ω 1 , d 1 1 , , d h 1 1 ) , , ( ω l , d 1 l , , d h l l )
where 0 d 1 k d 2 k d h k k s for k = 1 , , l . For { y t } t Z with empty unit root structure Ω S : = { } .
Remark 8.
The state space unit root structure Ω S contains information concerning the integration properties of the process { y t } t Z , since the integers d j k , k = 1 , , l , j = 1 , , h k describe (multiplied by two for k such that 0 < ω k < π ) the numbers of non-cointegrated stochastic trends or cycles of corresponding integration orders, compare again Remark 4. As such, Ω S describes properties of the stochastic process { y t } t Z —and, therefore, the state space unit root structure Ω S partitions unit root processes according to these (co-)integration properties. These (co-)integration properties, however, are invariant to a chosen canonical representation, or more generally invariant to whether a VARMA or state space representation is considered. For all minimal state representations of a unit root process { y t } t Z these indices—being related to the Jordan normal form—are invariant.
As mentioned in Section 2, Paruolo (1996, Definition 3) introduces integration indices at frequency zero as a triple of integers ( r 0 , r 1 , r 2 ) . These correspond to the numbers of columns of the matrices β , β 1 , β 2 in the error correction representation of I(2) VAR processes, see, e.g., Johansen (1997, sct. 3). Here, r 2 is the number of stochastic trends of order two, i.e., r 2 = d 1 1 . Furthermore, r 1 is the number of stochastic trends of order one that do not cointegrate with β 2 Δ 0 { y t } t Z and hence r 1 = d 2 1 d 1 1 . Therefore, the integration indices at frequency zero are in one-one correspondence with the state space unit root structure Ω S = ( ( 0 , d 1 1 , d 2 1 ) ) for I(2) processes and the dimension s = r 0 + r 1 + r 2 of the process.
The canonical form given in Theorem 1 imposes p.u.t. structures on sub-blocks of the matrix B u . The occurrence of these blocks—related to d j k > d j 1 k —is determined by the state space unit root structure Ω S . The number of free entries in these p.u.t.-blocks, however, is not determined by Ω S . Consequently, we need structure indices p N 0 n u indicating for each row the position of a potentially restricted positive element, as formalized below:
Definition 5 (Structure indices).
For the block B u C n u × s of the matrix B of a state space realization ( A , B , C ) in canonical form, define the corresponding structure indices p N 0 n u as
p i : = 0 if the i - th row of B u is not part of a p . u . t . block , j if the i - th row of B u is part of a p . u . t . block and its j - th entry is restricted to be positive .
Remark 9.
Since sub-blocks of B u corresponding to complex unit roots are of the form B k , C = [ B k , B ¯ k ] , the entries restricted to be positive are located in the same columns and rows of both B k and B ¯ k . Thus, the structure indices p i of the corresponding rows are identical for B k and B ¯ k . Therefore, it would be possible to omit the parts of p corresponding to the blocks B ¯ k . It is, however, as will be seen in Definition 9, advantageous for the comparison of unit root structures and structure indices that p is a vector with n u entries.
Example 4.
Consider the following state space system:
y t = C 1 , 1 E C 1 , 2 G C 1 , 2 E x t + ε t y t , ε t R 2 , x t R 3 , C 1 , 1 E , C 1 , 2 G , C 1 , 2 E R 2 × 1 x t + 1 = 1 1 0 0 1 0 0 0 1 x t + B 1 , 1 B 1 , 2 , 1 B 1 , 2 , 2 ε t , x 0 = 0 , B 1 , 1 , B 1 , 2 , 1 , B 1 , 2 , 2 R 1 × 2 .
In canonical form B 1 , 2 , 1 and B 1 , 2 , 2 are p.u.t. matrices and B 1 , 1 is unrestricted. If, e.g., the second entry b 1 , 2 , 1 , 2 of B 1 , 2 , 1 and the first entry b 1 , 2 , 2 , 1 of B 1 , 2 , 2 are restricted to be positive, then
B = * * 0 b 1 , 2 , 1 , 2 b 1 , 2 , 2 , 1 * ,
where the symbol * denotes unrestricted entries. In this case p = [ 0 , 2 , 1 ] .
For given state space unit root structure Ω S the matrix A u is fully determined. The parameterization of the set of feasible matrices B u for given structure indices p and of the set of stable subsystems ( A , B , C ) for given Kronecker indices α (cf. Hannan and Deistler 1988, chp. 2.) is straightforward, since the entries in these matrices are either unrestricted, restricted to zero or restricted to be positive. Matters are a bit more complicated for C u . One possibility to parameterize the set of possible matrices C u for a given state space unit root structure Ω S is to use real and complex valued Givens rotations (cf. Golub and van Loan 1996, chp. 5.1).
Definition 6 (Real Givens rotation).
The real Givens rotation R q , i , j ( θ ) R q × q , θ [ 0 , 2 π ) is defined as
R q , i , j ( θ ) : = I i 1 0 cos ( θ ) 0 sin ( θ ) 0 I j 1 i 0 sin ( θ ) 0 cos ( θ ) 0 I q j .
Remark 10.
Givens rotations allow transforming any vector v = [ v 1 , v 2 , . . . , v q ] R q into a vector of the form [ v ˜ 1 , 0 , . . . , 0 ] with v ˜ 1 0 . This is achieved by the following algorithm:
1. 
Set j = 1 , v 1 ( 1 ) = v 1 and v ( 1 ) = v .
2. 
Represent [ v 1 ( j ) , v q j + 1 ] using polar coordinates as [ v 1 ( j ) , v q j + 1 ] = [ r j cos ( θ q j ) , r j sin ( θ q j ) ] , with r j 0 and θ q j [ 0 , 2 π ) . If r j = 0 , set θ q j = 0 (cf. Otto 2011, chp. 1.5.3, p. 39). Then R 2 , 1 , 2 ( θ q j ) [ v 1 ( j ) , v q j + 1 ] = [ v 1 ( j + 1 ) , 0 ] such that v ( j + 1 ) = R q , 1 , q j + 1 ( θ q j ) v ( j ) = [ v 1 ( j + 1 ) , v 2 , , v q j , 0 , , 0 ] , with v 1 ( j + 1 ) 0 .
3. 
If j = q 1 , stop. Else increment j by one ( j j + 1 ) and continue at step 2.
This algorithm determines a unique vector θ = [ θ 1 , . . . , θ q 1 ] for every vector v R q .
Remark 11.
The determinant of real Givens rotations is equal to one, i.e., det ( R s , i , j ( θ ) ) = 1 for all s , i , j N and all θ [ 0 , 2 π ) . Thus, it is not possible to factorize an orthonormal matrix Q with det ( Q ) = 1 into a product of Givens rotations. This obvious fact has implications for the parameterization of C -matrices as is detailed below.
Definition 7 (Complex Givens rotation).
The complex Givens rotation Q q , i , j ( φ ) C q × q , φ : = [ φ 1 , φ 2 ] Θ C : = [ 0 , π / 2 ] × [ 0 , 2 π ) , is defined as
Q q , i , j ( φ ) : = I i 1 0 cos ( φ 1 ) 0 sin ( φ 1 ) e i φ 2 0 I j 1 i 0 sin ( φ 1 ) e i φ 2 0 cos ( φ 1 ) 0 I q j .
Remark 12.
Complex Givens rotations allow transforming any vector v = [ v 1 , v 2 , . . . , v q ] C q into a vector of the form [ v ˜ 1 , 0 , . . . , 0 ] with v ˜ 1 C . This is achieved by the following algorithm:
1. 
Set j = 1 , v 1 ( 1 ) = v 1 and v ( 1 ) = v .
2. 
Represent [ v 1 ( j ) , v q j + 1 ] using polar coordinates as [ v 1 ( j ) , v q j + 1 ] = [ a j e i φ a , j , b j e i φ b , j ] , with a j , b j 0 and φ a , j , φ b , j [ 0 , 2 π ) . If v 1 ( j ) = 0 , set φ a , j = 0 and if v q j + 1 = 0 , set φ b , j = 0 (cf. Otto 2011, chp. 8.1.3, p. 222).
3. 
Set
φ q j , 1 = tan 1 b j a j if a j > 0 , π / 2 if a j = 0 , b j > 0 , 0 if a j = 0 , b j = 0 , φ q j , 2 = φ a , j φ b , j mod 2 π .
Then Q 2 , 1 , 2 ( φ q j ) [ v 1 ( j ) , v q j + 1 ] = [ v 1 ( j + 1 ) , 0 ] such that v ( j + 1 ) = Q q , 1 , q j + 1 ( θ q 1 ) v ( j ) = [ v 1 ( j + 1 ) , v 2 , , v q j , 0 ] , with v 1 ( j + 1 ) C .
4. 
If j = q 1 , stop. Else increment j by one ( j j + 1 ) and continue at step 2.
This algorithm determines a unique vector φ = [ φ 1 , 1 , φ 1 , 2 , . . . , φ q 1 , 2 ] for every vector v C q .
To set the stage for the general case, we start the discussion of the parameterization of the set of matrices ( A , B , C ) in canonical form with the MFI(1) and I(2) cases. These two cases display all ingredients required later for the general case. The MFI(1) case illustrates the usage of either real or complex Givens rotations, depending on whether the considered C -block corresponds to a real or complex unit root. The I(2) case highlights recursive orthogonality constraints on the parameters of the C -block, which are related to the polynomial cointegration properties (cf. Example 3).

3.1. The Parameterization in the MFI(1) Case

The state space unit root structure of an MFI(1) process is given by Ω S = ( ( ω 1 , d 1 1 ) , , ( ω l , d 1 l ) ) . For the corresponding state space system ( A , B , C ) in canonical form, the sub-blocks of A u are equal to J k = z k ¯ I d 1 k , the sub-blocks B k of B u are p.u.t. and C k C k = I d 1 k , for k = 1 , , l .
Starting with the sub-blocks of C u , it is convenient to separate the discussion of the parameterization of C u -blocks into the real case, where ω k { 0 , π } and C k R s × d 1 k , and the complex case with 0 < ω k < π and C k C s × d 1 k . For the case of real unit roots the two cases d 1 k < s and d 1 k = s have to be distinguished. For brevity of notation refer to the considered real block simply as C R s × d . Using this notation, the set of matrices to be parameterized is
O s , d : = { C R s × d | C C = I d } .
The parameterization of O s , d is based on the combination of real Givens rotations, as given in Definition 6, that allow transforming every matrix in O s , d to the form [ I d , 0 ( s d ) × d ] for d < s . For d = s , Givens rotations allow transforming every matrix C O s , s either to I s or I s : = diag ( I s 1 , 1 ) , since, compare Remark 11, for the transformed matrix C ˜ ( s ) it holds that det ( C ) = det ( C ˜ ( s ) ) { 1 , 1 } . This is achieved with the following algorithm:
  • Set j = 1 and C ( 1 ) = C .
  • Transform the entries [ c j , j , , c j , d ] in the j-th row of C ( j ) , to [ c ˜ j , j , 0 , , 0 ] , c ˜ j , j 0 . Since this is a row vector, this is achieved by right-multiplication of C ( j ) with transposed Givens rotations and the required parameters are obtained via the algorithm described in Remark 10. The first j 1 entries of the j-th row remain unchanged. Denote the transformed matrix by C ( j + 1 ) .
  • If j = d 1 stop. Else increment j by one ( j j + 1 ) and continue at step 2.
  • Collect all parameters used for the Givens rotations in steps 1 to 3 in a parameter vector θ R . Steps 1–3 correspond to a QR decomposition of C = Q C ˜ , with an orthonormal matrix Q given by the product of the Givens rotations. Please note that the first j 1 entries of the j-th column of C ˜ = C ( d ) are equal to zero by construction.
  • Set j = 0 and C ˜ ( 0 ) = C ˜ .
  • Collect the entries in column d j of C ˜ ( j ) which have not been transformed to zero by previous transformations into the vector [ c d j , d j , c d + 1 , d j , , c s , d j ] . Using the algorithm described in Remark 10 transform this vector to [ c ˜ d j , d j , 0 , , 0 ] by left-multiplication of C ˜ ( j ) with Givens rotations. Since Givens rotations are orthonormal, the transformed matrix C ˜ ( j + 1 ) is still orthonormal implying for its entries c ˜ d j , d j = 1 and c ˜ i , d j = 0 for all i < d j . An exception occurs if d = s . In this case c d j , d j { 1 , 1 } and no Givens rotations are defined.
  • If j = d 1 stop. Else increment j by one ( j j + 1 ) and continue at step 6.
  • Collect all parameters used for the Givens rotations in steps 5 to 7 in a parameter vector θ L .
The parameter vector θ = [ θ L , θ R ] , contains the angles of the employed Givens rotations and provides one way of parameterizing O s , d . The following Lemma 1 demonstrates the usefulness of this parameterization.
Lemma 1 (Properties of the parameterization of O s , d ).
Define for d s a mapping θ C O ( θ ) from Θ O R : = [ 0 , 2 π ) d ( s d ) × [ 0 , 2 π ) d ( d 1 ) / 2 O s , d by
C O ( θ ) : = i = 1 d j = 1 s d R s , i , d + j ( θ L , ( s d ) ( i 1 ) + j ) I d 0 ( s d ) × d i = 1 d 1 j = 1 i R d , d i , d i + j ( θ R , i ( i 1 ) / 2 + j ) : = R L ( θ L ) I d 0 ( s d ) × d R R ( θ R ) ,
with θ : = [ θ L , θ R ] , where θ L : = [ θ L , 1 , , θ L , d ( s d ) ] and θ R : = [ θ R , 1 , , θ R , d ( d 1 ) / 2 ] . The following properties hold:
(i) 
O s , d is closed and bounded.
(ii) 
The mapping C O ( · ) is infinitely often differentiable.
For d < s , it holds that
(iii) 
For every C O s , d there exists a vector θ Θ O R such that
C = C O ( θ ) = R L ( θ L ) I d 0 ( s d ) × d R R ( θ R ) .
The algorithm discussed above defines the inverse mapping C O 1 : O s , d Θ O R .
(iv) 
The inverse mapping C O 1 ( · ) —the parameterization of O s , d —is infinitely often differentiable on the pre-image of the interior of Θ O R . This is an open and dense subset of O s , d .
For d = s , it holds that
(v) 
O s , s is a disconnected space in R s × s with two disjoint non-empty closed subsets O s , s + : = { C R s × s | C C = I s , det ( C ) = 1 } and O s , s : = { C R s × s | C C = I s , det ( C ) = 1 } .
(vi) 
For every C O s , s + there exists a vector θ Θ O R such that
C = C O ( θ ) = R L ( θ L ) I d R R ( θ R ) = R R ( θ R ) .
In this case, steps 1-4 of the algorithm discussed above define the inverse mapping C O 1 : O s , s + Θ O R .
(vii) 
Define v : = [ π , , π ] R s ( s 1 ) / 2 . Then a parameterization of O s , s is given by
C O ± ( C ) = v + C O 1 ( C ) if C O s , s + ( v + C O 1 ( C I s ) ) if C O s , s . .
The parameterization is infinitely often differentiable with infinitely often differentiable inverse on an open and dense subset of O s , s .
Remark 13.
The following arguments illustrate why C O 1 is not continuous on the pre-image of the boundary of Θ O R : Consider the unit sphere O 3 , 1 = { C R 3 | C C = C 2 = 1 } . One way to parameterize the unit sphere is to use degrees of longitude and latitude. Two types of discontinuities occur: After fixing the location of the zero degree of longitude, i.e., the prime meridian, its anti-meridian is described by both 180 W and 180 E. Using the half-open interval [ 0 , 2 π ) in our parametrization causes a similar discontinuity. Second, the degree of longitude is irrelevant at the north pole. As seen in Remark 10, with our parameterization a similar issue occurs when the first two entries of C to be compared are both equal to zero. In this case the parameter of the Givens rotation is set to zero, although every θ will produce the same result. Both discontinuities clearly occur on a thin subset of O s , d .
As in the parametrization of the VAR I(1)-case in the VECM framework, where the restriction β = [ I s d , β * ] can only be imposed when the upper ( s d ) × ( s d ) block of the true β 0 of the DGP is of full rank (cf. Johansen 1995, chp. 5.2), the set where the discontinuities occur can effectively be changed by a permutation of the components of the observed time series. This corresponds to redefining the locations of the prime meridian and the poles.
Remark 14.
Please note that the parameterization partitions the parameter vector θ into two parts θ L [ 0 , 2 π ) d ( s d ) and θ R [ 0 , 2 π ) ( d 1 ) d / 2 . Since changing the parameter values in θ R does not change the column space of C O ( θ ) , which, as seen above, determines the cointegrating vectors, θ L fully characterizes the (static) cointegrating space. Please note that the dimension of θ L is d ( s d ) and thus coincides with the number of free parameters in β in the VECM framework (cf. Johansen 1995, chp. 5.2).
Example 5.
Consider the matrix
C = 0 1 2 1 2 1 2 1 2 1 2
with d = 2 and s = 3 . As discussed, the static cointegrating space is characterized by the left kernel of this matrix. The left kernel of a matrix in R 3 × 2 with full rank two is given by a one-dimensional space, with the corresponding basis vector parameterized, when normalized to length one, by two free parameters. Thus, for the characterization of the static cointegrating space two parameters are required, which exactly coincides with the dimension of θ L given in Remark 14. The parameters in θ R correspond to the choice of a basis of the image of C. Having fixed the two-dimensional subspace through θ L , only one free parameter for the choice of an orthonormal basis remains, which again coincides with the dimension given in Remark 14. To obtain the parameter vector, the starting point is a QR decomposition of C = R R ( θ R ) C ˜ . In this example R R ( θ R ) = R 2 , 1 , 2 ( θ R , 1 ) , with θ R , 1 to be determined. To find θ R , 1 , solve [ 0 1 2 ] R 2 , 1 , 2 ( θ R , 1 ) = [ r 0 ] for r 0 and θ R , 1 [ 0 , 2 π ) . In other words, find r 0 and θ R , 1 [ 0 , 2 π ) such that [ 0 1 2 ] = r [ cos ( θ R , 1 ) sin ( θ R , 1 ) ] , which leads to r = 1 2 , θ R , 1 = π 2 . Thus, the orthonormal matrix R R ( θ R ) is equal to R 2 , 1 , 2 π 2 and the transpose of the upper triangular matrix C ˜ is equal to:
C ˜ = C ˜ ( 0 ) = C · R 2 , 1 , 2 π 2 = 0 1 2 1 2 1 2 1 2 1 2 0 1 1 0 = 1 2 0 1 2 1 2 1 2 1 2 .
Second, transform the entries in the lower 1 × 2 -sub-block of C ˜ ( 0 ) to zero, starting with the last column. For this find θ L , 2 [ 0 , 2 π ) such that R 3 , 2 , 3 ( θ L , 2 ) [ 0 1 2 1 2 ] = [ 0 1 0 ] , i.e., [ 1 2 1 2 ] = r [ cos ( θ L , 2 ) sin ( θ L , 2 ) ] . This yields r = 1 , θ L , 2 = 7 π 4 . Next compute C ˜ ( 1 ) = R 3 , 2 , 3 ( 7 π 4 ) C ˜ ( 0 ) :
C ˜ ( 1 ) = R 3 , 2 , 3 7 π 4 · C · R 2 , 1 , 2 π 2 = 1 0 0 0 1 2 1 2 0 1 2 1 2 0 1 2 1 2 1 2 1 2 1 2 0 1 1 0 = 1 2 0 0 1 1 2 0 .
In the final step find θ L , 1 [ 0 , 2 π ) such that R 3 , 1 , 3 ( θ L , 1 ) [ 1 2 0 1 2 ] = [ 1 0 0 ] , i.e., [ 1 2 1 2 ] = r [ cos ( θ L , 1 ) sin ( θ L , 1 ) ] . The solution is r = 1 , θ L , 1 = π 4 . Combining the transformations leads to
R 3 , 1 , 3 π 4 · R 3 , 2 , 3 7 π 4 · C · R 2 , 1 , 2 π 2 = 1 2 0 1 2 0 1 0 1 2 0 1 2 1 0 0 0 1 2 1 2 0 1 2 1 2 0 1 2 1 2 1 2 1 2 1 2 0 1 1 0 = 1 0 0 1 0 0 .
The parameter vector for this matrix is therefore θ = [ θ L , θ R ] = π 4 , 7 π 4 , π 2 with θ = C O 1 ( C ) .
In case of complex unit roots, referring for brevity again to the considered block C k simply as C C s × d , the set of matrices to be parameterized is
U s , d : = { C C s × d | C C = I d } .
The parameterization of this set is based on the combination of complex Givens rotations, as given in Definition 7, which can be used to transform every matrix in U s , d to the form [ D d , 0 ( s d ) × d ] with a diagonal matrix D d whose diagonal elements are of unit modulus. This transformation is achieved with the following algorithm:
  • Set j = 1 and C ( 1 ) = C .
  • Transform the entries [ c j , j , , c j , d ] in the j-th row of C ( j ) , to [ c ˜ j , j , 0 , , 0 ] . Since this is a row vector, this is achieved by right-multiplication of C with transposed Givens rotations and the required parameters are obtained via the algorithm described in Remark 12. The first j 1 entries of the j-th row remain unchanged. Denote the transformed matrix by C ( j + 1 ) .
  • If j = d 1 stop. Else increment j by one ( j j + 1 ) and continue at step 2.
  • Collect all parameters used for the Givens rotations in steps 1 to 3 in a parameter vector φ R . Step 1–3 corresponds to a QR decomposition of C = Q C ˜ , with a unitary matrix Q given by the product of the Givens rotations. Please note that the first j 1 entries of the j-th column of C ˜ = C ( d ) are equal to zero by construction.
  • Set j = 0 and C ˜ ( 0 ) = C ˜ .
  • Collect the entries in column d j of C ˜ ( j ) which have not been transformed to zero by previous transformations into the vector [ c d j , d j , c d + 1 , d j , , c s , d j ] . Using the algorithm described in Remark 12 transform this vector to [ c ˜ d j , d j , 0 , , 0 ] by left-multiplication of C ˜ ( j ) with Givens rotations. Since Givens rotations are unitary, the transformed matrix C ˜ ( j + 1 ) is still unitary implying for its entries | c ˜ d j , d j | = 1 and c ˜ i , d j = 0 for all i < d j . An exception occurs if d = s . In this case | c d j , d j | = 1 and no Givens rotations are defined.
  • If j = d 1 stop. Else increment j by one ( j j + 1 ) and continue at step 6.
  • Collect all parameters used for the Givens rotations in steps 5 to 7 in a parameter vector φ L .
  • Transform the diagonal entries of the transformed matrix C ˜ ( d ) = [ D d , 0 ( s d ) × d ] into polar coordinates and collect the angles in a parameter vector φ D .
The following lemma demonstrates the usefulness of this parameterization.
Lemma 2 (Properties of the parametrization of U s , d ).
Define for d s a mapping φ C U ( φ ) from Θ U C : = Θ C d ( s d ) × Θ C ( d 1 ) d / 2 × [ 0 , 2 π ) d U s , d by
C U ( φ ) : = i = 1 d j = 1 s d Q s , i , d + j ( φ L , ( s d ) ( i 1 ) + j ) D d ( φ D ) 0 ( s d ) × d i = 1 d 1 j = 1 i Q d , d i , d i + j ( φ R , i ( i 1 ) / 2 + j ) : = Q L ( φ L ) D d ( φ D ) 0 ( s d ) × d Q R ( φ R ) ,
with φ : = [ φ L , φ R , φ D ] , where φ L = [ φ L , 1 , , φ L , d ( s d ) ] , φ R : = [ φ R , 1 , , φ R , d ( d 1 ) / 2 ] and φ D : = [ φ D , 1 , , φ D , d ] and where D d ( φ D ) = d i a g ( e i φ D , 1 , , e i φ D , d ) . The following properties hold:
(i) 
U s , d is closed and bounded.
(ii) 
The mapping C U ( φ ) is infinitely often differentiable.
(iii) 
For every C U s , d a vector φ Θ U C exists such that
C = C U ( φ ) = Q L ( φ L ) D d ( φ D ) 0 ( s d ) × d Q R ( φ R ) .
The algorithm discussed above defines the inverse mapping C U 1 : U s , d Θ U R .
(iv) 
The inverse mapping C U 1 ( · ) —the parameterization of U s , d —is infinitely often differentiable on an open and dense subset of U s , d .
Remark 15.
Note the partitioning of the parameter vector φ into the parts φ L , φ D and φ R . The component φ L fully characterizes the column space of C U ( φ ) , i.e., φ L determines the cointegrating spaces.
Example 6.
Consider the matrix
C = 1 i 2 1 i 2 1 + i 2 1 i 2 0 0 .
The starting point is again a QR decomposition of C = Q R ( φ R ) C ˜ = Q 2 , 1 , 2 ( φ R , 1 ) C ˜ . To find a complex Givens rotation such that [ 1 i 2 1 i 2 ] Q 2 , 1 , 2 ( φ R , 1 ) = [ r e i φ a 0 ] with r > 0 , transform the entries of [ 1 i 2 1 i 2 ] into polar coordinates. The equation [ 1 i 2 1 i 2 ] = [ a e i φ a b e i φ b ] has the solutions a = b = 1 2 and φ a = φ b = 7 π 4 . Using the results of Remark 12, the parameters of the Givens rotation are φ R , 1 , 1 = tan 1 ( b a ) = π 4 and φ R , 1 , 2 = φ a φ b = 0 . Right-multiplication of C with Q 2 , 1 , 2 π 4 , 0 leads to
C ˜ = C Q 2 , 1 , 2 π 4 , 0 = C 1 2 1 2 1 2 1 2 = 1 i 2 0 0 1 i 2 0 0 = D 2 ( φ D ) 0 1 × 2 .
Since the entries in the lower 1 × 2 -sub-block of C ˜ are already equal to zero, the remaining complex Givens rotations are Q 3 , 2 , 3 ( [ 0 , 0 ] ) = Q 3 , 1 , 3 ( [ 0 , 0 ] ) = I 3 . Finally, the parameter values corresponding to the diagonal matrix D 2 ( φ D ) = d i a g ( e i φ D , 1 , e i φ D , 2 ) = d i a g ( 1 i 2 , 1 i 2 ) are φ D , 1 = 3 π 4 and φ D , 2 = 5 π 4 .
The parameter vector for this matrix is therefore φ = [ φ L , φ R , φ D ] = [ 0 , 0 , 0 , 0 ] , π 4 , 0 , 3 π 4 , 5 π 4 , with φ = C U 1 ( C ) .

Components of the Parameter Vector

Based on the results of the preceding sections we can now describe the parameter vectors for the general case. The dimensions of the parameter vectors of the respective blocks of the system matrices ( A , B , C ) depend on the multi-index Γ , consisting of the state space unit root structure Ω S , the structure indices p and the Kronecker indices α for the stable subsystem. A parameterization of the set of all systems in canonical form with given multi-index Γ for the MFI(1) case, therefore, combines the following components:
  • θ B , f : = [ θ B , f , 1 , . . . , θ B , f , l ] Θ B , f = R d B , f , with:
    θ B , f , k : = [ b 1 , p 1 k + 1 k , b 1 , p 1 k + 2 k , , b 1 , s k , b 2 , p 2 k + 1 k , , b d 1 k , s k ] for ω k { 0 , π } , R ( b 1 , p 1 k + 1 k ) , I ( b 1 , p 1 k + 1 k ) , R ( b 1 , p 1 k + 2 k ) , , I ( b 1 , s k ) , R ( b 2 , p 2 k + 1 k ) , , I ( b d 1 k , s k ) ] [ b 1 , p 1 k + 1 k , b 1 , p 1 k + 2 k , , b 1 , s k , b 2 , p 2 k + 1 k , , b d 1 k , s k ] for 0 < ω k < π ,
    for k = 1 , , l , with p j k denoting the j-th entry of the structure indices p corresponding to B k . The vectors θ B , f , k contain the real and imaginary parts of free entries in B k not restricted by the p.u.t. structures.
  • θ B , p : = [ θ B , p , 1 , . . . , θ B , p , l ] Θ B , p = R + d B , p : The vectors θ B , p , k : = b 1 , p 1 k k , , b d 1 k , p d 1 k k k contain the entries in B k restricted by the p.u.t. structures to be positive reals.
  • θ C , E : = [ θ C , E , 1 , . . . , θ C , E , l ] Θ C , E R d C , E : The parameters for the matrices C k as discussed in Lemma 1 and Lemma 2.
  • θ Θ , α R d : The parameters for the stable subsystem in echelon canonical form for Kronecker indices α .
Example 7.
Consider an MFI(1) process with Ω S = ( ( 0 , 2 ) , ( π 2 , 2 ) ) , p = [ 1 , 3 , 1 , 2 , 1 , 2 ] , n = 0 , and system matrices
A = d i a g ( 1 , 1 , i , i , i , i ) , B = 1 1 2 0 0 2 1 1 + i 1 i 0 2 i 1 1 i 1 + i 0 2 i , C = 0 1 2 1 i 2 1 i 2 1 + i 2 1 + i 2 1 2 1 2 1 + i 2 1 i 2 1 i 2 1 + i 2 1 2 1 2 0 0 0 0 ,
in canonical form. For this example it holds that θ B , f = [ [ 1 , 2 ] , [ 1 , 1 , 1 , 1 , 0 , 1 ] ] , θ B , p = [ [ 1 , 2 ] , [ 1 , 2 ] ] and
θ C , E = π 4 , 7 π 4 , π 2 , [ 0 , 0 , 0 , 0 ] , π 4 , 0 , 3 π 4 , 5 π 4 ,
with parameter values corresponding to the C-blocks collected in θ C , E considered in Examples 5 and 6.

3.2. The Parameterization in the I(2) Case

The canonical form provided above for the general case has the following form for I(2) processes with unit root structure Ω s = ( ( 0 , d 1 1 , d 2 1 ) ) :
A = I d 1 1 I d 1 1 0 0 0 I d 1 1 0 0 0 0 I d 2 1 d 1 1 0 0 0 0 A , B = B 1 , 1 B 1 , 2 , 1 B 1 , 2 , 2 B , C = C 1 , 1 E C 1 , 2 G C 1 , 2 E C ,
where 0 < d 1 1 d 2 1 s , B 1 , 2 , 1 and B 1 , 2 , 2 are p.u.t., C 1 , 1 E O s , d 1 1 , C 1 , 2 E O s , d 2 1 d 1 1 , ( C 1 , 1 E ) C 1 , 2 E = 0 d 1 1 × d 2 1 , ( C 1 , 1 E ) C 1 , 2 G = 0 d 1 1 × d 1 1 , ( C 1 , 2 E ) C 1 , 2 G = 0 ( d 2 1 d 1 1 ) × d 1 1 and ( A , B , C ) is in echelon canonical form with Kronecker indices α . All matrices are real valued.
The parameterizations of the p.u.t. matrices B 1 , 2 , 1 and B 1 , 2 , 2 are as discussed above. The entries of B 1 , 1 are unrestricted and thus included in the parameter vector θ B , f containing also the free entries in B 1 , 2 , 1 and B 1 , 2 , 2 . The subsystem ( A , B , C ) is parameterized using the echelon canonical form.
The parameterization of C 1 , 1 E O s , d 1 1 proceeds as in the MFI(1) case, using C O 1 ( C 1 , 1 E ) . The parameterization of C 1 , 2 E has to take the restriction of orthogonality of C 1 , 2 E to C 1 , 1 E into account, thus the set to be parameterized is given by
O s , d 2 1 d 1 1 ( C 1 , 1 E ) : = { C 1 , 2 E R s × ( d 2 1 d 1 1 ) | ( C 1 , 1 E ) C 1 , 2 E = 0 d 1 1 × ( d 2 1 d 1 1 ) , ( C 1 , 2 E ) C 1 , 2 E = I d 2 1 d 1 1 } .
The parameterization of this set again uses real Givens rotations. For C O s , d 2 1 d 1 1 ( C 1 , 1 E ) it follows that R L ( θ L ) C = [ 0 d 1 1 × ( d 2 1 d 1 1 ) , C ˜ ] for a matrix C ˜ such that C ˜ C ˜ = I d 2 1 d 1 1 with R L ( θ L ) corresponding to C 1 , 1 E . The matrix C ˜ is parameterized as discussed in Lemma 1.
Corollary 1 (Properties of the parameterization of O s , d 2 1 d 1 1 ( C 1 , 1 E ) ).
Define for d 1 1 < d 2 1 s a mapping θ ˜ C O , d 2 1 d 1 1 ( θ ˜ ; C 1 , 1 E ) from Θ O , d 2 1 R : = [ 0 , 2 π ) ( d 2 1 d 1 1 ) ( s d 2 1 ) × [ 0 , 2 π ) ( d 2 1 d 1 1 ) ( d 2 1 d 1 1 1 ) / 2 O s , d 2 1 d 1 1 ( C 1 , 1 E ) by
C O , d 2 1 d 1 1 ( θ ˜ ; C 1 , 1 E ) : = R L ( θ L ) 0 d 1 1 × ( d 2 1 d 1 1 ) C O ( θ ˜ ) ,
where θ L denotes the parameter values corresponding to [ θ L , θ R ] = C O 1 ( C 1 , 1 E ) as defined in Lemma 1. The following properties hold:
(i) 
O s , d 2 1 d 1 1 ( C 1 , 1 E ) is closed and bounded.
(ii) 
The mapping C O , d 2 1 d 1 1 ( θ ˜ ; C 1 , 1 E ) is infinitely often differentiable.
For d 2 1 < s , it holds
(iii) 
For every C 1 , 2 E O s , d 2 1 d 1 1 ( C 1 , 1 E ) there exists a vector θ ˜ = [ θ ˜ L , θ ˜ R ] Θ O , d 2 1 d 1 1 R such that
C 1 , 2 E = C O , d 2 1 d 1 1 ( θ ˜ ; C 1 , 1 E ) = R L ( θ L ) 0 d 1 1 × ( d 2 1 d 1 1 ) R L ( θ ˜ L ) I d 2 1 d 1 1 0 ( s d 2 1 ) × ( d 2 1 d 1 1 ) R R ( θ ˜ R ) .
The algorithm discussed above Lemma 1 defines the inverse mapping C O , d 2 1 d 1 1 1 .
(iv) 
The inverse mapping C O , d 2 1 d 1 1 1 ( · ; C 1 , 1 E ) —the parameterization of O s , d 2 1 d 1 1 ( C 1 , 1 E ) —is infinitely often differentiable on the pre-image of the interior of Θ O , d 2 1 d 1 1 R . This is an open and dense subset of O s , d 2 1 d 1 1 ( C 1 , 1 E ) .
For d 2 1 = s , it holds that
(v) 
O s , s d 1 1 ( C 1 , 1 E ) is a disconnected space with two disjoint non-empty closed subsets:
O s , s d 1 1 + ( C 1 , 1 E ) : = { C 1 , 2 E R s × ( s d 1 1 ) | ( C 1 , 1 E ) C 1 , 2 E = 0 d 1 1 × ( s d 1 1 ) , ( C 1 , 2 E ) C 1 , 2 E = I s d 1 1 , det ( [ C 1 , 1 E , C 1 , 2 E ] ) = 1 } , O s , s d 1 1 ( C 1 , 1 E ) : = { C 1 , 2 E R s × ( s d 1 1 ) | ( C 1 , 1 E ) C 1 , 2 E = 0 d 1 1 × ( s d 1 1 ) , ( C 1 , 2 E ) C 1 , 2 E = I s d 1 1 , det ( [ C 1 , 1 E , C 1 , 2 E ] ) = 1 } .
(vi) 
For every O s , s d 1 1 + ( C 1 , 1 E ) there exists a vector θ ˜ Θ O , d 2 1 d 1 1 R such that
C 1 , 2 E = C O , s d 1 1 ( θ ˜ ; C 1 , 1 E ) = R R ( θ ˜ R ) .
Steps 1–4 of the algorithm discussed above Lemma 1 define the inverse mapping C O , s d 1 1 1 ( · ; C 1 , 1 E ) : O s , s d 1 1 + ( C 1 , 1 E ) Θ O , s d 1 1 R .
(vii) 
Define v : = [ π , , π ] R ( s d 1 1 ) ( s d 1 1 1 ) / 2 . Then a parameterization of O s , s d 1 1 ( C 1 , 1 E ) is given by
C O , s d 1 1 ± ( C 1 , 2 E ; C 1 , 1 E ) = v + C O , s d 1 1 1 ( C 1 , 2 E ; C 1 , 1 E ) if C O s , s d 1 1 + ( C 1 , 1 E ) ( v + C O , s d 1 1 1 ( C 1 , 2 E I s d 1 1 ; C 1 , 1 E ) ) if C O s , s d 1 1 ( C 1 , 1 E ) .
The parameterization is infinitely often differentiable with infinitely often differentiable inverse on an open and dense subset of O s , s .
The proof of Corollary 1 uses the same arguments as the proof of Lemma 1 and is, therefore, omitted. It remains to provide a parameterization for C 1 , 2 G restricted to be orthogonal to both C 1 , 1 E and C 1 , 2 E . Thus, the set to be parametrized is given by
O s , G ( C 1 , 1 E , C 1 , 2 E ) : = { C 1 , 2 G R s × d 1 1 | ( C 1 , 1 E ) C 1 , 2 G = 0 d 1 1 × d 1 1 , ( C 1 , 2 E ) C 1 , 2 G = 0 ( d 2 1 d 1 1 ) × d 1 1 } .
The parameterization of O s , G ( C 1 , 1 E , C 1 , 2 E ) is straightforward: Left multiplication of C 1 , 2 G with R L ( θ L ) as defined in Lemma 1 and of the lower ( s d 1 1 ) × d 1 1 - block with R L ( θ ˜ L ) as defined in Corollary 1 transforms the upper d 2 1 × d 1 1 -block to zero and collects the free parameters in the lower ( s d 2 1 ) × d 1 1 -block. Clearly, this is a bijective and infinitely often differentiable mapping on O s , G ( C 1 , 1 E , C 1 , 2 E ) and thus a useful parameterization, since the matrix C 1 , 2 G is only multiplied with two constant invertible matrices. The entries of the matrix product are then collected in a parameter vector as shown in Corollary 2.
Corollary 2 (Properties of the parameterization of O s , G ( C 1 , 1 E , C 1 , 2 E ) ).
Define for given matrices C 1 , 1 E O s , d 1 1 and C 1 , 2 E O s , d 2 1 d 1 1 ( C 1 , 1 E ) a mapping λ C O , G ( λ ; C 1 , 1 E , C 1 , 2 E ) from R d 1 1 ( s d 2 1 ) O s , G ( C 1 , 1 E , C 1 , 2 E ) by
C O , G ( λ ; C 1 , 1 E , C 1 , 2 E ) : = R L ( θ L ) 0 d 1 1 × d 1 1 R L ( θ ˜ L ) 0 ( d 2 1 d 1 1 ) × 1 0 ( d 2 1 d 1 1 ) × 1 λ 1 λ d 1 1 λ d 1 1 + 1 λ 2 d 1 1 λ d 1 1 ( s d 2 1 1 ) + 1 λ d 1 1 ( s d 2 1 ) ,
where θ L denotes the parameter values corresponding to [ θ L , θ R ] = C O 1 ( C 1 , 1 E ) as defined in Lemma 1 and θ ˜ L denotes the parameter values corresponding to [ θ ˜ L , θ ˜ R ] = C O , d 2 1 d 1 1 1 ( C 1 , 2 E ; C 1 , 1 E ) as defined in Corollary 1. The set O s , G ( C 1 , 1 E , C 1 , 2 E ) is closed and both C O , G as well as C O , G 1 ( · ) —the parameterization of O s , G ( C 1 , 1 E , C 1 , 2 E ) —are infinitely often differentiable.

Components of the Parameter Vector

In the I(2) case, the multi-index Γ contains the state space unit root structure Ω S = ( ( 0 , d 1 1 , d 2 1 ) ) , the structure indices p N 0 d 1 1 + d 2 1 , encoding the p.u.t. structures of B 1 , 2 , 1 and B 1 , 2 , 2 , and the Kronecker indices α for the stable subsystem. The parameterization of the set of all systems in canonical form with given multi-index Γ for the I(2) case uses the following components:
  • θ B , f : = θ B , f , 1 Θ B , f = R d B , f : The vector θ B , f , 1 contains the free entries in B 1 not restricted by the p.u.t. structure, collected in the same order as for the matrices B k in the MFI(1) case.
  • θ B , p : = θ B , p , 1 Θ B , p = R + d B , p : The vector θ B , p , 1 : = b d 1 1 + 1 , p d 1 1 + 1 1 1 , , b d 1 1 + d 2 1 , p d 1 1 + d 2 1 1 1 contains the entries in B 1 restricted by the p.u.t. structures to be positive reals.
  • θ C , E : = [ θ C , E , 1 , 1 , θ C , E , 1 , 2 ] Θ C , E R d C , E : The parameters for the matrices C 1 , 1 E as in the MFI(1) case and C 1 , 2 E as discussed in Corollary 1.
  • θ C , G Θ C , G = R d C , G : The parameters for the matrix C 1 , 2 G as discussed in Corollary 2.
  • θ Θ , α R d : The parameters for the stable subsystem in echelon canonical form for Kronecker indices α .
Example 8.
Consider an I(2) process with Ω S = ( ( 0 , 1 , 2 ) ) , p = [ 0 , 1 , 1 ] , n = 0 and system matrices
A = 1 1 0 0 1 0 0 0 1 , B = 1 2 2 1 1 3 2 0 1 , C = 0 1 1 2 1 2 1 2 1 2 1 2 1 2 1 2 .
In this case, θ B , f , 1 = [ 1 , 2 , 2 , 1 , 3 , 0 , 1 ] , θ B , p , 1 = [ 1 , 2 ] . It follows from
R 3 , 1 , 2 7 π 4 R 3 , 1 , 3 π 2 C 1 , 1 E = [ 1 0 0 ] , R 3 , 1 , 2 7 π 4 R 3 , 1 , 3 π 2 C 1 , 2 E = 0 1 2 1 2 and R 2 , 1 , 2 7 π 4 1 2 1 2 = 1 0 , R 3 , 1 , 2 7 π 4 R 3 , 1 , 3 π 2 C 1 , 2 G = 0 1 1 and R 2 , 1 , 2 7 π 4 1 1 = 0 2 ,
that θ C , E = [ θ C , E , 1 , 1 , θ C , E , 1 , 2 ] = π 2 , 7 π 4 , 7 π 4 and θ C , G = [ 2 ] .

3.3. The Parameterization in the General Case

Inspecting the canonical form shows that all relevant building blocks are already present in the MFI(1) and the I(2) cases and can be combined to deal with the general case: The entries in B u are either unrestricted or follow restrictions according to given structure indices p, and the parameter space is chosen accordingly, as discussed for the MFI(1) and I(2) cases. The restrictions on the matrices C u and its blocks C k require more sophisticated parameterizations of parts of unitary or orthonormal matrices as well as of orthogonal complements. These are dealt with in Lemmas 1 and 2 and Corollaries 1 and 2 above. The extension of Corollaries 1 and 2 to complex matrices and to matrices which are orthogonal to a larger number of blocks of C k is straightforward.
The following theorem characterizes the properties of parameterizations for sets M Γ of transfer functions with (general) multi-index Γ and describes the relations between sets of transfer functions and the corresponding sets Δ Γ of triples ( A , B , C ) of system matrices in canonical form, defined below. Discussing the continuity and differentiability of mappings on sets of transfer functions and on sets of matrix triples also requires the definition of a topology on both sets.
Definition 8. 
(i) 
The set of transfer functions of order n, M n , is endowed with the pointwise topology T p t : First, identify transfer functions with their impulse response sequences. Then, a sequence of transfer functions k i ( z ) = I s + j = 1 K j , i z j converges in T p t to k 0 ( z ) = I s + j = 1 K j , 0 z j if and only if for every j N it holds that K j , i i K j , 0 .
(ii) 
The set of all triples ( A , B , C ) in canonical form corresponding to transfer functions with multi-index Γ is called Δ Γ . The set Δ Γ is endowed with the topology corresponding to the distance d ( ( A 1 , B 1 , C 1 ) , ( A 2 , B 2 , C 2 ) ) : = A 1 A 2 F r + B 1 B 2 F r + C 1 C 2 F r .
Please note that in the definition of the pointwise topology convergence does not need to be uniform in j and moreover, the power series coefficients do not need to converge to zero for j and hence the concept can also be used for unstable systems.
Theorem 2.
The set M n can be partitioned into pieces M Γ , where Γ : = { Ω S , p , α } , i.e.,
M n = Γ = { Ω S , p , α } | n u ( Ω S ) + n ( α ) = n M Γ ,
where n u ( Ω S ) : = k = 1 l j = 1 h k d j k δ k , with δ k = 1 for ω k { 0 , π } and δ k = 2 for 0 < ω k < π is the state dimension of the unstable subsystem ( A u , B u , C u ) with state space unit root structure Ω S and n ( α ) : = i = 1 s α , i is the state dimension of the stable subsystem with Kronecker indices α = ( α , 1 , , α , s ) , α , i N 0 .
For every multi-index Γ there exists a parameter space Θ Γ R d ( Γ ) for some integer d ( Γ ) , endowed with the Euclidean norm, and a function ϕ Γ : Δ Γ Θ Γ , such that for every ( A , B , C ) Δ Γ the parameter vector θ : = ϕ Γ ( A , B , C ) Θ Γ is composed of:
  • The parameter vector θ B , f = [ θ B , f , 1 , . . . , θ B , f , l ] Θ B , f = R d B , f , collecting the (real and imaginary parts of) non-restricted entries in B k , k = 1 , , l as described in the MFI(1) case.
  • The parameter vector θ B , p = [ θ B , p , 1 , . . . , θ B , p , l ] Θ B , p = R + d B , p , collecting the entries in B k , k = 1 , , l , restricted by the p.u.t. forms to be positive reals in a similar fashion as described for B 1 in the I(2) case.
  • The parameter vector θ C , E = [ θ C , E , 1 , . . . , θ C , E , l ] Θ C , E R d C , E , θ C , E , k = [ θ C , E , k , 1 , , θ C , E , k , h k ] collecting the parameters θ C , E , k , j for all blocks C k , j E , k = 1 , , l and j = 1 , , h k , obtained using Givens rotations (see Lemmas 1 and 2 and Corollary 1 and its extension to complex matrices).
  • The parameter vector θ C , G = [ θ C , G , 1 , . . . , θ C , G , l ] Θ C , G = R d C , G , θ C , G , k = [ θ C , G , k , 2 , , θ C , G , k , h k ] collecting the parameters θ C , G , k , j (real and imaginary parts for complex roots) for C k , j G , k = 1 , , l and j = 2 , , h k , subject to the orthogonality restrictions (see Corollary 2 and its extension to complex matrices).
  • The parameter vector θ Θ R d collecting the free entries in echelon canonical form with Kronecker indices α .
    (i) 
    The mapping ψ Γ : M Γ Δ Γ that attaches a triple ( A , B , C ) in canonical form to a transfer function in M Γ is continuous. It is the inverse (restricted to M Γ ) of the T p t -continuous function π : ( A , B , C ) k ( z ) = I s + z C ( I n z A ) 1 B .
    (ii) 
    Every parameter vector θ = [ θ B , f , θ B , p , θ C , E , θ C , G , θ ] Θ Γ Θ B , f × Θ B , p × Θ C , E × Θ C , G × Θ corresponds to a triple ( A ( θ ) , B ( θ ) , C ( θ ) ) Δ Γ and a transfer function k ( z ) = π ( A ( θ ) , B ( θ ) , C ( θ ) ) M Γ . The mapping ϕ Γ 1 : θ ( A ( θ ) , B ( θ ) , C ( θ ) ) is continuous on Θ Γ .
    (iii) 
    For every multi-index Γ the set of points in Δ Γ , where the mapping ϕ Γ is continuous, is open and dense in Δ Γ .
As mentioned in Section 2, the parameterization of Φ is straightforward. The s × m entries of Φ are collected in a parameter vector d . Thus, there is a one-to-one correspondence between state space realizations ( A , B , C , Φ ) Δ Γ × R s × m and parameter vectors τ = [ θ , d ] Θ Γ × R s m . The same holds true for parameters used for the symmetric, positive definite innovation matrix Σ R s × s obtained, e.g., from a lower triangular Cholesky factor of Σ .

4. The Topological Structure

The parameterization of M n in Theorem 2 partitions M n into subsets M Γ for a selection of multi-indices Γ . To every multi-index Γ there exists a corresponding associated parameter set Θ Γ . Thus, in practical applications, maximizing the pseudo likelihood requires choosing the multi-index Γ . Maximizing the pseudo likelihood over the set M Γ effectively amounts to including also all elements in the closure of M Γ , because of continuity of the parameterization. It is thus necessary to characterize the closures of the sets M Γ .
Moreover, maximizing the pseudo likelihood function over all possible multi-indices is time-consuming and not desirable. Fortunately, the results discussed below show that there exists a generic multi-index Γ g such that M n M Γ g ¯ . This generic choice corresponds to the set of all stable systems of order n corresponding to the generic neighborhood of the echelon canonical form. This multi-index, therefore, is a natural starting point for estimation.
However, in particular for hypotheses testing, it will be necessary to maximize the pseudo likelihood over sets of transfer functions of order n with specific state space unit root structure Ω S , denoted as M ( Ω S , n ) below, where n denotes the dimension of the stable part of the state. We show below that also in this case there exists a generic multi-index Γ g ( Ω S , n ) such that M ( Ω S , n ) M Γ g ( Ω S , n ) ¯ .
The main tool to obtain these results is investigating the properties of the mappings ψ Γ , that map transfer functions in M Γ to triples ( A , B , C ) Δ Γ , as well as analyzing the closures of the sets Δ Γ . The relation between parameter vectors θ Θ Γ and triples of system matrices ( A , B , C ) Δ Γ is easier to understand than the relation between Δ Γ and M Γ , due to the results of Theorem 2. Consequently, this section focuses on the relations between Δ Γ and M Γ —and their closures—for different multi-indices Γ .
To define the closures we embed the sets Δ Γ of matrices in canonical form with multi-indices Γ corresponding to transfer functions of order n into the space Δ n of all conformable complex matrix triples ( A , B , C ) with A C n × n , where additionally λ | m a x | ( A ) 1 . Since the elements of Δ n are matrix triples, this set is isomorphic to a subset of the finite dimensional space C n 2 + 2 n s , equipped with the Euclidean topology. Please note that Δ n also contains non-minimal state space realizations, corresponding to transfer functions of lower order.
Remark 16.
In principle the set Δ n also contains state space realizations of transfer functions k ( z ) = I s + j = 1 K j z j with complex valued coefficients K j . Since the subset of Δ n of state space systems realizing transfer functions with real valued K j is closed in Δ n , realizations corresponding to transfer functions with coefficients with non-zero imaginary part are irrelevant for the analysis of the closures of the sets Δ Γ .
After investigating the closure of Δ Γ in Δ n , denoted by Δ Γ ¯ , we consider the set of corresponding transfer functions π ( Δ Γ ¯ ) . Since we effectively maximize the pseudo likelihood over Δ Γ ¯ , we have to understand for which multi-indices Γ ˜ the set π ( Δ Γ ˜ ) is a subset of π ( Δ Γ ¯ ) . Moreover, we find a covering of π ( Δ Γ ¯ ) i I M Γ i . This restricts the set of multi-indices Γ that may occur as possible multi-indices of the limit of a sequence in π ( Δ Γ ) and thus the set of transfer functions that can be obtained by maximization of the pseudo likelihood.
The sets M Γ , are embedded into the vector space M of all causal transfer functions k ( z ) = I s + j = 1 K j z j . The vector space M is isomorphic to the infinite dimensional space Π j N R j s × s equipped with the pointwise topology. Since, as mentioned above, maximization of the pseudo likelihood function over M Γ effectively includes M Γ ¯ , it is important to determine for any given multi-index Γ , the multi-indices Γ ˜ for which the set M Γ ˜ is a subset of M Γ ¯ . Please note that M Γ ¯ is not necessarily equal to π ( Δ Γ ¯ ) . The continuity of π , as shown in Theorem 2 (i), implies the following inclusions:
M Γ = π ( Δ Γ ) π ( Δ Γ ¯ ) M Γ ¯ .
In general all these inclusions are strict. For a discussion in case of stable transfer functions see Hannan and Deistler (1988, Theorem 2.5.3).
We first define a partial ordering on the set of multi-indices Γ . Subsequently we examine the closure Δ ¯ Γ in Δ n and finally we examine the closures M ¯ Γ in M.
Definition 9. 
(i) 
For two state space unit root structures Ω S and Ω ˜ S with corresponding matrices A u C n u × n u and A ˜ u C n ˜ u × n ˜ u in canonical form, it holds that Ω ˜ S Ω S if and only if there exists a permutation matrix S such that
S A u S = A ˜ u J ˜ 12 0 J ˜ 2 .
Moreover, Ω ˜ S < Ω S holds if additionally Ω ˜ S Ω S .
(ii) 
For two state space unit root structures Ω S and Ω ˜ S and dimensions of the stable subsystems n , n ˜ N 0 we define
( Ω ˜ S , n ˜ ) ( Ω S , n ) if and only if Ω ˜ S Ω S , n ˜ n .
Strict inequality holds, if at least one of the two inequalities above holds strictly.
(iii) 
For two pairs ( Ω S , p ) and ( Ω ˜ S , p ˜ ) with corresponding matrices A u C n u × n u and A ˜ u C n ˜ u × n ˜ u in canonical form, it holds that ( Ω ˜ S , p ˜ ) ( Ω S , p ) if and only if there exists a permutation matrix S such that
S A u S = A ˜ u J ˜ 12 0 J ˜ 2 , S p = p 1 p 2 ,
where p 1 N 0 n ˜ u and p ˜ restricts at least as many entries as p 1 , i.e., p ˜ i ( p 1 ) i holds for all i = 1 , , n ˜ u . Moreover, ( Ω ˜ S , p ˜ ) < ( Ω S , p ) holds if additionally ( Ω ˜ S , p ˜ ) ( Ω S , p ) .
(iv) 
Let α = ( α , 1 , , α , s ) , α , i N 0 and α ˜ = ( α ˜ , 1 , , α ˜ , s ) , α ˜ , i N 0 . Then α ˜ α if and only if α ˜ , i α , i , i = 1 , , s . Moreover, α ˜ < α holds, if at least one inequality is strict (compare Hannan and Deistler 1988, sct. 2.5).
Finally, define
Γ ˜ = ( Ω ˜ S , p ˜ , α ˜ ) Γ = ( Ω S , p , α ) if and only if ( Ω ˜ S , p ˜ ) ( Ω S , p ) and α ˜ α .
Strict inequality holds, if at least one of the inequalities above holds strictly.
Please note that (i) implies that Ω ˜ S only contains unit roots that are also contained in Ω S , with the integration orders h ˜ k of the unit roots in Ω ˜ S smaller or equal to the integration orders of the respective unit roots in Ω S . Thus, denoting the unit root structures corresponding to Ω ˜ S and Ω S by Ω ˜ and Ω , it follows that Ω ˜ S Ω S implies Ω ˜ Ω . The reverse does not hold as, e.g., for Ω S = ( ( 0 , 1 , 1 ) ) (where hence Ω = ( ( 0 , 2 ) ) ) and Ω ˜ S = ( ( 0 , 2 ) ) (with Ω ˜ = ( ( 0 , 1 ) ) ) it holds that Ω ˜ Ω , but neither Ω ˜ S Ω S nor Ω S Ω ˜ S holds as here
A u = 1 1 0 1 , A ˜ u = 1 0 0 1 .
This partial ordering is convenient for the characterization of the closure of Δ Γ .

4.1. The Closure of Δ Γ in Δ n

Please note that the block-structure of A implies that every system in Δ Γ can be separated in two subsystems ( A u , B u , C u ) and ( A , B , C ) . Define Δ Ω S , p : = Δ ( Ω S , p , { } ) as the set of all state space realizations in canonical form corresponding to state space unit root structure Ω S , structure indices p and n = 0 . Analogously define Δ α : = Δ ( { } , { } , α ) as the set of all state space realizations in canonical form with Ω S = { } and Kronecker indices α . Examining Δ Ω S , p ¯ and Δ α ¯ separately simplifies the analysis.

4.1.1. The Closure of Δ Ω S , p

The canonical form imposes a lot of structure, i.e., restrictions on the matrices A , B and C . By definition Δ Ω S , p = Δ Ω S , p A × Δ Ω S , p B × Δ Ω S , p C and the closures of the three matrices can be analyzed separately. Δ Ω S , p A and Δ Ω S , p C are very easy to investigate. The structure of A is fully determined by Ω S and consequently Δ Ω S , p A consists of a single matrix A which immediately implies that Δ Ω S , p A ¯ = Δ Ω S , p A . The matrix C , compare Theorem 1 is composed of blocks C k E that are sub-blocks of unitary (or orthonormal) matrices and blocks C k G that have to fulfill (recursive) orthogonality constraints. The corresponding sets were shown to be closed in Lemmas 1 and 2 and Corollaries 1 and 2. Thus, Δ Ω S , p C ¯ = Δ Ω S , p C .
It remains to discuss Δ Ω S , p B ¯ . The structure indices p defining the p.u.t. structures of the matrices B k restrict some entries to be positive. Combining all the parameters—unrestricted with complex values parameterized by real and imaginary part and the positive entries—into a parameter vector leads to an open sub-set of R m for some m. For convergent sequences of systems with fixed Ω S and p, limits of entries restricted to be positive may be zero. When this happens, two cases have to be distinguished. First, all p.u.t. sub-matrices still have full row rank. In this case the limiting system, ( A 0 , B 0 , C 0 ) say, is still minimal and can be transformed to a system in canonical form ( A ˜ 0 , B ˜ 0 , C ˜ 0 ) with fewer unrestricted entries in B ˜ 0 .
Second, if at least one of the row ranks of the p.u.t. blocks decreases in the limit, the limiting system is no longer minimal. Consequently, ( Ω ˜ S , p ˜ ) < ( Ω S , p ) in the limit.
To illustrate this point consider again Example 4 with Equation (12) rewritten as
x t + 1 , 1 = x t , 1 + x t , 2 + B 1 , 1 ε t , x t + 1 , 2 = x t , 2 + B 1 , 2 , 1 ε t , x t + 1 , 3 = x t , 3 + B 1 , 2 , 2 ε t .
If B 1 , 2 , 1 = [ 0 , b 1 , 2 , 1 , 2 ] 0 and B 1 , 2 , 2 = [ b 1 , 2 , 2 , 1 , b 1 , 2 , 2 , 2 ] 0 , b 1 , 2 , 2 , 1 > 0 , it holds that { y t } t Z is an I(2) process with state space unit root structure Ω S = ( ( 0 , 1 , 2 ) ) .
Now consider a sequence of systems with all parameters except for b 1 , 2 , 1 , 2 constant and b 1 , 2 , 1 , 2 0 . The limiting system is then given by
y t = C 1 , 1 E x t , 1 + C 1 , 2 G x t , 2 + C 1 , 2 E x t , 3 + ε t , x t + 1 , 1 x t + 1 , 2 x t + 1 , 3 = 1 1 0 0 1 0 0 0 1 x t , 1 x t , 2 x t , 3 + b 1 , 1 , 1 b 1 , 1 , 2 0 0 b 1 , 2 , 2 , 1 b 1 , 2 , 2 , 2 ε t , x 1 , 1 = x 1 , 2 = x 1 , 3 = 0 .
In the limiting system x t , 2 = 0 is redundant and { y t } t Z is an I(1) process rather than an I(2) process. Dropping x t , 2 leads to a state space realisation of the limiting system { y t } t Z given by
y t = C 1 , 1 E x t , 1 + C 1 , 2 E x t , 3 + ε t = C ˜ x ˜ t + ε t , x ˜ t R 2 , x ˜ t + 1 = x t + 1 , 1 x t + 1 , 3 = 1 0 0 1 x t , 1 x t , 3 + b 1 , 1 , 1 b 1 , 1 , 2 b 1 , 2 , 2 , 1 b 1 , 2 , 2 , 2 ε t = x ˜ t + B ˜ ε t , x 1 , 1 = x 1 , 3 = 0 .
In case B ˜ has full rank, the above system is minimal. Since b 1 , 2 , 2 , 1 > 0 , the matrix B ˜ needs to be transformed into p.u.t. format. By definition all systems in the sequence, with b 1 , 2 , 1 , 2 0 , have structure indices p = [ 0 , 2 , 1 ] as discussed in Example 12. The limiting system—in case of full rank of B ˜ —has indices p ˜ = [ 1 , 2 ] . To relate to Definition 9 choose the permutation matrix S = 1 0 0 0 0 1 0 1 0 to arrive at
S A u S = 1 0 1 0 1 0 0 0 1 = I 2 J ˜ 12 0 J ˜ 2 , S p = 0 1 2 = ( p 1 ) 1 ( p 1 ) 2 p 2 .
This shows that ( p ˜ ) i > ( p 1 ) i , i = 1 , 2 and thus the limiting system has a smaller multi-index Γ than the systems of the sequence. In case B ˜ has reduced rank equal to one a further reduction in the system order to n = 1 along similar lines as discussed is possible, again leading to a limiting system with smaller multi-index Γ .
The discussion shows that the closure of Δ Ω S , p B is related to lower order systems in the sense of Definition 9. The precise statement is given in Theorem 3 after a discussion of the closure of the stable subsystems.

4.1.2. The Closure of Δ α

Consider a convergent sequence of systems { ( A j , B j , C j ) } j N in Δ α and denote the limiting system by ( A 0 , B 0 , C 0 ) . Clearly, λ | max | ( A 0 ) 1 holds true for the limit A 0 of the sequence { A j } j N with λ | max | ( A j ) < 1 for all j. Therefore, two cases have to be discussed for the limit:
  • If λ | max | ( A 0 ) < 1 , the potentially non-minimal limiting system ( A 0 , B 0 , C 0 ) corresponds to a minimal state space realization with Kronecker indices smaller or equal to α (cf. Hannan and Deistler 1988, Theorem 2.5.3).
  • If λ | max | ( A 0 ) = 1 , the limiting matrix A 0 is similar to a block matrix A ˜ = diag ( J ˜ 2 , A ˜ ) , where all eigenvalues of J ˜ 2 have unit modulus and λ | max | ( A ˜ ) < 1 .
The first case is well understood, compare Hannan and Deistler (1988, chp. 2), since the limit in this case corresponds to a stable transfer function. In the second case the limiting system can be separated into two subsystems ( J ˜ 2 , B ˜ u , C ˜ u ) and ( A ˜ , B ˜ , C ˜ ) , according to the block diagonal structure of A ˜ . The state space unit root structure of the limiting system ( A 0 , B 0 , C 0 ) depends on the multiplicities of the eigenvalues of the matrix J ˜ 2 and is greater (in the sense of Definition 9) than the empty state space unit root structure. At the same time the Kronecker indices of the subsystem ( A ˜ , B ˜ , C ˜ ) are smaller than α , compare again Hannan and Deistler (1988, chp. 2). Since the Kronecker indices impose restrictions on some entries of the matrices A j and thus also on A 0 , the block J ˜ 2 and consequently also the limiting state space unit root structure might be subject to further restrictions.

4.1.3. The Conformable Index Set and the Closure of Δ Γ

The previous subsection shows that the closure of Δ Γ does not only contain systems corresponding to transfer functions with multi-index smaller or equal to Γ , but also systems that are related in a different way that is formalized below.
Definition 10 (Conformable index set).
Given a multi-index Γ = ( Ω S , p , α ) , the set of conformable multi-indices K ( Γ ) contains all multi-indices Γ ˜ = ( Ω ˜ S , p ˜ , α ˜ ) , where:
  • The pair ( Ω ˜ S , p ˜ ) with corresponding matrix A ˜ u in canonical form extends ( Ω S , p ) with corresponding matrix A u in canonical form, i.e., there exists a permutation matrix S such that
    S A ˜ u S = A u 0 0 J ˜ 2 and S p ˜ = p p ˜ 2 ,
  • α ˜ α .
  • n ˜ u + n ˜ = n u + n .
Please note that the definition implies Γ K ( Γ ) . The importance of the set K ( Γ ) is clarified in the following theorem:
Theorem 3.
Transfer functions corresponding to state space realizations with multi-index Γ ˜ Γ are contained in the set π ( Δ Γ ¯ ) . The set π ( Δ Γ ¯ ) is contained in the union of all sets M Γ ˇ for Γ ˇ Γ ˜ with Γ ˜ conformable to Γ, i.e.,
Γ ˜ Γ M Γ ˜ π ( Δ Γ ¯ ) Γ ˜ K ( Γ ) Γ ˇ Γ ˜ M Γ ˇ .
Theorem 3 provides a characterization of the transfer functions corresponding to systems in the closure of Δ Γ . The conformable set K ( Γ ) plays a key role here, since it characterizes the set of all minimal systems that can be obtained as limits of convergent sequences from within the set Δ Γ . Conformable indices extend the matrix A u corresponding to the unit root structure by the block J ˜ 2 .
The second inclusion in Theorem 3 is potentially strict, depending on the Kronecker indices α in Γ . Equality holds, e.g., in the following case:
Corollary 3.
For every multi-index Γ with n = 0 the set of conformable indices consists only of Γ, which implies π ( Δ Γ ¯ ) = Γ ˜ Γ M Γ ˜ .

4.2. The Closure of M Γ

It remains to investigate the closure of M Γ in M. Hannan and Deistler (1988, Theorem 2.6.5 (ii) and Remark 3, p. 73) show that for any order n, there exist Kronecker indices α , g = α , g ( n ) corresponding to the generic neighborhood M α , g for transfer functions of order n such that
M , n : = α | n ( α ) = n M α M α , g ¯ ,
where M α : = π ( Δ α ) . Here M , n denotes the set of all transfer functions of order n with state space realizations ( A , B , C ) satisfying λ | max | ( A ) < 1 . Every transfer function in M , n can be approximated by a sequence of transfer functions in M α , g .
It can be easily seen that a generic neighborhood also exists for systems with state space unit root structure Ω S and without stable subsystem: Set the structure indices p to have a minimal number of elements restricted in p.u.t. sub-blocks of B u , i.e., for any block B k , h k , j C n k , h k , j × s , or B k , h k , j R n k , h k , j × s in case of a real unit root, set the corresponding structure indices to p = [ 1 , , n k , h k , j ] . Any p.u.t. matrix can be approximated by a matrix in this generic neighborhood with some positive entries restricted by the p.u.t. structure tending to zero. Combining these results with Theorem 3 implies the existence of a generic neighborhood for the canonical form considered in this paper:
Theorem 4.
Let M ( Ω S , n ) be the set of all transfer functions k ( z ) M n u ( Ω S ) + n with state space unit root structure Ω S . For every Ω S and n , there exists a multi-index Γ g : = Γ g ( Ω S , n ) such that
M ( Ω S , n ) M Γ g ¯ .
Moreover, it holds that M ( Ω S , n ) M α , g ( n ) ¯ for every Ω S and n satisfying n u ( Ω S ) + n n .
Theorem 4 is the basis for choosing a generic multi-index Γ for maximizing the pseudo likelihood function. For every Ω S and n there exists a generic piece that—in its closure—contains all transfer functions of order n u ( Ω S ) + n and state space unit root structure Ω S : The set of transfer functions corresponding to the multi-index with the largest possible structure indices p in the sense of Definition 9 (iii) and generic Kronecker indices for the stable subsystem. Choosing these sets and their corresponding parameter spaces as model sets is, therefore, the most convenient choice for numerical maximization, if only Ω S and n are known.
If, e.g., only an upper bound for the system order n is known and the goal is only to obtain consistent estimators, using α , g ( n ) is a feasible choice, since all transfer functions in the closure of the set M α , g ( n ) can be approximated arbitrarily well, regardless of their potential state space unit root structure Ω S , n u ( Ω S ) n . For testing hypotheses, however, it is important to understand the topological relations between sets corresponding to different multi-indices Γ . In the following we focus on the multi-indices Γ g ( Ω S , n ) for arbitrary Ω S and n .
The closure of M ( Ω S , n ) contains also transfer functions that have a different state space unit root structure than Ω S . Considering convergent sequences of state space realizations ( A j , B j , C j ) j N of transfer functions in M ( Ω S , n ) , the state space unit root structure of ( A 0 , B 0 , C 0 ) : = lim j ( A j , B j , C j ) may differ in three ways:
  • For sequences ( A j , B j , C j ) j N in canonical form rows of B u , j can tend to zero, which reduces the state space unit root structure as discussed in Section 4.1.1.
  • Stable eigenvalues of A j may converge to the unit circle, thereby extending the unit root structure.
  • Off-diagonal entries of the sub-block A u , j of A j = T j A j T j 1 may be converging to zeros in the sub-block A u , 0 of the limit A 0 = T 0 A 0 T 0 1 in canonical form, resulting in a different attainable state space unit root structure. Here T j C n × n for all j N are regular matrices transforming A j to canonical form and T 0 C n × n transforms A 0 accordingly.
The first change of Ω S described above results in a transfer function with smaller state space unit root structure according to Definition 9 (ii). The implications of the other two cases are summarized in the following definition:
Definition 11 (Attainable unit root structures).
For given n and Ω S the set A ( Ω S , n ) of attainable unit root structures contains all pairs ( Ω ˜ S , n ˜ ) , where Ω ˜ S with corresponding matrix A ˜ u in canonical form extends Ω S with corresponding matrix A u in canonical form, i.e., there exists a permutation matrix S such that
S A ˜ u S = A ˇ u J 12 0 J 2 ,
where A ˇ u can be obtained by replacing off-diagonal entries in A u by zeros and where n ˜ : = n d J with d J the dimension of J 2 C d J × d J .
Remark 17.
It is a direct consequence of the definition of A ( Ω S , n ) that ( Ω ˜ S , n ˜ ) A ( Ω S , n ) implies A ( Ω ˜ S , n ˜ ) A ( Ω S , n ) .
Theorem 5. 
(i) 
M Γ is T p t -open in M Γ ¯ (see Definition 8 for a definition of T p t ).
(ii) 
For every generic multi-index Γ g corresponding to Ω S and n it holds that
π ( Δ Γ g ¯ ) Γ ˜ K ( Γ g ) Γ ˇ Γ ˜ M Γ ˇ ( Ω ˜ S , n ˜ ) A ( Ω S , n ) ( Ω ˇ S , n ˇ ) ( Ω ˜ S , n ˜ ) M ( Ω ˇ S , n ˇ ) = M Γ g ¯ .
Theorem 5 has important consequences for statistical analysis, e.g., PML estimation, since—as stated several times already—maximizing the pseudo likelihood function over Θ Γ effectively amounts to calculating the supremum over the larger set M Γ ¯ . Depending on the choice of Γ the following asymptotic behavior may occur:
  • If Γ is chosen correctly and the estimator of the transfer function is consistent, openness of M Γ in its closure implies that the probability of the estimator being an interior point of M Γ tends to one asymptotically. Since the mapping attaching the parameters to the transfer function is continuous on an open and dense set, consistency in terms of transfer functions, therefore, implies generic consistency of the parameter estimators.
  • If the multi-index is incorrectly chosen to equal Γ , estimator consistency is still possible if the true multi-index Γ 0 < Γ , as in this case M Γ 0 M ¯ Γ . This is in some sense not too surprising and something that is also well-known in the simpler VAR framework where consistency of OLS can be established when the true autoregressive order is smaller than the order chosen for estimation. Analogous to the lag number in the VAR case, thus, a necessary condition for consistency is to choose the system order larger or equal to the true system order.
Finally, note that Theorem 5 also implies the following result relevant for the determination of the unit root structure, further discussed in Section 5.1.1 and Section 5.2.1:
Corollary 4.
For every pair ( Ω ˜ S , n ˜ ) A ( Ω S , n ) it holds that
M ( Ω ˜ S , n ˜ ) ¯ M ( Ω S , n ) ¯ .

5. Testing Commonly Used Hypotheses in the MFI(1) and I(2) Cases

This section discusses a large number of hypotheses, respectively restrictions, on cointegrating spaces, adjustment coefficients and deterministic components often tested in the empirical literature. As with the VECM framework, as discussed for the I(2) case in Section 2, testing hypotheses on the cointegrating spaces or adjustment coefficients may necessitate different reparameterizations.

5.1. The M F I ( 1 ) Case

The two by far most widely used cases of MFI(1) processes are I ( 1 ) processes and seasonally (co-)integrated processes for quarterly data with state space unit root structure ( ( 0 , d 1 1 ) , ( π / 2 , d 1 2 ) , ( π , d 1 3 ) ) . In general, assuming for notational simplicity ω 1 = 0 and ω l = π , it holds that for t > 0 and x 1 , u = 0 we have
y t = k = 1 l C k , R x t , k , R + C x t , + Φ d t + ε t = C 1 x t , 1 + k = 2 l 1 ( C k x t , k + C ¯ k x ¯ t , k ) + C l x t , l j + C x t , + Φ d t + ε t = C 1 B 1 j = 1 t 1 ε t j + 2 k = 2 l 1 R C k B k j = 1 t 1 ( z k ¯ ) j 1 ε t j + C l B l j = 1 t 1 ( 1 ) j 1 ε t j + C j = 1 t 1 A j 1 B ε t j + C A t 1 x 1 , + Φ d t + ε t = C 1 B 1 j = 1 t 1 ε t j + 2 k = 2 l 1 j = 1 t 1 R ( C k B k ) cos ( ω k ( j 1 ) ) + I ( C k B k ) sin ( ω k ( j 1 ) ) ε t j + C l B l j = 1 t 1 ( 1 ) j 1 ε t j + C j = 1 t 1 A j 1 B ε t j + C A t 1 x 1 , + Φ d t + ε t .
The above equation provides an additive decomposition of { y t } t Z into stochastic trends and cycles, the deterministic and stationary components. The stochastic cycles at frequency 0 < ω k < π are, of course, given by the combination of sine and cosine terms. For the MFI(1) case this can also be seen directly from considering the real valued canonical form discussed in Remark 4, with the matrices A k , R for k = 2 , , l 1 , given by A k , R = I d 1 k cos ( ω k ) sin ( ω k ) sin ( ω k ) cos ( ω k ) in this case.
The ranks of C k B k are equal to the integers d 1 k in Ω S = ( ( ω 1 , d 1 1 ) , , ( ω l , d 1 l ) ) . The number of stochastic trends is equal to d 1 1 , the number of stochastic cycles at frequency ω k is equal to 2 d 1 k for k = 2 , , l 1 and equal to d 1 l if k = l , as discussed in Section 3.
Moreover, in the MFI(1) case, d 1 k is linked to the complex cointegrating rank r k at frequency ω k , defined in Johansen (1991) and Johansen and Schaumburg (1999) in the VECM case as the rank of the matrix Π k : = a ( z k ) . For VARMA processes with arbitrary integration orders the complex cointegrating rank r k at frequency ω k is r k : = rank ( k 1 ( z k ) ) , where k ( z ) is the transfer function, with r k = s d 1 k in the MFI(1) case. Thus, in the MFI(1) case, determination of the state space unit root structure corresponds to determination of the complex cointegrating ranks in the VECM case.
In the VECM setting, the matrix Π k is usually factorized into Π k = α k β k , as presented for the I(1) case in Section 2. For ω k = { 0 , π } the column space of β k gives the cointegrating space of the process at frequency ω k . For 0 < ω k < π the relation between the column space of β k and the space of CIVs and PCIVs at the corresponding frequency is more involved. The columns of β k are orthogonal to the columns of C k , the sub-block of C from a state space realization ( A , B , C ) in canonical form corresponding to the VAR process. Analogously, the column space of the matrix α k , containing the so-called adjustment coefficients, is orthogonal to the row space of the sub-block B k of B .
Both integers d 1 k and r k are related to the dimensions of the static and dynamic cointegrating spaces in the MFI(1) case: For ω k { 0 , π } , the cointegrating rank r k = s d 1 k coincides with the dimension of the static cointegrating space at frequency ω k . Furthermore, the dimension of the static cointegrating space at frequency 0 < ω k < π is bounded from above by r k = s d 1 k , since it is spanned by at most s d 1 k vectors β R s orthogonal to the complex valued matrix C k . The dimension of the dynamic cointegrating space at 0 < ω k < π is equal to 2 r k = 2 ( s d 1 k ) . Identifying again β ( z ) = β 0 + β 1 z with the vector [ β 0 , β 1 ] , a basis of the dynamic cointegrating space at 0 < ω k < π is then given by the column space of the product
γ 0 γ ˜ 0 γ 1 γ ˜ 1 : = I s 0 s × s cos ( ω k ) I s sin ( ω k ) I s R ( β k ) I ( β k ) I ( β k ) R ( β k ) ,
with the columns of β k C s × ( s d 1 k ) spanning the orthogonal complement of the column space of C k , i.e., β k is of full rank and β k C k = ( R ( β k ) i I ( β k ) ) C k = 0 . This holds true, since both factors are of full rank and [ γ 0 , γ 1 ] satisfies ( z ¯ k γ 0 + γ 1 ) C k = 0 , which corresponds to the necessary condition given in Example 2 for the columns of [ γ 0 , γ 1 ] to be PCIVs. The latter implies ( z ¯ k γ ˜ 0 + γ ˜ 1 ) C k = 0 also for [ γ ˜ 0 , γ ˜ 1 ] , highlighting again the additional structure of the cointegrating space emanating from the complex conjugate pairs or eigenvalues (and matrices) as discussed in Example 2.
Please note that the relations between r k and d 1 k discussed above only hold in the MFI(1) and I(1) special cases. For higher orders of integration no such simple relations exist.
In the MFI(1) setting the deterministic component typically includes a constant, seasonal dummies and a linear trend. As discussed in Remark 6, a sufficiently rich set of deterministic components allows to absorb non-zero initial values x 1 , u .

5.1.1. Testing Hypotheses on the State Space Unit Root Structure

Using the generic sets of transfer functions M Γ g presented in Theorem 4, we can construct pseudo likelihood ratio tests for different hypotheses H 0 : ( Ω S , n ) = ( Ω S , 0 , n , 0 ) against chosen alternatives. Note, however, that by the results of Theorem 5 the null hypothesis includes all pairs ( Ω S , n ) A ( Ω S , 0 , n , 0 ) as well as all pairs ( Ω S , n ) that are smaller than a pair ( Ω ˜ S , n ˜ ) A ( Ω S , 0 , n , 0 ) .
As common in the VECM setting, first consider hypotheses at a single frequency ω k . For an MFI(1) process, the hypothesis of a state space unit root structure equal to Ω S , 0 = ( ( ω k , d 1 , 0 k ) ) corresponds to the hypothesis of the (compex) cointegrating rank r k at frequency ω k being equal to r 0 = s d 1 , 0 k . Maximization of the pseudo likelihood function over the set M ( ( ( ω k , d 1 , 0 k ) ) , n δ k d 1 , 0 k ) ¯ – with a suitably chosen order n—leads to estimates that may be arbitrary close to transfer functions with different state space unit root structures Ω S . These include Ω S with additional unit root frequencies ω k ˜ , with the integers d 1 k ˜ restricted only by the order n. Therefore, focusing on a single frequency ω k does not rule out a more complicated true state space unit root structure. Assume n δ k s with δ k = 1 for ω k { 0 , π } and δ k = 2 else. Corollary 4 shows that
M ( { } , n ) ¯ M ( ( ( ω k , 1 ) ) , n δ k ) ¯ M ( ( ( ω k , s ) ) , n s δ k ) ¯
since, e.g., ( ( ( ω k , 1 ) ) , n δ k ) A ( { } , n ) .
Analogously to the procedure of testing for the complex cointegrating rank r k in the VECM setting, these inclusions can be employed to test for d 1 k : Start with the hypothesis of d 1 k = s against the alternative of 0 d 1 k < s and decrease the assumed d 1 k consecutively until the test does not reject the null hypothesis.
Furthermore, one can formulate hypotheses on d 1 k jointly at different frequencies ω k . Again, there exist inclusions based on the definition of the set of attainable state space unit root structures and Corollary 4, which can be used to consecutively test hypotheses on Ω S .

5.1.2. Testing Hypotheses on CIVs and PCIVs

Johansen (1995) considers in the I ( 1 ) case three types of hypotheses on the cointegrating space spanned by the columns of β that are each motivated by examples from economic research: The different cases correspond to different types of hypotheses related to restrictions implied by economic theory.
(i)
H 0 : β = H φ , β R s × r , H R s × t , φ R t × r , r t < s : The cointegrating space is known to be a subspace of the column space of H (which is of full column rank).
(ii)
H 0 : β = [ b , φ ] , β R s × r , b R s × t , φ R s × r t , 0 < t r : Some cointegrating relations are known.
(iii)
H 0 : β = [ H 1 φ 1 , , H c φ c ] , β R s × r , H j R s × t j , φ j R t j × r j , r j t j s , for j = 1 , , c such that j = 1 c r j = r . Cointegrating relations are known to be in the column spaces of matrices H k (which are of full column rank).
As discussed in Example 1, cointegration at ω k = 0 occurs if and only if a vector β j satisfies β j C 1 = 0 . In other words, the column space of C 1 is the orthocomplement of the cointegrating space spanned by the columns of β and hypotheses on β restrict entries of C 1 .
The first type of hypothesis, H 0 , implies that the column space of C 1 is equal to the orthocomplement of the column space of H φ . Assume w.l.o.g. H O s , t , φ O t , t r and H O s , s t , such that the columns of [ H φ , H ] form an orthonormal basis for the orthocomplement of the cointegrating space. Consider now the mapping:
C 1 r ( θ ˇ L , θ R ) : = H · R ˇ L ( θ ˇ L ) I t r 0 r × ( t r ) , H · R R ( θ R ) ,
where R ˇ L ( θ ˇ L ) : = i = 1 t r j = 1 r R t , i , t r + j ( θ L , r ( i 1 ) + j ) R t × t and R R ( θ R ) R ( s r ) × ( s r ) as in Lemma 1. From this one can derive a parameterization of the set of matrices C 1 r corresponding to H 0 , analogously to Lemma 1. The difference of the number of free parameters under the null hypothesis and under the alternative is the difference between the number of free parameters in θ L [ 0 , 2 π ) r ( s r ) and θ ˇ L [ 0 , 2 π ) r ( t r ) , implying a reduction of the number of free parameters of r ( s t ) under the null hypothesis. This necessarily coincides with the number of degrees of freedom of the corresponding test statistic in the VECM setting (cf. Johansen 1995, Theorem 7.2).
The second type of hypothesis, H 0 , is also straightforwardly parameterized: In this case a subspace of the cointegrating space is known and given by the column space of b R s × t . Assume w.l.o.g. b O s , t . The orthocomplement of β = [ b , φ ] is given by the set of matrices C 1 satisfying the restriction b C 1 = 0 , i.e., the set O s , d 1 ( b ) defined in (13). The parameterization of this set has already been discussed. The reduction of the number of free parameters under the null hypothesis is t ( s r ) which again coincides with the number of degrees of freedom of the corresponding test statistic in the VECM setting (cf. Johansen 1995, Theorem 7.3).
Finally, the third type of hypothesis, H 0 , is the most difficult to parameterize in our setting. As an illustrative example consider the case H 0 : β = [ H 1 φ 1 , H 2 φ 2 ] , β R s × r , H 1 R s × t 1 , H 2 R s × t 2 , φ 1 R t 1 × r 1 , φ 2 R t 2 × r 2 , r j t j s and r 1 + r 2 = r . W.l.o.g. choose H b O s , t b such that its columns span the t b -dimensional intersection of the column spaces of H 1 and H 2 and choose H ˜ j O s , t ˜ j ( H b ) , j = 1 , 2 such that the columns of H ˜ j and H b span the column space of H j . Define H ˜ : = [ H ˜ 1 , H ˜ 2 , H b ] O s , t ˜ , with t ˜ = t ˜ 1 + t ˜ 2 + t b . Let w.l.o.g. H ˜ O s , s t ˜ ( H ˜ ) and define p j : = min ( r j , t ˜ j ) , q j : = max ( r j , t ˜ j ) for j = 1 , 2 and p b = q 1 t ˜ 1 + q 2 t ˜ 2 . A parameterization of β r O s , r satisfying the restrictions under the null hypothesis can be derived from the following mapping:
β r ( θ H , θ R , β ) : = H ˜ · R H ( θ H ) I p 1 0 p 1 × p 2 0 p 1 × p b 0 ( q 1 r 1 ) × p 1 0 ( q 1 r 1 ) × p 2 0 ( q 1 r 1 ) × p b 0 p 2 × p 1 I p 2 0 p 2 × p b 0 ( q 2 r 2 ) × p 1 0 ( q 2 r 2 ) × p 2 0 ( q 2 r 2 ) × p b 0 p b × p 1 0 p b × p 2 I p b 0 ( t ˜ q 1 q 2 ) × p 1 0 ( t ˜ q 1 q 2 ) × p 2 0 ( t ˜ q 1 q 2 ) × p b · R R ( θ R , β ) ,
where R R ( θ R , β ) R r × r as in Lemma 1 and R H ( θ H ) : = R H ( θ H 1 , θ H 2 , θ H b ) : = R H 1 ( θ H 1 ) R H 2 ( θ H 2 ) R H b ( θ H b ) R t ˜ × t ˜ is a product of Givens rotations corresponding to the entries in the blocks highlighted by bold font. The three matrices are defined as follows:
R H 1 ( θ H 1 ) : = i = 1 p 1 j = 1 t ˜ q 2 r 1 R t , i , δ H 1 ( j ) + j ( θ H 1 , ( t ˜ q 2 r 1 ) ( i 1 ) + j ) , δ H 1 ( j ) : = p 1 if j q 1 r 1 t ˜ 1 + t ˜ 2 + p b else , R H 2 ( θ H 2 ) : = i = 1 p 2 j = 1 t ˜ q 1 r 2 R t , p 1 + i , δ H 2 ( j ) + j ( θ H 2 , ( t ˜ q 1 r 2 ) ( i 1 ) + j ) , δ H 2 ( j ) : = t ˜ 1 + p 2 if j q 2 r 2 t ˜ 1 + t ˜ 2 + p b else , R H b ( θ H b ) : = i = 1 p b j = 1 t ˜ q 1 q 2 R t , p 1 + p 2 + i , t ˜ 1 + t ˜ 2 + p b + j ( θ H b , ( t ˜ q 1 q 2 ) ( i 1 ) + j ) .
Consequently, a parameterization of the orthocomplement of the cointegrating space is based on the mapping:
C 1 r ( θ H , θ R , C ) : = H ˜ · R H ( θ H ) 0 p 1 × ( q 1 r 1 ) 0 p 1 × ( q 2 r 2 ) 0 p 1 × ( t ˜ q 1 q 2 ) I q 1 r 1 0 ( q 1 r 1 ) × ( q 2 r 2 ) 0 ( q 1 r 1 ) × ( t ˜ q 1 q 2 ) 0 p 2 × ( q 1 r 1 ) 0 p 2 × ( q 2 r 2 ) 0 p 2 × ( t ˜ q 1 q 2 ) 0 ( q 2 r 2 ) × ( q 1 r 1 ) I q 2 r 2 0 ( q 2 r 2 ) × ( t ˜ q 1 q 2 ) 0 p b × ( q 1 r 1 ) 0 p b × ( q 2 r 2 ) 0 p b × ( t ˜ q 1 q 2 ) 0 ( t ˜ q 1 q 2 ) × ( q 1 r 1 ) 0 ( t ˜ q 1 q 2 ) × ( q 2 r 2 ) I t ˜ q 1 q 2 , H ˜ · R R ( θ R , C ) ,
where R H ( θ H ) R t ˜ × t ˜ as above and R R ( θ R , C ) R ( s r ) × ( s r ) as in Lemma 1. Please note that for all θ H , θ R , β and θ R , C it holds that β r ( θ H , θ R , β ) C 1 r ( θ H , θ R , C ) = 0 r × ( s r ) . The number of parameters restricted under H 0 is equal to r 1 ( q 1 r 1 ) + r 2 ( q 2 r 2 ) + ( r 1 + r 2 ) ( t ˜ q 1 q 2 ) + ( s r ) ( s r + 1 ) / 2 , and thus, through q 1 and q 2 , depends on the dimension t b of the intersection of the columns spaces of H 1 and H 2 . The reduction of the number of free parameters matches the degrees of freedom of the test statistics in Johansen (1995, Theorem 7.5), if β is identified, which is the case if r 1 t ˜ 1 and r 2 t ˜ 2 .
Using the mapping β r ( · ) as a basis for a parameterization allows to introduce another type of hypotheses of the form:
(iv)
H 0 : β = C 1 = [ H 1 φ 1 , , H c φ c ] , β R s × ( s r ) , H j O s , t j , φ j O t j , r j , r j t j s , for j = 1 , , c such that j = 1 c r j = s r . The ortho-complement of the cointegrating space is contained in the column spaces of the (full rank) matrices H k .
This type of hypothesis allows, e.g., to test for the presence of cross-unit cointegrating relations (cf. Wagner and Hlouskova 2009, Definition 1) in, e.g., multi-country data sets.
Hypotheses on the cointegrating space at frequency ω k = π can be treated analogously to hypotheses on the cointegrating space at frequency ω k = 0 .
Testing hypotheses on cointegrating spaces at frequencies 0 < ω k < π has to be discussed in more detail, as one also has to consider the space spanned by PCIVs, compare Example 2. There are 2 ( s d 1 k ) linearly independent PCIVs of the form β ( z ) = β 0 + β 1 z . Every PCIV corresponds to a vector z k β 0 + β 1 C s orthogonal to C k and consequently hypotheses on the space spanned by PCIVs can be transformed to hypotheses on the complex column space of C k C s × d 1 k .
Consider, e.g., an extension of the first type of hypothesis of the form
H 0 k : γ 0 γ ˜ 0 γ 1 γ ˜ 1 = I s 0 s × s cos ( ω k ) I s sin ( ω k ) I s ( H ˜ 0 ϕ ˜ 0 H ˜ 1 ϕ ˜ 1 ) ( H ˜ 0 ϕ ˜ 1 + H ˜ 1 ϕ ˜ 0 ) ( H ˜ 0 ϕ ˜ 1 + H ˜ 1 ϕ ˜ 0 ) ( H ˜ 0 ϕ ˜ 0 H ˜ 1 ϕ ˜ 1 ) = I s 0 s × s cos ( ω k ) I s sin ( ω k ) I s H ˜ 0 H ˜ 1 H ˜ 1 H ˜ 0 ϕ ˜ 0 ϕ ˜ 1 ϕ ˜ 1 ϕ ˜ 0 ,
with H ˜ 0 , H ˜ 1 R s × t , ϕ ˜ 0 , ϕ ˜ 1 R t × r , r t < s , which implies that the column space of C k is equal to the orthocomplement of the column space of ( H ˜ 0 + i H ˜ 1 ) ( ϕ ˜ 0 + i ϕ ˜ 1 ) . This general hypothesis encompasses, e.g., the hypothesis [ γ 0 , γ 1 ] = H ϕ = [ H 0 , H 1 ] ϕ , with H R 2 s × t , H 0 , H 1 R s × t , ϕ R t × r , by setting ϕ ˜ 0 : = ϕ ˜ 1 : = ϕ ˜ , H ˜ 0 : = H 0 and H ˜ 1 : = ( cos ( ω k ) H 0 + H 1 ) / sin ( ω k ) . The extension is tailored to include the pairwise structure of PCIVs and to simplify transformation into hypotheses on the complex matrix C k used in the parameterization. The parameterization of the set of matrices corresponding to H 0 k is derived from a mapping of the form given in (15), with R ˇ L ( θ ˇ L ) and R R ( θ R ) replaced by Q ˇ L ( φ ˇ L ) : = i = 1 t r j = 1 r Q t , i , t r + j ( φ L , r ( i 1 ) + j ) R t × t and D d ( φ D ) Q R ( φ R ) as in Lemma 2.
Similarly, the three other types of hypotheses on the cointegrating spaces considered above can be extended to hypotheses on the space of PCIVs in the MFI(1) case. They translate into hypotheses on complex valued matrices β k orthogonal to C k . To parameterize the set of matrices restricted according to these null hypotheses, Lemma 2 is used. Thus, the restrictions implied by the extensions of all four types of hypotheses to hypotheses on the dynamic cointegrating spaces at frequencies 0 < ω k < π for MFI(1) processes can be implemented using Givens rotations.
A different case of interest is the hypothesis of at least m linearly independent CIVs b j R s , j = 1 , , m with 0 < m s d 1 k , i.e., an m-dimensional static cointegrating space at frequency 0 < ω k < π , which we discuss as another illustrative example to the procedure for the case of cointegration at complex unit roots.
For the dynamic cointegrating space, this hypothesis implies the existence of 2 m linearly independent PCIVs of the form β 1 ( z ) = b j and β 2 ( z ) = b j z , j = 1 , , m . In light of the discussion above the necessary condition for these two polynomials to be PCIVs is equivalent to b j C k = 0 , for j = 1 , , m . This restriction is similar to H 0 discussed above, except for the fact that the cointegrating vectors b j are not fully specified. This hypothesis is equivalent to the existence of an m-dimensional real kernel of C k . A suitable parameterization is derived from the following mapping
C ( θ b , φ ) : = R L ( θ b ) 0 m × d 1 k C U ( φ ) ,
where θ b [ 0 , 2 π ) m ( s m ) and C U ( φ ) : = C U ( φ L , φ D , φ R ) U s m , d 1 k as in Lemma 2. The difference in the number of free parameters without restrictions and with restrictions is equal to m ( s m ) .
The hypotheses can also be tested jointly for the cointegrating spaces of several unit roots.

5.1.3. Testing Hypotheses on the Adjustment Coefficients

As in the case of hypotheses on the cointegrating spaces β k , hypotheses on the adjustment coefficients α k are typically formulated as hypotheses on the column spaces of α k . We only focus on hypotheses on the real valued α 1 corresponding to frequency zero. Analogous hypotheses may be considered for α k at frequencies ω k 0 , using the same ideas.
The first type of hypothesis on α 1 is of the form H α : α 1 = A ψ , A R s × t , ψ R t × r and therefore, can be rewritten as B 1 A ψ = 0 . W.l.o.g. let A O s , t and A O s , s t . We deal with this type of hypothesis as with H 0 : β = H φ in the previous section by simply reversing the roles of C 1 and B 1 . We, therefore, consider the set of feasible matrices B 1 as a subset in O s , s r and use the mapping B 1 ( θ ˇ L , θ R ) = [ A R ˇ L ( θ ˇ L ) [ I t r , 0 r × ( t r ) ] , A ] R R ( θ R ) to derive a parameterization, while C 1 is restricted to be a p.u.t. matrix and the set of feasible matrices C 1 is parameterized accordingly.
As a second type of hypothesis Juselius (2006, sct. 11.9, p. 200) discusses H α : α 1 , = H ψ , H R s × t , ψ R t × ( s r ) , linked to the absence of permanent effects of shocks H ε t on any of the variables of the system. Assume w.l.o.g. H O s , s t . Using the parameterization of O s r ( H ) defined in (13) for the set of feasible matrices B 1 and the parameterization of the set of p.u.t. matrices for the set of feasible matrices C 1 , implements this restriction.
The restrictions on H α reduce the number of free parameters by r ( s t ) and the restrictions implied by H α lead to a reduction by t ( s r ) free parameters, compared to the unrestricted case, which matches in both cases the number of degrees of freedom of the corresponding test statistic in the VECM framework.

5.1.4. Restrictions on the Deterministic Components

Including an unrestricted constant in the VECM equation Δ 0 y t = ε t + Φ 0 leads to a linear trend in the solution process y t = j = 1 t ( ε j + Φ 0 ) + y 1 = j = 1 t ε j + y 1 + Φ 0 t , for t > 1 . If one restricts the constant to Φ 0 = α Φ ˜ 0 , Φ ˜ 0 R r in a general VECM equation as given in (4), with Π = α β of rank r, no summation to linear trends in the solution process occurs, while a constant non-zero mean is still present in the cointegrating relations, i.e., the process { β y t } t Z . Analogously an unrestricted linear trend Φ 1 t in the VECM equation leads to a quadratic trend of the form Φ 1 t ( t 1 ) / 2 in the solution process, which is excluded by the restriction Φ 1 t = α Φ ˜ 1 t .
In the VECM framework, compare Johansen (1995, sct. 5.7, p. 81), five restrictions related to the coefficients corresponding to the constant and the linear trend are commonly considered:
1 . H ( r ) : Φ d t = Φ 1 t + Φ 0 , i . e . , unrestricted constant and linear trend , 2 . H * ( r ) : Φ d t = α Φ ˜ 1 t + Φ 0 , i . e . , unrestricted constant , linear trend restricted to cointegrating relations , 3 . H 1 ( r ) : Φ d t = Φ 0 , i . e . , unrestricted constant , no linear trend , 4 . H 1 * ( r ) : Φ d t = α Φ ˜ 0 , i . e . , constant restricted to cointegrating relations , no linear trend , 5 . H 2 ( r ) : Φ d t = 0 , i . e . , no deterministic components present ,
with Φ 0 , Φ 1 R s and Φ ˜ 0 , Φ ˜ 1 , R r and the following consequences for the solution processes: Under H ( r ) the solution process contains a quadratic trend in the direction of the common trends, i.e., in { β y t } t Z , and a linear trend in the direction of the cointegrating relations, i.e., in { β y t } t Z . Under H * ( r ) the quadratic trend is not present. H 1 ( r ) features a linear trend only in the directions of the common trends, H 2 ( r ) a constant only in these directions. Under H 1 * ( r ) the constant is also present in the directions of the cointegrating relations.
In the state space framework the deterministic components can be added in the output equation y t = C x t + Φ d t + ε t , compare (9). Consequently, the above considered hypotheses can be imposed by formulating linear restrictions on Φ . These can be directly parameterized by including the following deterministic components in the five considered cases:
1 . H ( r ) : Φ d t = C 1 Φ ˜ 2 t 2 + Φ 1 t + Φ 0 , 2 . H * ( r ) : Φ d t = Φ 1 t + Φ 0 , 3 . H 1 ( r ) : Φ d t = C 1 Φ ˜ 1 t + Φ 0 , 4 . H 1 * ( r ) : Φ d t = Φ 0 , 5 . H 2 ( r ) : Φ d t = C 1 Φ ˜ 0 ,
where Φ 0 , Φ 1 R s and Φ ˜ 0 , Φ ˜ 1 , Φ ˜ 2 R d 1 1 . The component C 1 Φ ˜ 0 captures the influence of the initial value C 1 x 1 , 1 in the output equation.
In the VECM framework for the seasonal MFI(1) case, with Π k = α k β k of rank r k for 0 < ω k < π , the deterministic component usually includes restricted seasonal dummies of the form α k Φ ˜ k z k t + α k Φ ˜ k ( z k ) t ¯ , Φ ˜ k C r k to avoid summation in the directions of the stochastic trends. The state space framework allows to straightforwardly include seasonal dummies in the output equation in the form of Φ k z k t + Φ k ( z k ) t ¯ , Φ k C s . Again, it is of interest whether these components are unrestricted or whether they take the form of C k Φ ˜ k z k t + C k Φ ˜ k ( z k ) t ¯ , Φ ˜ k C d 1 k , similarly allowing for a reinterpretation of these components as influence of the initial values x 1 , k on the output.
Please note that Φ k z k t + Φ k ( z k ) t ¯ is equivalently given by Φ ˇ k , 1 sin ( ω k t ) + Φ ˇ k , 2 cos ( ω k t ) using real coefficients Φ ˇ k , 1 , Φ ˇ k , 2 R s and the desired restrictions can be implemented accordingly.

5.2. The I ( 2 ) Case

The state space unit root structure of I(2) processes is of the form Ω S = ( ( 0 , d 1 1 , d 2 1 ) ) , where the integer d 1 1 equals the dimension of x t , 1 E , and d 2 1 equals the dimension of [ ( x t , 2 G ) , ( x t , 2 E ) ] . Recall that the solution for t > 0 and x 1 , u = 0 of the system in canonical form in this setting is given by
y t = C 1 , 1 E x t , 1 E + C 1 , 2 G x t , 2 G + C 1 , 2 E x t , 2 E + C x t , + Φ d t + ε t = C 1 , 1 E B 1 , 2 , 1 k = 1 t 1 j = 1 k ε t j + ( C 1 , 1 E B 1 , 1 + C 1 , 2 G B 1 , 2 , 1 + C 1 , 2 E B 1 , 2 , 2 ) j = 1 t 1 ε t j + C j = 1 t 1 A j 1 B ε t j + C A t 1 x 1 , + Φ d t + ε t .
For VAR processes integrated of order two the integers d 1 1 and d 2 1 of the corresponding state space unit root structure are linked to the ranks of the matrices Π = α β (denoted as r = r 0 ) and α Γ β = ξ η (denoted as m = r 1 ) in the VECM setting, as discussed in Section 2. It holds that r = s d 2 1 and m = d 2 1 d 1 1 . The relation of the state space unit root structure to the cointegration indices r 0 , r 1 , r 2 was also discussed in Section 3.
Again, both the integers d 1 1 and d 2 1 and the ranks r and m, and consequently also the indices r 0 , r 1 and r 2 , are closely related to the dimensions of the spaces spanned by CIVs and PCIVs. In the I ( 2 ) case the static cointegrating space of order ( ( 0 , 2 ) , ( 0 , 1 ) ) is the orthocomplement of the column space of C 1 , 1 E and thus of dimension s d 1 1 . The dimension of the space spanned by CIVs of order ( ( 0 , 2 ) , { } ) is equal to s d 2 1 r c , G , where r c , G denotes the rank of C 1 , 2 G , since this space is the orthocomplement of the column space of [ C 1 , 1 E , C 1 , 2 G , C 1 , 2 E ] . The space spanned by the PCIVs β 0 + β 1 z of order ( ( 0 , 2 ) , { } ) is of dimension smaller or equal to 2 s d 1 1 d 2 1 , due to the orthogonality constraint on [ β 0 , β 1 ] given in Example 3.
Consider the matrices β , β 1 and β 2 as defined in Section 2. From a state space realization ( A , B , C ) in canonical form corresponding to a VAR process it immediately follows that the columns of β 2 span the same space as the columns of the sub-block C 1 , 1 E . The same relation holds true for β 1 and the sub-block C 1 , 2 E . With respect to polynomial cointegration, Bauer and Wagner (2012) show that the rank of C 1 , 2 G determines the number of minimum degree polynomial cointegrating relations, as discussed in Example 3. If C 1 , 2 G = 0 , then there exists no vector γ , such that { γ y t } t Z is integrated and cointegrated with { β 2 Δ 0 y t } t Z . In this case { β y t } t Z is a stationary process.
The deterministic components included in the I(2) setting are typically a constant and a linear trend. As in the MFI(1) case, identifiability problems occur, if we consider a non-zero initial state x 1 , u : The solution to the state space equations for t > 0 and x 1 , u 0 is given by:
y t = j = 1 t 1 C A j 1 B ε t j + C 1 , 1 E ( x 1 , 1 E + x 1 , 2 G ( t 1 ) ) + C 1 , 2 G x 1 , 2 G + C 1 , 2 E x 1 , 2 E + C A t 1 x 1 , + Φ d t + ε t .
Hence, if Φ d t = Φ 0 + Φ 1 t , the output equation contains the terms C 1 , 1 E x 1 , 1 E + C 1 , 2 G x 1 , 2 G + C 1 , 2 E x 1 , 2 E C 1 , 1 E x 1 , 2 G + Φ 0 and ( C 1 , 1 E x 1 , 2 G + Φ 1 ) t . Again, this implies non-identifiability, which is resolved by assuming x 1 , u = 0 , compare Remark 6.

5.2.1. Testing Hypotheses on the State Space Unit Root Structure

To simplify notation we use
M ¯ ( d 1 1 , d 2 1 ) : = M ( ( ( 0 , d 1 1 , d 2 1 ) ) , n d 1 1 d 2 1 ) ¯ if d 1 1 > 0 , M ( ( ( 0 , d 2 1 ) ) , n d 2 1 ) ¯ if d 1 1 = 0 , d 2 1 > 0 , M , n ¯ if d 1 1 = d 2 1 = 0 ,
with n d 1 1 + d 2 1 . Here M ¯ ( d 1 1 , d 2 1 ) for d 1 1 + d 2 1 > 0 denotes the closure of the set of transfer functions of order n that possess a state space unit root structure of either Ω S = ( ( 0 , d 1 1 , d 2 1 ) ) or Ω S = ( ( 0 , d 2 1 ) ) in case of d 1 1 = 0 , while M ¯ ( 0 , 0 ) denotes the closure of the set of all stable transfer functions of order n.
Considering the relations between the different sets of transfer functions given in Corollary 4 shows that the following relations hold (assuming s 4 ; the columns are arranged to include transfer functions with the same dimension of A u ):
M ¯ ( 0 , 0 ) M ¯ ( 0 , 1 ) M ¯ ( 1 , 0 ) M ¯ ( 0 , 2 ) M ¯ ( 1 , 1 ) M ¯ ( 2 , 0 ) M ¯ ( 0 , 3 ) M ¯ ( 1 , 2 ) M ¯ ( 0 , 4 )
Please note that M ¯ ( d 1 1 , d 2 1 ) corresponds to H s d 2 1 , d 2 1 d 1 1 = H r , r 1 in Johansen (1995). Therefore, the relationships between the subsets match the ones in Johansen (1995, Table 9.1) and the ones found by Jensen (2013). The latter type of inclusions appear for instance for M ¯ ( 0 , 2 ) , containing transfer functions corresponding to I ( 1 ) processes, which is a subset of the set M ¯ ( 1 , 0 ) of transfer functions corresponding to I ( 2 ) processes.
The same remarks as in the MFI(1) case also apply in the I(2) case: When testing for H 0 : Ω S = ( ( 0 , d 1 , 0 1 , d 2 , 0 1 ) ) , all attainable state space unit root structures A ( ( ( 0 , d 1 , 0 1 , d 2 , 0 1 ) ) ) have to be included in the null hypothesis.

5.2.2. Testing Hypotheses on CIVs and PCIVs

Johansen (2006) discusses several types of hypotheses on the cointegrating spaces of different orders. These deal with properties of β , joint properties of [ β , β 1 ] or the occurrence of non-trivial polynomial cointegrating relations. Boswijk and Paruolo (2017), moreover, discuss testing hypotheses on the loading matrices of common trends (corresponding in our setting to testing hypotheses on C 1 ).
We commence with hypotheses of the form H 0 : β = K φ and H 0 : β = [ b , φ ] just as in the MFI(1) case at unit root one, since hypotheses on β correspond to hypotheses on its orthocomplement spanned by [ C 1 , 1 E , C 1 , 2 E ] in the VARMA framework:
Hypotheses of the form H 0 : β = K φ , K R s × t , φ R t × r imply φ K [ C 1 , 1 E , C 1 , 2 E ] = 0 . W.l.o.g. let K O s , t and K O s , s t . As in the parameterization under H 0 in the MFI(1) case at unit root one, compare (15), use the mapping
[ C 1 , 1 E , r , C 1 , 2 E , r ] ( θ ˇ L , θ R ) : = K · R ˇ L ( θ ˇ L ) I t r 0 r × ( t r ) , K · R R ( θ R ) ,
to derive a parameterization of the set of feasible matrices [ C 1 , 1 E , C 1 , 2 E ] , i.e., a joint parameterization of both sets of matrices C 1 , 1 E and C 1 , 2 E , where [ C 1 , 1 E , C 1 , 2 E ] O s , s r .
Hypotheses of the form H 0 : β = [ b , φ ] , b R s × t , φ R s × ( r t ) , 0 < t r are equivalent to b [ C 1 , 1 E , C 1 , 2 E ] = 0 . Assume w.l.o.g. b O s , t and parameterize the set of feasible matrices C 1 , 1 E using O s , d 1 1 ( b ) as defined in (13) and the set of feasible matrices C 1 , 2 E using O s , d 2 1 d 1 1 ( [ b , C 1 , 1 E ] ) . Alternatively, parameterize the set of feasible matrices jointly as elements [ C 1 , 1 E , C 1 , 2 E ] O s , s r ( b ) .
Applications using the VECM framework allow for testing hypotheses on [ β , β 1 ] . In the VARMA framework, these correspond to hypotheses on the orthogonal complement of [ β , β 1 ] , i.e., C 1 , 1 E . Implementation of different types of hypotheses on [ β , β 1 ] proceeds as for similar hypotheses on β in the MFI(1) case at unit root one, replacing [ C 1 , 1 E , C 1 , 2 E ] by C 1 , 1 E .
The hypothesis of no minimum degree polynomial cointegrating relations implies the restriction C 1 , 2 G = 0 , compare Example 3. Therefore, we can test all hypotheses considered in Johansen (2006) also in our more general setting.

5.2.3. Testing Hypotheses on the Adjustment Coefficients

Hypotheses on α and ξ as defined in (6) and (7) correspond to hypotheses on the spaces spanned by the rows of B 1 , 2 , 1 and B 1 , 2 , 2 . For VAR processes integrated of order two, the row space of B 1 , 2 , 1 is equal to the orthogonal complement of the column space of [ α , α ξ ] , while the row space of B 1 , 2 : = [ B 1 , 2 , 1 , B 1 , 2 , 2 ] is equal to the orthogonal complement of the column space of α . The restrictions corresponding to hypotheses on α and ξ can be implemented analogously to the restrictions corresponding to hypotheses on α 1 in Section 5.1.3, reversing the roles of the relevant sub-blocks in B u and C u accordingly.

5.2.4. Restrictions on the Deterministic Components

The I(2) case is, with respect to the modeling of deterministic components, less well studied than the MFI(1) case. In most theory papers they are simply left out, with the notable exception Rahbek et al. (1999), dealing with the inclusion of a constant term in the I(2)-VECM representation. The main reason for this appears to be the way deterministic components in the defining vector error correction representation translate into deterministic components in the corresponding solution process. An unrestricted constant in the VECM for I(2) processes leads to a linear trend in { β 1 y t } t Z and a quadratic trend in { β 2 y t } t Z , while an unrestricted linear trend results in quadratic and cubic trends in the respective directions. Already in the I(1) case discussed above five different cases—with respect to integration and asymptotic behavior of estimators and tests—need to be considered separately. An all encompassing discussion of the restrictions on the coefficients of a constant and a linear trend in the I(2) case requires the specification of even more cases. As an alternative approach in the VECM framework, deterministic components could be dealt with by replacing y t with y t Φ d t in the VECM equation. This has recently been considered in Johansen and Nielsen (2018) and is analogous to our approach in the state space framework.
As before, in the MFI(1) or I(1) case, the analysis of (the impact of) deterministic components is straightforward in the state space framework, which effectively stems from their additive inclusion in the Granger-type representation, compare (9). Choose, e.g., Φ d t = Φ 0 + Φ 1 t , as in the I(1) case. In analogy to Section 5.1.4, linear restrictions of deterministic components in relation to the static and polynomial cointegrating spaces can be embedded in a parameterization. Focusing on Φ 0 , e.g., this is achieved by
Φ 0 = [ C 1 , 1 E , C 1 , 2 E ] ϕ 0 + C ˜ 1 , 2 ϕ ˜ 0 + C ϕ ˇ 0 ,
where the columns of C ˜ 1 , 2 are a basis for the column space of C 1 , 2 G , which does not necessarily have full column rank, and the columns of C span the orthocomplement of the column space of [ C 1 , 1 E , C 1 , 2 E , C ˜ 1 , 2 ] . The matrix Φ 1 can be decomposed analogously. The corresponding parametrization then allows to consider different restricted versions of deterministic components and to study the asymptotic behavior of estimators and tests for these cases.

6. Summary and Conclusions

Vector autoregressive moving average (VARMA) processes, which can be cast equivalently in the state space framework, may be useful for empirical analysis compared to the more restrictive class of vector autoregressive (VAR) processes for a variety of reasons. These include invariance with respect to marginalization and aggregation, parsimony as well as the fact that the log-linearized solutions to DSGE models are typically VARMA processes rather than VAR processes. To realize the potential of these advantages necessitates, in our view, to develop cointegration analysis for VARMA processes to a similar extent as it is developed for VAR processes. The necessary first steps of this research agenda are to develop a set of structure theoretical results that allow subsequently developing statistical inference procedures. Bauer and Wagner (2012) provides the very first step of this agenda by providing a canonical form for unit root processes in the state space framework, which is shown in that paper to be very convenient for cointegration analysis.
Based on the earlier canonical form paper this paper derives a state space model parameterization for VARMA processes with unit roots using the state space framework. The canonical form and a fortiori the parameterization based on it are constructed to facilitate the investigation of the unit root and (static and polynomial) cointegration properties of the considered process. Furthermore, the paper shows that the framework allows to test a large variety of hypotheses on cointegrating ranks and spaces, clearly a key aspect for the usefulness of any method to analyze cointegration. In addition to providing general results, throughout the paper all results are discussed in detail for the multiple frequency I(1) and I(2) cases, which cover the vast majority of applications.
Given the fact that (as shown in Hazewinkel and Kalman 1976) VARMA unit root processes cannot be continuously parameterized, the set of all unit root processes (as defined in this paper) is partitioned according to a multi-index Γ that includes the state space unit root structure. The parameterization is shown to be a diffeomorphism on the interior of the considered sets. The topological relationships between the sets forming the partitioning of all transfer functions considered are studied in great detail for three reasons: First, pseudo maximum likelihood estimation effectively amounts to maximizing the pseudo likelihood function over the closures of sets of transfer functions, M ¯ Γ in our notation. Second, related to the first item, the relations between subsets of M Γ have to be understood in detail as knowledge concerning these relations is required for developing (sequential) pseudo likelihood-ratio tests for the numbers of stochastic trends or cycles. Third, of particular importance for the implementation of, e.g., pseudo maximum likelihood estimators, we discuss the existence of generic pieces.
In this respect we derive two results: First, for correctly specified state space unit root structure and system order of the stable subsystem —and thus correctly specified system order—we explicitly describe generic indices Γ g ( Ω S , n ) such that M Γ g ( Ω S , n ) is open and dense in the set of all transfer functions with state space unit root structure Ω S and system order of the stable subsystem n . This result forms the basis for establishing consistent estimators of the transfer functions—and via continuity of the parameterization—of the parameter estimators when the state space unit root structure and system order are known. Second, in case only an upper bound on the system order is known (or specified), we show the existence of a generic multi-index Γ α , g ( n ) for which the set of corresponding transfer functions M Γ α , g ( n ) is open and dense in the set M ¯ n of all non-explosive transfer functions whose order (or McMillan degree) is bounded by n. This result is the basis for consistent estimation (on an open and dense subset) when only an upper bound of the system order is known. In turn this estimator is the starting point for determining Ω S , using the subset relationships alluded to above in the second point. For the MFI(1) and I(2) cases we show in detail that similar subset relations (concerning cointegrating ranks) as in the cointegrated VAR MFI(1) and I(2) cases hold, which suggests constructing similar sequential test procedures for determining the cointegrating ranks as in the VAR cointegration literature.
Section 5 is devoted to a detailed discussion of testing hypotheses on the cointegrating spaces, again for both the MFI(1) and the I(2) case. In this section, particular emphasis is put on modeling deterministic components. The discussion details how all usually formulated and tested hypotheses concerning (static and polynomial) cointegrating vectors, potentially in combination with (un-)restricted deterministic components, in the VAR framework can also be investigated in the state space framework.
Altogether, the paper sets the stage to develop pseudo maximum likelihood estimators, investigate their asymptotic properties (consistency and limiting distributions) and tests based on them for determining cointegrating ranks that allow performing cointegration analysis for cointegrated VARMA processes. The detailed discussion of the MFI(1) and I(2) cases benefits the development of statistical theory dealing with these cases undertaken in a series of companion papers.

Author Contributions

The authors of the paper have contributed equally, via joint efforts, regarding both ideas, research, and writing. Conceptualization, all authors; methodology, all authors; formal analysis, P.d.M.R. and L.M.; investigation, all authors; writing—original draft preparation, P.d.M.R. and L.M.; writing—review and editing, all authors.; project administration, D.B. and M.W.; funding acquisition, D.B. and M.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation-Projektnummer 276051388) which is gratefully acknowledged. We acknowledge support for the publication costs by the Deutsche Forschungsgemeinschaft and the Open Access Publication Fund of Bielefeld University.

Acknowledgments

We thank the editors, Rocco Mosconi and Paolo Paruolo, as well as anonymous referees for helpful suggestions. The views expressed in this paper are solely those of the authors and not necessarily those of the Bank of Slovenia or the European System of Central Banks. On top of this the usual disclaimer applies.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of the Results of Section 3

Appendix A.1. Proof of Lemma 1

(i)
Let C j be a sequence in O s , d converging to C 0 for j . By continuity of matrix multiplication
C 0 C 0 = ( lim j C j ) lim j C j = lim j ( C j C j ) = I d .
Thus, C 0 O s , d , which shows that O s , d is closed. By construction [ C C ] i , i = j = 1 s c j , i 2 . Since [ C C ] i , i = 1 for all C O s , d and i = 1 , , d , the entries of C are bounded.
(ii)
By definition C O ( θ ) is a product of matrices whose elements are either constant or infinitely often differentiable functions of the elements of θ .
(iii)
The algorithm discussed above Lemma 1 maps every C O s , d to [ I d , 0 s d × d ] . Since R q , i , j ( θ ) 1 = R q , i , j ( θ ) for all q , i , j and θ , C can be obtained by multiplying [ I d , 0 s d × d ] with the transposed Givens rotations.
(iv)
As discussed, C O 1 ( · ) is obtained from a repeated application of the algorithm described in Remark 10. In each step two entries are transformed to polar coordinates. According to Amann and Escher (2008, chp. 8, p. 204) the transformation to polar coordinates is infinitely often differentiable with infinitely often differentiable inverse for θ > 0 (and hence r > 0 ), i.e., on the interior of the interval [ 0 , π ) . Thus, C O 1 is a concatenation of functions which are infinitely often differentiable on the interior of Θ O R and is thus infinitely often differentiable, if θ j > 0 for all components of θ .
Clearly, the interior of Θ O R is open and dense in Θ O R . By the definition of continuity the pre-image of the interior of Θ O R is open in O s , d . By (iii) there exists a θ 0 for arbitrary C 0 O s , d such that C O ( θ 0 ) = C 0 . Since the interior of Θ O R is dense in Θ O R there exists a sequence θ j in the interior of Θ O R such that θ j θ 0 . Then C O ( θ j ) C 0 because of the continuity of C O . Since C O ( θ j ) is a sequence in the pre-image of the interior of Θ O R , it follows that the pre-image of the interior of Θ O R is dense in O s , d .
(v)
For any C O s , s it holds that 1 = det ( C C ) = det ( C ) 2 and det ( C ) R , which implies det ( C ) { 1 , 1 } . Since the determinant is a continuous function on square matrices, both sets O s , s + and O s , s are disjoint and closed.
(vi)
The proof proceeds analogously to the proof of (iii).
(vii)
A function defined on two disjoint subsets is infinitely often differentiable if and only if the two functions restricted to the subsets are infinitely often differentiable. The same arguments as used in (iv) together with the results in (ii) imply that C O 1 : O s , s + Θ O R and C O ± ( · ) | O s , s + are infinitely often differentiable with infinitely often differentiable inverse on an open subset of O s , s + . Clearly, the multiplication with I s is infinitely often differentiable with infinitely often differentiable inverse, which implies that C O ± ( · ) | O s , s is infinitely often differentiable with infinitely often differentiable inverse on an open subset of O s , s , from which the result follows.

Appendix A.2. Proof of Lemma 2

(i)
Let C j be a sequence in U s , d converging to C 0 for j . By continuity of matrix multiplication
C 0 C 0 = ( lim j C j ) lim j C j = lim j ( C j C j ) = I d .
Thus, C 0 U s , d , which shows that U s , d is closed. By construction [ C C ] i , i = j = 1 s | c j , i | 2 . Since [ C C ] i , i = 1 for all C U s , d and i = 1 , , d , the entries of C are bounded.
(ii)
By definition C U ( φ ) is a product of matrices whose elements are either constant or infinitely often differentiable functions of the elements of φ .
(iii)
The algorithm discussed above Lemma 2 maps every C U s , d to [ D d ( φ D ) , 0 s d × d ] with D d ( φ D ) = diag ( e i φ D , 1 , , e i φ D , d ) . Since Q q , i , j ( φ ) 1 = Q q , i , j ( φ ) for all q , i , j and φ , C can be obtained by multiplying [ D d ( φ D ) , 0 s d × d ] with the transposed Givens rotations.
(iv)
The algorithms in Remark 12 and above Lemma 2 describe C U 1 in detail. The determination of an element of φ L or φ R uses the transformation of two complex numbers into polar coordinates in step 2 of Remark 12, which according to Amann and Escher (2008, chp. 8, p. 204) is infinitely often differentiable with infinitely often differentiable inverse except for non-negative reals, which are the complement of an open and dense subset of the complex plane. Step 3 of Remark 12 uses the formulas φ 1 = tan 1 b a , which is infinitely often differentiable for a > 0 , and φ 2 = φ a φ b mod 2 π , which is infinitely often differentiable for φ a φ b , which occurs on an open and dense subset of [ 0 , 2 π ) × [ 0 , 2 π ) . For the determination of an element of φ D a complex number of modulus one is transformed in polar coordinates which is infinitely often differentiable on an open and dense subset of complex numbers of modulus one compare again Amann and Escher (2008, chp. 8, p. 204). Thus, C U 1 is a concatenation of functions which are infinitely often differentiable on open and dense subsets of their domain of definition and is thus infinitely often differentiable on an open and dense subset of U s , d .

Appendix A.3. Proof of Theorem 2

(i)
The multi-index Γ is unique for a transfer function k M n , since it only contains information encoded in the canonical form. Therefore, M Γ is well defined. Since conversely for every transfer function k M n a multi-index Γ can be found, M Γ constitutes a partitioning of M n . Furthermore, using the canonical form, it is straightforward to see that the mapping attaching the triple ( A , B , C ) Δ Γ in canonical form to a transfer function k M Γ is homeomorphic (bijective, continuous, with continuous inverse): Bijectivity is a consequence of the definition of the canonical form. T p t continuity of the transfer function as a function of the matrix triples is obvious from the definition of T p t . Continuity of the inverse can be shown by constructing the canonical form starting with an overlapping echelon form (which is continuous according to Hannan and Deistler 1988, chp. 2) and subsequently transforming the state basis to reach the canonical form. This involves the calculation of a Jordan normal form with fixed structure. This is an analytic mapping (cf. Chatelin 1993, Theorem 4.4.3). Finally, the restrictions on C and B are imposed. For given multi-index Γ these transformations are continuous (as discussed above they involve QR decompositions to obtain unitary block columns for the blocks of C, rotations to p.u.t form with fixed structure for the blocks of B and transformations to echelon canonical form for the stable part).
(ii)
The construction of the triple ( A ( θ ) , B ( θ ) , C ( θ ) ) for given θ and Γ is straightforward: A u is uniquely determined by Γ . Since θ B , p contains the entries of B u restricted to be positive and θ B , f contains the free parameters of B u , the mapping θ B , p × θ B , f B u is continuous. The mapping θ ( A , B , C ) is continuous (cf. Hannan and Deistler 1988, Theorem 2.5.3 (ii). The mapping θ C , E × θ C , G C u consists of iterated applications of C O , and C U (compare Lemmas 1 and 2) which are differentiable and thus continuous and iterated applications of the extensions of the mappings C O , d 2 d 1 and C O , G (compare Corollaries 1 and 2) to general unit root structures and to complex matrices. The proof that these functions are differentiable is analogous to the proofs of Lemma 1 and Lemma 2.
(iii)
The definitions of θ B , f and θ B , p immediately imply that they depend continuously on B u . The parameter vector θ depends continuously on ( A , B , C ) (cf. Hannan and Deistler 1988, Theorem 2.5.3 (ii)). The existence of an open and dense subset of matrices C u such that the mapping attaching parameters to the matrices is continuous follows from arguments contained in the proofs of Lemmas 1 and 2.

Appendix B. Proofs of the Results of Section 4

Appendix B.1. Proof of Theorem 3

For the first inclusion the proof can be divided into two parts, discussing the stable and the unstable subsystem separately. The result with regard to the stable subsystem is due to Hannan and Deistler (1988, Theorem 2.5.3 (iv)). For the unstable subsystem ( Ω ˜ S , p ˜ ) ( Ω S , p ) implies the existence of a matrix S as described in Definition 9. Partition S = S 1 S 2 such that S 1 p = p 1 p ˜ . Let k ˜ be an arbitrary transfer function in M Γ ˜ = π ( Δ Γ ˜ ) with corresponding state space realization ( A ˜ , B ˜ , C ˜ ) Δ Γ ˜ . Then, we find matrices B 1 and C 1 such that for the state space realization given by A = S A ˜ J ˜ 12 0 J ˜ 2 S , B = S B ˜ B 1 and C = C ˜ C 1 S it holds that ( A , B , C ) Δ Γ . Then, ( A j , B j , C j ) = ( A , S diag ( I n 1 , j 1 I n 2 ) S B , C ) Δ Γ , where n i is the number of rows of S i for i = 1 , 2 converges for j to A , S B ˜ 0 , C Δ Γ ¯ , which is observationally equivalent to ( A ˜ , B ˜ , C ˜ ) . Consequently, k ˜ = π A , S B ˜ 0 , C π ( Δ Γ ¯ ) .
To show the second inclusion, consider a sequence of systems ( A j , B j , C j ) Δ Γ , j N converging to ( A 0 , B 0 , C 0 ) Δ Γ ¯ . We need to show Γ ¯ Γ ˜ K ( Γ ) { Γ ˇ Γ ˜ } , where Γ ¯ is the multi-index corresponding to ( A 0 , B 0 , C 0 ) .
For the stable system we can separate the subsystem ( A j , s , B j , s , C j , s ) remaining stable in the limit and the part with eigenvalues of A j tending to the unit circle. As discussed in Section 4.1.2, ( A j , s , B j , s , C j , s ) converges to the stable subsystem ( A 0 , , B 0 , , C 0 , ) whose Kronecker indices can only be smaller than or equal to α (cf. Hannan and Deistler 1988, Theorem 2.5.3).
The remaining subsystem consists of the unstable subsystem of ( A j , B j , C j ) which converges to ( A 0 , u , B 0 , u , C 0 , u ) and the second part of the stable subsystem containing all stable eigenvalues of A j converging to the unit circle. The limiting combined subsystem ( A 0 , c , B 0 , c , C 0 , c ) is such that A 0 , c is block diagonal. If the limiting combined subsystem is minimal and B 0 , u has a structure corresponding to p, this shows that the pair ( Ω ¯ S , p ¯ ) extends ( Ω S , p ) in accordance with the definition of K ( Γ ) .
Since the limiting subsystem is not necessarily minimal and B 0 , u has not necessarily a structure corresponding to p, eliminating coordinates of the state and adapting the corresponding structure indices p may result in a pair ( Ω ¯ S , p ¯ ) that is smaller than the pair ( Ω ˜ S , p ˜ ) corresponding to an element of K ( Γ ) .

Appendix B.2. Proof of Theorem 4

The multi-index Γ contains three components: Ω S , p , α . For given Ω S the selection of the structures indices p max introducing the fewest restrictions, such that in its boundary all possible p.u.t. matrices occur, was discussed in Section 4.2. Choosing this maximal element p max then implies that all systems of given state space unit root structure correspond to a multi-index that is smaller than or equal to ( Ω S , p max , β ) , where β is a Kronecker index corresponding to state space dimension n . For the Kronecker indices of order n it is known that there exists one index α , g such that M α , g is open and dense in M n ¯ . The set M Ω S , p max , β is, therefore, contained in M Ω S , p max , α , g ¯ which implies (14) with Γ g ( Ω S , n ) : = ( Ω S , p max , α , g ) .
For the second claim choose an arbitrary state space realization ( A , B , C ) in canonical form such that π ( A , B , C ) M ( Ω S , n ) for arbitrary Ω S . Define the sequence ( A j , B j , C j ) j N by A j = ( 1 j 1 ) A , B j = ( 1 j 1 ) B , C j = C . Then λ | max | ( A j ) < 1 holds for all j, which implies π ( A j , B j , C j ) M Γ α , g ( n ) ¯ for every n n u ( Ω s ) + n and every j. The continuity of π implies π ( A , B , C ) = lim j π ( A j , B j , C j ) M Γ α , g ( n ) ¯ .

Appendix B.3. Proof of Theorem 5

(i)
Assume that there exists a sequence k i M Γ ¯ converging to a transfer function k 0 M Γ . For such a sequence the size of the Jordan blocks for every unit root are identical from some i 0 onwards since eigenvalues depend continuously on the matrices (cf. Chatelin 1993): Thus, the stable part of the transfer functions k i must converge to the stable part of the transfer function k 0 , since the sum of the algebraic multiplicity of all eigenvalues inside the open unit disc cannot drop in the limit. Since V α (the set of all stable transfer functions with Kronecker index α ) is open in V α ¯ according to Hannan and Deistler (1988, Theorem 2.5.3) this implies that the stable part of k i has Kronecker index α from some i 0 onwards.
For the unstable part of the transfer function note that in M Γ for every unit root z j the rank of ( A z j I n ) r is equal for every r. Thus, the maximum over M Γ ¯ cannot be larger due to lower semi-continuity of the rank. It follows that for k i k 0 the ranks of ( A z j I n ) r for all | z j | = 1 and for all r N 0 are identical to the ranks corresponding to k 0 from some point onwards showing that k i has the same state space unit root structure as k 0 from some i 0 onwards. Finally, the p.u.t. structure of sub-blocks of B k clearly introduces an open set being defined via strict inequalities. This shows that k i M Γ from some i 0 onwards implying that M Γ is open in M Γ ¯ .
(ii)
The first inclusion was shown in Theorem 3. Comparing Definitions 10 and 11 we see Γ ˜ K ( Γ g ) M Γ ˜ ( Ω ˜ S , n ˜ ) A ( Ω S , n ) M ( Ω ˜ S , n ˜ ) . By the definition of the partial ordering (compare Definition 9) Γ ˜ Γ g M Γ ˜ ( Ω ˜ S , n ˜ ) ( Ω S , n ) M ( Ω ˜ S , n ˜ ) holds. Together these two statements imply the second inclusion.
( Ω ˜ S , n ˜ ) A ( Ω S , n ) ( Ω ˇ S , n ˇ ) ( Ω ˜ S , n ˜ ) M ( Ω ˇ S , n ˇ ) M Γ g ( Ω s , n ) ¯ is a consequence of the following two statements:
(a)
If M ( Ω ˜ S , n ˜ ) M ( Ω S , n ) ¯ , then ( Ω ˇ S , n ˇ ) ( Ω ˜ S , n ˜ ) M ( Ω ˇ S , n ˇ ) M ( Ω S , n ) ¯ .
(b)
If ( Ω ˜ S , n ˜ ) A ( Ω S , n ) , then M ( Ω ˜ S , n ˜ ) M ( Ω S , n ) ¯ .
For (a) note that for an arbitrary transfer function k ˇ M ( Ω ˇ S , n ˇ ) with ( Ω ˇ S , n ˇ ) ( Ω ˜ S , n ˜ ) there is a multi-index Γ ˇ such that k ˇ M Γ ˇ . By the definition of the partial ordering (compare Definition 9) we find a multi-index Γ ˜ Γ ˇ such that M Γ ˜ M ( Ω ˜ S , n ˜ ) . By Theorem 3 and the continuity of π we have M Γ ˇ π ( Δ Γ ˜ ¯ ) M Γ ˜ ¯ . Since M ( Ω ˜ S , n ˜ ) ¯ M ( Ω S , n ) ¯ by assumption, k ˇ M Γ ˜ ¯ M ( Ω ˜ S , n ˜ ) ¯ M ( Ω S , n ) ¯ which finishes the proof of (a).
With respect to (b) note that by Definition 11, A ( Ω S , n ) contains transfer functions with two types of state space unit root structures. First, A ˜ u corresponding to state space unit root Ω ˜ S may be of the form
S A ˜ u S = A u J 12 0 J 2 .
Second, A ˇ u corresponding to state space unit root Ω ˇ S may be of the form (A1) where off-diagonal elements of A u are replaced by zero. To prove (b) we need to show that for both cases the corresponding transfer function is contained in M ( Ω S , n ) ¯ .
We start by showing that in the second case the transfer function k ˇ is contained in M ( Ω ˜ S , n ˜ ) ¯ , where Ω ˜ S is the state space unit root structure corresponding to A ˜ u in (A1). For this, consider the sequence
A j = 1 j 1 0 1 , B j = B 1 B 2 , C j = C 1 C 2 .
Clearly, every system ( A j , B j , C j ) corresponds to an I ( 2 ) process, while the limit for j corresponds to an I ( 1 ) process. This shows that it is possible in the limit to trade one I ( 2 ) component with two I ( 1 ) components leading to more transfer functions in the T p t closure of M Γ g ( Ω S , n ) than only the ones included in π ( Δ Γ g ( Ω S , n ) ¯ ) , where the off-diagonal entry in A j is restricted to equal one and hence the corresponding sequence of systems in the canonical form diverges to infinity. In a sense these systems correspond to “points at infinity”: For the example given above we obtain the canonical form
A j = 1 1 0 1 , B j = B 1 B 2 / j , C j = C 1 j C 2 .
Thus, the corresponding parameter vector for the entries in B j , 2 converges to zero and the ones corresponding to C j , 2 to infinity.
Generalizing this argument shows that every transfer function corresponding to a pair ( Ω ˇ S , n ˇ ) in A ( Ω ˜ S , n ˜ ) , where A ˇ u can be obtained by replacing off-diagonal entries of A u with zero, can be reached from within M ( Ω ˜ S , n ˜ ) .
To prove k ˜ M ( Ω S , n ) ¯ in the first case, where the state space unit root structure is extended as visible in Equation (A1), consider the sequence:
A ˜ j = 1 1 0 1 j 1 , B ˜ j = B 1 B 2 , C ˜ j = C 1 C 2 ,
corresponding to the following system in canonical form (except that the stable subsystem is not necessarily in echelon canonical form)
A ˜ j = 1 0 0 1 j 1 , B ˜ j = B 1 + j B 2 j B 2 , C ˜ j = C 1 C 1 C 2 / j .
This sequence shows that there exists a sequence of transfer functions corresponding to I ( 1 ) processes with one common trend that converge to a transfer function corresponding to an I ( 2 ) system. Again, in the canonical form this cannot happen as there the ( 1 , 2 ) entry of A ˜ j would be restricted to be equal to zero. At the same time note that the dimension of the stable system is reduced due to one component of the state changing from the stable to the unit root part.
Now for a unit root structure Ω ˜ S such that ( Ω ˜ S , n ˜ ) A ( Ω S , n ) , satisfying
S A ˜ u S = A u J 12 0 J 2 ,
the Jordan blocks corresponding to Ω S are sub-blocks of the ones corresponding to Ω ˜ S , potentially involving a reordering of coordinates using the permutation matrix S. Taking as the approximating sequence of transfer functions k ˜ j M Γ g ( Ω S , n ) k 0 M Γ g ( Ω ˜ S , n ˜ ) that have the same structure Ω ˜ S but replacing J 2 by j 1 j J 2 leads to processes with state space unit root structure Ω S .
For the stable part of k ˜ j we can separate the part containing poles tending to the unit circle (contained in J 2 ) and the remaining transfer function k ˜ j , s , which has Kronecker indices α ˜ α . However, the results of Hannan and Deistler (1988, Theorem 2.5.3) then imply that the limit remains in M α ¯ and hence allows for an approximating sequence in M α .
Both results combined constitute the whole set of attainable state space unit root structures in Definition 11 and prove (b).
As follows from Corollary 4, M ( Ω S , n ) ¯ = M Γ g ( Ω S , n ) ¯ . Thus, (b) implies ( Ω ˜ S , n ˜ ) A ( Ω S , n ) M ( Ω ˜ S , n ˜ ) M Γ g ( Ω S , n ) ¯ and (a) adds the second union showing the subset inclusion.
It remains to show equality for the last set inclusion. Thus, we need to show that for k j M Γ g ( Ω S , n ) , k j k 0 , it holds that k 0 M ( Ω ˜ S , n ˜ ) , where ( Ω ˜ S , n ˜ ) ( Ω ˇ S , n ˇ ) A ( Ω S , n ) . To this end note that the rank of a matrix is a lower semi-continuous function such that for a sequence of matrices E j with limit E 0 , we have
rank ( lim j E j ) = rank ( E 0 ) lim inf j rank ( E j ) .
Then, consider a sequence k j ( z ) M Γ g ( Ω s , n ) , j N . We can find a converging sequence of systems ( A j , B j , C j ) realizing k j ( z ) . Therefore, choosing E j = ( A j z k I n ) r we obtain that
rank ( ( A 0 z k I n ) t ) n r = 1 t d j , h k r + 1 k ,
since k j ( z ) M Γ g ( Ω s , n ) implies that the number d j , h k r + 1 k of the generalized eigenvalues at the unit roots is governed by the entries of the state space unit root structure Ω s . This implies that r = 1 t d j , h k r + 1 k r = 1 t d 0 , h k r + 1 k for t = 1 , 2 , . . . , n . Consequently, the limit has at least as many chains of generalized eigenvalues of each maximal length as dictated by the state space unit root structure Ω S for each unit root of the limiting system.
Rearranging the rows and columns of the Jordan normal form using a permutation matrix S it is then obvious that either the limiting matrix A 0 has additional eigenvalues, where thus
S A 0 S = A j J ˜ 12 0 J ˜ 2
must hold. Or upper diagonal entries in A j must be changed from ones to zeros in order to convert some of the chains to lower order. One example in this respect was given above: For A j = 1 1 / j 0 1 the rank of ( A j I 2 ) r is equal to 1 for r = 1 and 0 for r = 2 . For the limit we obtain A 0 = I 2 and hence the rank is zero for r = 1 , 2 . The corresponding indices are d j , 1 1 = 1 , d j , 2 1 = 1 for the approximating sequence and d 0 , 1 1 = 0 , d 0 , 2 1 = 2 for the limit respectively. Summing these indices starting from the last one, one obtains d j , 2 1 = 1 d 0 , 2 1 = 2 and d j , 1 1 + d j , 2 1 = 2 d 0 , 1 1 + d 0 , 2 1 = 2 .
Hence the state space unit root structure corresponding to ( A 0 , B 0 , C 0 ) must be attainable according to Definition 11. The number of stable state components must decrease accordingly.
Finally, the limiting system ( A 0 , B 0 , C 0 ) is potentially not minimal. In this case the pair ( Ω ˜ S , n ˜ ) is reduced to a smaller one, concluding the proof.

References

  1. Amann, Herbert, and Joachim Escher. 2008. Analysis III. Basel: Birkhäuser Basel. [Google Scholar]
  2. Aoki, Massanao. 1990. State Space Modeling of Time Series. New York: Springer. [Google Scholar]
  3. Bauer, Dietmar, and Martin Wagner. 2003. On Polynomial Cointegration in the State Space Framework. Mimeo. [Google Scholar] [CrossRef]
  4. Bauer, Dietmar, and Martin Wagner. 2005. Autoregressive Approximations of Multiple Frequency I(1) Processes. IHS Economics Series, Institut für Höhere Studien–Institute for Advanced Studies (IHS) Vienna, No. 174. Available online: http://hdl.handle.net/10419/72306 (accessed on 3 November 2020).
  5. Bauer, Dietmar, and Martin Wagner. 2012. A State Space Canonical Form for Unit Root Processes. Econometric Theory 28: 1313–49. [Google Scholar] [CrossRef]
  6. Boswijk, H. Peter, and Paolo Paruolo. 2017. Likelihood Ratio Tests of Restrictions on Common Trends Loading Matrices in I(2) VAR Systems. Econometrics 5: 28. [Google Scholar] [CrossRef] [Green Version]
  7. Campbell, John Y. 1994. Inspecting the Mechanism: An Analytical Approach to the Stochastic Growth Model. Journal of Monetary Economics 33: 463–506. [Google Scholar] [CrossRef] [Green Version]
  8. Chatelin, Françoise. 1993. Eigenvalues of Matrices. New York: John Wiley & Sons. [Google Scholar]
  9. Engle, Robert F., and Clive W.J. Granger. 1987. Cointegration and Error Correction: Representation, Estimation and Testing. Econometrica 55: 251–76. [Google Scholar] [CrossRef]
  10. Golub, Gene H., and Charles F. van Loan. 1996. Matrix Computations, 3rd ed. Baltimore: The Johns Hopkins University Press. [Google Scholar]
  11. Granger, Clive W.J. 1981. Some Properties of Time Series Data and Their Use in Econometric Model Specification. Journal of Econometrics 16: 121–30. [Google Scholar] [CrossRef]
  12. Hannan, Edward J., and Manfred Deistler. 1988. The Statistical Theory of Linear Systems. New York: John Wiley & Sons. [Google Scholar]
  13. Hazewinkel, Michiel, and Rudolf E. Kalman. 1976. Invariants, Canonical Forms and Moduli for Linear, Constant, Finite Dimensional, Dynamical Systems. In Mathematical Systems Theory. Edited by Giovanni Marchesini and Sanjoy Kumar Mitter. Berlin: Springer, chp. 4. pp. 48–60. [Google Scholar]
  14. Jensen, Andreas N. 2013. The Nesting Structure of the Cointegrated Vector Autoregressive Models. Paper presented at the QED Conference 2013, Vienna, Austria, May 3–4. [Google Scholar]
  15. Johansen, Søren. 1991. Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models. Econometrica 59: 1551–80. [Google Scholar] [CrossRef]
  16. Johansen, Søren. 1995. Likelihood-Based Inference in Cointegrated Vector Auto-Regressive Models. Oxford: Oxford University Press. [Google Scholar]
  17. Johansen, Søren. 1997. Likelihood Analysis of the I(2) Model. Scandinavian Journal of Statistics 24: 433–62. [Google Scholar] [CrossRef]
  18. Johansen, Søren. 2006. Statistical Analysis of Hypotheses on the Cointegrating Relations in the I(2) Model. Journal of Econometrics 132: 81–115. [Google Scholar] [CrossRef]
  19. Johansen, Søren, and Morton ∅. Nielsen. 2018. The Cointegrated Vector Autoregressive Model with General Deterministic Terms. Journal of Econometrics 202: 214–29. [Google Scholar] [CrossRef] [Green Version]
  20. Johansen, Søren, and Ernst Schaumburg. 1999. Likelihood Analysis of Seasonal Cointegration. Journal of Econometrics 88: 301–39. [Google Scholar] [CrossRef] [Green Version]
  21. Juselius, Katarina. 2006. The Cointegrated VAR Model: Methodology and Applications. Oxford: Oxford University Press. [Google Scholar]
  22. Lewis, Richard, and Gregory C. Reinsel. 1985. Prediction of Multivariate Time Series by Autoregressive Model Fitting. Journal of Multivariate Analysis 16: 393–411. [Google Scholar] [CrossRef] [Green Version]
  23. Otto, Markus. 2011. Rechenmethoden für Studierende der Physik im ersten Jahr. Heidelberg: Spektrum Akademischer Verlag. [Google Scholar]
  24. Paruolo, Paolo. 1996. On the Determination of Integration Indices in I(2) Systems. Journal of Econometrics 72: 313–56. [Google Scholar] [CrossRef]
  25. Paruolo, Paolo. 2000. Asymptotic Efficiency of the Two Stages Estimator in I(2) Systems. Econometric Theory 16: 524–50. [Google Scholar] [CrossRef]
  26. Poskitt, Donald S. 2006. On the Identification and Estimation of Nonstationary and Cointegrated ARMAX Systems. Econometric Theory 22: 1138–75. [Google Scholar] [CrossRef]
  27. Rahbek, Anders, Hans C. Kongsted, and Clara Jorgensen. 1999. Trend Stationarity in the I(2) Cointegration Model. Journal of Econometrics 90: 265–89. [Google Scholar] [CrossRef]
  28. Saikkonen, Pentti. 1992. Estimation and Testing of Cointegrated Systems by an Autoregressive Approximation. Econometric Theory 8: 1–27. [Google Scholar] [CrossRef]
  29. Saikkonen, Pentti, and Ritva Luukkonen. 1997. Testing Cointegration in Infinite Order Vector Autoregressive Processes. Journal of Econometrics 81: 93–126. [Google Scholar] [CrossRef]
  30. Sims, Christopher A., James H. Stock, and Mark W. Watson. 1990. Inference in Linear Time Series Models with Some Unit Roots. Econometrica 58: 113–44. [Google Scholar] [CrossRef]
  31. Wagner, Martin. 2018. Estimation and Inference for Cointegrating Regressions. In Oxford Research Encyclopedia of Economics and Finance. Oxford: Oxford University Press. [Google Scholar]
  32. Wagner, Martin, and Jaroslava Hlouskova. 2009. The Performance of Panel Cointegration Methods: Results from a Large Scale Simulation Study. Econometric Reviews 29: 182–223. [Google Scholar] [CrossRef]
  33. Zellner, Arnold, and Franz C. Palm. 1974. Time Series Analysis and Simultaneous Equation Econometric Models. Journal of Econometrics 2: 17–54. [Google Scholar] [CrossRef]
1
Please note that the original contribution to the estimation of cointegrating relationship has been least squares estimation in a non- or semi-parametric regression setting, see, e.g., Engle and Granger (1987). A recent survey of regression-based cointegration analysis is provided by Wagner (2018).
2
The complexity of these inter-relations is probably well illustrated by the fact that only Jensen (2013) notes that “even though the I(2) models are formulated as submodels of I(1) models, some I(1) models are in fact submodels of I(2) models”.
3
The literature often uses VAR models as approximations, based on the fact that VARMA processes often can be approximated by VAR models with the order tending to infinity with the sample size at certain rates. This line of work goes back to Lewis and Reinsel (1985) for stationary processes and was extended to (co)integrated processes by Saikkonen (1992), Saikkonen and Luukkonen (1997) and Bauer and Wagner (2005). In addition to the issue of the existence and properties of a sequence of VAR approximations, the question whether a VAR approximation is parsimonious remains.
4
Below we often use the term “likelihood” as short form of “likelihood function”.
5
We are confident that this dual usage of notation does not lead to confusion.
6
Our definition of VAR processes differs to a certain extent from some widely used definitions in the literature. Given our focus on unit root and cointegration analysis we, unlike Hannan and Deistler (1988), allow for determinantal roots at the unit circle that, as is well known, lead to integrated processes. We also include deterministic components in our definition, i.e., we allow for a special case of exogenous variables, compare also Remark 2 below. There is, however, also a large part of the literature that refers to this setting simply as (cointegrated) vector autoregressive models, see, e.g., Johansen (1995) and Juselius (2006).
7
Of course, the statistical properties of the parameter estimators depend in many ways on the deterministic components.
8
The set V p is endowed with the pointwise topology T p t , defined in Section 3. For now, in the context of VAR models, it suffices to know that convergence in pointwise topology is equivalent to convergence of the VAR coefficient matrices a 1 , , a p in the Frobenius norm.
9
Please note that in case of restricted estimation, i.e., zero restrictions or cross-equation restrictions, OLS is not asymptotically equivalent to PML in general.
10
A similar property holds for V p , r R R R being a “thin” subset of V p O L S . This implies that the probability that the OLS estimator calculated over V p O L S corresponds to an element V p , r R R R V p O L S is equal to zero in general.
11
Below Example 3 we clarify how these indices are related to the state space unit root structure defined in Bauer and Wagner (2012, Definition 2) and link these to the dimensions of the cointegrating spaces in Section 5.2.
12
Uniqueness of realizations in the VAR case stems from the normalization m ( z ) b ( z ) = I s , which reduces the class of observationally equivalent VAR realizations of the same transfer function k ( z ) = a ( z ) 1 b ( z ) , with b ( z ) = I s , to a singleton.
13
The pair ( a ( z ) , b ( z ) ) is left coprime if all its left divisors are unimodular matrices. Unimodular matrices are polynomial matrices with constant non-zero determinant. Thus, pre-multiplication of, e.g., a ( z ) with a unimodular matrix u ( z ) does not affect the determinantal roots that shape the dynamic behavior of the solutions of VAR models.
14
When using the echelon canonical form, the partitioning is according to the so-called Kronecker indices related to a basis selection for the row-space of the Hankel matrix corresponding to the transfer function k ( z ) , see, e.g., Hannan and Deistler (1988, chp. 2.4) for a precise definition.
15
Here and below we will only consider state space systems in so-called innovation representation, with the same error in both the output equation and the state equation. Since every state space system has an innovation representation this is no restriction, compare Aoki (1990, chp. 7.1).
16
The definition of cointegrating spaces as linear subspaces allows to characterize them by a basis and implies a well-defined dimension. These advantages, however, have the implication that the zero vector is an element of all cointegrating spaces, despite not being a cointegrating vector in our definition, where the zero vector is excluded. This issue is well-known of course in the cointegration literature.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bauer, D.; Matuschek, L.; de Matos Ribeiro, P.; Wagner, M. A Parameterization of Models for Unit Root Processes: Structure Theory and Hypothesis Testing. Econometrics 2020, 8, 42. https://doi.org/10.3390/econometrics8040042

AMA Style

Bauer D, Matuschek L, de Matos Ribeiro P, Wagner M. A Parameterization of Models for Unit Root Processes: Structure Theory and Hypothesis Testing. Econometrics. 2020; 8(4):42. https://doi.org/10.3390/econometrics8040042

Chicago/Turabian Style

Bauer, Dietmar, Lukas Matuschek, Patrick de Matos Ribeiro, and Martin Wagner. 2020. "A Parameterization of Models for Unit Root Processes: Structure Theory and Hypothesis Testing" Econometrics 8, no. 4: 42. https://doi.org/10.3390/econometrics8040042

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop