*Article* **Johansen's Reduced Rank Estimator Is GMM**

#### **Bruce E. Hansen**

Department of Economics, University of Wisconsin, Madison, WI 53706, USA; bruce.hansen@wisc.edu; Tel.: +1-608-263-3880

Received: 30 January 2018; Accepted: 16 May 2018; Published: 18 May 2018

**Abstract:** The generalized method of moments (GMM) estimator of the reduced-rank regression model is derived under the assumption of conditional homoscedasticity. It is shown that this GMM estimator is algebraically identical to the maximum likelihood estimator under normality developed by Johansen (1988). This includes the vector error correction model (VECM) of Engle and Granger. It is also shown that GMM tests for reduced rank (cointegration) are algebraically similar to the Gaussian likelihood ratio tests. This shows that normality is not necessary to motivate these estimators and tests.

**Keywords:** GMM; VECM; reduced rank

**JEL Classification:** C3

#### **1. Introduction**

The vector error correction model (VECM) of Engle and Granger (1987) is one of the most widely used time-series models in empirical practice. The predominant estimation method for the VECM is the reduced-rank regression method introduced by Johansen (1988, 1991, 1995). Johansen's estimation method is widely used because it is straightforward, it is a natural extension of the VAR model of Sims (1980), and it is computationally tractable.

Johansen motivated his estimator as the maximum likelihood estimator (MLE) of the VECM under the assumption that the errors are i.i.d. normal. For many users, it is unclear whether the estimator has a broader justification. In contrast, it is well known that least-squares estimation is both maximum likelihood under normality and method of moments under uncorrelatedness.

This paper provides the missing link. It is shown that Johansen's reduced-rank estimator is algebraically identical to the generalized method of moments (GMM) estimator of the VECM, under the imposition of conditional homoscedasticity. This GMM estimator only uses uncorrelatedness and homoscedasticity. Thus Johansen's reduced-rank estimator can be motivated under much broader conditions than normality.

The asymptotic efficiency of the estimator in the GMM class relies on the assumption of homoscedasticity (but not normality). When homoscedasticity fails, the reduced-rank estimator loses asymptotic efficiency but retains its interpretation as a GMM estimator.

It is also shown that the GMM tests for reduced (cointegration) rank are nearly identical to Johansen's likelihood ratio tests. Thus the standard likelihood ratio tests for cointegration can be interpreted more broadly as GMM tests.

This paper does not introduce new estimation nor inference methods. It merely points out that the currently used methods have a broader interpretation than may have been understood. The results leave open the possibility that new GMM methods that do not impose homoscedasticity could be developed.

This connection is not new. In a different context, Adrian et al. (2015) derived the equivalence of the likelihood and minimum-distance estimators of the reduced-rank model. The equivalence between the Limited Information Maximum Likelihood (LIML) estimator (which has a dual relation with reduced-rank regression) and a minimum distance estimator was discovered by Goldberger and Olkin (1971). Recently, Kolesár (2018) drew out connections between likelihood-based and minimum-distance estimation of endogenous linear regression models.

This paper is organized as follows. Section 2 introduces reduced-rank regression models and Johansen's estimator. Section 3 presents the GMM and states the main theorems demonstrating the equivalence of the GMM and MLE. Section 4 presents the derivation of the GMM estimator. Section 5 contains two technical results relating generalized eigenvalue problems and the extrema of quadratic forms.

#### **2. Reduced-Rank Regression Models**

The VECM for *p* variables of cointegrating rank *r* with *k* lags is

$$
\Delta X\_t = a\beta'X\_{t-1} + \sum\_{i=1}^{k-1} \Gamma\_i \Delta X\_{t-i} + \Phi D\_t + \mathfrak{e}\_{t\_f} \tag{1}
$$

where *Dt* are the deterministic components. Observations are *t* = 1, ..., *T*. The matrices *α* and *β* are *p* × *r* with *r* ≤ *p*. This is a famous workhorse model in applied time series, largely because of the seminal work of Engle and Granger (1987).

The primary estimation method for the VECM is known as reduced-rank regression and was developed by Johansen (1988, 1991, 1995). Algebraically, the VECM (1) is a special case of the reduced-rank regression model:

$$\mathbf{Y}\_t = \mathbf{a}\beta'\mathbf{X}\_t + \mathbf{Y}\mathbf{Z}\_t + \mathbf{e}\_t. \tag{2}$$

where *Yt* is *p* × 1, *Xt* is *m* × 1, and *Zt* is *q* × 1. The coefficient matrix *α* is *p* × *r* and *β* is *m* × *r* with *r* ≤ min(*m*, *p*). Johansen derived the MLE for model (2) under the assumption that *et* is i.i.d. *N* (0, Ω). This immediately applies to the VECM (1) and is the primary application of reduced-rank regression in econometrics.

Canonical correlations were introduced by Hotelling (1936), and reduced-rank regression was introduced by Bartlett (1938). A complete theory was developed by Anderson and Rubin (1949, 1950) and Anderson (1951). These authors developed the MLE for the model:

$$Y\_t = \Pi X\_t + e\_{t\prime} \tag{3}$$

$$
\Gamma'\Pi = 0,\tag{4}
$$

where Γ is *p* × (*p* − *r*) and is unknown. This is an alternative parameterization of (2) without the covariates *Zt*. Anderson and Rubin (1949, 1950) considered the case *p* − *r* = 1 and primarily focused on estimation of the vector Γ. Anderson (1951) considered the case *p* − *r* ≥ 1.

While the models (2) and (3)–(4) are equivalent and thus have the same MLE, the different parameterizations led the authors to different derivations. Anderson and Rubin derived the estimator of (3) and (4) by a tedious application of constrained optimization. (Specifically, they maximized the likelihood of (3) imposing the constraint (4) using Lagrange multiplier methods. The solution turned out to be tedious because (4) is a nonlinear function of the parameters Γ and Π.) The derivation is so cumbersome that it is excluded from nearly all statistics and econometrics textbooks, despite the fact that it is the source of the famous LIML estimator.

The elegant derivation used by Johansen (1988) is algebraically unrelated to that of Anderson-Rubin and is based on applying a concentration argument to the product structure in (2). It is similar to the derivation in Tso (1981), although the latter did not include the covariates *Zt*. Johansen's derivation is algebraically straightforward and thus is widely taught to students.

It is useful to briefly describe the likelihood problem. The log-likelihood for model (2) under the assumption that *et* is i.i.d. *N* (0, Ω) is

$$\ell\left(a,\beta,\Psi,\Omega\right) = -\frac{T}{2}\log\det\Omega - \frac{1}{2}\sum\_{t=1}^{T}\left(\mathbf{Y}\_{l} - a\beta^{\prime}\mathbf{X}\_{l} - \Psi\mathbf{Z}\_{l}\right)^{\prime}\Omega^{-1}\left(\mathbf{Y}\_{l} - a\beta^{\prime}\mathbf{X}\_{l} - \Psi\mathbf{Z}\_{l}\right). \tag{5}$$

The MLE maximizes - (*α*, *β*, Ψ, Ω). Johansen's solution is as follows. Define the projection matrix *MZ* = *IT* − *Z* (*Z Z*) <sup>−</sup><sup>1</sup> *<sup>Z</sup>* and the residual matrices *<sup>Y</sup>*<sup>3</sup> <sup>=</sup> *MZY* and *<sup>X</sup>*<sup>3</sup> <sup>=</sup> *MZX*. Consider the generalized eigenvalue problem:

$$\left| \tilde{X}' \tilde{Y} \left( \tilde{Y}' \tilde{Y} \right)^{-1} \tilde{Y}' \tilde{X} - \tilde{X}' \tilde{X} \lambda \right| = 0. \tag{6}$$

The solutions 1 > - *λ*<sup>1</sup> > ··· > - *λ<sup>p</sup>* > 0 satisfy

$$
\widetilde{X}'\widetilde{Y}\left(\widetilde{Y}'\widetilde{Y}\right)^{-1}\widetilde{Y}'\widetilde{X}\nu\_i = \widetilde{X}'\widetilde{X}\widehat{\nu}\_i\widehat{\lambda}\_i.
$$

where (- *<sup>λ</sup>i*, *<sup>ν</sup><sup>i</sup>*) are known as the generalized eigenvalues and eigenvectors of *X*3 *Y*3 *Y*3 *Y*3 −1 *Y*3 *X*3 with respect to *X*3 *<sup>X</sup>*3. The normalization *<sup>ν</sup>*- *<sup>i</sup> X*3 *X*3*ν<sup>i</sup>* = 1 is imposed.

Given the normalization *β X*3 *X*3*β* = *Ir*, Johansen's reduced-rank estimator for *β* is

$$
\widehat{\beta}\_{\text{mle}} = [\widehat{\nu}\_{1\prime} \dots \widehat{\nu}\_{r}] \dots
$$

The MLE *α*mle and Ψmle are found by least-squares regression of *Yt* on *β* - mle*Xt* and *Zt*.

#### **3. Generalized Method of Moments**

Define *Wt* = (*X <sup>t</sup>*, *Z t*) . The GMM estimator of the reduced-rank regression model (2) is derived under the standard orthogonality restriction:

$$\mathbb{E}\left(\mathcal{W}\_t \boldsymbol{\varepsilon}\_t'\right) = 0 \tag{7}$$

plus the homoscedasticity condition:

$$\mathbb{E}\left(\mathbf{e}\_t \mathbf{e}\_t' \otimes \mathcal{W}\_t \mathcal{W}\_t'\right) = \Omega \otimes \mathcal{Q}\_t \tag{8}$$

where Ω = E (*ete <sup>t</sup>*) and *<sup>Q</sup>* <sup>=</sup> <sup>E</sup> (*WtW <sup>t</sup>*). These moment conditions are implied by the normal regression model. (Equations (7) and (8) can be deduced from the first-order conditions for maximization of (5)). Because (7) and (8) can be deduced from (5) but not vice versa, the moment condition model (7) and (8) is considerably more general than the normal regression model (5).

The efficient GMM criterion (see Hansen 1982) takes the form

$$J\_r(\alpha, \beta, \Psi) = T\overline{\g}\_r(\alpha, \beta, \Psi)' \hat{V}^{-1} \overline{\g}\_r(\alpha, \beta, \Psi) \text{ .}$$

where

$$\overline{\mathcal{G}}\_{r}\left(a,\beta,\Psi\right) = \frac{1}{T}\sum\_{t=1}^{n}\left(\left(Y\_{t} - a\beta'X\_{t} - \Psi Z\_{t}\right)\otimes W\_{t}\right),\tag{9}$$

$$\begin{aligned} \mathcal{V} &= \hat{\Omega}\otimes\hat{\mathcal{Q}},\\ \hat{\Omega} &= \frac{1}{T}\sum\_{t=1}^{n}\hat{e}\_{t}\hat{e}\_{t\prime}^{\prime},\\ \hat{\mathcal{Q}} &= \frac{1}{T}\sum\_{t=1}^{n}W\_{t}W\_{t\prime}^{\prime}.\end{aligned}\tag{10}$$

and *et* are the least-squares residuals of the unconstrained model:

*t*=1

$$
\widehat{\boldsymbol{e}}\_t = \boldsymbol{Y}\_t - \widehat{\boldsymbol{\Pi}} \boldsymbol{X}\_t - \widehat{\boldsymbol{\Psi}} \boldsymbol{Z}\_t.
$$

The GMM estimator are the parameters that jointly minimize the criterion *Jr* (*α*, *β*, Ψ) subject to the normalization *β X*3 *X*3*β* = *Ir*:

$$\mathbb{P}\left(\widehat{\mathfrak{a}}\_{\mathsf{grmm}}, \widehat{\beta}\_{\mathsf{grmm}}, \Psi\_{\mathsf{grmm}}\right) = \underset{\substack{\mathfrak{p}' \mathfrak{X}' \mathfrak{X} \mathfrak{g} = I\_r}}{\text{argmin}} \ f\_r\left(\mathfrak{a}, \beta, \Psi\right).$$

The main contribution of the paper is the following surprising result.

$$\text{Theorem 1. } \left(\widehat{\alpha}\_{\text{ $\mathfrak{g}$ mm} $}, \widehat{\beta}\_{\text{$ \mathfrak{g} $mm}$ }, \Psi\_{\text{ $\mathfrak{g}$ mm}}\right) = \left(\widehat{\alpha}\_{\text{mle}}, \widehat{\beta}\_{\text{mle}}, \Psi\_{\text{mle}}\right) .$$

**Theorem 2.** *Jr*(*α*gmm, *β* gmm, Ψ gmm) = tr Ω- −1 *Y*3 *Y*3 <sup>−</sup> *T p* <sup>−</sup> *<sup>T</sup>* <sup>∑</sup>*<sup>r</sup> i*=1 - *λi* 1−- *λi , where* - *λ<sup>i</sup> are the eigenvalues from (6).*

Theorem 1 states that the GMM estimator is algebraically identical to the Gaussian maximum likelihood estimator.

This shows that Johansen's reduced-rank regression estimator is not tied to the normality assumption. This is similar to the equivalence of least-squares as a method of moments estimator and the Gaussian MLE in the regression context.

The key is the use of the homoscedastic weight matrix. This shows that the Johansen reduced-rank estimator is an efficient GMM estimator under conditional homoscedasticity. When homoscedasticity fails, the Johansen reduced-rank estimator continues to be a GMM estimator but is no longer the efficient GMM estimator.

It is important to understand that Theorem 1 is different from the trivial statement that the MLE is GMM applied to the first-order condition of the likelihood (e.g., Hall (2005), Section 3.8.1). Specifically, if you take the derivatives of the Gaussian log-likelihood function (5) and treat these as moment conditions and solve, this is a GMM estimator, and thus MLE can be interpreted as GMM. That is not what Theorem 1 states.

GMM hypothesis tests can be constructed by the difference in the GMM criteria; tests for reduced rank are considered, which in the context of VECM are tests for cointegration rank. The model

$$\mathbf{Y}\_t = \Pi \mathbf{X}\_t + \Psi Z\_t + \mathbf{e}\_t$$

is taken and the following hypotheses on reduced rank are considered:

$$
\mathbb{H}\_r \text{ : } \text{rank}\left(\Pi\right) = r.
$$

The GMM test statistic for H*<sup>r</sup>* against H*r*+<sup>1</sup> is

$$C\_{r,r+1} = \min\_{\beta' \bar{X}' \bar{X} \beta = I\_r} J\_r \left( \alpha, \beta, \Psi \right) - \min\_{\beta' \bar{X}' \bar{X} \beta = I\_{r+1}} J\_{r+1} \left( \alpha, \beta, \Psi \right).$$

The GMM test statistic for H*<sup>r</sup>* against H*<sup>p</sup>* is

$$\mathbb{C}\_{r,p} = \min\_{\beta' \bar{X}' \bar{X} \beta = I\_r} J\_r \left( \alpha, \beta, \Psi \right) - \min\_{\beta' \bar{X}' \bar{X} \beta = I\_{\bar{P}}} J\_{\bar{P}} \left( \alpha, \beta, \Psi \right).$$

**Theorem 3.** *The GMM test statistics for reduced rank are*

$$\begin{aligned} \mathbf{C}\_{r,r+1} &= T \left( \frac{\widehat{\lambda}\_{r+1}}{1 - \widehat{\lambda}\_{r+1}} \right) , \\ \mathbf{C}\_{r,p} &= T \sum\_{i=r+1}^{p} \frac{\widehat{\lambda}\_{i}}{1 - \widehat{\lambda}\_{i}} \mathbf{M} \end{aligned}$$

*where* - *λ<sup>i</sup> are the eigenvalues from (6).*

Here it is recalled in contrast that the likelihood ratio test statistics derived by Johansen are

$$\begin{aligned} LR\_{r,r+1} &= -T \log \left( 1 - \widehat{\lambda}\_{r+1} \right), \\ LR\_{r,p} &= -T \sum\_{i=r+1}^p \log \left( 1 - \widehat{\lambda}\_{r+1} \right). \end{aligned}$$

The GMM test statistic *Cr*,*r*+<sup>1</sup> and the likelihood ratio (LR) statistic *LRr*,*r*+<sup>1</sup> yield equivalent tests, as they are monotonic functions of one another. (If the bootstrap is used to assess significance, the two statistics will yield numerically identical *p*-values.) They are asymptotically identical under standard approximations and in practice will be nearly identical, because the eigenvalues - *λ<sup>i</sup>* tend to be quite small in value (at least under the null hypothesis), so that − log (1 − *λ*) ≈ *λ*/(1 − *λ*) ≈ *λ*. For *p* − (*r* + 1) > 1, the GMM test statistic *Cr*,*<sup>p</sup>* and the LR statistic *LRr*,*<sup>p</sup>* do not provide equivalent tests (they cannot be written as monotonic functions of one another), but they are also asymptotically equivalent and will be nearly identical in practice.

An interesting connection noted by a referee is that the statistic *Cr*,*p* was proposed by Pillai (1955) and Muirhead (1982, Section 11.2.8).

#### **4. Derivation of the GMM Estimator**

It is convenient to rewrite the criterion in standard matrix notation, defining the matrices *Y*, *X*, *Z*, and *W* by stacking the observations. Model (2) is

$$\mathcal{Y} = X\beta\alpha' + Z\Psi' + \varepsilon.$$

The moment (9) is

$$\overline{\mathcal{g}}\_r\left(a,\beta,\Psi\right) = \frac{1}{T}\operatorname{vec}\left(\mathcal{W}'\left(Y - X\beta a' - Z\Psi'\right)\right).$$

Using the relation

$$\text{tr}\left(ABCD\right) = \text{vec}\left(D'\right)'\left(\text{C}' \otimes A\right)\text{vec}\left(B\right)'$$

the following is obtained:

$$\begin{split} \mathcal{I}\_{\mathcal{I}}(\mathsf{a},\mathsf{\mathcal{B}},G) &= T\overline{\mathcal{g}}\_{r}(\mathsf{a},\mathsf{\mathcal{B}},\mathsf{\mathcal{V}})^{\prime} \left(\widehat{\Omega}^{-1}\otimes\widehat{\mathcal{Q}}^{-1}\right) \overline{\mathcal{g}}\_{r}(\mathsf{a},\mathsf{\mathcal{B}},\mathsf{\mathcal{V}}) \\ &= \text{vec}\left(\mathcal{W}^{\prime}\left(\mathcal{Y}-\mathcal{X}\mathbb{\mathcal{A}}\mathcal{a}^{\prime}-Z\mathbb{\mathcal{Y}}^{\prime}\right)\right)^{\prime} \left(\widehat{\Omega}^{-1}\otimes\left(\mathcal{W}^{\prime}\mathcal{W}\right)^{-1}\right) \text{vec}\left(\mathcal{W}^{\prime}\left(\mathcal{Y}-\mathcal{X}\mathbb{\mathcal{A}}\mathcal{a}^{\prime}-Z\mathbb{\mathcal{Y}}^{\prime}\right)\right) \\ &= \text{tr}\left(\widehat{\Omega}^{-1}\left(\mathcal{Y}-\mathcal{X}\mathbb{\mathcal{A}}\mathcal{a}^{\prime}-Z\mathbb{\mathcal{Y}}^{\prime}\right)^{\prime}W\left(\mathcal{W}^{\prime}\mathcal{W}\right)^{-1}W^{\prime}\left(\mathcal{Y}-\mathcal{X}\mathbb{\mathcal{A}}\mathcal{a}^{\prime}-Z\mathbb{\mathcal{Y}}^{\prime}\right)\right). \end{split}$$

Following the concentration strategy used by Johansen, *β* is fixed and *α* and Ψ are concentrated out, producing a concentrated criterion that is a function of *β* only. The system is linear in the regressors *Xβ* and *Z*. Given the homoscedastic weight matrix, the GMM estimator of (*α*, Ψ) is multivariate least-squares. Using the partialling out (residual regression) approach, the least-squares residual can

be written as the residual from the regression of *Y*3 on *X*3*β*, where *Y*3 = *MZY* and *X*3 = *MZX* are the residuals from regressions on *Z*. That is, the least-squares residual is

$$\begin{aligned} \widehat{e}(\boldsymbol{\beta}) &= \widetilde{\boldsymbol{Y}} - \widetilde{\boldsymbol{X}}\boldsymbol{\beta} \left(\boldsymbol{\beta}^{\prime}\widetilde{\boldsymbol{X}}^{\prime}\widetilde{\boldsymbol{X}}\boldsymbol{\beta}\right)^{-1} \boldsymbol{\beta}^{\prime}\widetilde{\boldsymbol{X}}^{\prime}\widetilde{\boldsymbol{Y}}^{\prime} \\ &= \widetilde{\boldsymbol{Y}} - \widetilde{\boldsymbol{X}}\boldsymbol{\beta}\boldsymbol{\beta}^{\prime}\widetilde{\boldsymbol{X}}^{\prime}\widetilde{\boldsymbol{Y}}\_{\prime} \end{aligned}$$

where the second equality uses the normalization *β X*3 *X*3*β* = *Ir*. Because the space spanned by *W* = (*X*, *Z*) equals that spanned by (*X*3, *Z*), the following can be written:

$$\mathcal{W}\left(\mathcal{W}'\mathcal{W}\right)^{-1}\mathcal{W}' = Z\left(Z'Z\right)^{-1}Z' + \tilde{X}\left(\tilde{X}'\tilde{X}\right)^{-1}\tilde{X}'.$$

Because *Z e*(*β*) = 0, then

$$\begin{aligned} \left(\mathcal{W}\left(\mathcal{W}'\mathcal{W}\right)^{-1}\mathcal{W}'\widehat{\boldsymbol{\varepsilon}}(\boldsymbol{\beta}) &= \widetilde{\mathcal{X}}\left(\widetilde{\mathcal{X}}'\widetilde{\mathcal{X}}\right)^{-1}\widetilde{\mathcal{X}}'\widehat{\boldsymbol{\varepsilon}}(\boldsymbol{\beta}) \\ &= \widetilde{\mathcal{X}}\left(\widetilde{\mathcal{X}}'\widetilde{\mathcal{X}}\right)^{-1}\widetilde{\mathcal{X}}'\widetilde{\mathcal{Y}} - \widetilde{\mathcal{X}}\boldsymbol{\beta}\boldsymbol{\beta}'\widetilde{\mathcal{X}}'\widetilde{\mathcal{Y}} \end{aligned}$$

and

$$\begin{aligned} \hat{\varepsilon}(\boldsymbol{\beta})' \boldsymbol{W} \left(\boldsymbol{W}' \boldsymbol{W}\right)^{-1} \boldsymbol{W}' \hat{\varepsilon}(\boldsymbol{\beta}) &= \hat{\mathbf{Y}}' \hat{\mathbf{X}} \left(\hat{\mathbf{X}}' \hat{\mathbf{X}}\right)^{-1} \hat{\mathbf{X}}' \hat{\mathbf{Y}} - \hat{\mathbf{Y}}' \hat{\mathbf{X}} \boldsymbol{\beta} \boldsymbol{\beta}' \hat{\mathbf{X}}' \hat{\mathbf{Y}} \\ &= \hat{\mathbf{Y}}' \hat{\mathbf{Y}} - \hat{\mathbf{Y}}' \boldsymbol{M}\_{\widehat{\mathbf{X}}} \hat{\mathbf{Y}} - \hat{\mathbf{Y}}' \hat{\mathbf{X}} \boldsymbol{\beta} \boldsymbol{\beta}' \hat{\mathbf{X}}' \hat{\mathbf{Y}}, \end{aligned}$$

where

$$M\_{\widetilde{X}} = I - \widetilde{X} \left(\widetilde{X}'\widetilde{X}\right)^{-1} \widetilde{X}'... $$

Using the partialling out (residual regression) approach, the variance estimator (10) can be written as

$$
\hat{\Omega} = \frac{1}{T} Y' \left( I - W \left( W'W \right)^{-1} W' \right) Y = \frac{1}{T} \hat{Y}' M\_{\hat{\mathcal{R}}} \hat{Y}.
$$

Thus the concentrated GMM criterion is

$$\begin{split} \mathcal{J}\_{\Gamma}^{\*}(\boldsymbol{\beta}) &= \text{tr}\left(\widehat{\boldsymbol{\Omega}}^{-1}\widehat{\boldsymbol{\varepsilon}}(\boldsymbol{\beta})^{\prime}\mathcal{W}\left(\boldsymbol{W}^{\prime}\boldsymbol{W}\right)^{-1}\boldsymbol{W}^{\prime}\widehat{\boldsymbol{\varepsilon}}(\boldsymbol{\beta})\right) \\ &= \text{tr}\left(\widehat{\boldsymbol{\Omega}}^{-1}\left(\widehat{\boldsymbol{Y}}^{\prime}\widehat{\boldsymbol{Y}}\right)\right) - \text{tr}\left(\widehat{\boldsymbol{\Omega}}^{-1}\left(\widehat{\boldsymbol{Y}}^{\prime}\boldsymbol{M}\_{\widehat{\boldsymbol{X}}}\widehat{\boldsymbol{Y}}\right)\right) - \text{tr}\left(\widehat{\boldsymbol{\Omega}}^{-1}\left(\widehat{\boldsymbol{Y}}^{\prime}\widehat{\boldsymbol{X}}\boldsymbol{\beta}\boldsymbol{\beta}^{\prime}\boldsymbol{X}^{\prime}\widehat{\boldsymbol{Y}}\right)\right) \\ &= \text{tr}\left(\widehat{\boldsymbol{\Omega}}^{-1}\left(\widehat{\boldsymbol{Y}}^{\prime}\widehat{\boldsymbol{Y}}\right)\right) - Tp - T\text{tr}\left(\boldsymbol{\beta}^{\prime}\widehat{\boldsymbol{X}}^{\prime}\widehat{\boldsymbol{Y}}\left(\widehat{\boldsymbol{Y}}^{\prime}\boldsymbol{M}\_{\widehat{\boldsymbol{X}}}\widehat{\boldsymbol{Y}}\right)^{-1}\widehat{\boldsymbol{Y}}^{\prime}\widehat{\boldsymbol{X}}\boldsymbol{\beta}\right). \end{split} \tag{11}$$

The GMM estimator minimizes *J*∗ *<sup>r</sup>* (*β*) or, equivalently, maximizes the third term in (11). This is a generalized eigenvalue problem. Lemma 2 (in the next section) shows that the solution is *β* gmm <sup>=</sup> [*ν*31, ..., *<sup>ν</sup>*3*r*] as claimed.

Because the estimates *α*gmm and Ψ gmm are found by regression given *β* gmm, and because this is equivalent with the MLE, it is also concluded that *<sup>α</sup>*gmm <sup>=</sup> *α*mle and Ψ gmm = Ψmle. This completes the proof of Theorem 1.

To establish Theorem 2, Lemma 2 also shows that the minimum of the criterion is

*Econometrics* **2018**, *6*, 26

$$\begin{split} \mathcal{J}\_{r}(\widehat{\operatorname{arg\,max}},\widehat{\boldsymbol{\beta}}\_{\operatorname{\textbf{grmm}}},\boldsymbol{\Psi}\_{\operatorname{\textbf{grmm}}}) &= \min\_{\boldsymbol{\beta}' \boldsymbol{\bar{X}'} \boldsymbol{\bar{X}} \boldsymbol{\beta} = \boldsymbol{I}\_{r}} J\_{r}(\boldsymbol{a},\boldsymbol{\beta},\boldsymbol{G}) \\ &= \min\_{\boldsymbol{\beta}' \boldsymbol{\bar{X}'} \boldsymbol{\bar{X}} \boldsymbol{\beta} = \boldsymbol{I}\_{r}} J\_{r}^{\*}(\boldsymbol{\beta}) \\ &= \operatorname{tr} \left( \widehat{\boldsymbol{\Omega}}^{-1} \left( \widehat{\boldsymbol{Y}}^{\boldsymbol{\epsilon}} \widehat{\boldsymbol{Y}} \right) \right) - \operatorname{Tp} - T \max\_{\boldsymbol{\beta}' \boldsymbol{\bar{X}'} \boldsymbol{\bar{X}} \boldsymbol{\bar{\beta}} = \boldsymbol{I}\_{r}} \operatorname{tr} \left( \boldsymbol{\beta}' \boldsymbol{\bar{X}'} \boldsymbol{\bar{Y}} \left( \widehat{\boldsymbol{Y}}^{\boldsymbol{\epsilon}} M\_{\widehat{\boldsymbol{X}}} \boldsymbol{\bar{Y}} \right)^{-1} \widehat{\boldsymbol{Y}}^{\boldsymbol{\epsilon}} \boldsymbol{\bar{X}} \boldsymbol{\beta} \right) \\ &= \operatorname{tr} \left( \widehat{\boldsymbol{\Omega}}^{-1} \left( \widehat{\boldsymbol{Y}}^{\boldsymbol{\epsilon}} \widehat{\boldsymbol{Y}} \right) \right) - T p - T \sum\_{i=1}^{r} \frac{\widehat{\boldsymbol{\Lambda}}\_{i}}{1 - \widehat{\boldsymbol{\Lambda}}\_{i}}. \end{split}$$

This establishes Theorem 2.

#### **5. Extrema of Quadratic Forms**

To establish Theorems 1 and 2, a simple extrema property is necessary. First, a simple property that relates the maximization of quadratic forms to generalized eigenvalues and eigenvectors is given. It is a slight extension of Theorem 11.13 of Magnus and Neudecker (1988).

**Lemma 1.** *Suppose A and C are p* × *p real symmetric matrices with C* > 0*. Let λ*<sup>1</sup> > ··· > *λ<sup>p</sup>* > 0 *be the generalized eigenvalues of A with respect to C and ν*1, ..., *ν<sup>p</sup> be the associated eigenvectors. Then*

$$\max\_{\beta' \subset \beta = l\_r} \text{tr}\left(\beta' A \beta\right) = \sum\_{i=1}^r \lambda\_i$$

*and*

$$\underset{\beta' \in \mathbb{R} = I\_r}{\text{argmax}} \,\text{tr}\left(\beta' A \beta\right) = \left[\nu\_{1'} \dots \nu\_{r}\right] \dots$$

**Proof.** Define *γ* = *C*1/2 *β* and *A* = *C*−1/2*AC*−1/2 . The eigenvalues of *A* are equal to the generalized eigenvalues *λ<sup>i</sup>* of *A* with respect to *C*. The associated eigenvectors of *A* are *C*1/2 *νi*. Thus by Theorem 11.13 of Magnus and Neudecker (1988),

$$\max\_{\beta' \subsetneq \beta = I\_r} \text{tr}\left(\beta' A \beta\right) = \max\_{\gamma' \gamma = I\_r} \text{tr}\left(\gamma' \overline{A} \gamma\right) = \sum\_{i=1}^r \lambda\_i$$

and

$$\begin{aligned} \underset{\beta' \in \beta = I\_r}{\operatorname{argmax}} \, & \operatorname{tr} \left( \beta' A \beta \right) = \operatorname{C}^{-1/2\prime} \underset{\gamma' \gamma = I\_r}{\operatorname{argmax}} \, \underset{\gamma' \gamma = I\_r}{\operatorname{argmax}} \, & \operatorname{tr} \left( \gamma' \overline{A} \gamma \right) \\ & = \operatorname{C}^{-1/2\prime} \underline{C}^{1/2\prime} \left[ \nu\_1, \dots, \nu\_r \right] \\ & = \left[ \nu\_1, \dots, \nu\_r \right] \end{aligned}$$

as claimed.

**Lemma 2.** *Let MX* = *I* − *X* (*X X*) <sup>−</sup><sup>1</sup> *X . If X X* > 0 *and Y MXY* > 0 *then*

$$\max\_{\beta'X'X\beta=I\_r} \text{tr}\left(\beta'X'Y(Y'M\_XY)^{-1}Y'X\beta\right) = \sum\_{i=1}^r \frac{\lambda\_i}{1-\lambda\_i}$$

*and*

$$\underset{\beta' \mathcal{X}' \mathcal{X} \beta = I\_r}{\text{argmax}} \, \text{tr}\left(\beta' \mathcal{X}' \mathcal{Y} (\mathcal{Y}' M\_{\mathcal{X}} \mathcal{Y})^{-1} \mathcal{Y}' \mathcal{X} \beta\right) = [\nu\_1, \dots, \nu\_r],$$

*where* 1 > *λ*<sup>1</sup> > ··· > *λ<sup>p</sup>* > 0 *are the generalized eigenvalues of X Y*(*Y Y*)−1*Y X with respect to X X, and ν*1, ..., *vp are the associated eigenvectors.*

*Econometrics* **2018**, *6*, 26

**Proof.** By Lemma 1,

$$\max\_{\boldsymbol{\beta}' \boldsymbol{\lambda}' \boldsymbol{X}' \boldsymbol{X} \boldsymbol{\beta} = I\_r} \text{tr}\left(\boldsymbol{\beta}' \mathbf{X}' \boldsymbol{Y} (\mathbf{Y}' \boldsymbol{M}\_X \mathbf{Y})^{-1} \mathbf{Y}' \mathbf{X} \boldsymbol{\beta}\right) = \sum\_{i=1}^r \widetilde{\lambda}\_i \boldsymbol{\lambda}\_i$$

and

$$\underset{\beta' \mathbf{X}' \mathbf{X} \beta = I\_r}{\operatorname{argmax}} \, \text{tr}\left(\beta' \mathbf{X}' \mathbf{Y} (\mathbf{Y}' \mathbf{M}\_X \mathbf{Y})^{-1} \mathbf{Y}' \mathbf{X} \beta\right) = \left[\tilde{\nu}\_1, \dots, \tilde{\nu}\_r\right],$$

where 3 *λ*<sup>1</sup> > ··· > 3 *λ<sup>p</sup>* > 0 are the generalized eigenvalues of *X Y*(*Y MXY*)−1*Y X* with respect to *X X* and *<sup>ν</sup>*31, ..., *<sup>ν</sup>*3*<sup>p</sup>* are the associated eigenvectors. The proof is established by showing that <sup>3</sup> *λ<sup>i</sup>* = *λi*/(1 − *λi*) and *<sup>ν</sup>*3*<sup>i</sup>* <sup>=</sup> *<sup>ν</sup>i*.

Let (*ν*3, <sup>3</sup> *λ*) be a generalized eigenvector/eigenvalue pair of *X Y*(*Y MXY*)−1*Y X* with respect to *X X*. The pair satisfies

$$X^{\prime}Y\left(\left(Y^{\prime}M\_{X}Y\right)^{-1}Y^{\prime}X\vec{\nu}=X^{\prime}X\vec{\nu}\vec{\lambda}.\tag{12}$$

By the Woodbury matrix identity (e.g., Magnus and Neudecker (1988), Equation (7)),

$$\begin{aligned} \left(\mathbf{Y}^{\prime}M\mathbf{x}\mathbf{Y}\right)^{-1} &= \left(\mathbf{Y}^{\prime}\mathbf{Y} - \mathbf{Y}^{\prime}\mathbf{X}\left(\mathbf{X}^{\prime}\mathbf{X}\right)^{-1}\mathbf{X}^{\prime}\mathbf{Y}\right)^{-1} \\ &= \left(\mathbf{Y}^{\prime}\mathbf{Y}\right)^{-1} + \left(\mathbf{Y}^{\prime}\mathbf{Y}\right)^{-1}\mathbf{Y}^{\prime}\mathbf{X}\left(\mathbf{X}^{\prime}\mathbf{X} - \mathbf{X}^{\prime}\mathbf{Y}\left(\mathbf{Y}^{\prime}\mathbf{Y}\right)^{-1}\mathbf{Y}^{\prime}\mathbf{X}\right)^{-1}\mathbf{X}^{\prime}\mathbf{Y}\left(\mathbf{Y}^{\prime}\mathbf{Y}\right)^{-1} \\ &= \left(\mathbf{Y}^{\prime}\mathbf{Y}\right)^{-1} + \left(\mathbf{Y}^{\prime}\mathbf{Y}\right)^{-1}\mathbf{Y}^{\prime}\mathbf{X}\left(\mathbf{X}^{\prime}M\mathbf{y}\mathbf{X}\right)^{-1}\mathbf{X}^{\prime}\mathbf{Y}\left(\mathbf{Y}^{\prime}\mathbf{Y}\right)^{-1}, \end{aligned}$$

where *MY* = *I* − *Y* (*Y Y*) <sup>−</sup><sup>1</sup> *Y* . Thus

$$\begin{aligned} \left(X^\prime Y \left(\mathbf{Y}^\prime M\_X \mathbf{Y}\right)^{-1} Y^\prime X &= X^\prime Y \left(\mathbf{Y}^\prime Y\right)^{-1} Y^\prime X + X^\prime Y \left(\mathbf{Y}^\prime Y\right)^{-1} Y^\prime X \left(X^\prime M\_Y X\right)^{-1} X^\prime Y \left(\mathbf{Y}^\prime Y\right)^{-1} Y^\prime X\\ &= X^\prime P\_Y X + X^\prime P\_Y X \left(X^\prime M\_Y X\right)^{-1} X^\prime P\_Y X\\ &= X^\prime X \left(X^\prime M\_Y X\right)^{-1} X^\prime P\_Y X,\end{aligned}$$

where *PY* = *Y* (*Y Y*) <sup>−</sup><sup>1</sup> *Y* and the final equality uses *X PYX* = *X X* − *X MYX*. Substituting into (12) produces

$$X^\prime X \left(X^\prime M\_Y X\right)^{-1} X^\prime P\_Y X \tilde{\nu} = X^\prime X \tilde{\nu} \tilde{\lambda} ...$$

Multiplying both sides by (*X MYX*)(*X X*)−1, this implies

$$\begin{aligned} X^\prime P\_Y X \widetilde{\nu} &= X^\prime M\_Y X \widetilde{\nu} \widetilde{\lambda} \\ &= X^\prime X \widetilde{\nu} \widetilde{\lambda} - X^\prime P\_Y X \widetilde{\nu} \widetilde{\lambda} .\end{aligned}$$

By collecting terms,

$$X^\prime P\_Y X \widetilde{\nu} (1 + \widetilde{\lambda}) = X^\prime X \widetilde{\nu} \widetilde{\lambda}\_\prime$$

which implies

$$X^\prime P\_Y X \widetilde{\nu} = X^\prime X \widetilde{\nu} \frac{\bar{\lambda}}{(1 + \widetilde{\lambda})} \cdot \mathbb{1}$$

This is an eigenvalue equation. It shows that 3 *λ*/(1 + 3 *λ*) = *λ* is a generalized eigenvalue and *<sup>ν</sup>*<sup>3</sup> is the associated eigenvector of *<sup>X</sup> PYX* with respect to *X X*. Solving, 3 *λ* = *λ*/(1 − *λ*). This means that the generalized eigenvalues of *X Y*(*Y MXY*)−1*Y X* with respect to *X X* are *λi*/(1 − *λi*) and *νi*. Because *λ*/(1 − *λ*) is monotonically increasing on [0, 1) and *λ<sup>i</sup>* < 1, it follows that the orderings of *λ<sup>i</sup>* and 3 *λ<sup>i</sup>* are identical. Thus 3 *λ<sup>i</sup>* = *λi*/(1 − *λi*) as claimed.

**Acknowledgments:** This research is supported by the National Science Foundation and the Phipps Chair. Thanks to Richard Crump, the co-editors, and two referees for helpful comments on an earlier version. The author gives special thanks to Soren Johansen and Katerina Juselius for many years of stunning research, stimulating conversations, and impeccable scholarship.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
