*2.4. Test of Mean-Variance Efficiency*

To test *Hα* : *α* = **0** the three classic tests based on the likelihood function are considered, including the Wald test, likelihood-ratio test, and score test; see for instance Boos and Stefanski (2013). Let *θ* = (*αT*, *θT*2 )*T*, with *θ*2 = (*β<sup>T</sup>*, *<sup>σ</sup>T*, *η*)*<sup>T</sup>* and <sup>U</sup>(*θ*)=(*UTα* (*θ*), *<sup>U</sup>T*2 (*θ*))*<sup>T</sup>* the score function (8) partitioned following the partition of *θ*. In this case, after some algebraic manipulations, the test statistics are given by

$$\begin{array}{rcl} Lr &= n \log \{ |\hat{\Sigma}| / |\hat{\Sigma}| \} + 2 \sum\_{t=1}^{n} \log \{ \mathcal{g}(\hat{\delta}\_{t}) / \mathcal{g}(\hat{\delta}\_{t}) \}, \\ Ma &= n c\_{a} (\dot{\eta}) (1 + \dot{\mathfrak{x}}^{2}/\mathfrak{s}^{2})^{-1} \hat{\mathfrak{x}}^{-1} \hat{\mathfrak{x}}, \\ Sc &= \frac{1}{n c\_{a}^{-1} (\dot{\eta})} (1 + \ddot{\mathfrak{x}}^{2}/\mathfrak{s}^{2}) d^{T} \mathfrak{L}^{-1} d, \end{array}$$

where *x*¯ = (1/*n*) ∑*nt*=<sup>1</sup> *xt*, *s*2 = (1/*n*) ∑*nt*=<sup>1</sup>(*xt* − *x*¯)2, ˆ*δt* = ˆ *Tt* **Σ**ˆ −1<sup>ˆ</sup>*t*, ˜*δt* = (*yt* − *β*˜ *xt*)*<sup>T</sup>***Σ**˜ <sup>−</sup><sup>1</sup>(*yt* − *β*˜ *xt*); *α* ˆ , *β* ˆ , **Σ** ˆ and *η*ˆ are the ML estimators in the model (6); *β* ˜ , **Σ** ˜ and *η*˜ are the ML estimators of *β*, **Σ** and *η* under *H<sup>α</sup>*, *d* = ∑*nt*=<sup>1</sup> *<sup>ω</sup>*˜*t*(*yt* − *β* ˜ *xt*), *θ*˜ and *θ*ˆ are the restricted and unrestricted ML estimators of *θ*, respectively. Under *H<sup>α</sup>*, the asymptotic distribution of each of these test statistics is *<sup>χ</sup>*<sup>2</sup>(*p*). Note that *<sup>c</sup>α*(*η*) = 1 when *η* = 0 and *Wa* = *n*(<sup>1</sup> + *x*¯2/*s*<sup>2</sup>)−1*α*<sup>ˆ</sup> *<sup>T</sup>***Σ**<sup>ˆ</sup> −1*α*ˆ, which corresponds to the Wald test under normality; see, for instance, Campbell et al. (1997). In addition, under the assumption of multivariate normality, the likelihood-ratio test is given by *Lr* = *n* log(1 + *Wa*/*n*), and the score test takes the form *Sc* = *Wa*/(1 + *Wa*/*n*). The gradient test Terrell (2002) is also discussed, defined as

$$Ga = \mathcal{U}^{T}(\vec{\theta})(\vec{\theta} - \vec{\theta}). \tag{11}$$

Since *<sup>U</sup>*2(*θ*˜) = **0**, the gradient statistic in (11) can be written as *Ga* = *<sup>U</sup>Tα* (*θ*˜)*α*<sup>ˆ</sup> = *dT***Σ**˜ −1*α*ˆ. Note that *Ga* is attractive, as it is simple to compute and does not involve knowledge of the Fisher information matrix (10), unlike *Wa* and *Sc*. Asymptotically, *Ga* has a chi-square distribution with *p* degrees of freedom under *H<sup>α</sup>*. For more details and applications of this test, see Terrell (2002) and Lemonte (2016). However, in this case, under the normality assumption, the gradient statistic does not offer an alternative to test the hypothesis of mean-variance efficiency since *Sc* = *Ga*, see Appendix D.

To calculate the values of the statistics *Lr*, *Sc* and *Ga*, it is necessary to estimate *θ* under *H<sup>α</sup>*. The EM algorithm leads to the following equations to obtain the ML estimates of *β*, **Σ** and *η* under *Hα*:

$$\tilde{\boldsymbol{\beta}} = \frac{\sum\_{t=1}^{n} \omega\_t \mathbf{x}\_t \mathbf{y}\_t}{\sum\_{t=1}^{n} \omega\_t \mathbf{x}\_t^2}, \quad \tilde{\boldsymbol{\Sigma}} = \frac{1}{n} \sum\_{t=1}^{n} \omega\_t (\mathbf{y}\_t - \mathbf{\tilde{\beta}} \mathbf{x}\_t)(\mathbf{y}\_t - \mathbf{\tilde{\beta}} \mathbf{x}\_t)^T$$

and

*η*

$$\overline{g}^{-1} = \frac{2}{a + \log a - 1} + 0.0416 \left\{ 1 + \epsilon rf \left( 0.6594 \log \left( \frac{2.1971}{a + \log a - 1} \right) \right) \right\} \lambda$$

where *a* = −(1/*n*) ∑*nt*=<sup>1</sup>(*vt*<sup>2</sup> − *vt*1), with *vt*1 = (1 + *pη*)/(<sup>1</sup> + *<sup>c</sup>*(*η*)˜*δt*) and *vt*2 = *ψ*1 + *pη* 2*η* − log 1 + *<sup>c</sup>*(*η*)˜*δt* 2*η* , for *t* = 1, . . . , *n*.

#### *2.5. Model Assessment and Outlier Detection*

Any statistical analysis should include a critical analysis of the model assumptions. Following Lange et al. (1989), in this work the Mahalanobis distance is used to assess the fit of the CAPM. In effect, the random variables

$$F\_t = \left(\frac{1}{1 - 2\eta}\right) \frac{\delta\_t}{p} \sim F(p, 1/\eta)$$

for *t* = 1, ... , *n*. Substituting the ML estimators yields *F* ˆ *t* = *Ft*(*θ*<sup>ˆ</sup>), which has asymptotically the same *F* distribution as *Ft*, *t* = 1, . . . , *n*. Using the Wilson-Hilferty approximation,

$$z\_t = \frac{\left(1 - \frac{2\eta}{9}\right) \mathbf{f}\_t^{1/3} - \left(1 - \frac{2}{9p}\right)}{\sqrt{\frac{2\eta}{9} \mathbf{f}\_t^{2/3} + \frac{2}{9p}}}\tag{12}$$

for *t* = 1, ... , *n*, with an approximately standard normal distribution. Thus, a *QQ*-plot of the transformed distances {*<sup>z</sup>*1, ... , *zn*} can be used to evaluate the fit of the CAPM under the multivariate *t*-distribution. For *η* = 0, the transformed distances are simplified to *zt* = {*F*ˆ1/3 *t* − (1 − 2/9*p*)}/%2/9*p* and can be used to assess of fit of the CAPM under the assumption of normality. Additionally, the Mahalanobis distance can be used for multivariate outlier detection. In effect, larger than expected values of the Mahalanobis distance, *F* ˆ *t*, *t* = 1, . . . , *n*, identify outlying cases (see Lange et al. 1989).

### *2.6. Generalized Method of Moments Tests*

If the iid multivariate *t* assumption of random errors is violated, hypothesis (3) can be tested using the Generalized Method of Moments (GMM) (see Hansen 1982). No distributional assumptions are needed other than the data being stationary and ergodic. With the GMM framework, the random errors can be both serially dependent and conditionally heteroskedastic. From (2), we have  *t* = *yt* − *α* − *βxt*, for *t* = 1, ... , *n*. The idea of the GMM approach (see Hansen 1982), is to use sample moments conditions to replace the population moment conditions of the model restrictions. The relevant population moment conditions are *<sup>E</sup>*( *t*) = **0** and *<sup>E</sup>*(*xt t*) = **0**, for *t* = 1, . . . , *n*.

Define 2*p* × 1 vectors *ft*(*ψ*) and *<sup>g</sup>n*(*ψ*) as follows,

$$f\_t(\boldsymbol{\Psi}) = (\boldsymbol{\varepsilon}\_{t1\prime} \boldsymbol{\chi}\_t \boldsymbol{\varepsilon}\_{t1\prime} \dots \boldsymbol{\varepsilon}\_{t\prime} \boldsymbol{\varepsilon}\_{t\prime} \boldsymbol{\chi}\_t \boldsymbol{\varepsilon}\_{t\prime} \dots \boldsymbol{\varepsilon}\_{t\prime\prime} \boldsymbol{\chi}\_t \boldsymbol{\varepsilon}\_{t\prime})^T$$

and

$$\mathbf{g}\_n(\boldsymbol{\Psi}) = \frac{1}{n} \sum\_{t=1}^n f\_t(\boldsymbol{\Psi}) = \frac{1}{n} \sum\_{t=1}^n (\boldsymbol{\varepsilon}\_t \otimes \mathbf{x}\_t)\_\star$$

where, *ψ* = (*<sup>α</sup>*1, *β*1, ... , *<sup>α</sup>p*, *βp*)*<sup>T</sup>*, whose dimension is 2*p* × 1, where *p* is the number of assets and *xt* = (1, *xt*)*<sup>T</sup>* for *t* = 1, . . . , *n*. The GMM estimator is obtained by minimizing the quadratic form

$$Q(\boldsymbol{\upmu}) = \mathbf{g}\_n^T(\boldsymbol{\upmu}) \mathcal{W}\_n \mathbf{g}\_n(\boldsymbol{\upmu}),$$

where *Wn*, 2*p* × 2*p*, is the weighting matrix. As noted by MacKinlay and Richardson (1991), in this model the GMM estimator is independent of the weighting matrix and always coincides with the OLS estimators, which are given by

$$
\hat{\mathfrak{a}} = \bar{\mathfrak{y}} - \bar{\mathfrak{x}}\hat{\mathfrak{E}}, \quad \text{and} \quad \hat{\mathfrak{B}} = \frac{\sum\_{t=1}^{n} (\mathbf{x}\_{t} - \bar{\mathfrak{x}})(\mathbf{y}\_{t} - \bar{\mathfrak{y}})}{\sum\_{t=1}^{n} (\mathbf{x}\_{t} - \bar{\mathfrak{x}})^{2}}, \tag{13}
$$

where ¯*y* = 1*n* ∑*nt*=<sup>1</sup> *yt* and *x*¯ = 1*n* ∑*nt*=<sup>1</sup> *xt*.

As we know, there are several versions of the GMM test, for simplicity, in this work the Wald-type GMM test is considered (see MacKinlay and Richardson 1991). It is well known that the GMM estimators (13) are normally distributed asymptotically. In effect, from MacKinlay and Richardson (1991) it follows that the asymptotic sampling distribution of the estimator *ψ* ˆ is given by

$$
\sqrt{n}(\hat{\Psi} - \Psi) \stackrel{\mathcal{D}}{\longmapsto} \mathcal{N}\_{2p}(0, \Psi)\ .
$$

and a consistent estimator of Ψ is Ψ ˆ = (*DTn S*−<sup>1</sup> *n Dn*)−<sup>1</sup> with *Dn* = (1/*n*) ∑*nt*=<sup>1</sup>(*<sup>I</sup> p* ⊗ *<sup>x</sup>t<sup>x</sup>Tt* )=(1/*n*)(*<sup>I</sup> p* ⊗ *XTX*) and *Sn* = (1/*n*) ∑*nt*=<sup>1</sup>(<sup>ˆ</sup>*<sup>t</sup>* <sup>ˆ</sup> *Tt* ⊗ *<sup>x</sup>t<sup>x</sup>Tt* ). The Wald-type GMM test is given by

$$\mathsf{Na} = \mathsf{n} \mathsf{a}^{\mathsf{T}} (\mathsf{C} \mathsf{\hat{Y}} \mathsf{C}^{\mathsf{T}})^{-1} \mathsf{a},$$

where *C* = *I p* ⊗ (1, 0) such that *Cψ*<sup>ˆ</sup> = *α*ˆ. This test has an asymptotic *<sup>χ</sup>*<sup>2</sup>(*p*), a *χ*2 distribution with degrees of freedom *p*. See MacKinlay and Richardson (1991) for more details.

For the development of the methodology proposed in this paper, the classical approach is used. Optionally, the Bayesian approach can be used. Two recent references are Barillas and Shanken (2018) and Borup (2019), who propose a Bayesian framework for asset pricing models. For applications of Bayesian inference using the Markov Chain Monte Carlo (MCMC) approach in Capital Asset Pricing Models, see Glabadanidis (2014).
