**2. Methodology**

Consider the following model:

$$\mathbf{Y}\_{i} = \mathbf{X}\_{i}\boldsymbol{\theta}\_{i} + \boldsymbol{\varepsilon}\_{i}, i = 1, 2, \dots, M,\tag{1}$$

the *i*th equation of an *M* seemingly unrelated regression equation with *T* number of observations per equation. **Y***i* is a *T* × 1 vector of *T* observations; **X***i* is a *T* × *pi* full column rank matrix of *T* observations on *pi* regressors; and *βi* is a *pi* × 1 vector of unknown parameters.

Equation (1) can be rewritten as follows:

$$
\Upsilon = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \tag{2}
$$

where **Y** = **<sup>Y</sup>** 1,**<sup>Y</sup>** 2,...,**<sup>Y</sup>** *M* is the vector of responses and *ε* = *<sup>ε</sup>* 1, *<sup>ε</sup>* 2,..., *<sup>ε</sup> M* is the vector of disturbances with dimension *TM* × 1, **X** = *diag* (**<sup>X</sup>**1,**X**2,...,**X***M*) of dimension *TM* × *p*, and *β* = *β* 1, *β* 2,..., *β M* of dimension *p* × 1, for *p* = ∑*Mi*=<sup>1</sup> *pi*.

The disturbances vector *ε* satisfies the properties:

$$\mathbb{E}(\mathfrak{e}) = \mathfrak{0}$$

and:

$$E(\mathfrak{e}\mathfrak{e}') = \begin{bmatrix} \sigma\_{11}\mathbf{I} & \dots & \sigma\_{1M}\mathbf{I} \\ \vdots & \ddots & \\ \sigma\_{M1}\mathbf{I} & \dots & \sigma\_{MM}\mathbf{I} \end{bmatrix} = \Sigma \otimes \mathbf{I},$$

where **Σ** = [*<sup>σ</sup>ij*], *i*, *j* = 1, 2, ... , *M* is an *M* × *M* positive definite symmetric matrix, ⊗ stands for the Kronecker product, and **I** is an identity matrix of order of *T* × *T*. Following Greene (2019), we assume strict exogeneity of **X***i*,

$$E\left[\varepsilon|\mathbf{X}\_1, \mathbf{X}\_2, \dots, \mathbf{X}\_M\right] = \mathbf{0}\_r$$

and homoscedasticity:

$$E\left[\boldsymbol{\varepsilon}\_i \boldsymbol{\varepsilon}\_i' | \mathbf{X}\_1, \mathbf{X}\_2, \dots, \mathbf{X}\_M\right] = \sigma\_{ii} \mathbf{I}.$$

Therefore, it is assumed that disturbances are uncorrelated across observations, that is,

$$E\left[\varepsilon\_{\mathrm{if}}\varepsilon\_{\mathrm{js}}|\mathbf{X}\_{1},\mathbf{X}\_{2},\dots,\mathbf{X}\_{M}\right] = \sigma\_{\mathrm{ij}}, \text{ if } t = s \text{ and } 0 \text{ otherwise},$$

and it is assumed that disturbances are correlated across equations, that is,

$$E\left[\boldsymbol{\varepsilon}\_i \boldsymbol{\varepsilon}\_j' | \mathbf{X}\_1, \mathbf{X}\_2, \dots, \mathbf{X}\_M\right] = \sigma\_{ij}\mathbf{I}.$$

The OLS and GLS estimator of model (2) are thus given as:

$$
\hat{\boldsymbol{\beta}}^{\text{OLS}} = (\mathbf{X}^\prime \mathbf{X})^{-1} \mathbf{X}^\prime \mathbf{Y}
$$

and:

$$\hat{\mathcal{S}}^{\text{GLS}} = (\mathbf{X}'(\boldsymbol{\Sigma}^{-1} \otimes \mathbf{I})\mathbf{X})^{-1}\mathbf{X}'(\boldsymbol{\Sigma}^{-1} \otimes \mathbf{I})\mathbf{Y}.$$

*β* OLS simply consists of the OLS estimators computed separately from each equation and omits the correlations between equation, as can be seen in Kuan (2004). Hence, it should use the GLS estimator when correlations exist among equations. However, the true covariance matrix **Σ** is generally unknown. The solution for this problem is a feasible generalized least squares (FGLS) estimation, which uses covariance matrix **Σ** of **Σ** in the estimation of GLS. In many cases, the residual covariance matrix is calculated by:

$$\sigma\_{ij} = \frac{\hat{\mathbf{e}}\_i' \hat{\mathbf{e}}\_j}{T - \max(p\_{i'} p\_j)}, \ i, j = 1, \dots, M\_{\prime}$$

where *εi* = **Y***i* − **X***iβ i* represents residuals from the *i*th equation and *βi* may be the OLS or ridge regression (RR) estimation such that (**<sup>X</sup>** *i***X***<sup>i</sup>* + *<sup>λ</sup>***I**)−1**<sup>X</sup>** *i***Y***<sup>i</sup>* with the tuning parameter *λ* ≥ 0. Note that we use the RR solution to estimate **Σ** in our numerical studies because we assume that two or more explanatory variables in each equation are linearly related. Therefore, **Ω** = **Σ** ⊗ **I**, the FGLS of the SUR system, is:

$$
\hat{\boldsymbol{\beta}}^{\text{FGLS}} = (\mathbf{X}^{\prime}\hat{\boldsymbol{\Omega}}^{-1}\mathbf{X})^{-1}\mathbf{X}^{\prime}\hat{\boldsymbol{\Omega}}^{-1}\mathbf{Y}.
$$

By following Srivastava and Giles (1987) and Zeebari et al. (2012), we first transform Equation (2) by using the following transformations, in order to retain the information included in the correlation matrix of cross equation errors:

$$\mathbf{Y}\_{\*} = \left(\widehat{\mathbf{\Sigma}}^{-1/2} \otimes \mathbf{I}\right) \mathbf{Y}\_{\*}\\\mathbf{X}\_{\*} = \left(\widehat{\mathbf{\Sigma}}^{-1/2} \otimes \mathbf{I}\right) \mathbf{X} \text{ and } \mathbf{z}\_{\*} = \left(\widehat{\mathbf{\Sigma}}^{-1/2} \otimes \mathbf{I}\right) \mathbf{z}.$$
 \textbf{4.4.2.1}

Hence, Model (2) turns into:

$$\Upsilon\_\* = \mathcal{X}\_\* \mathcal{B} + \varepsilon\_\*. \tag{3}$$

The spectral decomposition of the symmetric matrix **<sup>X</sup>** ∗**<sup>X</sup>**∗ is **<sup>X</sup>** ∗**<sup>X</sup>**∗ = **PΛP** with **PP** = **I**. Model (3) can then be written as:

$$\begin{aligned} \Upsilon\_\* &= \quad \Upsilon\_\* \mathsf{PP}' \mathfrak{F} + \mathfrak{e}\_\* \\ &= \quad \mathsf{Za} + \mathfrak{e}\_\*, \end{aligned} \tag{4}$$

with **Z** = **X**∗**P**, *α* = **P** *β* and **Z Z** = **<sup>P</sup> <sup>X</sup>** ∗**<sup>X</sup>**∗**<sup>P</sup>** = **Λ**, so that **Λ** is a diagonal matrix of eigenvalues and **P** is a matrix whose columns are eigenvectors of **<sup>X</sup>** ∗**<sup>X</sup>**<sup>∗</sup>.

The OLS estimator of model (4) is:

$$
\widehat{\mathbf{a}}^{\rm OLS} = (\mathbf{Z}^\prime \mathbf{Z})^{-1} \mathbf{Z}^\prime \mathbf{Y}\_\*.
$$

The least squares estimates of *β* in model (2) can be obtained by an inverse linear transformation as:

$$
\hat{\mathcal{B}}^{\rm OLS} = (\mathbf{P}')^{-1} \hat{\mathfrak{a}}^{\rm OLS} = \mathbf{P} \hat{\mathfrak{a}}^{\rm OLS}.\tag{5}
$$

 . . . ,

Furthermore, by following Alkhamisi and Shukur (2008), the full model ridge SUR regression parameter estimation is:

$$
\hat{\mathbf{a}}^{\text{RR}} = (\mathbf{Z}^\prime \mathbf{Z} + \mathbf{K})^{-1} \mathbf{Z}^\prime \mathbf{Y}\_{\ast \prime} \tag{6}
$$

where **K** = *diag*(**<sup>K</sup>**1, **K**2, ... , **<sup>K</sup>***M*), **K***i* = *diag*(*ki*1, *ki*2, ... , *kipi*) and *kij* = 1 (*α*OLS)<sup>2</sup>*ij* > 0 for *i* = 1, 2, ... , *M* and*j*=1,2,*pi*.

Now let us assume that uncertain non-sample prior information (UNPI) on the vector of *β* parameters is available, either from previous studies, expert knowledge, or researcher's experience. This information might be of use for the estimation of parameters, in order to improve the quality of the estimators when the sample data have a low quality or may not be reliable Ahmed (2014). It is assumed that the UNPI on the vector of parameters will be restricted by the equation for Model (2),

$$\mathbf{R}\boldsymbol{\beta} = \mathbf{r},\tag{7}$$

where **R** = *diag*(**<sup>R</sup>**1, **R**2, ... , **<sup>R</sup>***M*), **R***i*, *i* = 1, ... , *M* is a known *mi* × *pi* matrix of rank *mi* < *pi* and **r** is a known ∑*Mimi* × 1 vector. In order to use restriction (7) in Equation (2), we transform it as follows:

$$\mathbf{RPP'}\boldsymbol{\mathcal{B}} = \mathbf{H}\boldsymbol{\mathfrak{a}} = \mathbf{r},\tag{8}$$

where **H** = **RP** and *α* = **<sup>P</sup>** *β*, which is defined above. Hence, the restricted ridge SUR regression estimation is obtained from the following objective function:

$$\begin{split} \widehat{\mathbf{a}}^{\text{RR}} &= \text{arg}\min\_{\mathbf{a}} \left\{ (\mathbf{Y}\_{\ast} - \mathbf{Z}\mathbf{a})^{\prime} (\mathbf{Y}\_{\ast} - \mathbf{Z}\mathbf{a}) \right\} \text{ w.r.t } \mathbf{H}\mathbf{a} = \mathbf{r} \text{ and } \mathbf{a} \mathbf{K}\mathbf{a}^{\prime} \le \tau^{2}, \\ &= \widehat{\mathbf{a}}^{\text{RR}} - \mathbf{Z}\_{\mathbf{K}}^{-1} \mathbf{H}^{\prime} (\mathbf{H}\mathbf{Z}\_{\mathbf{K}}^{-1} \mathbf{H}^{\prime})^{-1} (\mathbf{H}\widehat{\mathbf{a}}^{\text{RR}} - \mathbf{r}), \end{split} \tag{9}$$

where **ZK** = (**Z Z** + **<sup>K</sup>**).

**Theorem 1.** *The risks of α*RR *and* 0*α*RR *are given by:*

$$\begin{array}{rcl} R\left(\widehat{\mathfrak{a}}^{\text{RR}};\mathfrak{a}\right) &=& \operatorname{tr}\left[\left(\mathbf{A}+\mathbf{K}\right)^{-1}\mathbf{A}\left(\mathbf{A}+\mathbf{K}\right)^{-1}\right] + \mathfrak{a}'\mathbf{K}'\left(\mathbf{A}+\mathbf{K}\right)^{-2}\mathbf{K}\mathfrak{a}, \\ R\left(\widehat{\mathfrak{a}}^{\text{RR}};\mathfrak{a}\right) &=& \operatorname{tr}\left[\left(\mathbf{A}+\mathbf{K}\right)^{-1}\left(\mathbf{A}-\mathbf{H}'\left(\mathbf{H}\mathbf{A}^{-1}\mathbf{H}'\right)^{-1}\mathbf{H}\right)\left(\mathbf{A}+\mathbf{K}\right)^{-1}\right] \\ &+& \mathfrak{a}'\mathbf{K}'\left(\mathbf{A}+\mathbf{K}\right)^{-2}\mathbf{K}\mathfrak{a} + \mathcal{S}'\Lambda\left(\mathbf{A}+\mathbf{K}\right)^{-2}\Lambda\mathcal{S} \\ &+& 2\mathcal{S}'\Lambda\left(\mathbf{A}+\mathbf{K}\right)^{-2}\mathbf{K}\mathfrak{a}, \end{array}$$

*where δ* = **Λ**−1**H HΛ**−1**<sup>H</sup>** −<sup>1</sup> (**<sup>H</sup>***α* − **r**)*.*

**Proof.** For the risk of the estimators *α*RR and 0*α*RR, we consider:

$$R\left(\mathfrak{a}^\*;\mathfrak{a}\right) = E\left[\left(\mathfrak{a}^\*-\mathfrak{a}\right)'\left(\mathfrak{a}^\*-\mathfrak{a}\right)\right] = tr\left[\mathcal{M}\left(\mathfrak{a}^\*\right)\right]'$$

where *α*<sup>∗</sup> is the one of the estimators *α*RR and 0*α*RR and *M* (*α*<sup>∗</sup>) = *E* (*α*∗ − *α*) (*α*<sup>∗</sup> − *<sup>α</sup>*) . Since:

$$\begin{array}{rcl} \widehat{\mathbf{a}}^{\text{RR}} &=& (\mathbf{A} + \mathbf{K})^{-1} \mathbf{Z}^{\prime} \mathbf{Y}\_{\*} \\ &=& (\mathbf{A} + \mathbf{K})^{-1} \mathbf{A} \widehat{\mathbf{a}}^{\text{OLS}} \\ &=& \left[\mathbf{A}^{-1} \left(\mathbf{A} + \mathbf{K}\right)\right]^{-1} \widehat{\mathbf{a}}^{\text{OLS}} \\ &=& \left[\mathbf{I} + \mathbf{A}^{-1} \mathbf{K}\right]^{-1} \widehat{\mathbf{a}}^{\text{OLS}} \\ &=& \mathbf{A} (\mathbf{K}) \widehat{\mathbf{a}}^{\text{OLS}} \text{ and} \\ \widehat{\mathbf{a}}^{\text{OLS}} &=& \mathbf{A}^{-1} \mathbf{Z}^{\prime} \mathbf{Y}\_{\*} \\ &=& \mathbf{a} + \mathbf{A}^{-1} \mathbf{Z}^{\prime} \boldsymbol{\varepsilon}\_{\forall t} \end{array}$$

where **Λ** = **Z Z**.

$$\begin{aligned} \left( E\left( \widehat{\mathfrak{a}}^{\mathsf{RR}} - \mathfrak{a} \right) \right) &= \left. E\left( \mathbf{A}(\mathbf{K}) \widehat{\mathfrak{a}}^{\mathsf{OLS}} - \mathfrak{a} \right) \right| \\ &= \left. \left[ \mathbf{A}(\mathbf{K}) - \mathbf{I} \right] \mathfrak{a} . \end{aligned}$$

Using **Λ**(**K**) = &**I** + **<sup>Λ</sup>**−1**K**'−1, *kij* ≥ 0, we get:

$$\begin{array}{rcl} \mathbf{A}^{-1}(\mathbf{K}) & = & \mathbf{I} + \mathbf{A}^{-1}\mathbf{K} \\ \mathbf{I} & = & \mathbf{A}(\mathbf{K}) + \mathbf{A}(\mathbf{K})\mathbf{A}^{-1}\mathbf{K} \\ \mathbf{A}(\mathbf{K}) - \mathbf{I} & = & -\mathbf{A}(\mathbf{K})\mathbf{A}^{-1}\mathbf{K} \\ & = & -\left(\mathbf{A} + \mathbf{K}\right)^{-1}\mathbf{K}. \end{array}$$

Hence,

$$\begin{array}{rcl} E\left(\hat{\mathfrak{a}}^{\text{RR}} - \mathfrak{a}\right) & = & -\left(\boldsymbol{\Lambda} + \mathbf{K}\right)^{-1} \mathbf{K}\boldsymbol{\mathfrak{a}} \\ Var\left(\hat{\mathfrak{a}}^{\text{RR}} - \mathfrak{a}\right) & = & Var\left(\mathbf{A}(\boldsymbol{\mathsf{K}})\hat{\mathfrak{a}}^{\text{OLS}}\right) \\ & = & \left(\boldsymbol{\Lambda} + \mathbf{K}\right)^{-1} \boldsymbol{\Lambda}\left(\boldsymbol{\Lambda} + \mathbf{K}\right)^{-1} .\end{array}$$

Therefore, the risk of *α*RR is directly obtained by definition. Similarly,

$$\begin{split} \widetilde{\mathsf{a}}^{\text{RIR}} &= \mathsf{A}(\mathsf{K}) \widehat{\mathsf{a}}^{\text{OLS}} \\ &= \mathsf{A}(\mathsf{K}) \left( \widehat{\mathsf{a}}^{\text{OLS}} - \mathsf{A}^{-1} \mathsf{H}' \left( \mathsf{H} \Lambda^{-1} \mathsf{H}' \right)^{-1} \left( \mathsf{H} \widehat{\mathsf{a}}^{\text{OLS}} - \mathsf{r} \right) \right) \\ &= \mathsf{A}(\mathsf{K}) \widehat{\mathsf{a}}^{\text{OLS}} - \left( \mathsf{A} + \mathsf{K} \right)^{-1} \mathsf{H}' \left( \mathsf{H} \Lambda^{-1} \mathsf{H}' \right)^{-1} \left( \mathsf{H} \widehat{\mathsf{a}}^{\text{OLS}} - \mathsf{r} \right) \\ \mathsf{E} \left( \widehat{\mathsf{a}}^{\text{RR}} - \mathsf{a} \right) &= \mathsf{E} \left( \mathsf{A}(\mathsf{K}) \widehat{\mathsf{a}}^{\text{OLS}} - \mathsf{a} \right) - \mathsf{E} \left( \left( \mathsf{A} + \mathsf{K} \right)^{-1} \mathsf{H}' \left( \mathsf{H} \Lambda^{-1} \mathsf{H}' \right)^{-1} \left( \mathsf{H} \widehat{\mathsf{a}}^{\text{OLS}} - \mathsf{r} \right) \right) \\ &= \mathsf{A} - \left( \mathsf{A} + \mathsf{K} \right)^{-1} \mathsf{K} \mathsf{a} - \mathsf{A}(\mathsf{K}) \delta, \end{split}$$

and, 
$$\begin{split} Var\left(\hat{\mathbf{a}}^{\text{RR}} - \mathbf{a}\right) &=& Var\left(\Lambda(\mathbf{K}) \left(\hat{\mathbf{a}}^{\text{OLS}} - \Lambda^{-1} \mathbf{H}' \left(\mathbf{H} \Lambda^{-1} \mathbf{H}'\right)^{-1} \left(\mathbf{H} \hat{\mathbf{a}}^{\text{OLS}} - \mathbf{r}\right)\right)\right) \\ &=& \Lambda(\mathbf{K}) \left(\Lambda^{-1} - \Lambda^{-1} \mathbf{H}' \left(\mathbf{H} \Lambda^{-1} \mathbf{H}'\right)^{-1} \mathbf{H} \Lambda^{-1} \mathbf{H}' \left(\mathbf{H} \Lambda^{-1} \mathbf{H}'\right)^{-1} \mathbf{H} \Lambda^{-1}\right) \Lambda'(\mathbf{K}) \\ &=& \left(\Lambda + \mathbf{K}\right)^{-1} \left(\Lambda - \mathbf{H}' \left(\mathbf{H} \Lambda^{-1} \mathbf{H}'\right)^{-1} \mathbf{H}\right) \left(\Lambda + \mathbf{K}\right)^{-1}. \end{split}$$

Thus, the risk of 0*α*RR is directly obtained by definition. -
