**4. Em Algorithm**

The EM algorithm (Dempster et al. [10]) is a useful method for ML estimation in the presence of latent variables.

To facilitate the estimation process, we introduce latent variables *W*1, ... , *Wn* through the following hierarchical representation of the MSHN model:

$$T\_i \mid W\_i = w\_i \sim HN\left(\frac{\sigma}{w}\right) \quad \text{and} \quad W\_i \sim \mathcal{W}ci(q, 1/2).$$

In this setting, we have that

$$f\_{\varepsilon}(w|t) \propto w^{\eta} \exp\left\{-\left(\frac{w^2 t^2}{2\sigma^2} + 2w^{\eta}\right)\right\}.$$

Therefore, the complete log-likelihood function can be expressed as

$$l\_{\mathfrak{c}}(\theta|t\_{\mathfrak{c}}) \propto -n \log(\sigma) - \sum\_{i=1}^{n} \frac{w\_i^2 t\_i^2}{2\sigma^2} + l\_{\mathfrak{c}}(q|w\_{\mathfrak{c}})\_{\mathfrak{c}}$$

where *lc*(*q*|*wc*) = *n* log(*q*) + *q n* ∑ *i*=1 log(*wi*) − 2 *n* ∑ *i*=1 *wqi* . 

Letting *wi* = <sup>E</sup>[*Wi*|*ti*, *θ* = *θ*], it follows that the conditional expectation of the complete log-likelihood function has the form

$$Q(\theta|\hat{\theta}) \quad \propto -n \log(\sigma) - \sum\_{i=1}^{n} \frac{\hat{w}\_i^2 t\_i^2}{2\sigma^2} + Q(q|\hat{\theta}),\tag{11}$$

where *Q*(*q*| *θ*) = *n* log(*q*) + *qS*1*n* − <sup>2</sup>*S*2*n*,*q*, with *S*1*n* = *n* ∑ *i*=1 E[log(*Wi*)|*ti*] and *<sup>S</sup>*2*n*,*<sup>q</sup>* = *n* ∑ *i*=1 E[*Wqi* |*ti*].

As both quantities *S*1*n* and *<sup>S</sup>*2*n*,*<sup>q</sup>* have no explicit forms in the context of the MSHN model, they have to be computed numerically. Thus to compute *Q*(*q*| *θ*) we use an approach similar to that of Lee and Xu ([11], Section 3.1), i.e., considering {*wr*;*r* = 1, ..., *R*} to be a random sample from the conditional distribution *W*|(*T* = *t*, *θ* = *<sup>θ</sup>*), then *Q*(*q*| *θ*) can be approximated as

$$Q(q|\widehat{\boldsymbol{\theta}}) \approx \frac{1}{R} \sum\_{r=1}^{R} \ell\_c(q|w\_r).$$

Therefore, the EM algorithm for the MSHN model is given by

**E-step:** Given *θ* = *θ*(*k*) = (*σ*(*k*), *<sup>q</sup>*(*k*)), calculate *<sup>w</sup>i*(*k*), for *i* = 1, . . . , *n*.**CM-step I:** Update *σ*(*k*)

$$
\hat{\sigma}^{2(k+1)} = \frac{S\_u^{(k)}}{2},
$$

**CM-step II:** Fix *α* = *<sup>σ</sup>*(*k*+<sup>1</sup>), update *q*(*k*) by optimizing *q*(*k*+<sup>1</sup>) = arg max <sup>q</sup>*Q*(*σ*(*k*+<sup>1</sup>), *q*| *<sup>θ</sup>*(*k*)), where *S*(*k*) *u* = 1*n n*∑ *i*=1*w* (*k*) *i ti*.

The E, CM-I and CM-II steps are repeated until a convergence rule is satisfied, say |*l*( *θ*(*k*+<sup>1</sup>)) − *l*( *θ*(*k*))<sup>|</sup> is sufficiently small. Finally, standard errors (SE) can be estimated using the inverse of the observed information matrix.
