*4.1. Batch Mode EM Algorithm*

The batch mode parameter restoration using EM algorithm is quite simple and has been utilized in many researches. A dominated way to do this is using the well-known Baum-Welch algorithm. This is an algorithm that makes the expectation step and maximization step recursively. Here we simply describe how to extend the expectation and maximization steps to SemiTMC-GMM model, within one iteration of the EM algorithm. A simple recall of the forward and backward calculations [7] are displayed below:

$$\begin{split} a\_{1}(k) &= p\left( (\mathfrak{v}\_{1}, d\_{1}) = k, \mathfrak{y}\_{1} \right), \\ a\_{n}(k) &= \sum\_{(\mathfrak{v}\_{n-1}, d\_{n-1}) \in \Lambda \times \Gamma \times L} a\_{n-1}(k) \cdot p\left( (\mathfrak{v}\_{n}, d\_{n}) | (\mathfrak{v}\_{n-1}, d\_{n-1}) \right) \cdot p\left( \mathfrak{y}\_{n} | (\mathfrak{v}\_{n}, d\_{n}) \right), \\ \beta\_{N}(k) &= 1, \\ \beta\_{n}(k) &= \sum\_{(\mathfrak{v}\_{n+1}, d\_{n+1}) \in \Lambda \times \Gamma \times L} p\left( (\mathfrak{v}\_{n+1}, d\_{n+1}) | (\mathfrak{v}\_{n}, d\_{n}) \right) \cdot p\left( \mathfrak{y}\_{n+1} | (\mathfrak{v}\_{n+1}, d\_{n+1}) \right) \cdot \beta\_{n+1}(k). \end{split} \tag{12}$$

In the above equations, the *αn*(*k*) and *βn*(*k*) are the forward and backward calculations, *<sup>p</sup>* ((*vn*, *dn*)|(*vn*−1, *dn*−1)) is the state transition probability that has been described in Equations (8)–(10), *<sup>p</sup>* (*yn*|(*vn*, *dn*)) <sup>=</sup> *<sup>p</sup>* (*hn*|*vn*) *<sup>p</sup>* (*yn*|*vn*, *hn*) is the conditioned observation density.

Then, the algorithm requires the following probabilities:

$$\gamma\_n(k) = p((\sigma\_n, d\_n) = k | \mathcal{y}\_1^N) = \frac{a\_n(k)\beta\_n(k)}{\sum\_{k' \in \Lambda \times \Gamma \times L} a\_n(k')\beta\_n(k')},\tag{13}$$

$$\tilde{\gamma}\_{\mathbb{H}}(i) = \sum\_{d\_{\mathbb{H}}} \gamma\_{\mathbb{H}}((i, d\_{\mathbb{H}})) = \sum\_{d\_{\mathbb{H}}} p(\mathfrak{v}\_{\mathbb{H}} = i, d\_{\mathbb{H}} | y\_1^N), \tag{14}$$

$$\tilde{\gamma}\_n(i,j) = \tilde{\gamma}(i) \cdot \frac{c\_{ij}p(y\_n|\mathbf{v}\_n = i, h\_n = j)}{\sum\_{j' \in K} c\_{ij'} p(y\_n|\mathbf{v}\_n = i, h\_n = j')},\tag{15}$$

$$\mathfrak{F}\_{\mathfrak{H}}(l,k) = \frac{a\_{\mathfrak{n}}(l) \cdot p\left(y\_{\mathfrak{n}+1}h\_{\mathfrak{n}+1}(\mathfrak{v}\_{\mathfrak{n}+1},d\_{\mathfrak{n}+1}) - k \vert y\_{\mathfrak{n}}h\_{\mathfrak{n}}(\mathfrak{v}\_{\mathfrak{n}},d\_{\mathfrak{n}}) - l\right) \cdot \mathfrak{h}\_{\mathfrak{n}+1}(k)}{\sum\_{\mathfrak{l}',\mathfrak{l}' \in \Lambda \times \Gamma \times \perp} \left\{ a\_{\mathfrak{n}}(l') \cdot p\left(y\_{\mathfrak{n}+1}h\_{\mathfrak{n}+1}(\mathfrak{v}\_{\mathfrak{n}+1},d\_{\mathfrak{n}+1}) - k' \vert y\_{\mathfrak{n}}h\_{\mathfrak{n}}(\mathfrak{v}\_{\mathfrak{n}},d\_{\mathfrak{n}}) - l'\right) \cdot \mathfrak{h}\_{\mathfrak{n}+1}(k')\right\}}. \tag{16}$$

*γn*(*k*) is the probability of (*vn*, *dn*) conditioned on all observed data *y<sup>N</sup>* <sup>1</sup> . *γ*˜*n*(*k*) is the marginal probability of *γn*(*k*) over *dn*, this probability is the one that we are looking for to estimate the concerning hidden state *vn*. *γ*˜*n*(*i*, *j*) is the probability of each Gaussian component *w.r.t. γ*˜*n*(*k*); this probability helps to compute the parameters related to Gaussian mixture, i.e., *ckj*, *μkj*, Σ*kj*. *ξn*(*l*, *k*) is the joint probability of (*vn*, *dn*) = *l* and (*vn*<sup>+</sup>1, *dn*+1) = *k* conditioned on *y<sup>N</sup>* <sup>1</sup> . Here we give the formula of parameter update by using Equations (13)–(16):

$$
\zeta\_k = \gamma\_1(k),
\tag{17}
$$

$$a\_{lk} = \frac{\sum\_{n=1}^{N-1} \xi\_n(l,k)}{\sum\_{n=1}^{N-1} \gamma\_n(l)},\tag{18}$$

$$c\_{i\bar{j}} = \frac{\sum\_{n=1}^{N} \tilde{\gamma}\_n(i, j)}{\sum\_{n=1}^{N} \tilde{\gamma}\_n(i)},\tag{19}$$

$$\boldsymbol{\mu}\_{ij} = \frac{\sum\_{n=1}^{N} \tilde{\gamma}\_n(\boldsymbol{i}, \boldsymbol{j}) \boldsymbol{y}\_n}{\sum\_{n=1}^{N} \tilde{\gamma}\_n(\boldsymbol{i}, \boldsymbol{j})},\tag{20}$$

$$\Sigma\_{ij} = \frac{\sum\_{n=1}^{N} \tilde{\gamma}\_n (i\_\prime j) (y\_n - \mu\_{ij})^\top (y\_n - \mu\_{ij})}{\sum\_{n=1}^{N} \tilde{\gamma}\_n (i\_\prime j)}. \tag{21}$$

In fact, Equations (13)–(16) are the expectation step in one iteration of EM algorithm, while Equations (17)–(21) are the maximization step. Then, the parameter can be learned by recursively performing the two steps until the iteration number exceeds a pre-defined value, maximally 100 iterations, for example.
