*4.2. Sufficient Data Statistics*

Since Gaussian Markov models belong to the exponential family, the likelihood function of SemiTMC-GMM can be written in the form of [30]

$$p\_{\theta}(\mathbf{z}\_{n}, d\_{n}) = f(\mathbf{z}\_{n}, d\_{n}) \exp\left( \langle s(\mathbf{z}\_{n}, d\_{n}), \psi(\theta) \rangle - f(\theta) \right), \tag{22}$$

where *<sup>s</sup>*(*zn*, *dn*) is a vector of complete-data sufficient statistics belonging to convex set *<sup>S</sup>*, ·, · denotes the scalar product, function *ψ*(·) maps *θ* to the natural parametrization, and *J*(·) is the log-partition function. For SemiTMC-GMM, the definition of statistics is

$$s\_{n',lk}^{(1)} = \mathbb{1}\left\{ (\mathfrak{v}\_{n'}, d\_{n'}) = l, (\mathfrak{v}\_{n'+1}, d\_{n'+1}) = k \right\},\tag{23}$$

$$s\_{n',k}^{(2)} = \mathbb{1}\{ (\mathfrak{v}\_{n'}, d\_{n'}) = k \},\tag{24}$$

$$s\_{n'jj}^{(3)} = \mathbb{1}\{\mathfrak{v}\_{n'} = i, h\_{n'} = j\},\tag{25}$$

$$s\_{n'jj}^{(4)} = \mathbb{1}\{\boldsymbol{\sigma}\_{n'} = i, h\_{n'} = j\} \boldsymbol{y}\_{n'} \tag{26}$$

$$s\_{n',j\rangle}^{(5)} = \mathbb{1}\left\{\sigma\_{n'} = i, h\_{n'} = j\right\} y\_{n'}^{\mathsf{T}} y\_{n'} \tag{27}$$

where <sup>1</sup>{·} is the indicator function, *<sup>n</sup>* <sup>=</sup> 1, ... , *<sup>N</sup>*. Then, the statistics vector at time *<sup>n</sup>* is of the form *sn* = *s* (1) *<sup>n</sup>*,*lk*,*s* (2) *<sup>n</sup>*,*k*,*s* (3) *<sup>n</sup>*,*ij*,*s* (4) *<sup>n</sup>*,*ij*,*s* (5) *n*,*ij* . Consequently, the sufficient statistics *Sn* is the expectation of *sn* conditioned on *y<sup>n</sup>* 1 :

$$S\_{\mathfrak{n}} = \frac{1}{n} \mathbb{E}\_{\theta} \left( \sum\_{n'=1}^{n} s\_{n'} \right) \left| y\_1^{\mathfrak{n}}. \tag{28}$$

Denote *Sn* = *S*(1) *<sup>n</sup>*,*lk*, *<sup>S</sup>*(2) *<sup>n</sup>*,*<sup>k</sup>* , *<sup>S</sup>*(3) *<sup>n</sup>*,*ij*, *<sup>S</sup>*(4) *<sup>n</sup>*,*ij*, *<sup>S</sup>*(5) *n*,*ij* , in which the elements are the expectation of the ones with respect to *sn* . Now, comparing the Equation groups (13)–(21) and (23)–(28), we can reform the parameter update Equations (17)–(21) with sufficient statistics:

$$\bar{S}\_{n,i}^{(2)} = \sum\_{d\_n} S\_{n,(i,d\_n)}^{(2)} \tag{29}$$

$$\mathcal{Z}\_k = \mathcal{S}\_{1,k'}^{(2)} \tag{30}$$

$$a\_{n,lk} = S\_{n,lk}^{(1)} \Big/ S\_{n,k'}^{(2)} \tag{31}$$

$$\mathcal{L}\_{n,ij} = \mathcal{S}\_{n,ij}^{(3)} \Big/ \mathcal{S}\_{n,i}^{(2)},\tag{32}$$

$$
\mu\_{n,ij} = S\_{n,ij}^{(4)} \Big/ S\_{n,ij'}^{(3)} \tag{33}
$$

$$
\Sigma\_{n,ij} = S\_{n,ij}^{(5)} \Big/ S\_{n,ij}^{(3)} - \mu\_{n,ij}^{\mathsf{T}} \mu\_{n,ij\prime} \tag{34}
$$

in which *an*,*lk*, *cn*,*ij*, *μn*,*ij*, Σ*n*,*ij* are the updated parameters at time *n*.

**Remark 1.** *Replacing n with N in Equation (28), which means all the observed data y<sup>N</sup>* <sup>1</sup> *are used, SN is then called as complete sufficient statics. Therefore, using SN to compute the parameters in Equations (30)–(34) will be exactly the batch mode parameter learning that is given in Equations (17)–(21).*
