**Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Proof of Lemma 2**

From (14), (15), the identity diag(*a*)*<sup>b</sup>* ≡ diag(*b*)*<sup>a</sup>*, the fact that *Jn*(*t*) -= *Jn*(*<sup>t</sup>*−) at most at finite points of any finite interval and property 4 of the function *<sup>K</sup>*(*t*), the following equalities are true

$$\begin{split} \mathbf{C}\_{t}^{n} &= \int\_{0}^{t} (1 - \boldsymbol{\epsilon}\_{n}^{\top} \boldsymbol{V}\_{s-}) \boldsymbol{\epsilon}\_{n}^{\top} d\boldsymbol{R}\_{s} = \int\_{0}^{t} (1 - \boldsymbol{\epsilon}\_{n}^{\top} \boldsymbol{V}\_{s-}) \boldsymbol{\epsilon}\_{n}^{\top} \boldsymbol{J}(s) (\boldsymbol{\Lambda}^{\top}(s) \boldsymbol{X}\_{s-} ds + dM\_{s}^{X}) = \\ &= \int\_{0}^{t} (1 - \boldsymbol{f}\_{n}(\boldsymbol{s} - \boldsymbol{\epsilon}) \boldsymbol{X}\_{s-}) \boldsymbol{f}\_{n}(\boldsymbol{s} -) \boldsymbol{\Lambda}^{\top}(\boldsymbol{s}) \boldsymbol{X}\_{s-} ds + \int\_{0}^{t} (1 - \boldsymbol{\epsilon}\_{n}^{\top} \boldsymbol{V}\_{s-}) \boldsymbol{f}\_{n}(\boldsymbol{s}) dM\_{s}^{X} = \\ &= \int\_{0}^{t} \boldsymbol{f}\_{n}(\boldsymbol{s}) \boldsymbol{\Lambda}^{\top}(\boldsymbol{s}) (I - \operatorname{diag} \boldsymbol{f}\_{n}(\boldsymbol{s})) \boldsymbol{X}\_{s} ds + \int\_{0}^{t} (1 - \boldsymbol{\epsilon}\_{n}^{\top} \boldsymbol{V}\_{s-}) \boldsymbol{f}\_{n}(\boldsymbol{s}) dM\_{s}^{X} = \\ &= \int\_{0}^{t} \boldsymbol{1} \boldsymbol{\Gamma}\_{n}(\boldsymbol{s}) \boldsymbol{X}\_{s} ds + \int\_{0}^{t} (1 - \boldsymbol{\epsilon}\_{n}^{\top} \boldsymbol{V}\_{s-}) \boldsymbol{f}\_{n}(\boldsymbol{s}) dM\_{s}^{X}. \tag{A1} \end{split}$$

Assertion 1 of Lemma is proved.

The definition of the processes *Cnt* (*n* = 1, *N*) guarantees their strong orthogonality, i.e., P Δ*Cit*Δ*Cjt* = 0 ≡ 0 for any *i* -= *j* and *t* 0, so [*Ci*, *Cj*]*t* ≡ 0.

Let us use (5), (19) and properties of *X* and *Jn* to derive the quadratic characteristics of *Cn*:

$$\begin{split} \langle \mathcal{C}^{n}, \mathcal{C}^{n} \rangle\_{t} &= \int\_{0}^{t} (1 - f\_{n}(s)X\_{s-})^{2} f\_{n}(s) d\langle X, X \rangle\_{s} J\_{n}^{\top}(s) = \\ &= \int\_{0}^{t} (1 - f\_{n}(s)X\_{s-}) f\_{n}(s) \left( \text{diag} (\boldsymbol{\Lambda}^{\top}(s)X\_{s-} - \boldsymbol{\Lambda}^{\top}(s) \text{diag} \boldsymbol{X}\_{s-} - \text{diag}(\boldsymbol{X}\_{s-}) \boldsymbol{\Lambda}(s)) \right) J\_{n}^{\top}(s) ds = \\ &= \int\_{0}^{t} (1 - f\_{n}(s)X\_{s-}) f\_{n}(s) \text{diag} (f\_{n}(s)) \boldsymbol{\Lambda}^{\top}(s) \boldsymbol{X}\_{s-} ds = \int\_{0}^{t} f\_{n}(s) \boldsymbol{\Lambda}^{\top}(s) (I - \text{diag} \boldsymbol{J}\_{n}(s)) \boldsymbol{X}\_{s} ds = \\ &= \int\_{0}^{t} \mathbf{1} \boldsymbol{\Gamma}\_{n}(s) \boldsymbol{X}\_{s} ds. \end{split}$$

Assertion 2 of Lemma is proved.

If *s* and *t* are two arbitrary moments, such that *s t*, then

$$\begin{split} \mathbb{E}\left\{\boldsymbol{\nu}\_{t}^{\boldsymbol{u}}-\boldsymbol{\nu}\_{s}^{\boldsymbol{u}}|\overline{\mathcal{V}}\_{s}\right\} &= \mathbb{E}\left\{\int\_{s}^{t} \boldsymbol{J}\_{\boldsymbol{u}}(\boldsymbol{u})\boldsymbol{\Lambda}^{\top}(\boldsymbol{u})(\boldsymbol{I}-\text{diag}\,\boldsymbol{J}\_{\boldsymbol{n}}(\boldsymbol{u}))\mathbf{E}\left\{(\boldsymbol{X}\_{\boldsymbol{u}}-\widehat{\boldsymbol{X}}\_{\boldsymbol{u}})|\overline{\mathcal{V}}\_{\boldsymbol{u}}\right\}d\boldsymbol{u}|\overline{\mathcal{V}}\_{s}\right\} + \\ &\quad + \mathbb{E}\left\{\mathbb{E}\left\{\int\_{s}^{t} (\boldsymbol{1}-\boldsymbol{J}\_{\boldsymbol{n}}(\boldsymbol{s})\boldsymbol{X}\_{\boldsymbol{s}-})\boldsymbol{I}\_{\boldsymbol{n}}(\boldsymbol{u})d\boldsymbol{M}\_{\boldsymbol{n}}^{\boldsymbol{X}}|\overline{\mathcal{F}}\_{\boldsymbol{s}}\right\}|\overline{\mathcal{V}}\_{\boldsymbol{s}}\right\} = 0, \end{split}$$

i.e., *νnt*is a Y*t*-adapted martingale. Note, that *νnt* is purely discontinuous with unit jumps, hence

$$\begin{aligned} [\boldsymbol{\nu}^{\boldsymbol{n}},\boldsymbol{\nu}^{\boldsymbol{n}}]\_{\boldsymbol{t}} &= \sum\_{\boldsymbol{\tau}\leqslant\boldsymbol{t}} (\Delta\boldsymbol{\nu}^{\boldsymbol{n}}\_{\boldsymbol{\tau}})^{2} = [\mathbb{C}^{\boldsymbol{n}},\mathbb{C}^{\boldsymbol{n}}]\_{\boldsymbol{t}} = \sum\_{\boldsymbol{\tau}\leqslant\boldsymbol{t}} (\Delta\mathbb{C}^{\boldsymbol{n}}\_{\boldsymbol{\tau}})^{2} = \mathbb{C}^{\boldsymbol{n}}\_{\boldsymbol{t}} \\ &= \int\_{0}^{t} \boldsymbol{J}\_{\boldsymbol{n}}(\boldsymbol{s}) \boldsymbol{\Lambda}^{\top}(\boldsymbol{s}) (\boldsymbol{I} - \operatorname{diag}{J}\_{\boldsymbol{n}}(\boldsymbol{s})) \boldsymbol{\Lambda}\_{\boldsymbol{s}} ds + \int\_{0}^{t} (\boldsymbol{1} - \boldsymbol{J}\_{\boldsymbol{n}}(\boldsymbol{s}) \boldsymbol{X}\_{\boldsymbol{s}-}) \boldsymbol{\mu}\_{\boldsymbol{n}}(\boldsymbol{s}) d\boldsymbol{M}^{\boldsymbol{X}}\_{\boldsymbol{s}} = \int\_{0}^{t} \boldsymbol{1} \Gamma\_{\boldsymbol{n}}(\boldsymbol{s}) \boldsymbol{\hat{X}}\_{\boldsymbol{s}} ds + \mu^{0}\_{1,\sigma} \end{aligned}$$

where *μ*0*t* is some Y*t*-adapted martingale. From the uniqueness of the special martingale representation [*νn*, *<sup>ν</sup><sup>n</sup>*]*t* it follows that #*νn*, *<sup>ν</sup><sup>n</sup>*\$*t* = *t*0**<sup>1</sup>**Γ*n*(*s*)*Xsds*. Lemma 2 is proved. 

#### **Appendix B. Proof of Theorem 1**

We use the same approach as in ([6], Part III, Sect. 8.7) to derive the MJP filtering equations. The idea exploits the uniqueness of the representation for a special semimartingale along with the integral representation of a martingale [23].

From the Bayes rule it follows that *X* 0 = **E** {*<sup>X</sup>*0|*<sup>D</sup>*0} = -*<sup>D</sup>*"0 *<sup>J</sup>*(0)*π*<sup>+</sup> diag(*<sup>D</sup>*0)*J*(0)*<sup>π</sup>*. Let κ*n*−1 be a random instant of the *n* − 1-th discrete observation <sup>Δ</sup>*D*<sup>κ</sup>*n*−<sup>1</sup> . We investigate evolution of *Xt* over the interval [<sup>κ</sup>*n*−1,κ*n*):

$$X\_t = X\_{\varkappa\_{n-1}} + \int\_{\varkappa\_{n-1}}^t \Lambda^\top(s) X\_s ds + M\_t^X - M\_{\varkappa\_{n-1}}^X \quad t \in [\varkappa\_{n-1}, \varkappa\_n).$$

Conditioning the left and right parts of the latter equality over Y*<sup>t</sup>*, one can show that

$$\hat{\mathcal{X}}\_t = \hat{\mathcal{X}}\_{\mathbb{X}\_{n-1}} + \int\_{\mathbb{X}\_{n-1}}^t \Lambda^\top(s)\hat{\mathcal{X}}\_s ds + \mu\_t^1.$$

where {*μ*<sup>1</sup>*t* }*<sup>t</sup>*∈[<sup>κ</sup>*n*−1,κ*n*) is an Y*t*adapted martingale. For any *t* ∈ [<sup>κ</sup>*n*−1,κ*n*) the equality Y*t* = Y<sup>κ</sup>*n*−<sup>1</sup> ∨ *<sup>σ</sup>*{*Us*, *s* ∈ (<sup>κ</sup>*n*−1, *t*]} ∨ *<sup>σ</sup>*{*Cjs*, *s* ∈ (<sup>κ</sup>*n*−1, *t*], *j* = 1, *N*} holds. The process {*<sup>ω</sup>t*} (24) is a Y*t* -adapted standard Wiener process [10].

The process *Ut* is a Y*t*-adapted semimartingale with F *<sup>X</sup>*-conditionally-independent increments, meanwhile {*Cjt*}*j*=1,*<sup>N</sup>* are Y*t*-adapted point processes. Hence, the martingale *μ*1*t admits an integral representation* ([23], Chap. 4, §8, Problem 1), i.e.,

$$\hat{\mathcal{R}}\_t = \hat{\mathcal{X}}\_{\varkappa\_{n-1}} + \int\_{\varkappa\_{n-1}}^t \Lambda^\top(s)\hat{\mathcal{X}}\_s ds + \int\_{\varkappa\_{n-1}}^t a\_s d\omega\_s + \int\_{\varkappa\_{n-1}}^t \sum\_{j=1}^N \beta\_s^j d\nu\_{s\star}^j \tag{A3}$$

where *αt* and {*βjt*}*j*=1,*<sup>N</sup>* are Y*t*-predictable processes of appropriate dimensionality, which should be determined.

Due to the generalized Itô rule

$$X\_t \boldsymbol{U}\_t^\top = \boldsymbol{X}\_{\varkappa\_{n-1}} \boldsymbol{\mathcal{U}}\_{\varkappa\_{n-1}}^\top + \int\_{\varkappa\_{n-1}}^t \left(\boldsymbol{\Lambda}^\top(s)\boldsymbol{X}\_s \boldsymbol{U}\_s^\top + \text{diag}(\boldsymbol{X}\_s)\overline{\boldsymbol{f}}^\top(s)\right)ds + \boldsymbol{\mu}\_t^2,$$

where *μ*2*t* is an F*t*-adapted matringale. Conditioning both sides of the latter equality over Y*<sup>t</sup>*, we can show that

$$\boldsymbol{\hat{\mathcal{R}}}\_{t}\boldsymbol{\mathcal{U}}\_{t}^{\top} = \boldsymbol{\hat{\mathcal{R}}}\_{\varkappa\_{n-1}}\boldsymbol{\mathcal{U}}\_{\varkappa\_{n-1}}^{\top} + \int\_{\varkappa\_{n-1}}^{t} \left(\boldsymbol{\Lambda}^{\top}(\boldsymbol{s})\boldsymbol{\hat{\mathcal{X}}}\_{s}\boldsymbol{\mathcal{U}}\_{s}^{\top} + \text{diag}(\boldsymbol{\hat{\mathcal{X}}}\_{s})\boldsymbol{\overline{\boldsymbol{f}}}^{\top}(\boldsymbol{s})\right)ds + \boldsymbol{\mu}\_{t\prime}^{3} \tag{A4}$$

where *μ*3*t* is a Y*t*-adapted martingale. On the other hand, using the Itô rule, representation (A3) and the fact that *ωt* is the Wiener process, we can obtain

$$\boldsymbol{\hat{\mathcal{R}}}\_{t}^{\top}\boldsymbol{\mathcal{U}}\_{t}^{\top} = \boldsymbol{\hat{\mathcal{R}}}\_{\varkappa\_{n-1}}\boldsymbol{\mathcal{U}}\_{\varkappa\_{n-1}}^{\top} + \int\_{\varkappa\_{n-1}}^{t} \left(\boldsymbol{\Lambda}^{\top}(\boldsymbol{s})\boldsymbol{\hat{\mathcal{X}}}\_{s}\boldsymbol{\mathcal{U}}\_{s}^{\top} + \boldsymbol{\hat{\mathcal{X}}}\_{s}\boldsymbol{\mathcal{X}}\_{s}^{\top}\boldsymbol{\mathcal{F}}^{\top}(\boldsymbol{s}) + \boldsymbol{\alpha}\_{s}\right)ds + \boldsymbol{\mu}\_{t}^{4},\tag{A5}$$

where *μ*4*t* is a Y*t*-adapted martingale. One can see that (A4) and (A5) are two representations of the same special semimartingale *X tU*"*t* , hence due to the representation uniqueness the Y*t*-predictable process *αt* should satisfy the equality

$$\int\_{\mathbb{X}\_{n-1}}^{t} \text{diag}(\boldsymbol{\hat{X}}\_{s}) \overline{\boldsymbol{f}}^{\top}(\boldsymbol{s}) \text{d}s = \int\_{\mathbb{X}\_{n-1}}^{t} \left( \boldsymbol{\hat{X}}\_{s} \boldsymbol{\hat{X}}\_{s}^{\top} \overline{\boldsymbol{f}}^{\top}(\boldsymbol{s}) + \boldsymbol{a}\_{s} \right) \text{d}s.$$

and *αt* may be chosen in the form

$$\boldsymbol{u}\_{t} = \left(\text{diag}\,\hat{\boldsymbol{X}}\_{t-} - \hat{\boldsymbol{X}}\_{t} - \hat{\boldsymbol{X}}\_{t-}^{\top}\right) \overline{\boldsymbol{f}}^{\top}(t). \tag{A6}$$

Due to the generalized Itô rule, formulae (5), (18) and the properties of *X* and *Jj* we can obtain, that

$$X\_t \mathbf{C}\_t^j = X\_{\varkappa\_{n-1}} \mathbf{C}\_{\varkappa\_{n-1}}^j + \int\_{\varkappa\_{n-1}}^t \left(\Lambda^\top(s) X\_s \mathbf{C}\_s^j + \Gamma\_j(s) X\_s\right) ds + \mu\_t^5.$$

where *μ*5*t*is an F*t*-adapted martingale. Conditioning both sides of this equality over Y*<sup>t</sup>*, we ge<sup>t</sup>

$$\boldsymbol{\dot{\mathcal{R}}}\_{t}\mathbf{C}\_{t}^{j} = \boldsymbol{\hat{\mathcal{R}}}\_{\varkappa\_{n-1}}\mathbf{C}\_{\varkappa\_{n-1}}^{j} + \int\_{\varkappa\_{n-1}}^{t} \left(\boldsymbol{\Lambda}^{\top}(\boldsymbol{s})\boldsymbol{\hat{\mathcal{R}}}\_{s}\mathbf{C}\_{s}^{j} + \Gamma\_{j}(\boldsymbol{s})\boldsymbol{\hat{\mathcal{R}}}\_{s}\right)ds + \boldsymbol{\mu}\_{t}^{6}.\tag{A7}$$

where *μ*6*t* is a Y*t*-adapted martingale. On the other hand, using the Itô rule, representation (A3) and quadratic characteristic (21) we deduce, that

$$\boldsymbol{\hat{\mathcal{R}}}\_{t}\mathbf{C}\_{t}^{j} = \boldsymbol{\hat{\mathcal{R}}}\_{\mathcal{K}\_{n-1}}\mathbf{C}\_{\mathcal{K}\_{n-1}}^{j} + \int\_{\mathcal{X}\_{\mathcal{R}-1}}^{t} \left(\boldsymbol{\Lambda}^{\top}(\mathbf{s})\boldsymbol{\hat{\mathcal{X}}}\_{s}\mathbf{C}\_{s}^{j} + \boldsymbol{\hat{\mathcal{X}}}\_{s}\mathbf{1}\boldsymbol{\Gamma}\_{\boldsymbol{j}}(\mathbf{s})\boldsymbol{\hat{\mathcal{X}}}\_{s} + \boldsymbol{\beta}\_{s}^{\hat{j}}\mathbf{1}\boldsymbol{\Gamma}\_{\boldsymbol{j}}(\mathbf{s})\boldsymbol{\hat{\mathcal{X}}}\_{s}\right) ds + \boldsymbol{\mu}\_{t}^{\boldsymbol{\mathcal{T}}},\tag{A8}$$

where *μ*7*t* is a Y*t*-adapted martingale. Since the representations (A7) and (A8) correspond to the same special semimartingale *X tCjt*we conclude that the process *βjs* should satisfy the equality

$$\int\_{\mathcal{A}\_{n-1}}^t \Gamma\_j(s)\hat{\mathcal{X}}\_s ds = \int\_{\mathcal{A}\_{n-1}}^t \left[\hat{\mathcal{X}}\_s \mathbf{1}\Gamma\_j(s)\hat{\mathcal{X}}\_s + \beta\_s^j \mathbf{1}\Gamma\_j(s)\hat{\mathcal{X}}\_s\right] ds.$$

Acting as with the coefficient *αt*, we choose the predictable processes *βjt*in the form

$$\boldsymbol{\beta}\_{t}^{j} = \left(\boldsymbol{\Gamma}\_{j}(t) - \mathbf{1}\boldsymbol{\Gamma}\_{j}(t)\widehat{\boldsymbol{X}}\_{t-}\boldsymbol{I}\right)\widehat{\boldsymbol{X}}\_{t-}\left(\mathbf{1}\boldsymbol{\Gamma}\_{j}(t)\widehat{\boldsymbol{X}}\_{t-}\right)^{+}, \quad j = \overline{1,N}.\tag{A9}$$

So, on the interval [<sup>κ</sup>*n*−1,κ*n*) the optimal filtering estimate *X t* is described by the SDS

$$\begin{split} \boldsymbol{\hat{X}}\_{t} = \boldsymbol{\hat{X}}\_{\mathcal{H}\_{n-1}} &+ \int\_{\mathcal{X}\_{n-1}}^{t} \boldsymbol{\Lambda}^{\top}(\boldsymbol{s}) \boldsymbol{\hat{X}}\_{\boldsymbol{s}-} \boldsymbol{ds} + \int\_{\mathcal{X}\_{n-1}}^{t} \left( \text{diag} \, \boldsymbol{\hat{X}}\_{\boldsymbol{s}-} - \boldsymbol{\hat{X}}\_{\boldsymbol{s}-} \boldsymbol{\hat{X}}\_{\boldsymbol{s}-}^{\top} \right) \boldsymbol{\hat{f}}^{\top}(\boldsymbol{s}) d\boldsymbol{w}\_{\boldsymbol{s}} + \\ &+ \sum\_{j=1}^{N} \int\_{\mathcal{A}\_{n-1}}^{t} \left( \boldsymbol{\Gamma}\_{j}(\boldsymbol{s}) - \boldsymbol{\boldsymbol{1}} \boldsymbol{\Gamma}\_{j}(\boldsymbol{s}) \boldsymbol{\hat{X}}\_{\boldsymbol{s}-} \boldsymbol{I} \right) \boldsymbol{\hat{X}}\_{\boldsymbol{s}-} \left( \boldsymbol{1} \boldsymbol{\Gamma}\_{j}(\boldsymbol{s}) \boldsymbol{\hat{X}}\_{\boldsymbol{s}-} \right)^{+} d\boldsymbol{v}\_{\boldsymbol{s}}^{j}. \end{split} \tag{A10}$$

Since P {<sup>Δ</sup>*X*<sup>κ</sup>*n* = 0} = 1, equation (A10) presumes P-a.s. fulfilment of the equality

$$\begin{split} \operatorname{E}\left\{\mathcal{X}\_{\mathcal{X}\_{n}}[\overline{\mathcal{Y}}\_{\mathcal{X}\_{n-1}}\vee\sigma\{\operatorname{\boldsymbol{\lambda}}\boldsymbol{\xi}\_{\boldsymbol{\sigma}}\text{ s}\in(\operatorname{\boldsymbol{\varkappa}}\_{n-1},\operatorname{\boldsymbol{\varkappa}}\_{n}]\right\}\vee\sigma\{\operatorname{\boldsymbol{\zeta}}\_{\boldsymbol{s}\boldsymbol{\epsilon}}^{j}\text{ s}\in(\operatorname{\boldsymbol{\varkappa}}\_{n-1},\operatorname{\boldsymbol{\varkappa}}\_{n}]\_{\boldsymbol{\tau}}\mid j=\overline{1,N}\right\}\+\\ =\operatorname{\boldsymbol{\mathcal{X}}}\_{\operatorname{\boldsymbol{\varkappa}}\_{n-1}}+\int\_{\operatorname{\boldsymbol{\varkappa}}\_{n-1}}^{\operatorname{\boldsymbol{\varkappa}}\_{n}}\operatorname{\boldsymbol{\Lambda}}^{\top}(\operatorname{\boldsymbol{\beta}})\operatorname{\boldsymbol{\mathcal{X}}}\_{\boldsymbol{s}\boldsymbol{\epsilon}}\operatorname{ds}+\int\_{\operatorname{\boldsymbol{\varkappa}}\_{n-1}}^{\operatorname{\boldsymbol{\varkappa}}\_{n}}\operatorname{\boldsymbol{\left(\operatorname{\boldsymbol{\Lambda}}\_{n}}\operatorname{\boldsymbol{\mathcal{X}}}\_{\boldsymbol{s}\boldsymbol{\epsilon}}-\operatorname{\boldsymbol{\mathcal{X}}}\_{\operatorname{\boldsymbol{\varkappa}}}\operatorname{\boldsymbol{\Lambda}}^{\top}\_{\boldsymbol{s}\boldsymbol{\epsilon}}\right)\operatorname{\boldsymbol{\overline{f}}}^{\top}(\boldsymbol{s})\operatorname{d}\boldsymbol{\omega}\_{\boldsymbol{s}}+\\ \quad\quad +\sum\_{j=1}^{N}\int\_{\operatorname{\boldsymbol{\varkappa}}\_{n-1}}^{\operatorname{\boldsymbol{\varkappa}}\_{n}}\left(\Gamma\_{\boldsymbol{j}}(\operatorname{\boldsymbol{s}})-\operatorname{\boldsymbol{\mathbbm{1}}}\_{\operatorname{\boldsymbol{\overline{\boldsymbol{\jmath}}}}{\operatorname{\boldsymbol{\Lambda}}}(\operatorname{\boldsymbol{s}})\operatorname{\boldsymbol{\overline{\boldsymbol{\Lambda}}}}\_{\boldsymbol{s}-1}\right)\operatorname{\boldsymbol{\overline{\boldsymbol{\Lambda}}}}\_{\operatorname{\boldsymbol{\$$

Finally,

$$
\square \mathcal{V}\_{\varkappa\_{\mathsf{u}}} = \mathcal{V}\_{\varkappa\_{\mathsf{n}-1}} \lor \sigma \{ \mathsf{U}\_{\mathsf{s}\prime} \text{ s} \in (\varkappa\_{\mathsf{n}-1}, \varkappa\_{\mathsf{n}}] \} \lor \sigma \{ \mathsf{C}\_{\mathsf{s}\prime}^{j} \text{ s} \in (\varkappa\_{\mathsf{n}-1}, \varkappa\_{\mathsf{n}}] \} \; j = \overline{1, N} \} \lor \sigma \{ \mathsf{A} D\_{\varkappa\_{\mathsf{n}}} \} \; \varkappa\_{\mathsf{n}} = \imath \; \neg \epsilon \; \neg \epsilon \; \neg \epsilon \; \neg \epsilon \; \neg \epsilon
$$

so, by the Bayes rule we ge<sup>t</sup> that

$$
\hat{X}\_{\mathsf{T}\_{\mathsf{R}}} = \left(\Delta D\_{\mathsf{T}\_{\mathsf{R}}}^{\top} \Delta I(\mathsf{T}\_{\mathsf{R}}) \hat{X}\_{\mathsf{T}\_{\mathsf{R}}-}\right)^{+} \operatorname{diag}(\Delta D\_{\mathsf{T}\_{\mathsf{R}}}) \Delta I(\mathsf{T}\_{\mathsf{R}}) \hat{X}\_{\mathsf{T}\_{\mathsf{R}}-}.\tag{A11}
$$

Equation (23) can be obtained as "gluing" of local equations (A10), which describe the evolution of *X t* on the intervals [<sup>κ</sup>*n*−1,κ*n*), and formula (A11), which describes the estimate correction given the observations available at the moments κ*n*.

Uniqueness of the strong solution within the class of nonnegative piecewise-continuous Y*t*+-adapted processes with discontinuity set lying in V can be proved in complete analogy with ([31] Chap. 9, Theorem 9.2). Theorem 1 is proved. 

#### **Appendix C. Proof of Corollary 1**

The conditions of Corollary guarantee, that the elements of *K*(*t*) (4) satisfy the equality *Knm*(*t*) = *δnm* almost everywhere, hence *J*(*t*) ≡ *I*. This means that in (23) *D*0 = *X*0, P-a.s., i.e., *X* 0 = *X*0. Further, from the properties of transition intensity matrix <sup>Λ</sup>(·) and the identity *Jn*(*t*) ≡ *e*"*n* it follows that <sup>Γ</sup>*n*(*t*) = diag(*en*)Λ"(*t*), where Λ(*t*) Λ(*t*) − *<sup>λ</sup>*(*t*), *λ*(*t*) diag(<sup>Λ</sup>11(*t*), ... , <sup>Λ</sup>*NN*). In this case

$$\mathbb{C}\_t = \int\_0^t \overline{\Lambda}^\top(s) X\_s ds + \int\_0^t (I - \text{diag}\, X\_{s-}) dM\_s^X \, ds$$

and the *n*-th component counts the jumps of *Xt* into the state *en*, occurred on the interval (0, *t*]. This means *Xt* is the unique solution to the "purely discontinuous" equation

$$X\_t = D\_0 + \int\_0^t (I - X\_{s-} \mathbf{1}) d\mathbb{C}\_{s, \prime} \tag{A12}$$

i.e., the state *Xt* is measurable with respect to *<sup>σ</sup>*{*<sup>D</sup>*0, *Cs*, 0 *s <sup>t</sup>*}, so *X t* = *Xt* P-a.s.

Further, we substitute *Xt* into (23) and verify its validity. To do this we simplify the RHS of the equality using the explicit form of *Jn*(*t*), <sup>Γ</sup>*n*(*t*) and *Ct*, along with the identities diag *Xt* − *XtX*"*t* ≡ 0 and Δ*J*(*t*) ≡ 0:

$$\begin{split} X\_{l} &= D\_{0} + \int\_{0}^{t} \boldsymbol{\Lambda}^{\top}(\boldsymbol{s}) \boldsymbol{X}\_{s} d\boldsymbol{s} + \\ &+ \sum\_{n=1}^{N} \int\_{0}^{t} \left[ \text{diag} (\boldsymbol{e}\_{l}) \boldsymbol{\overline{\boldsymbol{\Lambda}}^{\top}} (\boldsymbol{s}) - \boldsymbol{e}\_{n}^{\top} \boldsymbol{\overline{\boldsymbol{\Lambda}}^{\top}} (\boldsymbol{s}) \boldsymbol{X}\_{s-l} \boldsymbol{I} \right] \boldsymbol{X}\_{s-r} \left( \boldsymbol{e}\_{n}^{\top} \boldsymbol{\overline{\boldsymbol{\Lambda}}^{\top}} (\boldsymbol{s}) \boldsymbol{X}\_{s-r} \right)^{+} \left[ d \boldsymbol{C}\_{s}^{n} - \boldsymbol{e}\_{n}^{\top} \boldsymbol{\overline{\boldsymbol{\Lambda}}^{\top}} (\boldsymbol{s}) \boldsymbol{X}\_{s-d} \boldsymbol{s} \right] = \\ &= D\_{0} + \sum\_{n=1}^{N} \int\_{0}^{t} \left[ \text{diag} (\boldsymbol{e}\_{n}) \boldsymbol{\overline{\boldsymbol{\Lambda}}^{\top}} (\boldsymbol{s}) - \boldsymbol{e}\_{n}^{\top} \boldsymbol{\overline{\boldsymbol{\Lambda}}^{\top}} (\boldsymbol{s}) \boldsymbol{X}\_{s-l} \boldsymbol{I} \right] \boldsymbol{X}\_{s-r} \left( \boldsymbol{e}\_{n}^{\top} \boldsymbol{\overline{\boldsymbol{\Lambda}}^{\top}} (\boldsymbol{s}) \boldsymbol{X}\_{s-l} \right)^{+} \boldsymbol{d} \boldsymbol{C}\_{s}^{n} .\end{split}$$

The properties of counting processes also provides the following implication: if for some T ⊆ [0, *T*] the equality T *e*"*n* <sup>Λ</sup>"(*s*)*Xsds* = 0 holds, then T *dCns* = 0. Hence, the latter transformation can be continued:

$$X\_t = D\_0 + \sum\_{n=1}^{N} \int\_0^t \left[ \varepsilon\_n - X\_{s-} \right] \varepsilon\_n^\top d\mathbb{C}\_s = D\_0 + \int\_0^t (I - X\_{s-} \mathbf{1}) d\mathbb{C}\_{s-}$$

which leads to (A12). So, we have verified that under conditions of Corollary 1 the state *Xt* is a solution to the filtering equation (23). Corollary 1 is proved. 

#### **Appendix D. Proof of Lemma 4**

Using notations Ξ*r ξ*1*ξ*2 ... *ξr* and Θ*r θ*1*θ*2 ... *θr* we can rewrite the estimates X*r* and <sup>X</sup>*r*(*s*) in the explicit form

$$\hat{\boldsymbol{\lambda}}\_{\boldsymbol{r}} = \left(\mathbf{1}\left(\boldsymbol{\Xi}\_{\boldsymbol{r}} + \boldsymbol{\Theta}\_{\boldsymbol{r}}\right)^{\top}\boldsymbol{\pi}\right)^{-1} \left(\boldsymbol{\Xi}\_{\boldsymbol{r}} + \boldsymbol{\Theta}\_{\boldsymbol{r}}\right)^{\top} \boldsymbol{\pi}\_{\boldsymbol{r}} \qquad\qquad\qquad \overline{\boldsymbol{\lambda}}\_{\boldsymbol{r}}(\boldsymbol{s}) = \left(\mathbf{1}\boldsymbol{\Xi}\_{\boldsymbol{r}}^{\top}\boldsymbol{\pi}\right)^{-1} \boldsymbol{\Sigma}\_{\boldsymbol{r}}^{\top} \boldsymbol{\pi}.$$

To simplify inferences we will omit the index *r* in Ξ*r* and Θ*<sup>r</sup>*. The following relations are valid

$$\begin{split} \mathbb{E}\left\{ \left\| \boldsymbol{\Re}\_{\boldsymbol{\tau}} - \overline{\mathsf{X}}\_{\boldsymbol{\tau}}(\boldsymbol{s}) \right\|\_{1} \right\} &= \mathbb{E}\left\{ \left\| \frac{1}{\mathbf{1}(\boldsymbol{\Xi}+\boldsymbol{\Theta})^{\top}\boldsymbol{\pi}} \left( \boldsymbol{\Xi}+\boldsymbol{\Theta} \right)^{\top} \boldsymbol{\pi} - \frac{1}{\mathbf{1}\boldsymbol{\Sigma}\top\boldsymbol{\pi}} \boldsymbol{\Sigma}^{\top} \boldsymbol{\pi} \right\|\_{1} \right\} = \\ &= \mathbb{E}\left\{ \frac{1}{\mathbf{1}(\boldsymbol{\Xi}+\boldsymbol{\Theta})^{\top}\boldsymbol{\pi}\mathbf{1}\boldsymbol{\Sigma}^{\top}\boldsymbol{\pi}} \left\| \boldsymbol{\bf}\boldsymbol{\Sigma}^{\top}\boldsymbol{\pi}\boldsymbol{\Theta}^{\top}\boldsymbol{\pi} - \mathbf{1}\boldsymbol{\Theta}^{\top}\boldsymbol{\pi}\boldsymbol{\Sigma}^{\top}\boldsymbol{\pi} \right\|\_{1} \right\} \leqslant \\ &\leqslant \mathbb{E}\left\{ \frac{1}{\mathbf{1}(\boldsymbol{\Xi}+\boldsymbol{\Theta})^{\top}\boldsymbol{\pi}\mathbf{1}\boldsymbol{\Sigma}^{\top}\boldsymbol{\pi}} \left( \boldsymbol{\bf$$

Let us consider an auxiliary estimate X ˘ *r* **E** *Xtr* **<sup>I</sup>***Asr* (*ω*)|Y*r*. From the Bayes rule it follows that X ˘ *r* = 1 **<sup>1</sup>**(<sup>Ξ</sup>+<sup>Θ</sup>)"*π* Ξ"*π* and

$$\mathbb{X}\_r - \mathbb{X}\_r = \mathbb{E}\left\{ \mathbf{X}\_{t\_r} \mathbf{I}\_{\overline{A}\_r^\*} (\omega) | \mathfrak{Y}\_r \right\} = \frac{1}{\mathbf{1} (\mathbb{Z} + \Theta)^\top \pi} \Theta^\top \pi. \tag{A14}$$

From (A13) and (A14) we deduce, that for *r* = 1 and ∀ *π* ∈ Π

$$\begin{split} \mathbb{E}\left\{ \| \hat{\mathbb{X}}\_{1} - \overline{\mathbf{X}}\_{1}(s) \|\_{1} \right\} &\leqslant 2\mathbb{E}\left\{ \| \mathbb{E}\left\{ X\_{t\_{1}} \mathbf{I}\_{\overline{\mathbf{z}}\_{1}^{\*}}(\omega) | \mathfrak{Y}\_{1} \right\} \|\_{1} \right\} =\\ &= 2\mathbb{E}\left\{ \sum\_{n=1}^{N} \mathbb{E}\left\{ X\_{t\_{1}}^{n} \mathbf{I}\_{\overline{\mathbf{z}}\_{1}^{\*}}(\omega) | \mathfrak{Y}\_{1} \right\} \right\} = 2\mathbb{E}\left\{ \mathbb{E}\left\{ \mathbf{I}\_{\overline{\mathbf{z}}\_{1}^{\*}}(\omega) | \mathfrak{Y}\_{1} \right\} \right\} = 2\mathcal{P}\left\{ \overline{\mathbf{z}}\_{1}^{\*} \right\}. \end{split} \tag{A15}$$

The counting process *NXt* has the quadratic characteristic #*NX*, *<sup>N</sup><sup>X</sup>*\$*t* = − *t*0 ∑*Nn*=<sup>1</sup> *λnnXns ds*, hence the probability P {*as*1} can be bounded from above as

$$\mathcal{P}\left\{\overline{\pi}\_{1}^{s}\right\} \lessapprox e^{-\overline{\lambda}h} \sum\_{k=s+1}^{\infty} \frac{(\overline{\lambda}h)^{k}}{k!} = \mathbb{C}\_{1} \frac{(\overline{\lambda}h)^{s+1}}{(s+1)!}.\tag{A16}$$

.

Formulae (A15) and (A16) lead to the fact, that sup*π*∈<sup>Π</sup> **E** X1 − <sup>X</sup>1(*s*)1 2*C*1 (*λh*)*s*+<sup>1</sup> (*s*+<sup>1</sup>)! . ofthe(*Xt*,*NXt*)and(A16)alsoallowtoboundthe

Markovianity pair inequality probability P *Asr* from above: P *Asr* 1 − 1 − *C*1 (*λh*)*s*+<sup>1</sup> (*s*+<sup>1</sup>)! *r*, that leads to (34). Lemma 4 is proved. 

#### **Appendix E. Proof of Theorem 2**

We have X 1 = (**<sup>1</sup>***ψ*"1 *<sup>π</sup>*)−<sup>1</sup>*ψ*"1 *π*, X1 = (**1***ξ*"1 *<sup>π</sup>*)−<sup>1</sup>*ξ*"1 *π* and Δ1 = X1 − <sup>X</sup>1(*s*). Using the matrix algebra it is easy to verify that [*γ*"*π***<sup>1</sup>** − **<sup>1</sup>***γ*"*π<sup>I</sup>*]*γ*"*π* ≡ 0. Both the estimates are stable, hence X 11 = <sup>X</sup>1(*s*)1 = 1. The following relations are valid:

<sup>Δ</sup>11 = 1 **<sup>1</sup>***ψ*"1 *<sup>π</sup>***1***ξ*"1 *π* **1***ξ*"1 *πψ*"1 *π* − **<sup>1</sup>***ψ*"1 *πξ*"1 *<sup>π</sup>*1 = 1 **<sup>1</sup>***ψ*"1 *<sup>π</sup>***1***ξ*"1 *π* **1***ξ*"1 *πγ*"1 *π* − **<sup>1</sup>***γ*"1 *πξ*"1 *<sup>π</sup>*1 = = 1 **<sup>1</sup>***ψ*"1 *<sup>π</sup>***1***ξ*"1 *π* [*γ*"1 *π***1** − **<sup>1</sup>***γ*"1 *<sup>π</sup><sup>I</sup>*]*ξ*"1 *<sup>π</sup>*1 = = 1 **<sup>1</sup>***ψ*"1 *<sup>π</sup>***1***ξ*"1 *π* [*γ*"1 *π***1** − **<sup>1</sup>***γ*"1 *<sup>π</sup><sup>I</sup>*][*ξ*"1 *π* + *γ*"1 *<sup>π</sup>*]1 = 1 **<sup>1</sup>***ξ*"1 *π* [*γ*"1 *π***1** − **<sup>1</sup>***γ*"1 *<sup>π</sup><sup>I</sup>*]*X*11 1 **<sup>1</sup>***ξ*"1 *π* [*γ*"1 *π***1** − **<sup>1</sup>***γ*"1 *<sup>π</sup><sup>I</sup>*]1*X*11 2 **<sup>1</sup>***γ*"1 *π* **<sup>1</sup>***ξ*"1 *π* = ∑*Ni*=<sup>1</sup> *πi* <sup>∑</sup>*Nj*=<sup>1</sup> *γij*1 <sup>∑</sup>*Nk*,=<sup>1</sup> *ξk*1 *<sup>π</sup>k*

Using the last inequality, (41) and (A20), it can be shown that

$$\mathbb{E}\left\{\mathbf{I}\_{a\_1^\nu}(\omega) \, \middle|\, \|\Delta\_1\|\, 1\right\} \leqslant 2\sum\_{i=1}^N \pi\_i \int\_{\mathbb{R}^M} \sum\_{i=1}^N \overline{\gamma}^{ij}(y) dy \leqslant 2\delta.$$

Since the latter inequality is valid for any *π* ∈ Π, we have an upper bound for the local distance characteristic: 

$$\sup\_{\pi \in \Pi} \mathbf{E} \left\{ \mathbf{I}\_{\mathbf{e}\_1^\alpha}(\omega) \| \mathbf{\tilde{X}}\_1 - \mathbf{\overline{X}}(s) \|\_{1} \right\} \leqslant 2\delta. \tag{A17}$$

Let us define the following products of the random matrices *ξr* and *ψr*:

$$
\Xi\_{q,r} \triangleq \begin{cases}
\quad \xi\_{\gamma q}^{\chi} \xi\_{q+1} \dots \xi\_{r\prime} & \text{if } q \leqslant r\_{\prime} \\
I & \text{otherwise}, \\
\quad \Psi\_{q,r} \overset{\triangle}{=} \begin{cases}
\quad \Psi\_{q} \xi\_{q+1} \dots \Psi\_{r\prime} & \text{if } q \leqslant r\_{\prime} \\
I & \text{otherwise}, \\
\quad \end{cases} \\
\quad \Gamma\_{q,r} \overset{\triangle}{=} \Psi\_{q,r} - \Xi\_{q,r}.
\end{cases}
$$

To proceed the proof of Theorem 2 we need the following auxiliary

**Lemma A1.** *If φr φr*(<sup>Y</sup>1,...,Y*r*) *is a non-negative* Y*r-measurable random value, and* Φ*r φr* **<sup>1</sup>**Ξ"1,*r<sup>π</sup> , then*

$$\mathbb{E}\left\{\mathbf{I}\_{A\_{\mathbb{R}}^{\omega}}(\omega)\Phi\_{r}\right\} = \int\_{\mathbb{R}^{M}} \dots \int\_{\mathbb{R}^{M}} \phi\_{r}(y\_{1}, \dots, y\_{r}) dy\_{r} \dots dy\_{1} \,\tag{A18}$$

**Proof of Lemma A1.** We consider a non-negative integrable function *φ*1 = *φ*1(*y*) : R*<sup>M</sup>* → R+ and a Y1-measurable random value

$$\Phi\_1 \stackrel{\triangle}{=} \frac{\phi\_1(\mathsf{Y}\_1)}{\mathbf{1}\_{\mathsf{V}\_1}^{\mathsf{T}}(\mathsf{Y}\_1)\pi} = \frac{\phi\_1(\mathsf{Y}\_1)}{\sum\_{i,j=1}^{N}\sum\_{m=0}^{s}\int\_{\mathcal{D}}\mathcal{N}(\mathsf{Y}\_1, f\mathsf{u}\_\prime\sum\_{p=1}^{N}\mathbf{u}^p\mathsf{G}\_p)p^{i,j,m}\_{\mathsf{v}}(du)\pi\_{\mathsf{i}}}.\tag{A19}$$

We find **E I***as*1 (*ω*)<sup>Φ</sup>1 :

$$\begin{split} \mathbb{E}\left\{\mathbf{1}\_{\mathsf{d}'\_{1}}(\omega)\Phi\_{1}\right\} &= \int\_{\mathbb{R}^{M}}\int\_{\mathcal{D}} \frac{\Phi\_{1}(\mathbf{y})\sum\_{k,\ell=1}^{N}\sum\_{n=0}^{s}\mathcal{N}(\mathbf{y},f\mathbf{v},\sum\_{q=1}^{N}\mathbf{v}^{q}\mathbf{G}\_{q})\rho^{k,\ell,n}(dv)\pi\_{k}}{\sum\_{i,j=1}^{N}\sum\_{m=0}^{s}\int\_{\mathcal{D}}\mathcal{N}(\mathbf{y},f\mathbf{u},\sum\_{p=1}^{N}\mathbf{u}^{p}\mathbf{G}\_{p})\rho^{i,j,m}(du)\pi\_{i}}dy = \\ &= \int\_{\mathbb{R}^{M}}\phi\_{1}(\mathbf{y})\frac{\sum\_{k,\ell=1}^{N}\sum\_{n=0}^{s}\int\_{\mathcal{D}}\mathcal{N}(\mathbf{y},f\mathbf{v},\sum\_{q=1}^{N}\mathbf{v}^{q}\mathbf{G}\_{q})\rho^{k,\ell,n}(dv)\pi\_{k}}{\sum\_{i,j=1}^{N}\sum\_{m=0}^{s}\int\_{\mathcal{D}}\mathcal{N}(\mathbf{y},f\mathbf{u},\sum\_{p=1}^{N}\mathbf{u}^{p}\mathbf{G}\_{p})\rho^{i,j,m}(du)\pi\_{i}}dy = \int\_{\mathbb{R}^{M}}\phi\_{1}(\mathbf{y})dy. \end{split}$$

Let us consider a non-negative integrable function *φ*2 = *φ*1(*y*1, *y*2) : R2*<sup>M</sup>* → R+ and a Y2-measurable random value

$$\Phi\_{2} \triangleq \frac{\phi\_{1}(\mathbf{Y}\_{1}, \mathbf{Y}\_{2})}{\mathbf{1}\Sigma\_{1,2}^{\top}(\mathbf{Y}\_{1}, \mathbf{Y}\_{2})\pi} = $$
 
$$\Lambda = \frac{\frac{1}{\sum\_{i=1}^{N}\sum\_{m\_{1}m\_{2}=0}^{s}\int\_{\mathcal{D}}\int\_{\mathcal{D}}\mathcal{N}(\mathbf{Y}\_{1}, f\mathbf{u}\_{1}, \sum\_{p\_{1}=1}^{N}\mathbf{u}^{p\_{1}}\mathbf{G}\_{p\_{1}})\mathcal{N}(\mathbf{Y}\_{2}, f\mathbf{u}\_{2}, \sum\_{p\_{2}=1}^{N}\mathbf{u}^{p\_{2}}\mathbf{G}\_{p\_{2}})\rho^{j\_{j}j\_{2}m\_{1}}(du\_{1})\rho^{j\_{2}j\_{2}m\_{2}}(du\_{2})\pi\_{j}}{\text{s.t.} $$

$$\begin{split} \text{We find } \mathbf{E}\left\{\mathbf{I}\_{A\_{2}^{k}}(\omega)\Phi\_{2}\right\}; \\ \mathbf{E}\left\{\mathbf{I}\_{A\_{2}^{k}}(\omega)\Phi\_{2}\right\} &= \int\_{\mathbb{R}^{3d}} \int\_{\mathbb{R}^{3d}} \Phi\_{2}(y\_{1}, y\_{2}) \times \\ \times \underbrace{\sum\_{k,k\_{1}=1}^{N}\sum\_{m\_{1},m\_{2}=0}^{s} \int\_{\mathcal{D}} \int\_{\mathcal{D}} \mathcal{N}(y\_{1}, f\nu\_{1}, \sum\_{q\_{1}=1}^{N} \nu^{q\_{1}}\mathcal{G}\_{\mathbf{q}\_{1}}) \mathcal{N}(y\_{2}, f\nu\_{2}, \sum\_{q\_{2}=1}^{N} \nu^{q\_{2}}\mathcal{G}\_{\mathbf{q}\_{2}}) \rho^{k\lambda\_{2}\mu\_{1}}(d\nu\_{1}) \rho^{\lambda\_{2}\ell\mu\_{2}}(d\nu\_{2}) \pi\_{k} \\ \times \underbrace{\sum\_{k,k\_{2}=1}^{N}\sum\_{m,m\_{2}=0}^{s} \int\_{\mathcal{D}} \int\_{\mathcal{D}} \mathcal{N}(y\_{1}, f\mu\_{1}, \sum\_{p\_{1}=1}^{N} \nu^{p\_{1}}\mathcal{G}\_{\mathbf{p}\_{1}}) \mathcal{N}(y\_{2}, f\mu\_{2}, \sum\_{p\_{2}=1}^{N} \nu^{p\_{2}}\mathcal{G}\_{\mathbf{p}\_{2}}) \rho^{j\_{2}j\_{2}m\_{1}}(d\nu\_{1}) \rho^{k\_{2}j\_{1}m\_{2}}(d\nu\_{2}) \pi\_{l} \\ \end{split}$$

The correctness of the Lemma assertion in the general case of **E I***As r* (*ω*)<sup>Φ</sup>*r* can be verified similarly. Lemma A1 is proved.

Let us define an upper estimate for the norm of Δ*r* = X*r* − X*<sup>r</sup>*. From the definitions of Ξ, Ψ and Γ it follows that

$$
\Gamma\_{1,r} \stackrel{\triangle}{=} \Psi\_{1,r} - \Xi\_{1,r} = \sum\_{t=1}^{r} \Psi\_{1,t-1} \gamma\_t \Psi\_{t+1,r}.\tag{A21}
$$

Making the same inferences as for Δ1, we can deduce that

$$\|\Delta\_{r}\|\_{1} \leqslant \frac{1}{\mathbf{1}\_{1,r}^{\top}\pi} \|[\Gamma\_{1,r}^{\top}\pi\mathbf{1} - \mathbf{1}\Gamma\_{1,r}^{\top}\pi I]\|\_{1} \leqslant 2\sum\_{t=1}^{r} \frac{1}{\mathbf{1}\Xi\_{1,r}^{\top}\pi} \mathbf{1}\Psi\_{t+1,r}^{\top}\overline{\gamma}\_{t}^{\top}\Psi\_{1,t-1}^{\top}\pi. \tag{A22}$$

To estimate the contribution of each summand in (A22) we use (A18). To simplify derivation we consider the case *r* = 3, function *φ*(*y*1, *y*2, *y*3) : R3*<sup>M</sup>* → R+

$$\phi(y\_1, y\_2, y\_3) = \mathbf{1} \boldsymbol{\psi}^\top(y\_3) \overline{\boldsymbol{\gamma}}^\top(y\_2) \boldsymbol{\psi}^\top(y\_1) \boldsymbol{\pi}^\top$$

and the Y3-measurable random value Φ *φ*(Y1,Y2,Y3) **<sup>1</sup>**Ξ"1,3(Y1,Y2,Y3)*<sup>π</sup>* . Let us estimate from above the mathematical expectation

$$\begin{split} \mathbb{E}\left\{\mathbf{1}\_{A\_{3}^{d}}(\omega)\Phi\right\} &= \int\_{\mathbb{R}^{M}} \int\_{\mathbb{R}^{M}} \int\_{\mathbb{R}^{M}} \sum\_{i,j,k,m=1}^{N} \pi\_{i} \boldsymbol{\upmu}^{ij}(y\_{1}) \overline{\boldsymbol{\upgamma}}^{k}(y\_{2}) \boldsymbol{\upmu}^{km}(y\_{3}) dy\_{3} dy\_{2} dy\_{1} = \\ &= \sum\_{i,j,k=1}^{N} \pi\_{i} \sum\_{\ell=1}^{L} \boldsymbol{\upmu}\_{\ell}^{ij} \int\_{\mathbb{R}^{M}} \overline{\boldsymbol{\upgamma}}^{k}(y\_{2}) dy\_{2} \sum\_{m=1}^{N} \sum\_{n=1}^{L} \boldsymbol{\uprho}\_{n}^{km} = Q \sum\_{i,j=1}^{N} \pi\_{i} \sum\_{\ell=1}^{L} \boldsymbol{\uprho}\_{\ell}^{ij} \sum\_{k=1}^{N} \int\_{\mathbb{R}^{M}} \overline{\boldsymbol{\upgamma}}^{k}(y\_{2}) dy\_{2} \leqslant Q \boldsymbol{Q} \sum\_{i=1}^{N} \pi\_{i} \sum\_{j=1}^{L} \sum\_{\ell=1}^{L} \boldsymbol{\uprho}\_{\ell}^{ij} = Q \sum\_{i=1}^{N} \pi\_{i} \int\_{\mathbb{R}^{M}} \boldsymbol{\uprho}\_{\ell}^{i}(y\_{1}) dy\_{2} \end{split}$$

Acting in the same way, we can prove that for arbitrary *r* 2 the inequality

$$\mathbf{E}\left\{\mathbf{I}\_{A\_{\tau}^{s}}(\omega)\frac{\mathbf{1}\mathbf{Y}\_{t+1\tau}^{\top}\mathbf{y}\_{t}^{\top}\mathbf{Y}\_{1,t-1}^{\top}\pi}{\mathbf{1}\mathbf{E}\_{1\tau}^{\top}\pi}\right\}\lessapprox Q^{r-1}\delta^{\tau}$$

is valid for all *r* summands in the RHS of (A22). Finally **E I***As r* (*ω*)<sup>Δ</sup>*r*<sup>1</sup> <sup>2</sup>*rQr*−1*δ*, and the correctness of (42) follows from the fact that the latter inequality is valid for arbitrary *π* ∈ Π. Theorem 2 is proved. 
