**1. Introduction**

The Wonham filter [1], as well as the Kalman–Bucy filter [2], is one of the most practically used filtering algorithms for the states of the stochastic differential observation systems. It is applied extensively for signal processing in technics, communications, finance and economy, biology, medicine, etc. [3–6]. The filter provides the optimal in *the Mean Square* (MS) sense on-line estimate of the finite-state *Markov Jump Process*. (MJP) given indirect continuous-time observations, corrupted by the Wiener noise. The elegant algorithm represents the desired estimate as a solution to *a Stochastic Differential System* (SDS) with continuous random processes on *the Right-Hand Side* (RHS).

The fundamental condition for the solution to the filtering problem is the independence of the observation noise intensity of the estimated state. It provides the continuity from the right for the natural flow of *σ*-algebras induced by the observations, with subsequent utilization of the innovation process framework. The condition violation breaks these advantages. In the case of the state-dependent observation noise, the author of [7] presents the optimal estimate within the class of the linear estimates. Further, the authors of [8,9] use filters of a linear structure for the solution to the H2-optimal state filtering problem. To find the absolute optimal filtering estimate, one has to

make extra efforts. First, for proper utilization of the stochastic analysis framework, one needs to reformulate the optimal filtering problem, "smoothing forward" the flow of *σ*-algebras induced by the observations. Second, in the case of state-dependent noise, the innovation process contains less information than the original observations. One has to supplement the innovation by the observation quadratic characteristic, which represents a continuous-time noiseless function of the estimated MJP state. In general, the optimal filtering given partially noiseless observations is a challenging problem. Its solution can be expressed either as a sequence of some regularized estimates [10] or by the additional differentiation of the smooth observation components or their quadratic characteristics [11–14]. In both cases, one needs to realize a limit passage, which is difficult in computers.

Even in the traditional settings, the numerical realization of the MJP state filtering is a complicated problem. For example, the explicit numerical methods based on the Itô–Taylor expansion applied to the Wonham filter equation, diverge: the produced approximations do not meet component-wise non-negativity condition. Over time the approximation components reach arbitrary large absolute values. Further, in the presentation, we refer to the approximations, preserving both the component non-negativity and normalization condition as *the stable ones*.

The Wonham filtering equation is a particular case of the nonlinear Kushner–Stratonovich equation. To solve it, one can use various numerical algorithms


All the algorithms are developed for the case of additive observation noise and based on the Girsanov's measure transform. Hence, they are useless for the estimation of the MJP given the observations with state-dependent noise.

The goal of the paper is two-fold. First, it presents a theoretical solution to the MS-optimal filtering problem, given the observations with state-dependent noise. Second, it introduces a new class of stable numerical algorithms for filter realization and investigates its accuracy. We organize the paper as follows. Section 2 contains a description of the studying observation system with state-dependent observation noise along with the MS-optimal filtering problem statement. To solve the problem, one needs to transform the available observations both to preserve the information equivalence and suit for application of the known results of the optimal nonlinear filtering. Section 3 describes both the observation transformation and the SDS defining the optimal filtering estimate. The SDS is discrete–continuous and contains both continuous and counting random processes on the RHS. Previously, the author of the note [21] presents a sketch of the observation transform, but it cannot guarantee the uniqueness of that SDS solution.

Section 4 presents a new class of the stable numerical algorithms of the nonlinear filtering. The main idea is to discretize original continuous-time observations and then find the MS-optimal filtering estimate given the sampled observations. The authors of [22] use this idea to solve a particular case of the estimation problem, namely the classification problem of a finite-state random vector given continuous-time observations with multiplicative noise. Section 4.1 contains a general solution to the problem. The corresponding estimate represents a ratio, which numerator and denominator are the infinite sums of integrals. They are shift-scale mixtures of the Gaussians. The mixing distributions, in turn, describe the occupation time of the system state in each admissible value during the time discretization interval. In Section 4.2, we sugges<sup>t</sup> approximating the estimates by a convergen<sup>t</sup> sequence bounding number *s* of possible state transitions, which occurred over the discretization interval. We replace the infinite sums in the formula of the optimal estimate by their finite analogs and also investigate the accuracy of the approximations. We refer these approximations as *the analytical ones of the s-th order*. One cannot calculate the integrals analytically and have to replace them with some

integral sums, and this brings an extra error. Section 4.3 analyzes the value of this error and the total distance between the optimal filtering estimate given the discretized observations and its numerical realization. Section 4.4 presents a numerical example that illustrates the conformity of theoretical estimates and their numerical realization. Section 5 contains discussion and concluding remarks.

#### **2. Continuous-Time Filtering Problem Statement**

On the probability triplet with filtration (<sup>Ω</sup>, F,P, {F}*t*0) we consider the observation system

*Xt* = *X*0 + *t* 0 <sup>Λ</sup>"(*s*)*Xsds* + *MXt* , (1)

$$\mathcal{Y}\_t = \int\_0^t f(\mathbf{s}) \mathbf{X}\_s d\mathbf{s} + \int\_0^t \sum\_{n=1}^N \mathbf{X}\_s^n \mathbf{G}\_n^{1/2}(\mathbf{s}) d\mathcal{W}\_\mathbf{s}.\tag{2}$$

Here


The natural flow of *σ*-algebras generated by the observations *Y* up to the moment *t* is denoted by Y*t σ*{*Ys* : *s* ∈ [0, *<sup>t</sup>*]}, Y0 {<sup>∅</sup>, <sup>Ω</sup>}.

*The optimal state filtering given the observations Y* is to find *the Conditional Mathematical Expectation* (CME)

$$
\hat{X}\_t \triangleq \mathbb{E}\left\{X\_t|\mathcal{Y}\_{t+}\right\}.\tag{3}
$$

#### **3. Observation Transform and Optimal Filtering Equation**

Before derivation of the optimal filtering equation we specify the properties of the observation system (1) and (2).


here and after, *I* is a unit matrix of appropriate dimensionality.

4. The processes

$$\mathbb{K}\_{\overline{i}\rangle}(t) \triangleq \mathbb{I}\_{\{\mathfrak{d}\}} (G\_{\overline{i}}(t) - G\_{\overline{j}}(t)), \quad i, j = \overline{1, N} \tag{4}$$

have a finite variation; here and after, **<sup>I</sup>**A(*x*) is an indicator function of the set A, and **0** is a zero matrix of appropriate dimensionality.

Conditions 1–3 are standard for the filtering problems [10]. They guarantee the proper description of MJP distribution *π*(*t*) **E** {*Xt*} by the Kolmogorov system *π*(*t*) = *π* + *t*0 <sup>Λ</sup>"(*s*)*π*(*s*)*ds*. Condition 4 relates to the quadratic characteristic of the observation process as a key information source itself. Below we show that collection of *Gn*(·), distinguished for different *n*, allows to restore the state *Xt precisely* given the available *noisy* observations. Condition 4 guarantees the local regularity of the time subsets, where *Gn*(·) coincide and/or differ each other: one can express them as finite unions of the intervals. The condition is not too restrictive: for instance, they are valid when *Gn*(·) are piece-wise continuous with bounded derivatives.

Both the system state and observation are special square-integrable semimartingales [6,23] with the predictable characteristics

$$\begin{aligned} \langle X, X \rangle\_{\boldsymbol{l}} \stackrel{\scriptstyle \Delta}{=} X\_{\boldsymbol{l}} X\_{\boldsymbol{l}}^{\top} - \int\_{0}^{t} X\_{\boldsymbol{s} - \boldsymbol{l}} dX\_{\boldsymbol{s}}^{\top} - \int\_{0}^{t} d\mathbf{X}\_{\boldsymbol{s}} X\_{\boldsymbol{s} - \boldsymbol{l}}^{\top} &= \\ &= \int\_{0}^{t} \left( \text{diag} \left( \boldsymbol{\Lambda}^{\top} (\boldsymbol{s}) X\_{\boldsymbol{s}} \right) - \boldsymbol{\Lambda}^{\top} (\boldsymbol{s}) \text{diag} \, X\_{\boldsymbol{s}} - \text{diag} \, \left( X\_{\boldsymbol{s}} \right) \boldsymbol{\Lambda} (\boldsymbol{s}) \right) ds \quad (5) \end{aligned}$$

and

$$
\langle \langle Y\_{\prime} Y\_{\rangle} \rangle\_{\underline{t}} \stackrel{\triangle}{=} Y\_{\underline{t}} Y\_{\underline{t}}^{\top} - \int\_{0}^{t} Y\_{\underline{s}-} dY\_{\underline{s}}^{\top} - \int\_{0}^{t} dY\_{\underline{s}} Y\_{\underline{s}-}^{\top} = \sum\_{n=1}^{N} \int\_{0}^{t} X\_{\underline{s}}^{n} \mathcal{G}\_{n}(s) ds. \tag{6}
$$

Conditions 1–3 and the properties of *Xt* guarantee P-a.s. fulfilment of the following equalities for the one-sided derivatives of #*<sup>Y</sup>*,*<sup>Y</sup>*\$*t*:

$$\begin{cases} \frac{d(Y,Y)\_s}{ds}|\_{s=t-} = \sum\_{n=1}^N X\_{t-}^n G\_n(t-) = \sum\_{n=1}^N X\_t^n G\_n(t-),\\ \frac{d(Y,Y)\_s}{ds}|\_{s=t+} = \sum\_{n=1}^N X\_{t-}^n \left( G\_n(t-) + \Delta G\_n(t) \right) = \sum\_{n=1}^N X\_t^n G\_n(t), \end{cases} \tag{7}$$

where Δ*Gn*(*t*) *Gn*(*t*) − *Gn*(*<sup>t</sup>*−) is a jump function of *Gn*(*t*). So, if there exists a nonrandom instant *t*∗ > 0 such that ∑*Nn*=<sup>1</sup> *π<sup>n</sup>*(*t*<sup>∗</sup>)Δ*Gn*(*t*<sup>∗</sup>) -= 0, then Y*t*∗ ⊂ Y*t*∗+ = Y*t*∗ ∨ *<sup>σ</sup>*{∑*Nn*=<sup>1</sup> *Xnt*∗Δ*Gn*(*t*<sup>∗</sup>)}. The inclusion presumes the flow of *σ*-subalgebras {Y*t*}*t*0 is not necessarily continuous from the right for the considered observations [24]. This is a reason to define a filtering estimate as a CME of *Xt* with respect to the "smoothed" flow Y*t*+ for subsequent correct usage of the stochastic analysis framework.

Let us transform the available observations in such a way to derive the optimal filtering estimate by the standard methods [6,23]. Initially, the idea of this transform is suggested in [11]. As the result, the authors introduce the pair

$$dI\_t \stackrel{\triangle}{=} \int\_0^t \left( \frac{d\langle \mathbf{y}, Y \rangle\_u}{du} \mid\_{u=s+} \right)^{-1/2} dY\_{s,t} \tag{8}$$

$$
\langle \langle Y, Y \rangle\_t = \sum\_{n=1}^N \int\_0^t X\_s^n G\_{\mathbb{II}}(s) ds. \tag{9}
$$

The authors of [11] prove coincidence of the *σ*-algebras Y*t* = *<sup>σ</sup>*{*Us*, 0 *s t*} ∨ *<sup>σ</sup>*{#*<sup>Y</sup>*,*<sup>Y</sup>*\$*<sup>s</sup>*, 0 *s t*} for the general diffusion observation systems. However, they do not pay attention to the continuity of {Y*t*} from the right. The authors of [12,14] sugges<sup>t</sup> to replace the observations #*<sup>Y</sup>*,*<sup>Y</sup>*\$*<sup>t</sup>* by their derivative

$$Q(t) \stackrel{\Delta}{=} \frac{d\langle Y\_t Y \rangle\_t}{ds}|\_{s=t-} = \sum\_{n=1}^{N} X\_{t-}^n G\_n(t-). \tag{10}$$

Then, one can construct the optimal estimate either to use *Qt* as a linear constraint or to differentiate (10) for extraction of the dynamic noises. The papers [12,14] contain a rather pessimistic conclusion: the number of differentiations is unbounded in the general case of diffusion observation system. In contrast, we estimate a finite-state MJP and can construct the optimal filtering estimate using *Q* without additional differentiation.

So, the transformed observations will contain


The first transformed observation part is the process *Ut* (8), and in view of (2) and (7) it can be rewritten as

$$\mathcal{U}\_t = \int\_0^t \overline{f}(s) X\_s ds + \overline{W}\_{t\prime} \tag{11}$$

where *f*(*s*) ∑ *N <sup>n</sup>*=1 *G*−1/2 *n* (*s*)*f*(*s*) diag(*en*) and *Wt* is an F*t*-adapted standard Wiener [10].

The process *Qt* could play the role of the second part of the transformed observations since Y*t* = *<sup>σ</sup>*{*Us*, *Qs*, *s* ∈ [0, *t*]} [11], however the natural flow of *σ*-algebras generated by the couple (*<sup>U</sup>*, *Q*) is not continuous from the right yet. Moreover, the process *Qt* is matrix-valued and looks overabundant for the filter derivation. The point is, *Qt* = *Q*(*<sup>t</sup>*, *Xt*−) (10) is a function of the finite-set argumen<sup>t</sup> *Xt*, and it affects the estimate performance through its complete preimage

$$Q\_{\mathbf{t}} = Q(\mathbf{t}, \mathbf{X}\_{\mathbf{t}-}) \xrightarrow{\mathcal{Q}^{-1}} \{\mathbf{c}\_{\mathbf{n}} \in \mathbb{S}^{N} \,:\, G\_{\mathbf{n}}(\mathbf{t}-) \mathbf{c}\_{\mathbf{n}} = Q\_{\mathbf{t}}\}.$$

To go to the preimage we introduce the following transformation of *Qt*:

$$H\_t \stackrel{\triangle}{=} \sum\_{n=1}^N \mathbf{I}\_{\{\mathbf{0}\}} \left( \mathbf{Q}\_t - \mathbf{G}\_n(t) \right) e\_n \dots$$

*Ht* is a Y*t*-adapted vector process with components 0 or 1, but the trajectories *Ht* are not *cádlág* processes. Due to the fact *Xt*− = *Xt* P-a.s. for ∀ *t* ≥ 0 the equalities below are valid

$$H\_{\mathbf{f}} = \sum\_{n,k=1}^{N} \mathbf{I}\_{\{\mathbf{0}\}} \left( G\_{\mathbf{k}}(t) - G\_{\mathbf{n}}(t) \right) X\_{\mathbf{f}}^{k} e\_{\mathbf{n}} = K(t) X\_{\mathbf{f}} = K(t) X\_{\mathbf{f}-} \ \mathcal{P}-\text{a.s.} \tag{12}$$

where *K*(*t*) is the *N* × *N*-dimensional matrix with the components (4).

The function *K*(*t*) has the following properties.



6. P *T*(*t*)*Ht* ∈ S*<sup>N</sup>* = 1 for any *t* 0.

Let us define a Y*t*+-adapted process *Vt* = col(*V*<sup>1</sup> *t*,..., *V N t*) with the *cádlág*-trajectories:

$$V\_t \triangleq T(t+)H\_{t+}.\tag{13}$$

From (12) and (13) it follows that *Vt* = *J*(*t*)*Xt* P-a.s., where *J*(*t*) *T*(*t*+) *<sup>K</sup>*(*t*+).

We denote the set of the process *V* discontinuity by V, X stands for the set of *X* discontinuity and J for the analogous set of the process *J*. The sets V and X are random, in contrast J is nonrandom. The process *Vt* is purely discontinuous, and due to property 4 it can be rewritten in the form

$$V\_{l} = f(0)X\_{0} + \sum\_{\mathbf{x} \in \mathcal{V} \colon \mathbf{x} \le t} \Delta V\_{\mathbf{x}} = f(0)X\_{0} + \sum\_{\mathbf{x} \in \mathcal{J} \colon \mathbf{x} \le t} \Delta f(\mathbf{x})X\_{\mathbf{x}} + \sum\_{\mathbf{x} \in \mathcal{V} \backslash \mathcal{J} \colon \mathbf{x} \le t} f(\mathbf{x}) \Delta X\_{\mathbf{x}} = f(0)X\_{0} + \sum\_{\mathbf{x} \in \mathcal{J} \colon \mathbf{x} \le t} \Delta f(\mathbf{x}) \Delta X\_{\mathbf{x}} = f(0)X\_{0} + \sum\_{\mathbf{x} \in \mathcal{J} \colon \mathbf{x} \le t} \Delta f(\mathbf{x}) X\_{\mathbf{x}} + \underbrace{\int\_{0}^{t} f(s) dX\_{s}}\_{\triangleq D\_{t}}.\tag{14}$$

Due to the definition *Vt* ∈ S*<sup>N</sup>* for ∀ *t* 0. The process *Dt* characterizes the observable jumps at the nonrandom moments caused by *J*(*t*) changes, and *Rt* is an observable part of the state *Xt* jumps, occurred, at some random instants.

As a second part of the transformed observations, we choose the *N*-dimensional random process *Ct* col(*C*1*t* , ... , *CNt* ): the components *Cnt* count the jumps of the process *Vt* into the state *en*, occurred *at the random instants* over the interval [0, *t*]:

$$\mathbf{C}\_{t}^{n} = \int\_{0}^{t} (1 - \mathbf{e}\_{n}^{\top} V\_{s-}) \mathbf{e}\_{n}^{\top} d\boldsymbol{R}\_{s}.\tag{15}$$

The third part of the transformed observations is the *N*-dimensional process *Dt* with the jumps at the nonrandom moments.

**Lemma 1.** *If* Y*t <sup>σ</sup>*{(*Us*, *Cs*, *Ds*), *s* ∈ [0, *<sup>t</sup>*]}*, then the coincidence* Y*t* = Y*t*+ *is true for any t* 0*.*

Correctness of the Lemma assertion follows immediately from the fact the composite process (*Ut*, *Ct*, *Dt*) is constructed to be Y*t*+-adapted, and one-to-one correspondence of the (*<sup>U</sup>*, *C*, *D*) and *Y* paths:

$$\begin{cases} \begin{aligned} \mathbf{U}\_{l} &= \int\_{0}^{t} \left( \frac{d(\mathbf{y},\mathbf{y})}{du}|\_{u=s+} \right)^{-1/2} d\mathbf{Y}\_{s}, \\ \mathbf{C}\_{t} &= \int\_{0}^{t} (I - \text{diag}\,V\_{s-}) dV\_{s} - \sum\_{\mathbf{x} \in \mathcal{T} \colon \mathbf{x} \le t} (I - \text{diag}\,V\_{\mathbf{x}-}) \Delta V\_{\mathbf{x}}, \\ D\_{t} &= \sum\_{\mathbf{x} \in \mathcal{T} \colon \mathbf{x} \le t} (I - \text{diag}\,V\_{\mathbf{x}-}) \Delta V\_{\mathbf{x}}, \\ V\_{t} &= T(\mathbf{f} + \text{Id}) \mathbf{f}\_{t+\tau} \\ H\_{t} &\stackrel{\text{in}}{\Longrightarrow} \mathbf{I}\_{\{\mathbf{0}\}} \left( \frac{d(\mathbf{y},\mathbf{y})}{ds}|\_{s=t-} - G\_{n}(t) \right) \mathbf{e}\_{n\prime} \end{aligned} \tag{16}$$
 
$$\begin{cases} V\_{t} = D\_{t} + \int\_{0}^{t} \sum\_{(i,j): i \neq j} V\_{s-}^{i} (e\_{j} - e\_{i}) d\mathbf{C}\_{s}^{j}, \\ \mathbf{Y}\_{t} = \int\_{0}^{t} \sum\_{n=1}^{N} V\_{s}^{n} G\_{n}^{1/2}(s) d\mathbf{I}\_{s}, \end{cases} \tag{17}$$

Below we use the following notations: **1** is a row vector of the appropriate dimensionality formed by units, *Jn*(*s*) *e*"*n J*(*s*) is the *n*-th row of the matrix *J*(*s*),

$$
\Gamma\_n(\mathbf{s}) \stackrel{\Delta}{=} \text{diag}(I\_n(\mathbf{s})) \Lambda^\top(\mathbf{s}) (I - \text{diag} \, I\_n(\mathbf{s})).\tag{18}
$$

**Lemma 2.** *The process Ct* = col(*C*1*t* ,..., *CNt* ) *has the following properties.*

*1. n-th component Cntallows the martingale representation*

$$\mathbf{C}\_t^n = \int\_0^t \mathbf{1} \Gamma\_n(s) \mathbf{X}\_s ds + \int\_0^t (1 - f\_n(s) \mathbf{X}\_{s-}) f\_n(s) dM\_s^X.$$

*2.* [*Cn*, *Cm*]*t* ≡ 0 *for any n* -= *m;*

$$
\langle \mathbb{C}^n, \mathbb{C}^n \rangle\_t = \int\_0^t \mathbf{1} \Gamma\_n(s) X\_s ds. \tag{19}
$$

.

*3. The innovation processes*

$$\nu\_t^n \stackrel{\triangle}{=} \int\_0^t \left( d\mathbb{C}\_s^n - \mathbf{1} \Gamma\_n(s) \hat{X}\_s ds \right), \quad n = \overline{1, N} \tag{20}$$

*are* Y*t-adapted martingales with the quadratic characteristics*

$$
\langle \langle \boldsymbol{\nu}^{\boldsymbol{n}}, \boldsymbol{\nu}^{\boldsymbol{n}} \rangle\_{\boldsymbol{t}} = \int\_{0}^{\boldsymbol{t}} \mathbf{1} \Gamma\_{\boldsymbol{n}}(\boldsymbol{s}) \boldsymbol{\mathcal{X}}\_{\boldsymbol{s}} d\boldsymbol{s}.\tag{21}
$$

Proof of Lemma 2 is given in Appendix A. Finally, the transformed observations (*<sup>U</sup>*, *C*, *D*) take the form

$$\begin{cases} \mathbf{^l}\mathbf{I}\_l = \int\_0^t \overline{f}(\mathbf{s}) \mathbf{X}\_s d\mathbf{s} + \overline{\mathbf{W}}\_{t\_\prime} \\ \mathbf{^c}\mathbf{C}\_l^\mathbf{u} = \int\_0^t \mathbf{1} \Gamma\_n(\mathbf{s}) \mathbf{X}\_s d\mathbf{s} + \int\_0^t (1 - f\_n(\mathbf{s}) \mathbf{X}\_{s-}) f\_n(\mathbf{s}) d\mathbf{M}\_{\mathbf{s}}^\mathbf{X} \; \; \mathbf{n} = \overline{\mathbf{1} \cdot \mathbf{N}} \end{cases} \tag{22}$$
  $D\_t = f(0) \mathbf{X}\_0 + \sum\_{\mathbf{x} \in \mathcal{T} \colon \mathbf{x} \le t} \Delta f(\mathbf{x}) \mathbf{X}\_\mathbf{x}$ .

**Theorem 1.** *The optimal filtering estimate X t is a strong solution to the SDS*

$$\begin{split} \hat{X}\_{t} = \left( (D\_{0})^{\top}f(0)\pi\_{0} \right)^{+} \operatorname{diag}(D\_{0})f(0)\pi\_{0} + \int\_{0}^{t} \Lambda^{\top}(s)\hat{X}\_{s} ds + \int\_{0}^{t} \left( \operatorname{diag}\hat{X}\_{s} - \hat{X}\_{s}\hat{X}\_{s}^{\top} \right) \overline{f}^{\top}(s) ds \omega + \\ &+ \sum\_{n=1}^{N} \int\_{0}^{t} \left( \Gamma\_{n}(s) - \mathbb{1}\Gamma\_{n}(s)\hat{X}\_{s-} I \right) \hat{X}\_{s-} \left( \operatorname{\mathbb{1}}\Gamma\_{n}(s)\hat{X}\_{s-} \right)^{+} dv\_{s}^{n} + \\ &+ \sum\_{\kappa \in \mathcal{J}\hat{\tau} : \kappa \leq t} \left( \left( \operatorname{\mathbb{1}}\Gamma\_{\kappa}^{\top}\operatorname{\mathbb{1}}f(\kappa)\hat{X}\_{\kappa-} \right)^{+} \operatorname{diag}(\Delta D\_{\text{x}})\Delta f(\kappa) - I \right) \hat{X}\_{\kappa-\tau} \tag{23} \end{split}$$

*where*

$$
\omega\_t \triangleq \mathcal{U}\_t - \int\_0^t \overline{f}(s)\,\widehat{\mathcal{X}}\_s ds \tag{24}
$$

*and A*<sup>+</sup> *is a Moore–Penrose pseudoinverse. The solution is unique within the class of nonnegative piecewise-continuous* Y*t*+*-adapted processes with discontinuity set lying in* V*.*

Proof of Theorem 1 is given in Appendix B.

The transformed observations (22) along with Theorem 1 prompt a condition of the *exact* identifiability of the state *Xt* given indirect noisy observations *Yt* (2).

**Corollary 1.** *If for any n* -= *m (n*, *m* = 1, *N) the inequalities Gn*(*s*) -= *Gm*(*s*) *are true almost everywhere on* [0, *t*]*, then X t* = *Xt* P*-a.s., and Xt is the solution to SDS (23).*

The proof of Corollary 1 is given in Appendix C.

#### **4. Numerical Algorithms of Optimal Filtering**

#### *4.1. Optimal Filtering Given Discretized Observations*

The latter section contains the stochastic system (23) defining the optimal filtering estimate *X t*. The problem of its numerical realization seems routine: we should apply the corresponding methods of numerical integration of SDS with jumps on the RHS [26]. However, this simplicity is illusory. The problem is that the "new" countable observation *Ct* and discrete-time one *Dt* are results of certain transform of the available observation *Y*, and this transform includes a limit passage operation. In fact, to obtain *Ct* we have to estimate/restore the current value of the derivative *d*#*<sup>Y</sup>*,*<sup>Y</sup>*\$*<sup>t</sup>*<sup>+</sup> *dt* . First, this leads to some time delay to accumulate observations *Yt*. Second, any pre-limit variant of *Ct* either has a.s. continuous trajectories or represents their sampling, which demonstrates oscillating nature. Third, the considered filtering estimate is the CME of the state *Xt* given the observations *Y* up to the moment *t*. The CME has natural properties: its components are a.s. non-negative and satisfy the normalization condition. The estimates and approximations having these properties are referred in the paper as the stable ones. Mostly, the conventional numerical algorithms do not provide these properties for the calculated approximations. They can preserve the normalization condition only, but the components can have the arbitrary signs and absolute values.

In the paper we present another approach to the numerical realization of the filtering algorithm above. We discretize the available observations *Y* by time with the increment *h* and then solve the optimal state filtering problem given discretized observations. The estimate can be considered as approximation of the one given the initial continuous-time observations. Properties of the CME guarantee the stability of the proposed approximation.

To simplify derivation of the numerical algorithm and its accuracy analysis we investigate the time-invariant subset of the observation system (1), (2), i.e., Λ(*t*) ≡ Λ, *A*(*t*) ≡ *A*, *Gn*(*t*) ≡ *Gn*, *n* = 1, *N*. The observations are discretized with the time increment *h*:

$$\mathcal{Y}\_r \triangleq \int\_{t\_{r-1}}^{t\_r} f \mathcal{X}\_s ds + \int\_{t\_{r-1}}^{t\_r} \sum\_{n=1}^N \mathcal{X}\_s^n G\_n^{1/2} d\mathcal{W}\_{s\prime} \quad r \in \mathbb{N},\tag{25}$$

where *tr rh* are equidistant time instants. We denote Y*r <sup>σ</sup>*{<sup>Y</sup>*s* : 1 *s r*} non-decreasing collection of *σ*-algebras generated by the time-discretized observations; Y0 {<sup>∅</sup>, <sup>Ω</sup>}.

*The optimal state filtering problem given discretized observations* is to find X*r* **E** {*Xtr* |Y*r*}.

Let us consider asymptotics of X. We fix some *T* > 0 and consider a condensed sequence of binary meshes { *rT* 2*n* }*<sup>r</sup>*=1,2*<sup>n</sup>* with time increments *hn T* 2*n* and corresponding increasing sequence of *σ*-subalgebras {Y*n* 2*n* }: Y*n* 2*n <sup>σ</sup>*{<sup>Y</sup>*<sup>r</sup>*, 1 *r* <sup>2</sup>*<sup>n</sup>*}. The observation process {*Yt*} is separable, hence *σ* { 5 ∞ *<sup>n</sup>*=1 Y*n*} = Y*<sup>T</sup>*. Then, by Levy theorem <sup>X</sup>2*n* **E** {*XT*|Y*n*} *n*→ ∞ −−−→ **E** {*XT*|Y*T*} = **E** {*XT*|Y*<sup>T</sup>*+} *<sup>X</sup>T* P-a.s. Moreover, since **E** *<sup>X</sup>T* ≡ **E** <sup>X</sup>2*n* = *<sup>π</sup>*(*T*), the L1-convergence is also true: lim*n*<sup>→</sup> ∞ **E** |*XT* − <sup>X</sup>2*n* | = 0. The convergence also holds, if we replace the sequence of the binary meshes by any condensed sequence with vanishing step. So, we can conclude that the optimal filtering given the discretized observation is a way to design the stable convergen<sup>t</sup> approximations without observation transform *Y* → (*<sup>U</sup>*, *C*, *D*) introduced in the previous section.

To derive the filtering formula we use the approach of [27] and the mathematical induction. In the case *r* = 0 we have

$$
\hat{\mathbf{X}}\_0 = \mathbf{E}\left\{ \mathbf{X}\_0 | \mathfrak{Y}\_0 \right\} = \mathbf{E}\left\{ \mathbf{X}\_0 \right\} = \boldsymbol{\pi}.\tag{26}
$$

Let for some *r* ∈ N the estimate <sup>X</sup>*r*−<sup>1</sup> = **E** *Xtr*−<sup>1</sup> |Y*<sup>r</sup>*−<sup>1</sup> be known. Now we calculate X*r* at the next time instant. To do this we have to specify the mutual conditional distribution (*Xtr*, <sup>Y</sup>*r*) with respect to Y*<sup>r</sup>*−1. From the observation model and ([10] Lemma 7.5) it follows that the conditional distribution of Y*r* given *σ*-algebra F *X tr*∨ Y*<sup>r</sup>*−<sup>1</sup> is Gaussian with the parameters

$$\mathbb{E}\left\{\boldsymbol{\mathsf{Y}}\_{r}|\boldsymbol{\mathcal{F}}\_{t\_{r}}^{X}\right\} = f\boldsymbol{\upsilon}\_{r\prime} \qquad \text{cov}(\boldsymbol{\mathsf{Y}}\_{r\prime}\boldsymbol{\mathsf{Y}}\_{r}|\boldsymbol{\mathcal{F}}\_{t\_{r}}^{X}) = \sum\_{n=1}^{N} \boldsymbol{\upsilon}\_{r}^{n}\mathbf{G}\_{n}.\tag{27}$$

Here, *υr* = col(*υ*<sup>1</sup> *r*, ... , *υ<sup>N</sup> r* ) *tr tr*−<sup>1</sup> *Xsds* is a random vector composed of the occupation times of the process *X* in each state *en* during the interval [*tr*−1, *tr*].

Below in the presentation we use the following notations:


$$\mathbb{E}\left\{\mathbf{I}\_{\mathcal{G}}(\upsilon\_r)\mathbf{I}\_{\{q\}}(N\_r^X)X\_{t\_r}^{\ell}|X\_{t\_{r-1}}=\mathfrak{e}\_k\right\} = \int\_{\mathcal{G}} \rho^{k,\ell,q}(du);$$

• N (*y*, *m*, *K*) (<sup>2</sup>*π*)−*<sup>M</sup>*/2det−1/2*<sup>K</sup>* exp −12 *y* − *<sup>m</sup>*)<sup>2</sup>*K*−<sup>1</sup> is an *M*-dimensional Gaussian *probability density function* (pdf) with the expectation *m* and nondegenerate covariance matrix *K*; •*α*<sup>2</sup>*K <sup>α</sup>*"*Kα*,#*<sup>α</sup>*,*β*\$*K <sup>α</sup>*"*Kβ*.

Markovianity of {(*Xtr*,Y*r*)}*r*0, formula of the total probability and Fubini theorem provide the equalities below for any set A∈B(R*<sup>M</sup>*)

**E** *Xtr* **<sup>I</sup>**A(<sup>Y</sup>*r*)Y*<sup>r</sup>*−<sup>1</sup> = **E E** *Xtr* **<sup>I</sup>**A(<sup>Y</sup>*r*)<sup>F</sup> *Xtr* ∨ <sup>Y</sup>*<sup>r</sup>*−<sup>1</sup> Y*<sup>r</sup>*−<sup>1</sup> = = **E** *Xtr* A N *y*, *f υr*, *N*∑*p*=1 *υpr Gpdy*<sup>Y</sup>*<sup>r</sup>*−<sup>1</sup>6 = = **E E** *Xtr* A N *y*, *f υr*, *N*∑*p*=1 *υpr GpdyXtr*−<sup>1</sup> ∨ Y*r*−16 <sup>Y</sup>*<sup>r</sup>*−<sup>1</sup>6 = = **E** *N*∑=1 *e* ∞∑*q*=0 *N*∑*k*=1 *e*"*k Xtr*−<sup>1</sup> D A N *y*, *f u*, *N*∑*p*=1 *<sup>u</sup>pGpdyρk*,,*<sup>q</sup>*(*du*)<sup>Y</sup>*<sup>r</sup>*−<sup>1</sup>6 = = *N* ∑ =1 *e* A *N*∑*k*=1 <sup>X</sup>*kr*−<sup>1</sup> ∞∑*q*=0 D N *y*, *f u*, *N*∑*p*=1 *upGpρk*,,*<sup>q</sup>*(*du*) *dy*.

This means that the integrand in the square brackets defines the conditional distribution (*Xtr*, <sup>Y</sup>*r*) given Y*<sup>r</sup>*−1. Further, the conditional distribution X *r* is defined component-wisely by the generalized Bayes rule [10]

$$\hat{\mathcal{R}}\_{r}^{j} = \frac{\sum\_{k=1}^{N} \hat{\mathcal{X}}\_{r-1}^{k} \sum\_{q=0}^{\infty} \int\_{\mathcal{D}} \mathcal{N}\left(\boldsymbol{\Y}\_{r} \boldsymbol{f} \boldsymbol{u}, \boldsymbol{\Sigma}\_{p=1}^{N} \boldsymbol{u}^{p} \boldsymbol{G}\_{p}\right) \rho^{k,j,q}(d\boldsymbol{u})}{\sum\_{i,\ell=1}^{N} \hat{\mathcal{X}}\_{r-1}^{i} \sum\_{c=0}^{\infty} \int\_{\mathcal{D}} \mathcal{N}\left(\boldsymbol{\Y}\_{r} \boldsymbol{f} \boldsymbol{v}, \boldsymbol{\Sigma}\_{n=1}^{N} \boldsymbol{v}^{n} \boldsymbol{G}\_{n}\right) \rho^{i,\ell,c}(d\boldsymbol{v})}, \quad j = \overline{1,N}. \tag{28}$$

So, we have proved the following

**Lemma 3.** *If for the observation system (1), (2) conditions 1–3 are valid, then the filtering estimate* X *r given the discretized observations is defined by (26) at r* = 0*, and by recursion (28) at the instant tr of the discretized observation* Y*r reception.*

#### *4.2. Stable Analytic Approximations*

Recursion (23) cannot be realized directly because of infinite summation both in the numerator and denominator. We replace them by the finite sums, and the corresponding vector sequence <sup>X</sup>*r*(*s*), calculated by the formula

$$\overline{\mathcal{R}}\_{r}^{j}(s) = \frac{\sum\_{k=1}^{N} \overline{\mathcal{X}}\_{r-1}^{k}(s) \sum\_{q=0}^{s} \int\_{\mathcal{D}} \mathcal{N}\left(\mathsf{Y}\_{r} \, f \boldsymbol{u}, \sum\_{p=1}^{N} \boldsymbol{u}^{p} \mathsf{G}\_{p}\right) \rho^{k,j,q}(d\boldsymbol{u})}{\sum\_{i,\ell=1}^{N} \overline{\mathcal{X}}\_{r-1}^{i}(s) \sum\_{c=0}^{s} \int\_{\mathcal{D}} \mathcal{N}\left(\mathsf{Y}\_{r} \, f \boldsymbol{v}, \sum\_{n=1}^{N} \boldsymbol{v}^{n} \mathsf{G}\_{n}\right) \rho^{j,\ell,c}(d\boldsymbol{v})}, \quad j = \overline{1,N} \tag{29}$$

is called *the analytic approximation of the s-th order* of X *r*. Obviously, that <sup>X</sup>*r*(*s*) is stable.

Let us introduce the following positive random numbers and matrices:

$$\begin{split} \xi\_{q}^{kj} &\triangleq \sum\_{m=0}^{s} \int\_{\mathcal{D}} \mathcal{N} \left( \mathsf{Y}\_{q} \, f \, u\_{\prime} \sum\_{p=1}^{N} u^{p} G\_{p} \right) \rho^{k,j,m} (du), \\ \theta\_{q}^{kj} &\triangleq \sum\_{m=s+1}^{\infty} \int\_{\mathcal{D}} \mathcal{N} \left( \mathsf{Y}\_{q\prime} \, f \, u\_{\prime} \sum\_{p=1}^{N} u^{p} G\_{p} \right) \rho^{k,j,m} (du), \\ \xi\_{q}^{\prime} &\triangleq ||\xi\_{q}^{kj}||\_{k,j=\overline{1,N}} \qquad \qquad \theta\_{q} \triangleq ||\theta\_{q}^{kj}||\_{k,j=\overline{1,N}}. \end{split} \tag{30}$$

The estimates X*r* (28) and <sup>X</sup>*r*(*s*) (29) can be rewritten in the recurrent form:

$$
\widehat{\mathbf{X}}\_{\mathbf{r}} = (\mathbf{1}(\boldsymbol{\xi}\_{r} + \boldsymbol{\theta}\_{r})^{\top}\widehat{\mathbf{X}}\_{r-1})^{-1}(\boldsymbol{\xi}\_{r} + \boldsymbol{\theta}\_{r})^{\top}\widehat{\mathbf{X}}\_{r-1} \tag{31}
$$

$$\mathbf{X}\_r(\mathbf{s}) = (\mathbf{1}\_{\mathbb{S}^r}^{\mathbb{T}^\top} \mathbf{X}\_{r-1}(\mathbf{s}))^{-1} \mathbf{j}\_r^{\mathbb{T}^\top} \mathbf{X}\_{r-1}(\mathbf{s}).\tag{32}$$

Let us define the global distance [28] between the estimates {<sup>X</sup>*r*(*s*)} and {X*r*} as

$$\Sigma\_r(\mathbf{s}) \triangleq \sup\_{\pi \in \Pi} \mathbb{E}\left\{ \| \mathbf{\hat{X}}\_r - \overline{\mathbf{X}}\_r(\mathbf{s}) \|\_{1} \right\} = \sup\_{\pi \in \Pi} \sum\_{j=1}^N \mathbb{E}\left\{ |\mathbf{\hat{X}}\_r^j - \overline{\mathbf{X}}\_r^j(\mathbf{s})| \right\}.\tag{33}$$

The pretty natural characteristic shows the maximal expected divergence of the recursions (28) and (29) at the *r*-th step.

The assertion below defines an upper bound of the characteristic <sup>Σ</sup>*r*(*s*).

**Lemma 4.** *If the conditions of Lemma 3 are valid, then*

$$\Sigma\_{\rm I}(s) \lessapprox 2 - 2\left(1 - \mathbb{C}\_1 \frac{(\overline{\lambda}h)^{s+1}}{(s+1)!} \right)^r,\tag{34}$$

*where λ* max1*n<sup>N</sup>* |*<sup>λ</sup>nn*|*, and C*1 = *<sup>C</sup>*1(*h*, *λ*) ∈ (0, 1) *is the following parameter:*

$$\mathbb{C}\_1 \stackrel{\Delta}{=} e^{-\overline{\lambda}h} \frac{(s+1)!}{(\overline{\lambda}h)^{s+1}} \sum\_{k=s+1}^{\infty} \frac{(\overline{\lambda}h)^k}{k!} \, ^\circ \tag{35}$$

.

*which is bounded from above: C*1 (*λh*)*s*+<sup>1</sup> (*s*+<sup>1</sup>)! < 1*.*

> The proof of Lemma 4 is given in Appendix D.

Assertion of Lemma brings the practical benefit. The Lemma does not contain any asymptotic requirements neither to the approximation order *s* nor to the discretization step *h*: inequality (34) is universal. Mostly, in the digital control systems the data acquisition rate is fixed or bounded from above. There are some extra algorithmic limitations of the rate: the "raw" data should be preprocessed, smoothed, averaged, refined from outliers, etc. For example, utilization of the central limit theorem [29] and diffusion approximation framework [30] for the the renewal processes is legitimate with significant averaging intervals, and their length depends on the process moments.

Now we fix the time instant *T* and consider an asymptotic *h* → 0. In this case *r* = *T h*→ ∞ and

$$\Sigma\_{\frac{T}{h}}(s) \lessapprox 2 - 2\left(1 - C\_1 \frac{(\overline{\lambda}h)^{s+1}}{(s+1)!}\right)^{\frac{T}{h}} \sim 2\overline{\lambda}T \frac{(\overline{\lambda}h)^{s}}{(s+1)!}.$$

#### *4.3. Stable Numerical Approximations*

In the recursion (32) we use the integrals *ξij r* , which cannot be calculated analytically. The numerical integration brings some extra approximation error. Let us investigate its affect to the total accuracy of the filter numerical realization.

The integrals *ξij*(*y*) are usually approximated by the sums

$$\xi^{ij}(y) \approx \psi^{ij}(y) \stackrel{\triangle}{=} \sum\_{\ell=1}^{L} \mathcal{N}\left(y, fw\_{\ell}, \sum\_{p=1}^{N} w\_{\ell}^{p} \mathbf{g}\_{p}\right) \stackrel{ij}{\boldsymbol{\varrho}}\_{\ell'} \qquad \boldsymbol{\Psi}(y) \stackrel{\triangle}{=} ||\psi^{ij}(y)||\_{i,j=\overline{1,N}}\tag{36}$$

which are defined by the collection of the pairs {(*<sup>w</sup>*, *ij* )}=1,*L*. Here, *w* col(*w*<sup>1</sup> , ... , *w N* ) ∈ D are the points, and *ij* 0 ( = 1, *L*) are the weights: ∑ *N j*=1 ∑*<sup>L</sup>* =1 *ij Q* 1.

In complete analogy with *ξq* we define the approximations *ψq ψij*(<sup>Y</sup>*q*)*<sup>i</sup>*,*j*=1,*N*. By construction, the elements of *ψq* are positive random values, hence the approximation X *r*

$$\widetilde{\mathbb{X}}\_r \triangleq (\mathbf{1}\boldsymbol{\psi}\_r^\top \widetilde{\mathbb{X}}\_{r-1})^{-1} \boldsymbol{\psi}\_r^\top \widetilde{\mathbb{X}}\_{r-1}, \quad \widetilde{\mathbb{X}}\_0 = \boldsymbol{\pi} \tag{37}$$

is stable. Below we denote the numerical integration errors and their absolute values as follows

$$
\gamma^{kj} \triangleq \Psi^{kj} - \mathfrak{J}^{kj}, \qquad \gamma\_r \triangleq \|\gamma^{kj}(\varPsi\_r)\|\_{k, j = \varXi, \mathbf{N}} \tag{38}
$$

$$\overline{\gamma}^{kj} \triangleq |\gamma^{kj}|, \qquad \overline{\gamma}\_r \triangleq \left\| |\gamma^{kj}(\mathsf{Y}\_r)| \right\|\_{k,j=\mathsf{T},\mathsf{V}}.\tag{39}$$

So, the recursion (32) is replaced by the scheme (37), holding the common initial condition *π*.

Both (32) and (37) are constructed in light of the event *Asr*: the state transition numbers do not exceed the threshold *s* over any subintervals [*tq*−1, *tq*] belonging to [0, *tr*]. So, the distance between X *r* and <sup>X</sup>*r*(*s*) should be determined taking into account *Asr*. In view of this fact, we propose the pseudo-metrics

$$\mathcal{E}\_{\mathbf{r}}(s) \stackrel{\triangle}{=} \sup\_{\pi \in \Pi} \mathbb{E}\left\{ \mathbf{I}\_{A\_{\mathbf{r}}^{i}}(\omega) \| \widetilde{\mathbf{X}}\_{\mathbf{r}} - \overline{\mathbf{X}}\_{\mathbf{r}}(s) \|\_{1} \right\} = \sup\_{\pi \in \Pi} \sum\_{n=1}^{N} \mathbb{E}\left\{ \mathbf{I}\_{A\_{\mathbf{r}}^{i}}(\omega) | \widetilde{\mathbf{X}}\_{\mathbf{r}}^{n} - \overline{\mathbf{X}}\_{\mathbf{r}}^{n}(s)| \right\}.\tag{40}$$

This index reflects maximal divergence of the algorithms (32) and (37) after *r* steps, being started from the arbitrary but common initial condition.

**Theorem 2.** *If the inequality*

$$\max\_{i=1,\overline{\mathcal{N}}} \sum\_{j=1}^{N} \int\_{\mathbb{R}^M} |\psi^{ij}(y) - \xi^{ij}(y)| dy < \delta \tag{41}$$

*is true for the numerical integration scheme (36), then the distance* E*r*(*s*) *is bounded from above:*

$$\mathcal{E}\_r(s) \lessapprox 2rQ^{r-1}\delta.\tag{42}$$

The proof of Theorem 2 is given in Appendix E.

The chance to describe the accuracy of the numerical algorithm for the stochastic filtering using only the condition (41), related to the calculus, looks remarkable. Furthermore, if the total weight *Q* = ∑,*<sup>j</sup> ij* separates from the unity, i.e., *Q* < 1, then the index E*r*(*s*) is a *sublinear* function of *r*, so as the index <sup>Σ</sup>*r*(*s*) of the analytic accuracy is. Notably, that in the classic numerical algorithms of the SDS solution the global error grows *linearly* with respect to the number of steps *r* [26].

The precision characteristics of both the analytical approximation and its numerical realization should be aggregated into the one. If the conditions of Lemma 4 and Theorem 2 are valid, then the local distance (i.e., the distance after one iteration) between the optimal filtering estimate and its numerical approximation can be bounded from above:

$$\begin{split} \tau(\boldsymbol{s}) \triangleq \sup\_{\boldsymbol{\pi} \in \Pi} \mathbf{E} \left\{ \| \hat{\mathbf{X}}\_{1} - \overline{\mathbf{X}}\_{1} \|\_{1} \right\} \leqslant \sup\_{\boldsymbol{\pi} \in \Pi} \mathbf{E} \left\{ \mathbf{I}\_{\boldsymbol{\pi}\_{1}^{\prime}}(\boldsymbol{\omega}) \| \widetilde{\mathbf{X}}\_{1} - \overline{\mathbf{X}}\_{1}(\boldsymbol{s}) + \overline{\mathbf{X}}\_{1}(\boldsymbol{s}) - \widehat{\mathbf{X}}\_{1} \|\_{1} + \mathbf{I}\_{\overline{\mathbf{x}}\_{1}^{\prime}}(\boldsymbol{\omega}) \| \widetilde{\mathbf{X}}\_{1} - \overline{\mathbf{X}}\_{1}(\boldsymbol{s}) \|\_{1} \right\} \leqslant \\ \quad \leqslant 2 \mathcal{P} \left\{ \widetilde{\mathbf{z}}\_{1}^{\prime} \right\} + \sup\_{\boldsymbol{\pi} \in \Pi} \mathbf{E} \left\{ \| \widetilde{\mathbf{X}}\_{1}(\boldsymbol{s}) - \widehat{\mathbf{X}}\_{1} \|\_{1} \right\} + \sup\_{\boldsymbol{\pi} \in \Pi} \mathbf{E} \left\{ \mathbf{I}\_{\widetilde{\mathbf{z}}\_{1}^{\prime}}(\boldsymbol{\omega}) \| \widetilde{\mathbf{X}}\_{1} - \overline{\mathbf{X}}\_{1}(\boldsymbol{s}) \|\_{1} \right\} = \\ \qquad \qquad \qquad \qquad = 2 \mathcal{P} \left\{ \widetilde{\mathbf{z}}\_{1}^{\prime} \right\} + \sigma(\mathbf{s}) + \mathcal{E}\_{1}(\boldsymbol{s}) \leqslant 4 \frac{(\overline{\mathbf{a}}\boldsymbol{\alpha})^{\nu+1}}{(\boldsymbol{s}+1)!} + 2\delta. \tag{43} \end{split}$$

The global distance between X *r* **E** {*Xr*|Y*r*} and X*r* can be bounded in the similar way:

$$\mathcal{T}(s) \stackrel{\Delta}{=} \sup\_{\pi \in \Pi} \mathbf{E}\left\{ \|\widehat{\mathsf{X}}\_{\mathsf{V}} - \widetilde{\mathsf{X}}\_{\mathsf{V}}\|\_{1} \right\} \leqslant 4\left[1 - \left(1 - \frac{(\widetilde{\mathsf{A}}h)^{s+1}}{(s+1)!} \right)^{r}\right] + 2rQ^{r-1}\delta. \tag{44}$$

We could choose the parameters (*h*,*<sup>s</sup>*) of the analytical approximation and *δ* of the numerical integration independently each other. However, both the limitation of the computational resources and the accuracy requirements lead to the necessity of the mutual optimization of (*h*,*s*, *<sup>δ</sup>*).

Let us fix some time horizon *T* along with the order *s* of analytical approximation, and consider the asymptotic *r* → <sup>∞</sup>, or, equivalently, *h* = *T r* → 0. Due to the Bernoulli inequality, and condition 0 < *Q* 1 we have that

$$\begin{split} \sup\_{\pi \in \Pi} \mathbb{E}\left\{ \| \tilde{\mathcal{X}}\_{T/h} - \hat{\mathcal{X}}\_{T/h} \|\_{1} \right\} &\leqslant 4 \left[ 1 - \left( 1 - \frac{(\overline{\Lambda}h)^{s+1}}{(s+1)!} \right)^{r} \right] + 2rQ^{r-1}\delta \leqslant 4r \frac{(\overline{\Lambda}h)^{s+1}}{(s+1)!} + 2rQ^{r-1}\delta = \\ &= 4\overline{\Lambda}T \frac{(\overline{\Lambda}h)^{s}}{(s+1)!} + 2rQ^{r-1}\delta \leqslant 2T \left( 2\overline{\Lambda} \frac{(\overline{\Lambda}h)^{s}}{(s+1)!} + \frac{\delta}{h} \right). \end{split} \tag{45}$$

The first summand in the brackets represents the contribution of the analytical approximation error, the second one reflects the error of the specified numerical integration scheme. Obviously, the optimal choice of the parameters provides an equal infinitesimal order for both the summands, and it is possible when *δ* ∼ (*λh*)*s*+<sup>1</sup> *λ*.
