*Article* **Particle Filtering: A Priori Estimation of Observational Errors of a State-Space Model with Linear Observation Equation †**

**Rodi Lykou \*,‡ and George Tsaklidis ‡**

Department of Statistics and Operational Research, School of Mathematics, Aristotle University of Thessaloniki, GR54124 Thessaloniki, Greece; tsaklidi@math.auth.gr

**\*** Correspondence: lykourodi@math.auth.gr

† This paper is an extended version of our published in the 6th Stochastic Modeling Techniques and Data Analysis International Conference with Demographics Workshop. 2–5 June 2020, Turned into an Online Conference.

‡ These authors contributed equally to this work.

**Abstract:** Observational errors of Particle Filtering are studied over the case of a state-space model with a linear observation equation. In this study, the observational errors are estimated prior to the upcoming observations. This action is added to the basic algorithm of the filter as a new step for the acquisition of the state estimations. This intervention is useful in the presence of missing data problems mainly, as well as sample tracking for impoverishment issues. It applies theory of Homogeneous and Non-Homogeneous closed Markov Systems to the study of particle distribution over the state domain and, thus, lays the foundations for the employment of stochastic control against impoverishment. A simulating example is quoted to demonstrate the effectiveness of the proposed method in comparison with existing ones, showing that the proposed method is able to combine satisfactory precision of results with a low computational cost and provide an example to achieve impoverishment prediction and tracking.

**Keywords:** particle filter; missing data; single imputation; impoverishment; Markov Systems

**MSC:** 60G35; 60G20; 60J05; 62M20

#### **1. Introduction**

Particle Filter (PF) methodology deals with the estimation of latent variables of stochastic processes taking into consideration noisy observations generated by the latent variables [1]. This technique mainly consists of Monte-Carlo (MC) simulation [2] of the hidden variables and the weight assignment to the realizations of the random trials during simulation, the particles. This procedure is repeated sequentially, at every time step of a stochastic process. The involvement of sequential MC simulation in the method is accompanied by a heavy computational cost. However, the nature of the MC simulation makes the PF estimation methodology suitable for a wide variety of state-space models, including non-linear models with non-Gaussian noise. The weights are defined according to observations, which are received at every time step. The weight assignment step constitutes an evaluation process of the existing particles, which are created at the simulation step.

As weight assignment according to an observation dataset is a substantial part of PF, missing observations hinder the function of the filter. Wang et al. [3] wrote a review concerning PF on target tracking, wherein they mentioned cases of missing data and measurement uncertainties within multi-target tracking, as well as methods that deal with this problem (see, e.g., [4]). Techniques that face the problem of missing data focus mainly on substitution of the missing data. In recent decades, Expectation-Maximization algorithm [5] and Markov-Chain Monte Carlo methods [6] became popular for handling missing data problems. These algorithms have been constructed independently of PF. Gopaluni [7] proposed a combination of Expectation-Maximization algorithm with PF for

**Citation:** Lykou, R.; Tsaklidis, G Particle Filtering: A Priori Estimation of Observational Errors of a State-Space Model with Linear Observation Equation. *Mathematics* **2021**, *9*, 1445. https://doi.org /10.3390/math9121445


Academic Editor: Panagiotis-Christos Vassiliou and Andreas C. Georgiou

Received: 20 May 2021 Accepted: 17 June 2021 Published: 21 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

parameter estimation with missing data. Housfater et al. [8] devised Multiple Imputations Particle Filter (MIPF), wherein missing data are substituted by multiple imputations from a proposal distribution and these imputations are evaluated with an additional weight assignment according to their proposal distribution. Xu et al. [9] involved uncertainty on data availability in the observations with the form of additional random variables in the subject state-space model. All the aforementioned approaches are powerful, although computationally costly.

This paper focuses on state-space models with linear observation equations and provides an estimation of the errors of missing observations (in cases of missing data), aiming at the approximation of weights, under a Missing At Random (MAR) assumption [10]. Linearity in an observation equation permits sequential replacements of missing values with equal quantities of known distributions. Although this method is applicable to a smaller set of models than the former ones, it is much faster as it leads to a single imputation process. A simulating example is provided for the comparison of the suggested method with existing techniques for the advantages of the proposed algorithm to be highlighted. The contribution of the a priori estimation step to the study of impoverishment phenomena is also exhibited through Markov System (MS) framework (see, e.g., [11]). The substitution of future weights renders the estimation of future distribution of particles in the state domain feasible. The significance of this initiative lays on the possible estimation of the sample condition concerning impoverishment, in future steps, based on the suggested theory. Such a practice permits the coordinated application of stochastic control [12] instead of the mostly empirical approaches that been proposed so far [13].

The present article is based on the work of Lykou and Tsaklidis [14]. Further mathematical propositions are formed by the sparse remarks exhibited in [14], and the incorporation procedure of MS-theory in the study of particle distribution is explained in detail. The presentation of the initial results of the simulation example is reformulated for the example to be more easily understandable, as well as a new application of MS-theory for the quantitative prior estimation of the particle distribution one time step forward is added to the initial example. In Section 2, PF algorithm is presented analytically. In Section 3, the new weight estimation step is introduced and its connection with the study of degeneracy and impoverishment is explained. In Section 4, a simulating example is provided, where the results of the current method are compared with those of MIPF and the results of the basic PF algorithm in the case when all data are available. An example for the estimation of the particle distribution one step ahead after the current is also presented in the direction of impoverishment tracking and prediction. In Section 5, the discussion and concluding remarks are quoted.

#### **2. Particle Filter Algorithm**

Let {*xt*}*t*∈<sup>N</sup> be a stochastic process described by *<sup>m</sup>*-dimensional latent vectors, *xt* <sup>∈</sup> <sup>R</sup>*m*, and {*yt*}*t*∈<sup>N</sup> be the *<sup>k</sup>*-dimensional process of noisy observations, *yt* <sup>∈</sup> <sup>R</sup>*k*. The states and observations are inter-related according to the state-space model,

$$\mathbf{x}\_t \quad = \quad f(\mathbf{x}\_{t-1}) + \mathbf{v}\_t \tag{1}$$

$$\mathbf{y}\_t \quad = \ c + A\mathbf{x}\_t + \mathbf{u}\_t \Leftrightarrow \mathbf{y}\_t - \ c - A\mathbf{x}\_t = \mathbf{u}\_t. \tag{2}$$

In the system of Equations (1) and (2), *f* is a known deterministic function of *xt*, *vt* stands for the process noise, and *ut* symbolizes the observation noise. Each sequence {*vt*}*t*∈<sup>N</sup> and {*ut*}*t*∈<sup>N</sup> consists of independent and identically distributed (iid) random vectors, while *<sup>c</sup>* <sup>∈</sup> <sup>R</sup>*<sup>k</sup>* is a constant vector and *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*k*×*<sup>m</sup>* is a constant matrix.

PF methodology employs Bayesian inference for state estimation. The Bayesian approach aims at the construction of the posterior probability distribution function *p*(*xt*|*y*1:*t*), where *y*1:*<sup>t</sup>* = (*y*1, *y*2, ..., *yt*), resorting to the recursive equations,

$$\begin{array}{rcl}p(\mathbf{x}\_{t}|y\_{1:t-1})&=&\int p(\mathbf{x}\_{t}|\mathbf{x}\_{t-1})p(\mathbf{x}\_{t-1}|y\_{1:t-1})d\mathbf{x}\_{t-1}\text{ (prediction)}\\p(\mathbf{x}\_{t}|y\_{1:t})&=&\frac{p(y\_{t}|\mathbf{x}\_{t})p(\mathbf{x}\_{t}|y\_{1:t-1})}{\int p(y\_{t}|\mathbf{x}\_{t}^{\prime})p(\mathbf{x}\_{t}^{\prime}|y\_{1:t-1})d\mathbf{x}\_{t}^{\prime}}\text{ (update)}.\end{array}$$

These equations are analytically solvable in cases of linear state-space models with Gaussian noises. However, for more general models, analytical solutions are usually infeasible. For this reason, PF can be applied by utilizing MC simulation and integration to represent the state posterior probability distribution function, *p*(*xt*|*y*1:*t*), with the help of a set of *<sup>N</sup>* <sup>∈</sup> <sup>N</sup> particles *<sup>x</sup><sup>i</sup> <sup>t</sup>*, *i* = 1, 2, ..., *N*, with corresponding weights *w<sup>i</sup> <sup>t</sup>*, *i* = 1, 2, ..., *N*. Then, *p*(*xt*|*y*1:*t*) can be approximated by the discrete mass probability distribution of the weighted particles {*x<sup>i</sup> t*}*N <sup>i</sup>*=<sup>1</sup> as

$$\phi(\mathbf{x}\_t|y\_{1:t}) \approx \sum\_{i=1}^N w\_t^i \delta(\mathbf{x}\_t - \mathbf{x}\_t^i),$$

where *δ* is the Dirac delta function and weights are normalized, so that ∑*<sup>i</sup> w<sup>i</sup> <sup>k</sup>* = 1. As *p*(*xt*|*y*1:*t*) is usually unknown; this MC simulation is based on importance sampling, namely a known probability (importance) density *q*(*xt*|*y*1:*t*) is chosen in order for the set of particles to be produced. Then, the state posterior distribution is approximated by

$$\not{p}(\boldsymbol{\pi}\_{\boldsymbol{t}}|\boldsymbol{y}\_{1:t}) \approx \sum\_{i=1}^{N} \vec{w}\_{\boldsymbol{t}}^{\boldsymbol{i}} \delta(\boldsymbol{\pi}\_{\boldsymbol{t}} - \vec{\pi}\_{\boldsymbol{t}}^{\boldsymbol{i}})\_{\boldsymbol{\tau}}$$

with *x*˜*<sup>i</sup> <sup>t</sup>* ∼ *q*(*xt*|*y*1:*t*), while

$$\forall \vec{w}\_t^i \propto \vec{w}\_{t-1}^i \frac{p(y\_t|\vec{\pi}\_t^i)p(\vec{\pi}\_t^i|\vec{\pi}\_{t-1}^i)}{q(\vec{\pi}\_t^i|\vec{\pi}\_{t-1}^i, y\_t)}\tag{3}$$

are the normalized particle weights for *i* = 1, 2, ..., *N*.

As PF is applied successively for several time steps, it happens that increasing weights are assigned to the most probable particles, while the weights of the other particles become negligible progressively. Thus, only a very small proportion of particles is finally used for the state estimation. This phenomenon is known as PF degeneracy. In order to face this problem, a resampling with replacement step according to the calculated weights has been incorporated into the initial algorithm, resulting in the Sampling Importance Resampling (SIR) algorithm. Nevertheless, sequential resampling leads the particles to take values from a very small domain and exclude many other less probable values. This problem is called impoverishment. A criterion over the weight variability has been introduced for a decision to be made at every time step, whether existing particles should be resampled or not, to reach the middle ground between degeneracy and impoverishment. In this criterion, the Effective Sample Size measure of degeneracy, defined as

$$N\_{eff}(t) = \frac{N}{1 + Var\_{p(\bullet|y\_{1:t})}(w(x\_{\mathbf{f}}))},$$

is involved (see, e.g., [15], pp. 35–36). As this quantity cannot be calculated directly, it can be estimated as

$$
\hat{N}\_{eff}(t) = \frac{1}{\sum\_{i=1}^{N} (\vec{w}\_t^i)^2}.
$$

The decision on resampling is positive whenever *<sup>N</sup>*ˆ*eff*(*t*) <sup>&</sup>lt; *NT*, where *NT* <sup>=</sup> *<sup>c</sup>*1*N*, *<sup>c</sup>*<sup>1</sup> <sup>∈</sup> <sup>R</sup> is a fixed threshold. A usually selected value for *NT* is 75%*N*. Establishing a criterion for resampling slows down sample impoverishment of the sample but does not prevent it.

Algorithm 1 summarizes PF steps. The sampling part corresponds to the prior (prediction) step of Bayesian inference, while weight assignment and possible resampling constitute the posterior (update) step.

**Algorithm 1** SIR Particle Filter

**Require:** *N*, *q*, *Neff* , *T* Initialize {*x<sup>i</sup>* <sup>0</sup>, *<sup>w</sup><sup>i</sup>* 0} **for** *t* = 1, 2, ..., *T* **do** 1. Importance Sampling Sample *x*˜*<sup>i</sup> <sup>t</sup>* <sup>∼</sup> *<sup>q</sup>*(*xt*|*x<sup>i</sup>* 0:*t*−1, *yt*) Set *x*˜*<sup>i</sup>* 0:*<sup>t</sup>* = (*x<sup>i</sup>* 1:*t*−1, *<sup>x</sup>*˜*<sup>i</sup> t*), Calculate importance weights *w*˜*i <sup>t</sup>* ∝ *w<sup>i</sup> t*−1 *<sup>p</sup>*(*yt*|*x*˜*<sup>i</sup> t*)*p*(*x*˜*<sup>i</sup> t*|*xi <sup>t</sup>*−1) *q*(*x*˜*<sup>i</sup> t*|*xi <sup>t</sup>*−1,*yt*) . **end for for** *i* = 1, 2, ..., *N* **do** Normalize weights *w<sup>i</sup> <sup>t</sup>* <sup>=</sup> *<sup>w</sup>*˜*<sup>i</sup> t* ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *<sup>w</sup>*˜*<sup>i</sup> t* 2. Resampling **if** *<sup>N</sup>*ˆ*eff*(*t*) <sup>≥</sup> *NT* **then for** *i* = 1, 2, ..., *N* **do** *xi* 0:*<sup>t</sup>* = *<sup>x</sup>*˜*<sup>i</sup>* 0:*t* **end for else for** *i* = 1, 2, ..., *N* **do** Sample with replacement index *j*(*i*) according to the discrete weight distribution *P*(*j*(*i*) = *d*) = *w<sup>d</sup> <sup>t</sup>* , *d* = 1, ..., *N* Set *x<sup>i</sup>* 0:*<sup>t</sup>* = *x*˜ *j*(*i*) 0:*<sup>t</sup>* and *<sup>w</sup><sup>i</sup> <sup>t</sup>* = <sup>1</sup> *N* **end for end if end for**

#### **3. The Missing Data Case—Estimation of Weights**

We now proceed to the addition of a new step to Algorithm 1 for the case of missing data. For that purpose, some new definitions need be quoted. As the missing data mechanism is usually unknown, its possible dependence on the missing data themselves could introduce bias to the statistical inference. For this reason, a Missing at Random (MAR) assumption is adopted: let a random indicator variable *Rt*,*j*, *j* = 1, ..., *k*, indicate whether the *j*th component of the *t*th observation is available or not. That is,

$$R\_{t,j} = \begin{cases} 0 & \text{the } j \text{th component is available at time } t\\ 1 & \text{otherwise} \end{cases}$$

Additionally, sets *Zt* and *Wt* are defined as the collections of missing and available components of observations *yt*, respectively, for every time step *t* ∈ N.

According to the MAR assumption, the missing data mechanism does not depend on the missing observations, given the available ones:

$$P(R\_{t,j}|Z\_t, \mathcal{W}\_t) = P(R\_{t,j}|\mathcal{W}\_t), t \in \mathbb{N}, j = 1, \dots, k.$$

**Proposition 1.** *Let* {*x<sup>i</sup> <sup>t</sup>*−1}*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *be the set of particles produced for the posterior estimation of latent vector xt*−1*, while whole observation yt is missing. In addition, let <sup>u</sup><sup>i</sup> t, i* = 1 ... *N, be the observational errors for corresponding candidate particles x*˜*<sup>i</sup> t, so that u<sup>i</sup> <sup>t</sup>* <sup>=</sup> *yt* <sup>−</sup> *<sup>c</sup>* <sup>−</sup> *Ax*˜*<sup>i</sup> <sup>t</sup>*−1*,* *according to (2). Then, the conditional distribution of every observational error u<sup>i</sup> <sup>t</sup> on the particle set* {*x<sup>i</sup> <sup>t</sup>*−1}*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *is approximated as*

$$p(u\_t^i | \{\mathbf{x}\_{t-1}^i\}\_{i=1}^N) \approx p(A(f(\mathbf{\hat{x}}\_{t-1}) - f(\mathbf{x}\_{t-1}^i)) + Av\_t - Av\_t^i + u\_t),\tag{4}$$

*where <sup>x</sup>*ˆ*t*−<sup>1</sup> *is a point estimation of xt*−<sup>1</sup> *and <sup>v</sup><sup>i</sup> <sup>t</sup> represents the process noise for the generation of x<sup>i</sup> t.*

**Proof.** Let *x<sup>i</sup> <sup>t</sup>*−<sup>1</sup> be a particle for the posterior estimation of the hidden state *xt*−<sup>1</sup> of the state-space model (1)–(2). Then, according to Algorithm 1 and Equation (1), the *i*th prior estimation of the hidden state *xt* is produced by equation

$$\|\mathbf{x}\_t^i = f(\mathbf{x}\_{t-1}^i) + \mathbf{v}\_t^i. \tag{5}$$

According to Equation (2), the observational error of the particle is calculated as

$$
\mu\_t^i = y\_t - c - A\tilde{\mathbf{x}}\_t^i. \tag{6}
$$

If the (whole) observation *yt* is unavailable, sequential replacements of *u<sup>i</sup> <sup>t</sup>* and *yt* from Equations (6) and (2), respectively, contribute to the creation of the formula,

$$\begin{aligned} \boldsymbol{\mu\_{t}^{i}} &= \quad \boldsymbol{c} + A\boldsymbol{x\_{t}} + \boldsymbol{\mu\_{t}} - \boldsymbol{c} - A\widetilde{\boldsymbol{x}}\_{t}^{l} \\ &= \quad \boldsymbol{A} \left( \boldsymbol{x\_{t}} - \widetilde{\boldsymbol{x}}\_{t}^{i} \right) + \boldsymbol{\mu\_{t}}. \end{aligned}$$

As observation *yt* is considered missing, particles *x*˜*<sup>i</sup> <sup>t</sup>* cannot be evaluated. Thus, both *xt* and *x*˜*<sup>i</sup> <sup>t</sup>* are replaced according to Equations (1) and (5),

$$\begin{aligned} u\_t^l &= -A f(\mathbf{x}\_{t-1}) + Av\_t - A f(\mathbf{x}\_{t-1}^l) - Av\_t^l + u\_t \\ &= -A(f(\mathbf{x}\_{t-1}) - f(\mathbf{x}\_{t-1}^i)) + Av\_t - Av\_t^i + u\_t. \end{aligned}$$

The hidden state *xt*−<sup>1</sup> is unknown, but its posterior distribution is available, so that a point estimation of it, *x*ˆ*t*−1, can be calculated. Then,

$$u\_t^i \approx A \left( f(\mathbf{x}\_{t-1}) - f(\mathbf{x}\_{t-1}^i) \right) + Av\_t - Av\_t^i + u\_t. \tag{7}$$

Therefore, since the quantity *<sup>A</sup>*(*f*(*x*ˆ*t*−1) <sup>−</sup> *<sup>f</sup>*(*x<sup>i</sup> <sup>t</sup>*−1)) is a constant at time *<sup>t</sup>*, the distribution of *u<sup>i</sup> <sup>t</sup>* can be approximated as

$$p(u\_t^i | \{\mathbf{x}\_{t-1}^i\}\_{i=1}^N) \approx p(A(f(\mathbf{\hat{x}}\_{t-1}) - f(\mathbf{x}\_{t-1}^i)) + Av\_t - Av\_t^i + u\_t). \tag{8}$$

**Remark 1.** *Given that the distributions of the random vectors vt, v<sup>i</sup> t, and ut are known, the distribution of Avt* <sup>−</sup> *Av<sup>i</sup> <sup>t</sup>* + *ut is also known, as it is the convolution of linear functions of the initial components vt, v<sup>i</sup> t, and ut. Calculation of such convolutions is not always an easy task, as analytical solutions may not be feasible, leading to numerical approximation options ([16]). However, given that each noise process consists of iid vectors and matrix A is constant, the distribution of this sum needs to be calculated only once.*

**Remark 2.** *The replacement of x*˜*<sup>i</sup> <sup>t</sup> can be avoided, if MC simulation has been implemented at this time point.*

The weight assigned to *x*˜*<sup>i</sup> <sup>t</sup>* depends on *u<sup>i</sup> <sup>t</sup>*, according to Equations (3) and (6), because

$$p(y\_t | \mathfrak{x}\_t^i) = p(u\_t^i).$$

Then, as the two variables (*w*˜*<sup>i</sup> <sup>t</sup>* and *u<sup>i</sup> t*) are closely associated, knowledge of the distribution of *u<sup>i</sup> <sup>t</sup>* leads to the derivation of the distribution of *w*˜*<sup>i</sup> <sup>t</sup>*. Even if the distribution of *w*˜*<sup>i</sup> <sup>t</sup>* may not be exactly calculated, in cases where *w*˜*<sup>i</sup> <sup>t</sup>* are complicated functions of *u<sup>i</sup> t*, knowledge on the distribution of *u<sup>i</sup> <sup>t</sup>* will suffice to provide information on the weight distribution. Thereby, calculation of *p*(*u<sup>i</sup> <sup>t</sup>* <sup>∈</sup> *<sup>D</sup>*), *<sup>D</sup>* <sup>⊂</sup> <sup>R</sup>*k*, as it is suggested in Remark 1, is of interest for the concomitant estimation of weights.

**Proposition 2.** *If the conditions of Proposition 1 hold, while yt is partially observed, the conditional distribution of every observational error u<sup>i</sup> <sup>t</sup> on particle set* {*x<sup>i</sup> <sup>t</sup>*−1}*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *and collection Wt of available components of yt is approximated as*

$$p(\boldsymbol{u}\_t^i | \{\mathbf{x}\_{t-1}^i\}\_{i=1}^N, \mathcal{W}\_t) \approx p(\boldsymbol{A}(f(\boldsymbol{\hat{x}}\_{t-1}) - f(\mathbf{x}\_{t-1}^i)) + A\boldsymbol{v}\_t - A\boldsymbol{v}\_t^i + \boldsymbol{u}\_t | \mathcal{W}\_t).$$

**Proof.** Estimation of *u<sup>i</sup> <sup>t</sup>* implies the estimation of *yt*, according to Equation (6). If observation *yt* is partially available, its available components, say *Wt* collection, can be placed into the above equations. Thus, some components of the observational error will also be available, while the rest of the components, say *Zt* collection, possibly dependent on the available ones, is estimated in the same rationale. In this case, (4) takes the form

$$p(\boldsymbol{u}\_{t}^{l}|\{\mathbf{x}\_{t-1}^{l}\}\_{i=1\prime}^{N}, \mathcal{W}\_{t}) \approx p(\boldsymbol{A}(f(\boldsymbol{\hat{x}}\_{t-1}) - f(\mathbf{x}\_{t-1}^{l})) + A\boldsymbol{v}\_{t} - A\boldsymbol{v}\_{t}^{l} + \boldsymbol{u}\_{t}|\mathcal{W}\_{t}).$$

In any case, missing (parts of) observational errors *u<sup>i</sup> <sup>t</sup>* along with their weights can be substituted by single values, as expected values or modes. Consequently, the initial PF algorithm undergoes a slight change, as presented in Algorithm 2. Further to Remark 2, the substitution of observational errors *u<sup>i</sup> <sup>t</sup>* is implemented after the Importance Sampling step in Algorithm 2 for the sake of simplicity.

#### *3.1. Connection to Markov Systems and Contribution to the Study over Impoverishment*

Impoverishment over the particle samples can be studied in connection with certain Markov models, the Homogeneous or Non-homogeneous Markov Systems (denoted as HMSs or NHMSs, respectively) , which have their roots in [17]. With the consideration of a grid of *d* ∈ N cells over the state domain, problems of impoverishment reduce to a problem concerning the derivation of the distribution of the particle population over the grid cells. Term "grid" denotes here a single partition over the whole state domain. The cells of this grid represent the states of the MS. At every time step, a particle moves from cell *i* (*i* = 1, ... , *d*) to cell *j* (*j* = 1, ... , *d*) with (time-dependent) transition probability *pij*,*h*(*t*), (*h* = 1, ... , *N*) in the general case. However, MS consideration is based on the hypothesis that population members which are situated in the same state move to any cell at the next time step according to a joint transition probability. Thus, for the introduction of MS-theory in the study of particle distribution over the grid, probabilities *pij*,*h*(*t*) are approximated by single quantities *pij*(*t*) for all particles in cell *i* at time point *t*. The fact that PF is applied to dynamical systems entails that different areas of the state domain become of particular interest at different time steps. Therefore, it is preferable for the grid lines not to remain constant over time. A simple time-varying grid is constructed within the simulating example in Section 4, while a more complex structure is provided in [18]. In the simple case that the PF algorithm comprises constant parameters and excludes the resampling step, the corresponding MS can be considered homogeneous, as the particles only move according to a state equation with constant approximating transition probability values. Resampling causes the redistribution of particles over the grid. The probability vectors for this redistribution are defined by the observational errors at every time step. Thus, steps of changing probability vectors are introduced in the MS rendering this MS non-homogeneous. Moreover, the results over the grid of both production of weighted particles on the basis of system (1)–(2) and resampling, at the end of every time step, derive

the results of sums of multinomial trials with varying probability vectors (see also [19], p. 28); this procedure corresponds to the transitions of population members between the state of a MS. In general, the sums of multinomial trials can be considered to follow generalized multinomial distribution [20] or, more generally, Poisson Binomial distribution (which has its roots in [21], §14 ). As the number of particles remains constant, according to Algorithm 1, the MS is considered to be closed. The difficulty in making predictions on the MS lies in the fact that observational errors are not a priori acquired during the filtering procedure.

**Algorithm 2** SIR Particle Filter for missing data with observational error estimation

**Require:** *N*, *q*, *Neff* , *T* Initialize {*x<sup>i</sup>* <sup>0</sup>, *<sup>w</sup><sup>i</sup>* 0} **for** *t* = 1, 2, ..., *T* **do** 1. Importance Sampling Sample *x*˜*<sup>i</sup> <sup>t</sup>* <sup>∼</sup> *<sup>q</sup>*(*xt*|*x<sup>i</sup>* 0:*t*−1, *Wt*) Set *x*˜*<sup>i</sup>* 0:*<sup>t</sup>* = (*x<sup>i</sup>* 1:*t*−1, *<sup>x</sup>*˜*<sup>i</sup> t*), Produce observational error estimations *u*ˆ*<sup>i</sup> <sup>t</sup>* for the missing components *Zt* and calculate importance weights *w*˜*i <sup>t</sup>* ∝ *w<sup>i</sup> t*−1 *<sup>p</sup>*(*yt*|*x*˜*<sup>i</sup> t*)*p*(*x*˜*<sup>i</sup> t*|*xi <sup>t</sup>*−1) *q*(*x*˜*<sup>i</sup> t*|*xi <sup>t</sup>*−1,*yt*) . **end for for** *i* = 1, 2, ..., *N* **do** Normalize weights *w<sup>i</sup> <sup>t</sup>* <sup>=</sup> *<sup>w</sup>*˜*<sup>i</sup> t* ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *<sup>w</sup>*˜*<sup>i</sup> t* 2. Resampling **if** *<sup>N</sup>*ˆ*eff*(*t*) <sup>≥</sup> *NT* **then for** *i* = 1, 2, ..., *N* **do** *xi* 0:*<sup>t</sup>* = *<sup>x</sup>*˜*<sup>i</sup>* 0:*t* **end for else for** *i* = 1, 2, ..., *N* **do** Sample with replacement index *j*(*i*) according to the discrete weight distribution *P*(*j*(*i*) = *d*) = *w<sup>d</sup> <sup>t</sup>* , *d* = 1, ..., *N* Set *x<sup>i</sup>* 0:*<sup>t</sup>* = *x*˜ *j*(*i*) 0:*<sup>t</sup>* and *<sup>w</sup><sup>i</sup> <sup>t</sup>* = <sup>1</sup> *N* **end for end if end for**

In this study, observational errors are substituted by single values for one time step, so that weights can be estimated one step ahead. The set of weights configures the probability vectors of the resampling step. Thus, the distribution of particles over the grid cells can be approximated during the upcoming step of resampling and new Importance Sampling. Thus, the distribution of the particle population can be estimated for the next step on the basis of the estimated weights for the dispersion of the future particles over the grid to be assessed and impoverishment phenomena to be predicted. Such a practice paves the way to the involvement of stochastic control theory [12] (leading to control of asymptotic variability [22]) into the matter of the avoidance of impoverishment.

#### **4. Simulating Example**

#### *4.1. Contribution to the Missing Data Case*

A simulation example is presented in this section to emphasize the benefits of the proposed algorithm. The proposed method is compared with the typical PF algorithm, when the complete dataset is available, and the multiple imputation particle filter (MIPF) for *n* = 5 imputations [8]. The reduction step proposed in [23] is incorporated in the initial

MIPF algorithm for the best possible results to be achieved. The data simulation is based on the state-space model of Equations (1) and (2), with two-dimensional vectors

$$\begin{array}{rcl} \mathbf{x}\_{l} &=& \begin{bmatrix} \mathbf{x}\_{1,t} \\ \mathbf{x}\_{2,t} \end{bmatrix} \\\ \end{array} = \begin{bmatrix} \cos(\mathbf{x}\_{1,t-1} - \mathbf{x}\_{1,t-1}/\mathbf{x}\_{2,t-1}) \\\ \cos(\mathbf{x}\_{2,t-1} - \mathbf{x}\_{2,t-1}/\mathbf{x}\_{1,t-1}) \end{bmatrix} + \begin{bmatrix} \mathbf{v}\_{1,t} \\ \mathbf{v}\_{2,t} \end{bmatrix} \end{array} \tag{9}$$

*yt* = *xt* + *ut*,

where *vt* = *v*1,*<sup>t</sup> v*2,*<sup>t</sup>* ∼ *N*(*μ*, *S*1), *μ* = 0 0 , *S*<sup>1</sup> = 0.05 0 0 0.05 , *ut* ∼ *N*(*μ*, *S*2), *S*<sup>2</sup> = 0.03 0 0 0.03 and *N* symbolizes Gaussian distribution. Let *x*<sup>0</sup> = 1 0.5 be considered known. Next, concerning missing data, we let *Rt*,*<sup>j</sup>* ∼ *Bernoulli*(0.15), *j* = 1, 2, that is the data are missing completely at random. *N* = 100 particles have been used for every filter. The distributions of noises are also considered known. The weighted mean is used as a point estimator of a hidden state and missing observational errors are substituted by their expected values. All the filters have been repeated for 100 times and their performance concerning their precision and consumed time has been recorded. (The code was written in R project [24]. Packages *mvnorm* [25], with its corresponding reference book [26], ggplot [27], and ggforce [28] were also used. Simulations were performed on an AMD A8-7600 3.10 GHz processor with 8 GB of RAM.)

The results of the three methods are shown in Table 1. In the first two columns, the means over the simulations of Root-Mean-Square Errors (RMSE) of the estimators (weighted means) for each component of the hidden states are presented. The mean of the two aforementioned columns is also calculated, as well as the mean time consumed in each approach. In the table, it is shown that the weight estimation with the suggested method outperforms MIPF concerning both precision and time elapsed. The precision of the suggested method supersedes that of MIPF slightly, while the mean required computational time is about 50% less than the corresponding mean time required for MIPF. The proposed method is also compared with the results of the standard PF algorithm, for which all observations are available, and it seems that, even if the precision is inevitably reduced in the case of missing data, the computational time remains nearly the same. The small differentiation in the mean elapsed time is probably connected with the resampling decision. That is, in this example, the precision of the suggested method slightly supersedes its competitor, while its computational cost is much lower than the cost of its competitor, reaching the levels of the basic filter (which is practically infeasible in the missing data case). In Figures 1 and 2, the performances of the proposed method and MIPF are depicted for the two components of the state process, respectively, for one iteration of each filter. The estimators (weighted means) of the two approaches are close to each other, tracking the hidden vector satisfactorily. Therefore, in this example, the suggested method appears to provide the best option between the available ones in the missing data case.

**Table 1.** Comparison of the results over three methods: the basic PF algorithm, when all observations are available; the weight estimation method, which is proposed in this study; and MIPF for *n* = 5 imputations. The methods are compared through the mean of RMSE and the time consumed over the 100 repeated implementations.


**Figure 1.** Time-series of the hidden values, the observations, and the corresponding point estimations of the proposed method and MIPF imputations for the first component *x*1,*<sup>t</sup>* of the state process.

**Figure 2.** Time-series of the hidden values, the observations, and the corresponding point estimations of the proposed method and MIPF imputations for the second component *x*2,*<sup>t</sup>* of the state process.

#### *4.2. Contribution to Impoverishment Prediction*

As far as estimation of particle distribution one step ahead is concerned, an application for the transition of the particles from time point *t* = 9 to *t* = 10 is presented during one implementation of the suggested PF with single imputation for missing values on the available dataset. In the time interval (0,10], only the first component of observation *y*<sup>6</sup> is unavailable. In the end of time step *t* = 9, resampling is implemented and the histograms of the particle sets are exhibited in Figure 3. The sample mean of the particles is *m* = [*m*<sup>1</sup> = 0.947 *m*<sup>2</sup> = 1.07] *<sup>T</sup>* and the standard deviations of the corresponding components are *s*<sup>1</sup> = 0.126 and *s*<sup>2</sup> = 0.127. According to Equation (4) and the given parameters of the problem, the random factor needed to be estimated for every particle at the next time step is

$$z\_t^i = Av\_{t+1} - Av\_{t+1}^i + u\_{t+1} \sim N(0, S\_z), \quad \mathcal{S}\_z = 2\mathcal{S}\_1 + \mathcal{S}\_2 = \begin{pmatrix} \sigma\_z^2 & 0\\ 0 & \sigma\_z^2 \end{pmatrix},$$

where *σ<sup>z</sup>* = 0.36. Thus, in both dimensions, the following partitions are considered,

$$\Pi\_{\rangle} = \{ ( -\infty, m\_{\rangle} - \sigma\_z/2 ), [m\_{\rangle} - \sigma\_z/2, m\_{\rangle} + \sigma\_z/2 ), [m\_{\rangle} + \sigma\_z/2, +\infty ) \}, \quad \not{\} = 1, 2, 3$$

so that a grid of nine cells is configured over the two dimensions. The frequency table (Table 2) exhibits the particle distribution over the grid.

(**a**) Histogram of the particle sample (*i* = 1, ... , *N*) for the posterior estimation of first component *x*1,9.

(**c**) Joint histogram of the particle sample (*i* = 1, ... , *N*) for the posterior estimation of both components of the whole hidden vector *x*9. Red lines delimit the suggested grid cells.The red diamond stands for the hidden state.

**Figure 3.** Histograms of particle samples for the posterior estimation of hidden variable *x*9.

The selected time period was chosen because there is a considerable number of preceding steps that permits a relatively good adaptation of the particle samples over the hidden states and the samples have not yet collapsed to a tiny neighborhood around a single point (utter impoverishment). This argument is evinced in Figure 3c and Table 2, where the distribution of the particles is presented in connection with the hidden state and the suggested grid. The produced particles are both close to the hidden state, as most of them are less than one standard deviation *σ<sup>z</sup>* from it, and sparse enough for the existence of particles outside the central cell of the grid. Thus, the condition of the sample during

these time points configures a typical example of filter implementation before its collapse. Such time points may be suitable starting points for the introduction of control (which exceeds the limits of this study) as the subject sample is in a good condition concerning both impoverishment and accuracy over hidden variable estimation.


**Table 2.** Frequency table of the particle distribution over the suggested grid at the end of *t* = 9.

For the next time step, a prior estimation for hidden vector *x*<sup>10</sup> is implemented. For the formation of the new grid, the existing particles are moved according to the deterministic part of Equation (9), resulting in *x*10(−)=[0.993 0.984] *<sup>T</sup>*, the mean of the new particle set. This quantity constitutes a prior point estimator of the hidden state. Thus, the grid of *t* = 9 is shifted by *x*10(−) − *m* to a new grid, as shown it Table 3, the central cell of which is

$$\begin{aligned} \left[ \mathbf{x}\_{10}(-) \middle| \mathbf{1} \right] - 0.18, \mathbf{x}\_{10}(-) \middle| \mathbf{1} \right] + 0, \mathbf{18} \rangle \times \left[ \mathbf{x}\_{10}(-) \middle| \mathbf{2} \right] - 0.18, \mathbf{x}\_{10}(-) \middle| \mathbf{2} \right] + 0, \mathbf{18} \rangle = \\ \left[ \mathbf{0.813, 1.17} \right] \times \left[ \mathbf{0.804, 1.16} \right]. \end{aligned}$$

The movement of all the particles according to the deterministic part of Equation (9) results in the frequency table in Table 3, where it is shown that all the new particles belong to the central cell. Even though the particles are identically distributed, with the addition of the process noise to the particles, the probabilities for particles to move from the central cell to random ones defer from particle to particle, as the particles have different distances form the grid lines initially. This fact is in contrast to the theoretical background of MS, according to which population members have a common transition probability matrix *P* to move during a time step. For this reason, the probabilities of particles to move to a cell with the addition of the random noise are approximated by the probabilities of the point estimation *x*10(−) to move to a random cell with the addition of noise. These probabilities (rounded values) are provided in Table 4. Thus, the expected numbers of particles over the grid cells are

$$N \ast P = 100 \ast \begin{pmatrix} 0.044 & 0.122 & 0.044 \\ 0.122 & 0.335 & 0.122 \\ 0.044 & 0.122 & 0.044 \end{pmatrix}$$

and the expected distribution of the particles over the grid is presented in Table 5. Concerning the expected posterior distribution of the particles, the expected observational errors are zero, so that particle weights are expected to remain the same. Thus, no further change is expected in their distribution in cells even if resampling is decided to take place, as all weights are equal after resampling in the previous time step.

**Remark 3.** *The expected values of observational errors are zero. Nevertheless, the prior estimation of their distribution according to relation (4) and model parameters, where the variances of the errors are presented, evinces the increased uncertainty for them, as σ*<sup>2</sup> *<sup>u</sup>* = 0.03 *while σ*<sup>2</sup> *<sup>z</sup>* = 0.13*.*


**Table 3.** Frequency table of the particle distribution over the suggested grid at *t* = 10 when the particles *x<sup>i</sup>* <sup>9</sup>, *i* = 1, . . . , 100, move only according to the deterministic part of Equation (9).

**Table 4.** Transition probability table for *x*10(−) to move with the addition of process noise *v*<sup>10</sup> .


**Table 5.** Frequency table for the expected numbers of the particles over the grid cells after the addition of process noise realizations at *t* = 10.


The results of the implementation of PF at time step *t* = 10 are also exhibited. Resampling has taken place at this time step as well. The joint histogram of the posterior sample over both dimensions (Figure 4) indicates that the majority of the particles do not belong to the central cell. This fact is reasonable, as the length of the sides of the central cell equals only one standard deviation *σz*, so that prior probabilities for the particles to be placed outside of the central cell at this time point are considerably big according to the Empirical Rule (68-95-99.7) for normal distribution. For the consolidation of these results towards this rule, the orange squares are drawn in Figure 4 for the corresponding areas of the rule to be defined for each separate dimension, while the orange circles are the corresponding standard deviation circles (and not ellipses, generally, as the two components have the same variance *σ*<sup>2</sup> *<sup>z</sup>* = 0.13) of the whole vector. Thus, questions on the suitability of the proposed grid structure are raised for future study. Nevertheless, it should be mentioned that a grid with a central cell of double side length would have classified all particles to the central cell during time step *t* = 9, rendering further study on the issue meaningless. Additionally, a new grid of nine cells is also constructed around the mean of this posterior particle set, the central cell of which also has length *σz*. The distribution of the particles in the new grid is quoted in Table 6. In comparison with Table 2, it seems that the number of particles in the central cell is increased in Table 6.


**Table 6.** Frequency table of the particle distribution over the grid around the mean of the sample in the end of *t* = 10.

**Figure 4.** Joint histogram of the particle sample (*i* = 1, ... , *N*) for the posterior estimation of both components of the whole hidden vector *x*10. Red lines delimit the suggested grid cells of Table 3. The red diamond stands for the hidden state. The red star stands for the observation at this time point. The sides of the two squares are correspondingly one and two standard deviations *σz* from the center of the diagram. The circles are inscribed in the corresponding squares.

**Remark 4.** *In the present example, the transitions of the particles according to the deterministic function led all particles to a single cell (Table 3), so that the result of the addition of process noise was handled as a result of a multinomial trial. In the case that the deterministic function leads the particles to more than one cell, then it is suggested that different means be found for each cell as well as corresponding transition probabilities, so that the final result can be considered the sum of results of multinomial trials for the transitions to every cell.*

#### **5. Discussion and Conclusions**

In this study, single substitution (in contrast to MIPF) of observational errors is proposed for missing data cases, when PF is implemented and MAR assumption is adopted. This method is a single imputation procedure. Acuña et al. [23] argued against single imputation, as it is rather simplistic and it cannot attribute to a single value the distributional characteristics that can be approached and described by a sample of multiple imputation. Nevertheless, the primary target of the proposed technique is the minimization of the computational cost that is added to the initial PF algorithm, in the case of missing data. For this purpose, interventions in the PF algorithm are slight. Moreover, in the provided simulation, the suggested method outperforms the multiple imputation approach even for a considerable number of imputations, whereas Acuña et al. [23] noticed that MIPF with *n* = 3, 4, 5 imputations produces very satisfactory results, according to the approximation of multiple imputation estimator of efficiency provided by Rubin [29]. As a result, in this example, estimation of observational errors performs better with respect to both the computational time it requires and the precision it achieves. Besides, knowledge on the distribution of the observational errors contributes to the quantification possibility of the uncertainty over the point estimations. Thus, the suggested method takes advantage of the low computational cost of the single imputation option, while the study of more general

distributional characteristics of the observational noise can also be taken into account at the same time (see Remark 3).

The contribution of such a method in order to cope with impoverishment problems is also worth mentioning. This method permits the estimation of observational errors and their corresponding weights one time step forward. The evolution of weight distribution has not been a priori estimated for multiple time steps yet, to the best of the authors' knowledge, but this is feasible at least for one step ahead. As the weights of the next step can be estimated, the probabilities that a particle will be chosen at the resampling step can also be estimated. As explained in Section 3.1, the assessment of weight distribution for the forthcoming time steps could be very interesting, as far as it is connected with impoverishment issues. Concerning future perspectives over this issue, the study of impoverishment problems can be implemented with the use of input control [12], in order for the impoverishment to be controlled; laws of large numbers [30], as MC approximation employs large samples; state capacity restrictions [31]; for the existence of a population limit at every grid cell; literature on the evolution of attainable structures [32]; the evolution of the distribution of particles [33] or of the corresponding moments [34] in the direction of HMSs and NHMSs; and for the estimation of the future behaviour of the sample, possibly reaching continuous-time models [35]. Research on automatic optimal control [36] could be combined with the suggested methodology, possibly leading to interesting joint applications of PF [37] along with artificial intelligence [38]. The performance of the method could also be tested when data are missing for longer time periods [39], while more sophisticated grid structures could also be examined [18]. Correspondingly, in a broader sense, the main idea of the proposed method could be implemented in the errorsin-variables signal processing for missing data cases [40], or it could be involved in more complex models that require MC simulation for the prior estimation of variables [41].

**Author Contributions:** Conceptualization, R.L. and G.T.; methodology, R.L. and G.T.; software, R.L.; validation, R.L. and G.T.; formal analysis, R.L. and G.T.; writing—original draft preparation, R.L.; writing—review and editing, G.T.; supervision, G.T.; and funding acquisition, R.L. and G.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by A.G. Leventis foundation (Grant for Doctoral Studies).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors would like to express their gratitude to Panagiotis-Christos Vassiliou and Andreas C. Georgiou for the opportunity to submit to this Special Issue for possible publication. The constructive comments and the valuable suggestions of the three anonymous reviewers as well as the editorial assistance of Bella Chen are highly appreciated.

**Conflicts of Interest:** The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

PF Particle Filter MC Monte Carlo MIPF Multiple Imputation Particle Filter MAR Missing At Random


#### **References**

