Information Geometric Approach to Recursive Update in Nonlinear Filtering

Li, Yubo; Cheng, Yongqiang; Li, Xiang; Hua, Xiaoqiang; Qin, Yuliang

doi:10.3390/e19020054

Open AccessArticle

Information Geometric Approach to Recursive Update in Nonlinear Filtering

School of Electronic Science and Engineering, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Entropy 2017, 19(2), 54; https://doi.org/10.3390/e19020054

Submission received: 25 November 2016 / Revised: 14 January 2017 / Accepted: 20 January 2017 / Published: 26 January 2017

(This article belongs to the Special Issue Information Geometry II)

Download

Browse Figures

Versions Notes

Abstract

:

The measurement update stage in the nonlinear filtering is considered in the viewpoint of information geometry, and the filtered state is considered as an optimization estimation in parameter space has been corresponded with the iteration in the statistical manifold, then a recursive method is proposed in this paper. This method is derived based on the natural gradient descent on the statistical manifold, which constructed by the posterior probability density function (PDF) of state conditional on the measurement. The derivation procedure is processing in the geometric viewpoint, and gives a geometric interpretation for the iteration update. Besides, the proposed method can be seen as an extended for the Kalman filter and its variants. For the one step in our proposed method, it is identical to the Extended Kalman filter (EKF) in the nonlinear case, while traditional Kalman filter in the linear case. Benefited from the natural gradient descent used in the update stage, our proposed method performs better than the existing methods, and the results have showed in the numerical experiments.

Keywords:

information geometry; nonlinear filtering; Kalman filter; Bayesian filtering; natural gradient descent

1. Introduction

Nonlinear filtering is a significant issue in the field of signal processing, such as target tracking, navigation and audio signal processing. However, there is no close form solution for nonlinear filtering similar to the Kalman filter for linear and Gaussian scenarios. How to deal with the nonlinear state propagation or nonlinear measurement is crucial in the research of nonlinear filtering. Various nonlinear filtering algorithms have been proposed and widely used in the practical applications, such as Extended Kalman filter (EKF) [1], Unscented Kalman filter (UKF) [2], Cubature Kalman filter (CKF) [3], and Particle filter (PF) [4]. Among these methods, EKF, UKF and CKF can be classified into the same catalog, which also has been called as KF-type method that have the certain nonlinear approximation to utilize the common frame of traditional Kalman filter to achieve the goal. Some relations among these KF-type method are given in [5]. The EKF utilizes the Taylor Series Expansion to exploit the analytical structure of nonlinear functions, which also be called as analytical linearization. Similarly, the UKF or CKF which exploit the statistical properties of Gaussian variables that undergo nonlinear transformations, which known as statistical linearization. While the PF is another catalog, which uses the Monte Carlo technique to approximate the PDF. It obtains a weighted sample from the posterior PDF, and provides an asymptotically exact approximation of the posterior PDF as the sample size tends to infinity [6]. In particular, all of these methods can be unified in the framework of Bayesian filtering [7] to induce the optimal estimation of state by means of computing the posterior PDF, i.e., the density of state conditional on the corresponding measurements.

As for the Bayesian filtering, the procedure can be split into two steps: time propagation and measurement update, which also have been called prediction and correction steps in same articles. In the nonlinear and non-Gaussian case, the state propagation step is relatively simple as it can be obtained by approximating the first two moments of the state variable that undergoes a transformation along the state transition function. But the measurement update is more difficult and challenging. Firstly, the measurement is proceeded not easily in itself, with nonlinear and/or non-Gaussian conditions. Secondly, the error which brings by the approximation of state propagation becomes more seriously under the nonlinear measurements. Thirdly, the measurement update step has associated state with measurement, and the posterior PDF of state has to be computed conditional on the measurement with noisy. Therefore, the posterior PDF is the key way to tackle this issue, and it attracted extensive attention in the research. Usually, Gaussian approximation to posterior PDF is used in the measurement update because of the analytically intractable in non-Gaussian scenario [8]. There are two most popular estimators being utilized in the estimation for posterior PDF as the criterion. The one is the linear minimum mean square error (LMMSE) estimator, which approximates the posterior mean and covariance matrix by its estimator and its mean square error matrix, respectively. Based on the LMMSE, Zanetti [9,10] proposes a recursive update method for the measurement update, which overcomes some of the limitations of the EKF. In addition, the famous methods, such as EKF, UKF and CKF can be classified in this class [11]. Another is the maximum a posteriori (MAP) estimator, which estimates the posterior mean and obtain the covariance matrix by linearizing the measurement function around the MAP estimate. As the most well-known iterated EKF (IEKF), which based on the Gauss-Newton optimization [12] or Levenberg-Marquardt (LM) method [13], is categorized in this class. Usually, these iterative methods for nonlinear filter have the better performance. As Lefebvre [14] has shown that IEKF outperform the EKF, unscented Kalman filter (UKF) for implementing the measurement update of the covariance matrix. Besides, there are some other methods to analysis and approximate the posterior PDF recently, such as the Kullback-Leibler divergence (KLD) as metric to analysis the performance from the true joint posterior PDF of the state conditional on the measurement to approximation posterior PDF [15]. This metric can be used to devise new algorithms, such as the iterated posterior linearization filter (IPLF) [8] which can be seen as an approximate recursive KLD minimization procedure.

In particular, when we consider the posterior PDF parameterized by the estimators in itself, some natural characters may attract our attentions and provides a new viewpoints on the estimation. During the posterior PDF approximated recursively by the MAP estimator, a family of the posterior PDFs parameterized by the estimators can construct a statistical manifold, which a Riemannian manifold of probability distributions. Thus, the better approximation of the true posterior PDF can be viewed as the search for the optimum in the statistical manifold. Usually, the search along the direction of conventional gradient can obtain the optimum with the faster speed and better convergence in Euclidean space, but not for the statistical manifold. For the sake of utilizing gradient in statistical manifold, Amari [16,17] has proposed the natural gradient descent for the search direction traditionally motivated from the perspective of information geometry, and it has been proved that steps along the direction of natural gradient descent is the steepest descent in the Riemannian manifold [18]. Based on the direction of steepest descent, the natural gradient descent can obtain the optimum in manifold with faster speed and better convergence. Because of these properties, it has been acted a new tool to analysis the nonlinear in the statistical problems with an optimization perspective. Further, it provides some geometric information, such as Riemannian metrics, distances, curvature and affine connections [19], which may be improve the performance of estimation. To characterize this geometric information, there are two important elements should be considered. The one is that considering the set of PDF as a statistical manifold, and another is the Fisher Information Matrix as the metric of the statistical manifold. With the defining metric by information geometry, the natural gradient descent can be constructed as the product of inverse of Fisher Matric and the conventional steepest. After it has been proposed, natural gradient descent works well for many applications as an alternative to stochastic gradient descent, such as neural network [20,21], Blind Separation [22], evolution strategy [23,24], and stochastic distribution control systems [25].

Motivated by the recursively estimation for nonlinear measurement update in the Bayesian filtering, we propose a method by using the natural gradient descent method in the statistical manifold constructed by the posterior PDF for measurement update in the nonlinear filtering. After the procedure of the state propagation, the prediction state can be as a prior information in the Bayesian framework. The statistical manifold will be constructed by the posterior PDF. We consider a geometric structure of a this manifold equipped with the Fisher metric, and construct an alternate recursive process on the manifold. Given an initial point in the manifold based on the prior information, the iterated method to seek the optimum can been achieved along the natural gradient steepest descent. At each iteration, the algorithm moves from the current estimate to a new estimate along the geodesic in the direction of the natural gradient descent with a step size. Our method has the better performance compared with the update of EKF, which occur an overshooting in some nonlinear case. In addition, it has also given the theoretical justification for the nonlinear measurement update in the filtering from the information geometry insight and a mathematical interpretation of this method. There are two advantages in the estimation procedure benefitting from the natural gradient descent method. Firstly, based on the fact that the Fisher information matrix (FIM) is the inverse of Cramer-Rao Limit Bounds(CRLB) with respect to the estimation, the natural gradient descent, which has used FIM as metric in the statistical manifold, may achieve the better performance of estimation. Secondly, the natural gradient descent is asymptotically Fisher efficient and often converges faster than the conventional gradient descent shown in [16]. With the better performance and fast convergence, the measurement update with natural gradient descent may be achieved better and faster. Furthermore, based on this measurement update stage, we can construct the different nonlinear filtering method with different state prediction method.

The paper is organized as follows. In Section 2, we give a description for estimation in the statistical manifold and derive the iteration estimation procedure in the viewpoint of information geometry. Then the measurement update using natural gradient descent has deduced in the Section 3. The differences between our proposed method and other existing methods have be discussed in the Section 4. In Section 5, the numerical experiments are presented to illustrate the performance compared with other existing methods. Finally, conclusions are made in Section 6.

2. Information Geometry and Natural Gradient Descent

Information geometry [17] is a new mathematical tool for the study of manifolds of probability distributions. It opens a new prospective to study the geometric structure of information theory and provides a new way to deal with existing statistical problems. It has the powerful ability to handle the non-Gaussian PDF case, such as Weibull distributions [19,26], gamma distributions [27], multivariate generalized Gaussian distribution [28], and non-Euclidean case, for example, the statistical manifold [29], the morphogenetic system [30]. For the optimization problems in statistical signal processing, the Natural gradient descent traditionally motivated from the perspective of information geometry and gradient descent method, has become an alternative to recursive estimation in the statistical manifold. It has a better performance and faster convergence.

Usually in the Euclidean space, the recursive method for estimation can be along the direction of steepest descent in which is relatively straightforward, but it is not the straightforward direction in the Riemannian geometry because of the curved coordinate system. In this case, the curvature of the manifold should be considered, and the steepest gradient descent should follow the curvature of the manifold. This is the important factor for estimation in the manifold, which has been considered in the natural gradient descent method.

In the view of information geometry, the initial point and the target point in the Riemannian manifold are corresponding with the initial and final estimation in the parameters space, respectively. Thus the estimation of parameters can be converted to the recursive procedure seeking optimum in the manifold. With the direction provided by the natural gradient descent, the recursive in the Riemannian manifold can obtain the optimization estimation of parameters.

In a Riemannian manifold, the Riemannian metric

g_{p}

is used at every point p, which describes the relationship between the tangent space and its neighborhood tangent space. If the tangent vectors at point p base on the basis

v_{p}^{1}, ..., v_{p}^{n}

, then the Riemannian metric can be obtained as

g_{p}^{i j} = 〈 v_{p}^{i}, v_{p}^{j} 〉

, where

〈 \cdot, \cdot 〉

denotes an inner product defined on the tangent space as the same as in the Euclidean space. With these definitions, the inner product of tangent space at point p can be induced as

{〈 w_{p}, w_{q} 〉}_{g} = w_{p}^{T} g w_{q}

, where

w_{p} \in T_{p} M

and

w_{q} \in T_{p} M

. Given the description of tangent basis

w_{p} = \sum_{i = 1}^{n} α_{i} v_{p}^{i}

and

w_{q} = \sum_{j = 1}^{n} β_{j} v_{p}^{j}

, their inner product will be

{〈 w_{p}, w_{q} 〉}_{g} = \sum_{i, j = 1}^{n} g_{p}^{i j} α_{i} β_{j}

.

Consider a family of probability distributions

S

on

R^{d}

parameterized by n real-valued variables

θ = {[θ_{1} \dots θ_{n}]}^{T}

as

\begin{matrix} S = {p_{θ} = p (y; θ) | θ \in Θ} \end{matrix}

(1)

where

y \in R^{d}

is a random variable and Θ is a open subset of

R^{n}

. The mapping

φ : S \to R^{n}

defined by

φ (p_{θ}) = θ

can be viewed as a coordinate system of

S

. With the Fisher information matrix (FIM)

\begin{matrix} F (θ) = \int \frac{\partial log p (y; θ)}{\partial θ} {(\frac{\partial log p (y; θ)}{\partial θ})}^{T} p (y; θ) d y \end{matrix}

(2)

as the Riemannian metric

G (θ)

in the view of information geometry, which also termed Fisher metric, the

S

can be considered as a Riemannian manifold. Thus the

S

can be called an n-dimensional statistical manifold on

R^{d}

, which the parameter θ plays the role of the coordinate system for

S

. The element of the Riemannian metric

G (θ) = [g_{i j} (θ)]

can be written as the follow formation

\begin{matrix} g_{i j} (θ) = E [\frac{\partial log p (y; θ)}{\partial θ_{i}} \frac{\partial log p (y; θ)}{\partial θ_{j}}] \end{matrix}

(3)

where

E

denotes the expectation with respect to y. The Fisher metric is the only invariant metric to be given to the statistical manifold [31]. Because of the fact that

\begin{matrix} E [\frac{\partial log p (y; θ)}{\partial θ_{i}}] & = \int \frac{\partial log p (y; θ)}{\partial θ} p (y; θ) d y \\ = \frac{\partial}{\partial θ} \int p (y; θ) d y = 0 \end{matrix}

(4)

the FIM can also be expressed in terms of the expectation of Hessian matrix of log-likelihood through algebraic manipulations, which may be easier to compute for certain problems.

\begin{matrix} E [\frac{\partial log p (y; θ)}{\partial θ_{i}} \frac{\partial log p (y; θ)}{\partial θ_{j}}] = - E [\frac{\partial^{2} log p (y; θ)}{\partial θ_{i} \partial θ_{j}}] \end{matrix}

(5)

Consider the smooth mapping

ℓ : S \to R

, then a set of basis of tangent space of

S

can be given as

{\partial ℓ (θ) / \partial θ_{i}}_{i = 1, \dots, n}

. For example, when the usual log-likelihood functions

l o g : S \to R

as the mapping, the set of basis of tangent space of

S

is

{\partial log p (y; θ) / \partial θ_{i}}_{i = 1, \dots, n}

. This definition has established the local coordinate systems of statistical manifold

S

. Let P and Q be two close points on statistical manifold

S

corresponding to the coordinates

ℓ (θ^{p})

and

ℓ (θ^{q}) = ℓ (θ^{p} + Δ θ)

. Assume that the vector

\vec{P Q} \in T_{P} S

has a fixed length, namely

\begin{matrix} {∥ P Q ∥}^{2} = ε^{2} \end{matrix}

(6)

where ε is a sufficiently small positive constant, and a neighborhood ball can be defined according to the constant ε. Then one can obtain the relationship in the tangent space as

\begin{matrix} \vec{P Q} = ε v \end{matrix}

(7)

where

v = \sum_{i = 1}^{n} a_{i} \frac{\partial ℓ (θ)}{\partial θ_{i}} |_{θ = θ_{p}} \in T_{P} S

. This relationship can be considered as a tangent vector of

T_{P} S

at the point P in the view of geometry. For normalizing the tangent vector, the tangent vector satisfies the constraint

{∥ v ∥}^{2} = 1

. In the statistical manifold, the tangent vector is with respect to the random variable y. As the description

a = {(a_{1}, \dots, a_{n})}^{T}

is denoted and the expectation is used in the constraint, we can get the relationship

\begin{matrix} {∥ v ∥}^{2} & = a^{T} E [\begin{matrix} \frac{\partial ℓ (θ)}{\partial θ_{1}} \frac{\partial ℓ (θ)}{\partial θ_{1}} & \dots & \frac{\partial ℓ (θ)}{\partial θ_{1}} \frac{\partial ℓ (θ)}{\partial θ_{n}} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial ℓ (θ)}{\partial θ_{n}} \frac{\partial ℓ (θ)}{\partial θ_{1}} & \dots & \frac{\partial ℓ (θ)}{\partial θ_{n}} \frac{\partial ℓ (θ)}{\partial θ_{n}} \end{matrix}] a \\ = a^{T} G a = 1 \end{matrix}

(8)

where

G

is the Fisher metric of the statistical manifold

S

.

Considering two points P and Q in

S

, which have been mapped into the Euclidean space as

ℓ (θ^{p})

and

ℓ (θ^{q})

, we can obtain the approximation formation as

\begin{matrix} ℓ (θ^{q}) - ℓ (θ^{p}) & = {(θ^{p} + Δ θ - θ^{p})}^{T} \nabla ℓ (θ^{p}) \\ = Δ θ^{T} \nabla ℓ (θ^{p}) \end{matrix}

(9)

where

\begin{matrix} \nabla ℓ (θ^{p}) {= {(\frac{\partial ℓ (θ)}{\partial θ_{1}}, \dots, \frac{\partial ℓ (θ)}{\partial θ_{n}})}^{T}|}_{θ = θ^{p}} \end{matrix}

(10)

For simplifying the description, the tangent vector

v

can be rewritten as

\begin{matrix} v & = \sum_{i = 1}^{n} a_{i} \frac{\partial ℓ (θ)}{\partial θ_{i}} = a^{T} \nabla ℓ (θ) \end{matrix}

(11)

In the tangent space of the manifold S at the point P, the two points satisfy the relationship

\begin{matrix} \vec{P Q} & = ℓ (θ^{q}) - ℓ (θ^{p}) \\ = ℓ (θ^{p} + Δ θ) - ℓ (θ^{p}) \end{matrix}

(12)

Combining the Equations (7), (9) and (12), we can get the relationship between the parameter space and the tangent space of manifold

\begin{matrix} Δ θ = ε a \end{matrix}

(13)

To obtain the parameter

a

in the tangent space under the constraint (8), we use the Lagrange function method

\begin{matrix} F (a, λ) = ℓ (θ^{p}) + ε a^{T} \nabla ℓ (θ^{p}) + λ (1 - a^{T} G a) \end{matrix}

(14)

where λ is the Lagrange multiplier. Then we have

\begin{matrix} \frac{\partial}{\partial a_{i}} {ε a^{T} \nabla ℓ (θ^{p}) + λ (1 - a^{T} G a)} = 0 \end{matrix}

(15)

Solving this equation, one obtains

\begin{matrix} a = \frac{ε}{2 λ} G^{- 1} \nabla ℓ (θ^{p}) \end{matrix}

(16)

Substituting the Equation (16) into (8), we obtain the conclusion in equation

\begin{matrix} 1 = a^{T} G a & = {(\frac{ε}{2 λ} G^{- 1} \nabla ℓ (θ^{p}))}^{T} G (\frac{ε}{2 λ} G^{- 1} \nabla ℓ (θ^{p})) \\ = \frac{ε^{2}}{4 λ^{2}} \nabla ℓ {(θ^{p})}^{T} G^{- 1} \nabla ℓ (θ^{p}) \end{matrix}

(17)

Since

\nabla ℓ {(θ^{p})}^{T} G^{- 1} \nabla ℓ (θ^{p})

is positive, the value of multiplier λ can be computed as

\begin{matrix} λ = \frac{ε}{2} \sqrt{\nabla ℓ {(θ^{p})}^{T} G^{- 1} \nabla ℓ (θ^{p})} \end{matrix}

(18)

Then, the relationship between point P and Q in the tangent space of manifold

S

can be obtained.

\begin{matrix} ℓ (θ^{p} + Δ θ) = ℓ (θ^{p}) + ε a^{T} \nabla ℓ (θ^{p}) \end{matrix}

(19)

Simultaneously, the correspondence relationship in the parameter space

Θ \subset R^{n}

is

\begin{matrix} θ^{q} & = θ^{p} + Δ θ = θ^{p} + ε a \\ = θ^{p} + \frac{ε}{\sqrt{\nabla ℓ {(θ^{p})}^{T} G^{- 1} \nabla ℓ (θ^{p})}} G^{- 1} \nabla ℓ (θ^{p}) \end{matrix}

(20)

where the

G^{- 1}

is the inverse of the Fisher metric

G

. In particular, the representation

G^{- 1} \nabla ℓ (θ)

is called natural gradient of the mapping ℓ in the Riemannian manifold. Compared with the steepest direction

\nabla ℓ (θ)

in the Euclidean space, the natural gradient introduces the inverse of Fisher metric

G

to describe the curvature of Riemannian manifold, and it is invariant under the choice of coordinate system.

As the simplification form of Amari suggested in the article [16], the Equation (20) can be written as

\begin{matrix} θ^{q} = θ^{p} + η G^{- 1} (θ^{p}) \nabla ℓ (θ^{p}) \end{matrix}

(21)

where the parameter satisfies

0 < η \leq 1

and it controls the convergence speed.

The geometric interpretation of this method is that the process path has moved from point P to point Q in the manifold

S

along the steepest direction, while the parameter has moved from

θ^{p}

to

θ^{q}

in parameter space. As the process path has been moving in the manifold

S

, the parameter has been estimated recursively. The recursive formation as follow

\begin{matrix} θ^{t + 1} = θ^{t} + η G^{- 1} (θ^{t}) \nabla ℓ (θ^{t}) \end{matrix}

(22)

where t denotes the times of iterative procedure in which correspond with the iterated points in the statistical manifold. This iteration procedure has been illustrated in Figure 1.

For the recursive process, we can use KLD, which is often used to show the discrepancy between two probability distributions, as the stopping criterion. It is defended by

\begin{matrix} D [θ^{p} : θ^{q}] = \int p (y; θ^{p}) log \frac{p (y; θ^{p})}{p (y; θ^{q})} d y \end{matrix}

(23)

where

p (y; θ^{p})

and

p (y; θ^{p})

as the specification with respect to two parameters

θ^{p}

and

θ^{q}

, respectively. When the two probability distributions are infinitesimally close, corresponding with

θ^{q} = θ^{p} + d θ

, the KLD between two nearby distributions can be expanded as

\begin{matrix} D [θ^{p} : θ^{p} + d θ] = \frac{1}{2} d θ^{T} G (θ) d θ \end{matrix}

(24)

Motivated by the relationship between the statistical manifold and the parameter space, the estimation of state conditional on the measurement in state parameter space can be converted to the iteration procedure on the manifold in which constructed by the posterior PDF. This method provides a new alternative for the measurement update in nonlinear filtering based on the Bayesian framework.

3. Filtering Based on Information Geometry

In this section, we present the ideas of measurement update based on the natural gradient descent. In Bayesian framework, the predicted state acts as the prior information with respect to the state to deduce the posterior distribution under the condition of measurements. Then the natural gradient descent method has been used to estimation the final state recursively. Here, we consider the usual scenario which discrete-time state space as follow

\begin{matrix} x_{k} & = f (x_{k - 1}) + w_{k} \end{matrix}

(25)

\begin{matrix} y_{k} & = h (x_{k}) + v_{k} \end{matrix}

(26)

where

x_{k} \in R^{n}

and

y_{k} \in R^{m}

denote the system discrete-time state and measurement at the instant k respectively, f and h denote the state transition and measurement functions.

w_{k}

and

v_{k}

are the noise correspondence with state and measurement respectively. Generally the additive white Gaussian noise (AWGN) has been studied for simplicity.

In Bayesian filtering, the time propagation can be processed according to the state transition function, and the prediction of state can be achieved in this step. Based on the Bayesian principle, the prediction of state is considered as the prior information about the state.

\begin{matrix} p ({\hat{x}}_{k}^{-}) & = p (x_{k} | y_{k - 1}) \\ = \int p (x_{k} | x_{k - 1}) p (x_{k - 1} | y_{k - 1}) d x_{k - 1} \\ = \int p (x_{k} | x_{k - 1}) p (x_{k - 1} | y_{k - 1}) d x_{k - 1} \end{matrix}

(27)

where

y_{k - 1} = {y_{k - 1}, y_{k - 2}, \dots, y_{1}}

. The relationship

p (x_{k - 1} | y_{k - 1}) = p (x_{k - 1} | y_{k - 1})

means that the current state only depends on current measurement and is conditionally independent from all past measurements. The PDF

p (x_{k} | x_{k - 1})

and

p (x_{k - 1} | y_{k - 1})

denote the state transition PDF and posterior PDF of state for last time, respectively. To aim at obtaining the posterior PDF of current time under the condition of current measurement, we can use the Bayesian principle for deducing. The procedure of the measurement update in the form of Bayesian filtering as

\begin{matrix} p ({\hat{x}}_{k}^{+}) & = p (x_{k} | y_{k}) \\ = \frac{p (x_{k}, y_{k} | y_{k - 1}) p (y_{k - 1})}{p (y_{k} | y_{k - 1}) p (y_{k - 1})} \\ = \frac{p (y_{k} | x_{k}, y_{k - 1}) p (x_{k} | y_{k - 1})}{\int p (y_{k} | x_{k}, y_{k - 1}) p (x_{k} | y_{k - 1}) d x_{k}} \\ = \frac{p (y_{k} | x_{k}) p (x_{k} | y_{k - 1})}{\int p (y_{k} | x_{k}) p (x_{k} | y_{k - 1}) d x_{k}} \end{matrix}

(28)

since the likelihood

p (y_{k} | x_{k}, y_{k - 1}) = p (y_{k} | x_{k})

, i.e., the measurement

y_{k}

is conditionally independent of all past measurements. The denominator of the posterior PDF is the normalizing constant independent of the state. Thus the posterior PDF can be rewrite as

\begin{matrix} p ({\hat{x}}_{k}^{+}) & = p (x_{k} | y_{k}) \\ \propto p (y_{k} | x_{k}) p (x_{k} | y_{k - 1}) \\ = p (y_{k} | x_{k}) p ({\hat{x}}_{k}^{-}) \end{matrix}

(29)

where ∝ means “is proportional to”.

p ({\hat{x}}_{k}^{-})

and

p (y_{k} | x_{k})

are the prior and the likelihood respectively. In this paper, we assume that they are all Gaussian distribution for simplicity, which the Gaussian approximation has been used for non-Gaussian distribution and beyond our paper. Based on the procedure of state propagation, the prior distribution can be described as

\begin{matrix} p ({\hat{x}}_{k}^{-}) = \frac{exp [- \frac{1}{2} {(x_{k} - {\hat{x}}_{k}^{-})}^{T} P^{- 1} (x_{k} - {\hat{x}}_{k}^{-})]}{\sqrt{{(2 π)}^{n} | P |}} \end{matrix}

(30)

Usually, the likelihood with respect to the measurement is chosen as

\begin{matrix} p (y_{k} | x_{k}) = \frac{exp [- \frac{1}{2} {(y_{k} - h (x_{k}))}^{T} R^{- 1} (y_{k} - h (x_{k}))]}{\sqrt{{(2 π)}^{m} | R |}} \end{matrix}

(31)

where

| \cdot |

is the determinant, n and m are the number of dimension of state and measurement, respectively.

In the stage of measurement update, the goal is to obtain the optimal estimation of state under the conditions of the corresponding measurement and the prior information of state. From the Bayesian perspective, it is the maximum a posterior PDF of the prior and the likelihood. This MAP method has been used in the estimation for statistical signal processing extensively. With the optimization view, the solution that maximizes

p ({\hat{x}}_{k}^{+})

is equivalent to minimizing its negative log likelihood function. The negative log operation of posterior distribution is

\begin{matrix} L (x_{k}) & = - log p ({\hat{x}}_{k}^{+}) \\ = \frac{1}{2} {(x_{k} - {\hat{x}}_{k}^{-})}^{T} P^{- 1} (x_{k} - {\hat{x}}_{k}^{-}) + \frac{1}{2} (y_{k} - h (x_{k}))^{T} R^{- 1} (y_{k} - h (x_{k})) + C \end{matrix}

(32)

where C is a constant which not effect the estimation of state

x_{k}

. Defining the negative log likelihood function

\begin{matrix} ℓ (x_{k}) & = \frac{1}{2} {(x_{k} - {\hat{x}}_{k}^{-})}^{T} P^{- 1} (x_{k} - {\hat{x}}_{k}^{-}) + \frac{1}{2} {(h (x_{k}) - y_{k})}^{T} R^{- 1} (h (x_{k}) - y_{k}) \end{matrix}

(33)

thus the object function can be convert into

\begin{matrix} max_{x_{k}} p ({\hat{x}}_{k}^{+}) = min_{x_{k}} ℓ (x_{k}) \end{matrix}

(34)

Based on the posterior PDF, the statistical manifold

S

can be defined as

\begin{matrix} S = {p (x_{k} | y_{k})} \end{matrix}

(35)

Meanwhile, the negative log function as the mapping from

S

to

R

, i.e.,

ℓ : S \to R

has constructed the coordinate system.

Let

ℓ_{1}

and

ℓ_{2}

stand for the first and second part of the Equation (33),

e_{y} = h (x_{k}) - y_{k}

and

e_{x} = x_{k} - {\hat{x}}_{k}^{-}

for the error of measurement and the state estimation respectively. Computing the first derivative of

ℓ_{1}

and

ℓ_{2}

about the

x_{k}

and

x_{k}^{T}

respectively

\begin{matrix} \frac{\partial ℓ_{1}}{\partial x_{k}} & = {(\partial h)}^{T} R^{- 1} e_{y} \end{matrix}

(36)

\begin{matrix} \frac{\partial ℓ_{1}}{\partial x_{k}^{T}} & = e_{y}^{T} R^{- 1} (\partial h) \end{matrix}

(37)

\begin{matrix} \frac{\partial ℓ_{2}}{\partial x_{k}} & = P^{- 1} e_{x} \end{matrix}

(38)

\begin{matrix} \frac{\partial ℓ_{2}}{\partial x_{k}^{T}} & = e_{x}^{T} P^{- 1} \end{matrix}

(39)

where

\partial h = \frac{\partial h (x_{k})}{\partial x_{k}} \in R^{m} \times R^{n}

denotes the Jacobian of

h (x_{k})

about

x_{k}

. The covariance matrix

R^{- 1}

and

P^{- 1}

are symmetry. Thus the gradient of the negative log likelihood is

\begin{matrix} \nabla ℓ (x_{k}) = {(\partial h)}^{T} R^{- 1} e_{y} + P^{- 1} e_{x} \end{matrix}

(40)

Then the Fisher metric

G

of manifold

S

can be obtained

\begin{matrix} G (x_{k}) & = E [\frac{\partial ℓ}{\partial x_{k}} {(\frac{\partial ℓ}{\partial x_{k}})}^{T}] = - E [\frac{\partial^{2} ℓ}{\partial x_{k} \partial x_{k}^{T}}] \\ = - E [\frac{\partial}{\partial x_{k}} ({e_{y}}^{T} R^{- 1} (\partial h) + e_{x}^{T} P^{- 1})] \\ = - E [e_{y}^{T} R^{- 1} (\partial^{2} h) + {(\partial h)}^{T} R^{- 1} (\partial h) + P^{- 1}] \\ = - {(\partial h)}^{T} R^{- 1} (\partial h) - P^{- 1} \end{matrix}

(41)

where

\partial^{2} h = \frac{\partial^{2} h (x_{k})}{\partial x_{k} \partial x_{k}^{T}} \in R^{m} \times R^{n} \times R^{n}

and

E

denotes the expectation with respect to the measurement

y_{k}

. In the above deducing procedure, the follow principle has been used

\begin{matrix} D_{x} (f {(x)}^{T} Ag (x)) = g {(x)}^{T} A^{T} D_{x} f (x) + f {(x)}^{T} A D_{x} g (x) \end{matrix}

(42)

With the computed gradient and Fisher metric, the natural gradient of manifold is

\begin{matrix} G {(x_{k})}^{- 1} \nabla ℓ (x_{k}) & = - {({(\partial h)}^{T} R^{- 1} (\partial h) + P^{- 1})}^{- 1} ({(\partial h)}^{T} R^{- 1} e_{y} + P^{- 1} e_{x}) \end{matrix}

(43)

Meanwhile, considering the denominator of Equation (20), it can be computed as

\begin{matrix} \nabla ℓ {(x_{k})}^{T} G {(x_{k})}^{- 1} \nabla ℓ (x_{k}) \\ = tr [G {(x_{k})}^{- 1} \nabla ℓ (x_{k}) \nabla ℓ {(x_{k})}^{T}] \\ = tr [{(\partial h^{T} R^{- 1} \partial h + P^{- 1})}^{- 1} (\partial h^{T} R^{- 1} e_{y} + P^{- 1} e_{x}) {(\partial h^{T} R^{- 1} e_{y} + P^{- 1} e_{x})}^{T}] \\ = tr [{(\partial h^{T} R^{- 1} \partial h + P^{- 1})}^{- 1} (\partial h^{T} R^{- 1} e_{y} e_{y}^{T} R^{- 1} \partial h + \partial h^{T} R^{- 1} e_{y} e_{x}^{T} P^{- 1} + P^{- 1} e_{x} e_{y}^{T} R^{- 1} \partial h + P^{- 1} e_{x} e_{x}^{T} P^{- 1})] \end{matrix}

(44)

For simplicity, we use the expectation value to substitute the value which computed in each iterative procedure. Thus

\begin{matrix} E [G {(x_{k})}^{- 1} \nabla ℓ (x_{k}) \nabla ℓ {(x_{k})}^{T}] \\ = {(\partial h^{T} R^{- 1} \partial h + P^{- 1})}^{- 1} (\partial h^{T} R^{- 1} R R^{- 1} \partial h + P^{- 1} P P^{- 1}) \\ = {(\partial h^{T} R^{- 1} \partial h + P^{- 1})}^{- 1} (\partial h^{T} R^{- 1} \partial h + P^{- 1}) \\ = I \end{matrix}

(45)

where

e_{y}

and

e_{x}

are independent, and

I

is the n-dimension square matrix which correspondence to the n-dimension state. Based on the natural gradient descent, we can construct the recursive procedure to get the optimal estimation in the measurement update step

\begin{matrix} x_{k}^{t} & = x_{k}^{t - 1} + \frac{ε}{\sqrt{\nabla ℓ {(x_{k}^{t - 1})}^{T} G^{- 1} \nabla ℓ (x_{k}^{t - 1})}} G^{- 1} (x_{k}^{t - 1}) \nabla ℓ (x_{k}^{t - 1}) \\ \approx x_{k}^{t - 1} + \frac{ε}{\sqrt{tr (I)}} G^{- 1} (x_{k}^{t - 1}) \nabla ℓ (x_{k}^{t - 1}) \\ \approx x_{k}^{t - 1} + \frac{ε}{\sqrt{n}} G^{- 1} (x_{k}^{t - 1}) \nabla ℓ (x_{k}^{t - 1}) \end{matrix}

(46)

where n is the dimension of the state. Replacing the parameter as

η = \frac{ε}{\sqrt{n}}

, the natural gradient descent method becomes the simplification form suggested by Amari. Given the above, the iterative procedure for estimating the state can be constructed

\begin{matrix} x_{k}^{t} & = x_{k}^{t - 1} + η G^{- 1} (x_{k}^{t - 1}) \nabla ℓ (x_{k}^{t - 1}) \\ = x_{k}^{t - 1} + η {(\partial h^{T} R^{- 1} \partial h + P_{t - 1}^{- 1})}^{- 1} {(\partial h^{T} R^{- 1} (y_{k} - h (x_{k})) + P_{t - 1}^{- 1} (x_{k}^{t - 1} - x_{k}))|}_{x_{k} = x_{k}^{t - 1}} \\ = x_{k}^{t - 1} + η {(\partial h_{t - 1}^{T} R^{- 1} \partial h_{t - 1} + P_{t - 1}^{- 1})}^{- 1} (\partial h_{t - 1}^{T} R^{- 1}) (y_{k} - h (x_{k}^{t - 1})) \end{matrix}

(47)

where

\partial h_{t - 1} = \frac{\partial h (x_{k})}{\partial x_{k}} ∣_{x_{k} = x_{k}^{t - 1}}

, and

P_{t - 1}

is the covariance matrix after the

t - 1

iterative procedure.

For the iterative procedure, a stopping criterion should be set for final estimation. In the statistical manifold, the KLD is a better criterion for measurement the distance between two PDF. Because of the estimated state corresponding with the posterior PDF, the convergence in statistical manifold also means the better estimation in parameter space. Besides, Morelande [15] has shown that the lower KLD value to select the optimum parameter for the approximation posterior PDF has better performance. Thus, we choose the KLD as the stopping criterion in the iteration procedure of measurement update following the statistical manifold.

\begin{matrix} D [x_{k}^{t - 1} : x_{k}^{k}] = \frac{1}{2} {(x_{k}^{k} - x_{k}^{t - 1})}^{T} G (x_{k}^{t - 1}) (x_{k}^{k} - x_{k}^{t - 1}) \leq δ \end{matrix}

(48)

Also, we can choose a stopping criterion as the follow form from the view of parameter space in which has been used in the IEKF or other iterated estimation problems

\begin{matrix} ∥ {\hat{x}}_{k}^{t} - {\hat{x}}_{k}^{t - 1} ∥_{2}^{2} \leq γ \end{matrix}

(49)

Based on this measurement update method, we can construct nonlinear filtering by combining the existing method for state propagation. Here, we use the Ensemble Kalman filter (EnKF) for state propagation, which uses the Monte Carlo technique for integral operation in the Bayesian filtering. For the last procedure ensemble has

N_{s}

elements as

\begin{matrix} S_{k - 1 | k - 1} = {{\hat{x}}_{k - 1 | k - 1}^{i}, i = 1, \dots, N_{s}} \end{matrix}

(50)

We can get the prediction ensemble as

\begin{matrix} S_{k | k - 1} = {{\hat{x}}_{k | k - 1}^{i} = f ({\hat{x}}_{k - 1 | k - 1}^{i}) + w_{k}^{i}, i = 1, \dots, N_{s}} \end{matrix}

(51)

where

w_{k}^{i}

is generated according the state process noise. Then, the mean and the covariance matrix can be computed

\begin{matrix} {\bar{x}}_{k | k - 1} & = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} {\hat{x}}_{k | k - 1}^{i} \end{matrix}

(52)

\begin{matrix} P_{k | k - 1} & = \frac{1}{N_{s} - 1} \sum_{i = 1}^{N_{s}} ({\hat{x}}_{k | k - 1}^{i} - {\bar{x}}_{k | k - 1}) {({\hat{x}}_{k | k - 1}^{i} - {\bar{x}}_{k | k - 1})}^{T} \end{matrix}

(53)

In the Bayesian framework, these prediction means and covariance will be incorporated in the procedure as prior information of state to propel the measurement update by the natural gradient method. With the stopping criterion of iterative procedure, we can get the final estimation of state. Meanwhile, the iterative estimation will converge to its final value because of the convergence of the natural gradient descent, and the iterative procedure can process some iterative estimation value of state. Here, we consider these values as the ensemble of measurement update for next period of nonlinear filter in the EnKF.

\begin{matrix} S_{k | k} = {{\hat{x}}_{k | k}^{i}, i = 1, \dots, N_{s}} \end{matrix}

(54)

Thus, the complete nonlinear filter method has proposed based on state propagation by EnKF and measurement update by natural gradient descent method, and has been shown in Algorithm 1.

Algorithm 1 Natural Gradient Descent Method for Measurement Update in EnKF

Input: Measurement

y = {y_{1}, y_{2}, ..., y_{T}}

, ensemble

N_{s}

, parameter

η \in (0, 1]

.

Initialize:

x_{0}

,

P_{0}

,

{\hat{x}}_{0}^{i} = x_{0} + \sqrt{P_{0}} N (0, 1)

, ensemble

S_{0 | 0} = {{\hat{x}}_{0}^{i}, i = 1, ..., N_{s}}

.

For

k = 1 : T

% compute prediction ensemble

S_{k | k - 1} = {{\hat{x}}_{k | k - 1}^{i} = f ({\hat{x}}_{k - 1 | k - 1}^{i}) + w_{k}^{i}, i = 1, ..., N_{s}}

;

% compute mean and covariance of prediction ensemble

{\bar{x}}_{k | k - 1} = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} {\hat{x}}_{k | k - 1}^{i}

;

P_{k | k - 1} = \frac{1}{N_{s} - 1} \sum_{i = 1}^{N_{s}} ({\hat{x}}_{k | k - 1}^{i} - {\bar{x}}_{k | k - 1}) {({\hat{x}}_{k | k - 1}^{i} - {\bar{x}}_{k | k - 1})}^{T}

;

% natural gradient descent for measurement update

x_{k | k}^{0} = {\bar{x}}_{k | k - 1}

,

P_{k | k}^{0} = P_{k | k - 1}

,

t = 1

;

For

t = 1 : N_{s}

H_{t - 1} = {\frac{\partial h (x_{k})}{\partial x_{k}}|}_{x_{k} = x_{k | k}^{t - 1}}

;

G = H_{t - 1}^{T} R^{- 1} H_{t - 1} + {(P_{k | k}^{t - 1})}^{- 1}

;

\nabla ℓ = (H_{t - 1}^{T} R^{- 1}) (y_{k} - h (x_{k}^{t - 1}))

;

x_{k | k}^{t} = x_{k | k}^{t - 1} + η G^{- 1} \nabla ℓ

;

P_{k | k}^{t} = \frac{1}{N_{s} - 1} \sum_{i = 1}^{N_{s}} ({\hat{x}}_{k | k - 1}^{i} - x_{k | k}^{t}) {({\hat{x}}_{k | k - 1}^{i} - x_{k | k}^{t})}^{T}

;

D [x_{k}^{t - 1} : x_{k}^{t}] = \frac{1}{2} {(x_{k | k}^{t} - x_{k | k}^{t - 1})}^{T} G (x_{k | k}^{t} - x_{k | k}^{t - 1})

;

End for

% Select final estimation

For

t = 1 : N_{s}

If

D [x_{k}^{t - 1} : x_{k}^{t}] \leq δ

and

∥ {\hat{x}}_{k | k}^{t} - {\hat{x}}_{k | k}^{t - 1} ∥_{2}^{2} \leq γ

{\hat{x}}_{k | k} = x_{k | k}^{t}

;

P_{k | k} = P_{k | k}^{t}

;

End if

End for

S_{k | k} = {x_{k | k}^{t}, i = 1, ..., N_{s}}

;

End for

Output: State

x = {{\hat{x}}_{k | k}, k = 1, ..., T}

4. Comparing with the Existing Methods

As the natural gradient descent method for measurement update, we can get some conclusions comparing with the existing methods. When we choose the parameter as

η = 1

with a single iteration, the iterative procedure (47) will be simplified as

\begin{matrix} x_{k} = x_{k - 1} + & {(\partial h_{k - 1}^{T} R^{- 1} \partial h_{k - 1} + P^{- 1})}^{- 1} (\partial h_{k - 1}^{T} R^{- 1}) (y_{k} - h (x_{k - 1})) \end{matrix}

(55)

Thus, the update has deduced to the traditional Kalman filter and the EKF. When the measurement function is linear, i.e.,

y_{k} = h (x_{k}) = H x_{k}

, the Jacobian of

h (x_{k})

can be obtained easily as

\frac{\partial h (x_{k})}{\partial x_{k}} = H

,

\frac{\partial h (x_{k})}{\partial x_{k}^{T}} = H^{T}

. As for the nonlinear measurement function, the first-order Taylor expansion has been used as the same as the linear case. The procedure becomes

\begin{matrix} x_{k} & = x_{k - 1} + {(H^{T} R^{- 1} H + P^{- 1})}^{- 1} (H^{T} R^{- 1} (y_{k} - h (x_{k - 1}))) \\ = x_{k - 1} + (P H^{T}) {(H P H^{T} + R)}^{- 1} (y_{k} - h (x_{k - 1})) \end{matrix}

(56)

where the matrix inversion lemma

\begin{matrix} {(H^{T} R^{- 1} H + P^{- 1})}^{- 1} H^{T} R^{- 1} = P H^{T} {({HPH}^{T} + R)}^{- 1} \end{matrix}

(57)

has been used in the simplification. In comparison, this is the same as the measurement update in traditional Kalman filter and EKF.

Besides, there are two iteration methods existing before. The one is IEKF [12,32], which also has been derived based on the MAP method. Its update step has the form as

\begin{matrix} x_{k}^{t} = {\hat{x}}_{k}^{-} + & (P H_{t - 1}^{T}) {(H_{t - 1} P H_{t - 1}^{T} + R)}^{- 1} (y_{k} - h (x_{k}^{t - 1}) - H_{t - 1} ({\hat{x}}_{k}^{-} - x_{k}^{t - 1})) \end{matrix}

(58)

where

H_{t - 1} = {\frac{\partial h (x_{k})}{\partial x_{k}}|}_{x_{k} = x_{k}^{t - 1}}

.

Comparing with IEKF and our proposed method, there are some differences. Firstly, the initial estimation value in each iteration is the same as the prior estimation in the IEKF, while our proposed method bases on the last iterated estimation. Secondly, the parameter η of the natural gradient descent controls the update step of state, but the IEKF has no parameters to control the update step. Thirdly, the innovations are used in the natural gradient descent method, while the IEKF adds the prediction error of current estimation state and the prior estimation state. Further from the formulation of the IEKF update, the first step of IEKF is identified to the EKF, and later steps are from the direction of overestimation convergence to the true state.

Another iteration method is RUF [9], which derived based on the LMMSE method. It defines the cross-covariance between the measurement noise and the estimation error for each steps. Then the LMMSE method has been used to deduce the update step. Also, it has considered the fraction of update, which selected as the reciprocal of the number of iteration steps. In comparison with our proposed method, the RUF uses the manual setting number of iteration steps, which also influences the update step, and has no criterion for evaluation the estimation performance. In some case, the other criterions may provide guarantee for convergence and performance in the procedure. Moreover the detailed differences among these methods will be discussed in the numerical examples in the next section.

5. Numerical Examples

In this section, we present three numerical examples to demonstrate the advantages of our proposed method and illustrate the differences between the proposed method and the existing method. The first scalar example provides a good geometrical interpretation in the convergence speed and the iteration procedure. The second example, which has been used as the common test platform for nonlinear filtering in scalar, is provided to validate the performance. The third example is a multidimensional filtering problem in which has tracked a vehicle entering atmosphere.

5.1. Iteration and Convergence for Measurement Update

Consider a sample scalar nonlinear measurement as

\begin{matrix} y = x^{5} + v \end{matrix}

(59)

where x denotes the state, and y denotes the measurement with noise

v \sim N (0, 0 . 1^{2})

. Only for the measurement update step, we assume the mean and variance of prediction state, which as prior information in the Bayesian framework, are

{\hat{x}}^{-} = 2.5

and

P = 0 . 5^{2}

, and the true state value is set at

x = 4

. With noisy measuring, the measurement

y = 1024.4

. We compare our proposed method with EKF, IEKF, and RUF. The results are shown in the Figure 2. The EKF only has one step in the procedure for estimation the posterior, while the other methods have more than one steps. For the IEKF method, the results illustrate that the first step of IEKF is the same as the EKF, which also can be deduced from the expression of two methods. However, the more iteration steps in the IEKF make the estimation converge to the true posterior state. In detail, the second and more iteration steps in the IEKF have the same beginning state for estimation, but the ending state after estimation is more close to true state. Completing the procedure, the more steps in the IEKF, the more accurate estimation than the EKF. Considering the RUF, the error of estimation is incorporated in the to refine the estimation. After some iterated steps, the error would be approximately reduced to zero. Thus the final estimation is convergence to the true state. For our proposed method, the parameter η controls the convergence behaviours to true state. When the parameter

η = 0.8

, the first step for estimation is bigger than the true state as same as the EKF, then the later iteration steps convergence to true state in the contrary direction of RUF. As the parameter

η = 0.2

, the convergence path of iteration steps is the same direction of RUF, but has the fast convergence speed. Moreover, we consider the convergence of the natural gradient descent method. We use the random technique to generate the initial state and parameter

η = 0.5

. The results are show in Figure 3 after running 100 Monte Carlo. It has showed that the iteration steps, which the natural gradient descent convergence to the true state, always need less 30 times.

5.2. Univariate Nonstationary Growth Model (UNGM)

This model has been used extensively to validate the performance of nonlinear filtering because of its high nonlinearity. The model can be formulated as [33]

\begin{matrix} x_{k} & = \frac{x_{k - 1}}{2} + \frac{25 x_{k - 1}}{1 + x_{k - 1}^{2}} + 8 cos (1.2 k) + u_{k} \end{matrix}

(60)

\begin{matrix} y_{k} & = \frac{x_{k}^{3}}{20} + v_{k} \end{matrix}

(61)

where

u_{k}

and

v_{k}

denote the additive white Gaussian noise corresponding with state and measurement, as

u_{k} \sim N (0, Q)

and

v_{k} \sim N (0, R)

. In this experiment, we set the variance as

Q = 10

and

R = 1

, respectively. We assume the initiate state as

p (x_{0}) = N (x_{0}; 0, 1)

before the filtering procedure. The true state that we used in the simulations with 100 time steps has a severe fluctuation, which has showed in the Figure 4. The state varies frequently, and the first order Taylor series expanded can not depict fitting in the procedure. Thus, the EKF has the bad performance in the experiment. In addition, the state will make the nonlinear measurement more difficult.

As a performance metric, we use Root Mean Square Error (RMSE) in the example. Let

{\hat{x}}_{k}^{i}

and

x_{k}^{i}

denote the estimated state and true state at time k, respectively. Then the RMSE is defined as

\begin{matrix} {RMSE}_{k} = \sqrt{\frac{1}{M} \sum_{i = 1}^{M} {({\hat{x}}_{k}^{i} - x_{k}^{i})}^{2}} \end{matrix}

where M is the total number of Monte Carlo runs.

In our experiment, the RMSE is obtained by using 100 Monte Carlo runs, averaging over different realizations of the measurements and the true states. The comparison is achieved between our proposed method, which called natural gradient descent Kalman filter (NgdKF) here, and the four existing methods, i.e., EnKF, IEKF, EKF, and RUF. In the EnKF and our proposed method, the number of ensemble used each time is 200, and the initial ensemble generated randomly with

p (x_{0})

. During the procedure of IEKF, the stopping criterion is

∥ {\hat{x}}_{k | k}^{t} - {\hat{x}}_{k | k}^{t - 1} ∥_{2}^{2} \leq 10^{- 4}

, while

∥ {\hat{x}}_{k | k}^{t} - {\hat{x}}_{k | k}^{t - 1} ∥_{2}^{2} \leq 10^{- 4}

and

D [x_{k}^{t - 1} : x_{k}^{t}] \leq 10^{- 5}

in our proposed method. The number of iteration steps in the RUF is set as 200. The parameter

η = 0.5

is selected in our proposed method for the iterated step by natural gradient descent method. The comparison results measured by the RMSE are showed in the Figure 5. As we can see that our proposed method performs much better than EKF, EnKF, RUF, and IEKF. As aforementioned in the Figure 4, the EKF becomes poor because of first order Taylor series expanded is not fit the measurement function. In comparing the iteration methods, i.e., IEKF, RUF and NgdKF, with no iteration method, i.e., EKF and EnKF, the front class methods have the better performance than the later. This conclusion also has been mentioned by Lefebvre [14]. However, the later is computed faster than the front. In fact, from the update formulation of each method, the iteration methods have more than one steps, while the no iteration only has one step. In the nonlinear filter, we have to take more times or steps to improve the filtered precision. Based on the comparison results, our proposed method has the better performance.

5.3. Tracking of Vehicle Entering Atmosphere

In the previous example, we have shown the performance in scalar filtering problem. In this example, we will study a multidimensional dynamic system. We analyze target tracking problem of a vehicle entering atmosphere at high altitude with a high speed [34]. It has the continuous state, which constructed by position (

x_{1}

and

x_{2}

), velocity (

x_{3}

and

x_{4}

) and drag coefficient (

x_{5}

), and the discrete measurement constructed by range (

r_{k}

) and bearing (

b_{k}

). This experiment is a better platform to measure the filtering algorithms. Firstly, the number of dimension between the state and the measurement is difference. Secondly, the noise is partial in the state propagation. Thirdly, the measurement function is nonlinear. Fourthly, the drag coefficient is a constant but affected by the noise, which can give a good criterion for measuring the filtering performance.

With the vector description of the continuous state as

X (t) = {[x_{1} (t), x_{2} (t), x_{3} (t), x_{4} (t), x_{5} (t)]}^{T} \in R^{5}

, and the discrete-time measurement as

Y (k) = {[r (k), b (k)]}^{T} \in R^{2}

, the state equation of dynamic is characterized as

\begin{matrix} \dot{X} (t) = F (t) X (t) + W (t) \end{matrix}

(62)

where

\dot{X} (t) = {[{\dot{x}}_{1} (t), {\dot{x}}_{2} (t), {\dot{x}}_{3} (t), {\dot{x}}_{4} (t), {\dot{x}}_{5} (t)]}^{T}

is the the derivative of

X (t)

,

W (t) = {[0, 0, w_{1}, w_{2}, w_{3}]}^{T}

denotes the random noise. In above equation, the state propagation function is formulated as

\begin{matrix} F (t) = [\begin{matrix} 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ G (t) & 0 & D (t) & 0 & 0 \\ 0 & G (t) & 0 & D (t) & 0 \\ 0 & 0 & 0 & 0 & 0 \end{matrix}] \end{matrix}

(63)

where the parameter

G (t) = \frac{G m_{0}}{R^{3} (t)}

and

D (t) = β (t) \exp {\frac{R_{0} - R (t)}{H_{0}}} V (t)

denote gravity-related force term and drag-related force term, respectively. The

R (t) = \sqrt{x_{1}^{2} (t) + x_{2}^{2} (t)}

is the distance from the center of the Earth,

β (t)

is the ballistic coefficient with

β (t) = β_{0} \exp (x_{5} (t))

, and

V (t) = \sqrt{x_{3}^{2} (t) + x_{4}^{2} (t)}

is the speed. The constant parameters in our experiment is set as

G m_{0} = 3.9860 \times 10^{5}

,

β_{0} = - 0.59783

,

H_{0} = 13.406

, and

R_{0} = 6347

.

The measurement

Y (k) = {[r_{k}, b_{k}]}^{T}

is obtained by a radar located at

{[x_{1, r}, x_{2, r}]}^{T} = {[6374, 0]}^{T}

with a frequency of 10 Hz. The measurement equation as

\begin{matrix} Y (k) = h (X (k)) + u (k) \end{matrix}

(64)

where

\begin{matrix} h (X (k)) = [\begin{matrix} \sqrt{{(x_{1} (t_{k}) - x_{1, r})}^{2} + {(x_{2} (t_{k}) - x_{2, r})}^{2}} \\ \tan^{- 1} (\frac{x_{2} (t_{k}) - x_{2, r}}{x_{1} (t_{k}) - x_{1, r}}) \end{matrix}] \end{matrix}

(65)

and

u (k) = {[u_{1, k}, u_{2, k}]}^{T}

is the measurement noise. The parameter

t_{k}

is the time instant of k-time measurement.

In the experiment, the state noise

w (t) = {[w_{1} (t), w_{2} (t), w_{3} (t)]}^{T}

with

E [w (t + τ) w (t)] = diag ([2.4064 \times 10^{- 5}, 2.4064 \times 10^{- 5}, 0]) δ (t - τ)

, and the measurement random noise

u (k) = diag ([10^{- 6}, 17^{2} \times 10^{- 6}])

.

For the continuous time state propagation, we have to discrete as a discrete-time state model. Here the Euler approximation method has been used to discrete the model

\begin{matrix} X (k + 1) = X (k) + τ F (k) X (k) + W (k) \end{matrix}

(66)

where k and τ are the time instant and time interval correspondent with the measurement time, respectively. With this discrete dynamic model, the usual filtering methods can be used. The initial condition of state for filtering has been set as

X (0) = {[6500.4, 349.14, - 1.8093, - 6.7967, 0]}^{T}

and

P (0) = {[10^{- 6}, 10^{- 6}, 10^{- 6}, 10^{- 6}, 1]}^{T}

.

In our experiment, we compare four methods, i.e., RUF, IEKF, EnKF and our proposed method (NgdKF), with time interval

τ = 0.1

, time steps

K = 100

and Monte Carlo

M = 100

. For the EnKF, the number of ensemble is chose as

N_{s} = 200

. In our proposed method, the parameter

η = 0.1

,

δ = 10 e - 2

and

γ = 10 e - 2

. The position root mean square error (RMSE), velocity RMSE and drag coefficient RMSE considered for measuring the performance. They are defined as

\begin{matrix} {MSE}_{k}^{pos} & = \sqrt{\frac{1}{M} \sum_{i = 1}^{M} {(x_{1}^{i} (k) - {\hat{x}}_{1}^{i} (k))}^{2} + {(x_{2}^{i} (k) - {\hat{x}}_{2}^{i} (k))}^{2}} \end{matrix}

(67)

\begin{matrix} {MSE}_{k}^{vel} & = \sqrt{\frac{1}{M} \sum_{i = 1}^{M} {(x_{3}^{i} (k) - {\hat{x}}_{3}^{i} (k))}^{2} + {(x_{4}^{i} (k) - {\hat{x}}_{4}^{i} (k))}^{2}} \end{matrix}

(68)

\begin{matrix} {MSE}_{k}^{Dra} & = \sqrt{\frac{1}{M} \sum_{i = 1}^{M} {(x_{5}^{i} (k) - {\hat{x}}_{5}^{i} (k))}^{2}} \end{matrix}

(69)

where

x_{1}^{i} (k), x_{2}^{i} (k), x_{3}^{i} (k), x_{4}^{i} (k), x_{5}^{i} (k)

and

{\hat{x}}_{1}^{i} (k), {\hat{x}}_{2}^{i} (k), {\hat{x}}_{3}^{i} (k), {\hat{x}}_{4}^{i} (k), {\hat{x}}_{5}^{i} (k)

are correspond with the true state and filtered state of k-time instant in i-time simulation, respectively.

The results are showed in the Figure 6, Figure 7 and Figure 8. In Figure 6, the position RMSE has been compared. The performance can be divided into two levels. The better level constructs by the IEKF and our proposed method, and the poor level contains the RUF and the EnKF. As for Figure 7 in which comparing the velocity RMSE, our proposed method and the IEKF have the better performance than the RUF and the EnKF, while the EnKF performs better than the RUF. For the performance of estimation the drag coefficient, our proposed method has the same performance with the IEKF, which better than the other methods. The RUF has performed better than the EnKF.

In the experiment, the positions are affected directly by the measurement, but have no noise. Thus, the better performance in dealing with nonlinear measurement, the better estimation for positions is achieved. Comparing with the estimation of positions, we can make a conclude intuitively that the better performance of our proposed method in dealing with nonlinear measurement. For the estimation procedure of drag coefficient, the noise has affected the estimation primarily, which would be the same level all the times. Thus the performance for estimating the drag coefficient may reflect the robust of the filtering. Accordingly, the robust property of our proposed method and the IEKF have been shown in the experiment. Comparing the three RMSE, our proposed method has the same performance with the IEKF. The reasons is that our proposed method and the IEKF all have convergence to a stable estimation, and the iteration procedures have made the estimation close to the true state. Also, it has verified the convergence of our proposed method. From Figure 8, we can see the poor performance of the EnKF. This may be affected by the limit numbers of the ensemble used in the filtering. Besides, the RUF is performance moderately may indicated that the convergence has not achieved in the procedure. In all, our proposed method has converged a stable estimation and close to the true state. The performance is the same as the IEKF in multidimensional signal processing, and better than the other methods.

6. Conclusions

In this paper, we have derived a new method for measurement update of the nonlinear filtering. In our proposed method, the measurement update has been considered as an optimization estimation in the view of information geometry, and the estimation in the parameter space has been corresponded with the iteration procedure in a statistical manifold. We have constructed the statistical manifold based on the posterior PDF of state under the condition of measurement in the Bayesian framework. Along the statistical manifold, the natural gradient descent method has been used for the optimization estimation. In the iteration procedure, the Fisher information matrix has applied to influence the iteration direction and convergence speed. Thus, it achieves the better performance and faster convergence. In addition, our proposed method has also given the theoretical justification for the measurement update in nonlinear filtering from the information geometric insight and a mathematical interpretation. We also discuss that in some conditions our proposed method is identical to the existing Kalman filter and its variants. In the procedure of deducing our method, the Bayesian framework is the bridge connecting the traditional filtering problems with the statistical manifold, thus the information geometric approach can be used in the nonlinear filtering. Also, the Bayesian framework provides a unifying framework for the dynamical model regardless of discrete-time or continuous-time state functions. That is means that the information geometric approach for nonlinear filtering will no limit to discrete-time or continuous-time dynamical model. Because of the ability of information geometry for processing non-Gaussian PDF, we can induce the filtering algorithms under the condition of non-Gaussian PDF or noise. Furthermore, our method has only consider the measurement update stage in the nonlinear filtering, and we can construct other nonlinear filtering combining with different state propagation methods for some filtering problems. In reviewing our method, there are some points can be considered thoroughly in the future. Firstly, the parameter of update steps η will be considered for selecting adaptively. Secondly, the covariance matrix of prior estimation of state may have other approaches to obtain. Apart from the covariance matrix of our method which computed by the ensemble, there are two other approaches for the estimation of covariance matrix. In some articles, the mean of estimation is determined by the maximum likelihood estimates, while the covariance matrix can be approximated by the inverse of the Fisher information matrix [35]. The other method for covariance is estimating the mean and the covariance simultaneously, such as evolutionary algorithms [36]. The different approaches have different effects in the filtering. Thus the influence will be analyzed and the suit approach would be selected for the filtering. Thirdly, the number of steps for the convergence of our method will be considered in the future work. The proper number of steps can guarantee the convergence and spend less computation in the iteration procedure.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under grant No. 61302149 and 61601479. The authors gratefully acknowledge the reviewers for their very valuable and insightful comments and suggestions, which have improved the presentation.

Author Contributions

Yubo Li put forward the original ideas and performed the research. Yongqiang Cheng and Xiang Li conceived and designed the simulations comparing with other existing methods. Xiaoqiang Hua makes some discussion on the natural gradient descent method. Yuliang Qin reviewed the paper and provided useful comments. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Haug, A. Bayesian Estimation and Tracking: A Practical Guide; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
Julier, S.; Uhlmann, J. Unscented filtering and nonlinear estimation. Proc. IEEE 2004, 92, 401–422. [Google Scholar] [CrossRef]
Arasaratnam, I.; Haykin, S. Cubature Kalman filters. IEEE Trans. Autom. Control 2009, 54, 1254–1269. [Google Scholar] [CrossRef]
Arulampalam, M.; Maskell, S.; Gordon, N. A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Process. 2002, 50, 174–188. [Google Scholar] [CrossRef]
Gustafsson, F.; Hendeby, G. Some relations between extended and unscented Kalman filters. IEEE Trans. Signal Process. 2012, 60, 545–555. [Google Scholar] [CrossRef]
Crisan, D.; Doucet, A. A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Process. 2002, 50, 736–746. [Google Scholar] [CrossRef]
Stano, P.; Lendek, Z.; Braaksma, J.; Babuška, R.; Keizer, C.; den Dekker, A. Parametric Bayesian Filters for Nonlinear Stochastic Dynamical Systems: A Survey. IEEE Trans. Cybern. 2013, 43, 1607–1624. [Google Scholar] [CrossRef] [PubMed]
García-Fernández, Á.; Svensson, L.; Morelande, M.; Sarkka, S. Posterior Linearization Filter: Principles and Implementation Using Sigma Points. IEEE Trans. Signal Process. 2015, 63, 5561–5573. [Google Scholar] [CrossRef]
Zanetti, R. Recursive Update Filtering for Nonlinear Estimation. IEEE Trans. Autom. Control 2012, 57, 1481–1490. [Google Scholar] [CrossRef]
Zanetti, R. Adaptable Recursive Update Filter. J. Guid. Control Dyn. 2015, 38, 1295–1299. [Google Scholar] [CrossRef]
Fatemi, M.; Svensson, L.; Hammarstrand, L.; Morelande, M. A study of MAP estimation techniques for nonlinear filtering. In Proceedings of the 2012 15th International Conference on Information Fusion (FUSION), Singapore, 9–12 July 2012; pp. 1058–1065.
Bell, B.; Cathey, F. The iterated Kalman filter update as a Gauss-Newton method. IEEE Trans. Autom. Control 1993, 38, 294–297. [Google Scholar] [CrossRef]
Bellaire, R.; Kamen, E.; Zabin, S. A new nonlinear iterated filter with applications to target tracking. Proc. SPIE Signal Data Process. Small Targets 1995, 2561, 240–251. [Google Scholar]
Lefebvre, T.; Bruyninckx, H.; Schutter, J. Kalman filters for nonlinear systems: A comparison of performance. Int. J. Control 2004, 77, 639–653. [Google Scholar] [CrossRef]
Morelande, M.; García-Fernández, A. Analysis of Kalman filter approximations for nonlinear measurements. IEEE Trans. Signal Process. 2013, 61, 5477–5484. [Google Scholar] [CrossRef]
Amari, S. Natural gradient works efficiently in learning. Neural Comput. 2008, 10, 251–276. [Google Scholar] [CrossRef]
Amari, S.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Providence, RI, USA, 2007. [Google Scholar]
Raskutti, G.; Mukherjee, S. The Information Geometry of Mirror Descent. IEEE Trans. Inf. Theory 2015, 61, 1451–1457. [Google Scholar] [CrossRef]
Galanis, G.; Chu, P.C.; Kallos, G.; Kuo, Y.H.; Dodson, C.T.J. Wave Height Characteristics in the North Atlantic Ocean: A new approach based on statistical and geometrical techniques. Stoch. Environ. Res. Risk Assess. 2012, 26, 83–103. [Google Scholar] [CrossRef] [Green Version]
Amari, S.; Park, H.; Fukumizu, K. Adaptive method of realizing natural gradient learning formultilayer perceptrons. Neural Comput. 2000, 12, 1399–1409. [Google Scholar] [CrossRef] [PubMed]
Park, H.; Amari, S.; Fukumizu, K. Adaptive natural gradient learning algorithms for various stochastic models. Neural Netw. 2000, 7, 755–764. [Google Scholar] [CrossRef]
Zhang, L.; Cichocki, A.; Amari, S. Natural Gradient Algorithm for Blind Separation of Overdetermined Mixture with Additive Noise. IEEE Signal Process. Lett. 1999, 6, 293–295. [Google Scholar] [CrossRef]
Yi, S.; Wierstra, D.; Schaul, T.; Schmidhuber, J. Stochastic search using the natural gradient. In Proceedings of the 26th International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 1161–1168.
Akimoto, Y.; Nagata, Y.; Ono, I.; Kobayashi, S. Theoretical Foundation for CMA-ES from Information Geometry Perspective. Algorithmica 2012, 64, 698–716. [Google Scholar] [CrossRef]
Zhang, Z.; Sun, H.; Peng, L.; Jiu, L. A Natural Gradient Algorithm for Stochastic Distribution Systems. Entropy 2014, 16, 4338–4352. [Google Scholar] [CrossRef]
Galanis, G.; Famelis, I.; Liakatas, A. A new Kalman Filter based on Information Geometry techniques for optimizing numerical environmental simulations. Stoch. Environ. Res. Risk Assess. 2016. [Google Scholar] [CrossRef]
Cai, Y.; Dodson, C.T.J.; Doig, A.; Wolkenhauer, O. Information-theoretic analysis of protein sequences shows that amino acids self-cluster. J. Theor. Biol. 2002, 218, 409–418. [Google Scholar] [CrossRef]
Verdoolaege, G.; Scheunders, P. On the Geometry of Multivariate Generalized Gaussian Models. J. Math. Imaging Vis. 2012, 43, 180–193. [Google Scholar] [CrossRef]
Harsha, K.V.; Subrahamanian, M.K.S. F-Geometry and Amari’s α-Geometry on a Statistical Manifold. Entropy 2014, 16, 2472–2487. [Google Scholar]
Resconi, G. Geometry of risk analysis (morphogenetic system). Stoch. Environ. Res. Risk Assess. 2009, 23, 425–432. [Google Scholar] [CrossRef]
Amari, S. Differential-Geometrical Methods in Statistics; Lecture Notes in Statistics 28; Springer: Berlin/Heidelberg, Germany, 1985. [Google Scholar]
Sibley, G.; Sukhatme, G.; Matthies, L. The iterated sigma point Kalman filter with applications to long range stereo. Robot. Sci. Syst. 2006, 8, 235–244. [Google Scholar]
García-Fernández, Á.; Morelande, M.; Grajal, J. Truncated unscented Kalman filtering. IEEE Trans. Signal Process. 2012, 60, 3372–3386. [Google Scholar] [CrossRef]
Dunik, J.; Ejstraka, O.; Simandl, M.; Blasch, E. Random-Point-Based Filters: Analysis and Comparison in Target Tracking. IEEE Trans. Aerosp. Electron. Syst. 2015, 51, 1403–1421. [Google Scholar] [CrossRef]
Iguzquiza, E.; Chica-Olmo, M. Geostatistical simulation when the number of experimental data is small: An alternative paradigm. Stoch. Environ. Res. Risk Assess. 2008, 22, 325–337. [Google Scholar] [CrossRef]
Beyer, H. Convergence Analysis of Evolutionary Algorithms that Are Based on the Paradigm of Information Geometry. Evol. Comput. 2014, 22, 679–709. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The iterative procedure for estimation in parameter space can be converted to the recursive estimation in the statistical manifold from the information geometric viewpoint.

Figure 2. Comparison of the iteration procedure.

Figure 3. The convergence of the natural gradient descent method.

Figure 4. True state in the experiment. It varies in short time, and brings some difficult for the measurement update.

Figure 5. RMSE Comparison. Our proposed method (NgdKF) is better perform than other existing methods.

Figure 6. The position RMSE.

Figure 7. The velocity RMSE.

Figure 8. The drag coefficient RMSE.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Cheng, Y.; Li, X.; Hua, X.; Qin, Y. Information Geometric Approach to Recursive Update in Nonlinear Filtering. Entropy 2017, 19, 54. https://doi.org/10.3390/e19020054

AMA Style

Li Y, Cheng Y, Li X, Hua X, Qin Y. Information Geometric Approach to Recursive Update in Nonlinear Filtering. Entropy. 2017; 19(2):54. https://doi.org/10.3390/e19020054

Chicago/Turabian Style

Li, Yubo, Yongqiang Cheng, Xiang Li, Xiaoqiang Hua, and Yuliang Qin. 2017. "Information Geometric Approach to Recursive Update in Nonlinear Filtering" Entropy 19, no. 2: 54. https://doi.org/10.3390/e19020054

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Information Geometric Approach to Recursive Update in Nonlinear Filtering

Abstract

1. Introduction

2. Information Geometry and Natural Gradient Descent

3. Filtering Based on Information Geometry

4. Comparing with the Existing Methods

5. Numerical Examples

5.1. Iteration and Convergence for Measurement Update

5.2. Univariate Nonstationary Growth Model (UNGM)

5.3. Tracking of Vehicle Entering Atmosphere

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI