A Maximum Likelihood Ensemble Filter via a Modified Cholesky Decomposition for Non-Gaussian Data Assimilation

Nino-Ruiz, Elias David; Mancilla-Herrera, Alfonso; Lopez-Restrepo, Santiago; Quintero-Montoya, Olga

doi:10.3390/s20030877

Open AccessArticle

A Maximum Likelihood Ensemble Filter via a Modified Cholesky Decomposition for Non-Gaussian Data Assimilation

by

Elias David Nino-Ruiz

^1,*

,

Alfonso Mancilla-Herrera

¹

,

Santiago Lopez-Restrepo

^2,3

and

Olga Quintero-Montoya

²

¹

Applied Math and Computer Science Laboratory, Department of Computer Science, Universidad del Norte, Barranquilla 080001, Colombia

²

Mathematical Modelling Research Group, Department of Mathematical Sciences, Universidad EAFIT, Medellín 050001, Colombia

³

Delft Institute of Applied Mathematics, Delft University of Technology, 2625 Delft, The Netherlands

^*

Author to whom correspondence should be addressed.

Sensors 2020, 20(3), 877; https://doi.org/10.3390/s20030877

Submission received: 19 November 2019 / Revised: 24 January 2020 / Accepted: 27 January 2020 / Published: 6 February 2020

(This article belongs to the Section Remote Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

This paper proposes an efficient and practical implementation of the Maximum Likelihood Ensemble Filter via a Modified Cholesky decomposition (MLEF-MC). The method works as follows: via an ensemble of model realizations, a well-conditioned and full-rank square-root approximation of the background error covariance matrix is obtained. This square-root approximation serves as a control space onto which analysis increments can be computed. These are calculated via Line-Search (LS) optimization. We theoretically prove the convergence of the MLEF-MC. Experimental simulations were performed using an Atmospheric General Circulation Model (AT-GCM) and a highly nonlinear observation operator. The results reveal that the proposed method can obtain posterior error estimates within reasonable accuracies in terms of

ℓ - 2

error norms. Furthermore, our analysis estimates are similar to those of the MLEF with large ensemble sizes and full observational networks.

Keywords:

ensemble-based data assimilation; EnKF; MLEF; line-search optimization; modified cholesky decomposition

MSC:

49K10; 49M05; 49M15

1. Introduction

Remotely sensed observations by earth observing satellites are usually spatially and temporally discontinuous as a result of the sensor, satellite, and target view geometries [1]. For instance, polar orbiting satellites/sensors provide greater spatial details at a reduced temporal resolution, while geostationary orbiting satellites provide a better temporal resolution at a reduced spatial resolution [2]. Data Assimilation (DA) methods can be employed to make these observations more coherent both in time and space [3,4]. In this context, information from observations and an imperfect numerical forecast are optimally combined to estimate the state

x^{*} \in R^{n}

of a dynamical system which approximately evolves according to some imperfect numerical model:

\begin{matrix} x_{next} =_{M_{t_{current} \to t_{next}}} (x_{current}), for x \in R^{n}, \end{matrix}

(1)

where

M : R^{n} \to R^{n}

is a numerical model which encapsulates our knowledge about the dynamic system of interest, n is the model resolution, and t stands for assimilation time. In sequential DA, well-known formulations are based on the cost function:

\begin{matrix} J (x) = \frac{1}{2} ∥ x - x^{b} ∥_{B^{- 1}}^{2} + \frac{1}{2} ∥ y - H (x) ∥_{R^{- 1}}^{2}, \end{matrix}

(2)

where

x^{b} \in R^{n}

is the background state,

B \in R^{n \times n}

is the background error covariance matrix,

y \in R^{m}

is a vector holding the observations, m is the number of observations,

R \in R^{m \times m}

is the (estimated) data-error covariance matrix, and

H : R^{n} \to R^{m}

is the observation operator (which maps model states to observations). Equation (2) is better known as the Three-Dimensional Variational (3D-Var) cost function. The analysis state is estimated via the solution of the 3D-Var optimization problem:

\begin{matrix} x^{a} = arg min_{x} J (x), \end{matrix}

(3)

where

x^{a} \in R^{n}

is the analysis state. For linear observation operators, closed-form solutions can be obtained for Equation (3), these are widely employed by ensemble-based methods. However, for nonlinear observation operators, numerical optimization methods can be employed to iteratively solve Equation (3). For instance, in the Maximum Likelihood Ensemble Filter (MLEF), vector states are constrained to the space spanned by an ensemble of model realizations, which is nothing but a low-rank square-root approximation of B. This method is widely accepted in the DA community owing to its efficient formulation and relative ease of implementation. Nevertheless, since analysis increments are computed onto an ensemble space, convergence is not ensured. We think that it is possible to replace the ensemble square-root approximation by a full-rank, well-conditioned square-root approximation of B via a modified Cholesky decomposition. In this manner, analysis increments are computed onto a space whose dimension equals that of the model one. Moreover, convergence can be ensured as long as the classic assumptions of Line-Search (LS) methods are satisfied.

The structure of this paper is as follows. In Section 2, we discuss ensemble-based methods for (non) linear data assimilation. Section 3.1 proposes a Maximum Likelihood Ensemble Filter via a Modified Cholesky decomposition (MLEF-MC); the theoretical aspects of this method as well as its computational cost are analyzed. In Section 4, numerical simulations are performed using the Lorenz-96 model and an Atmospheric General Circulation Model (AT-GCM). Section 5 states the conclusions of this research.

2. Preliminaries

In this section, we briefly discuss ensemble-based data assimilation in linear and nonlinear cases. Line-Search optimization methods are also discussed for the numerical solution of optimization problems.

2.1. Ensemble-Based Data Assimilation

Ensemble-based methods estimate prior error distributions via an ensemble of model realizations [5]:

\begin{matrix} X^{b} = [x^{b [1]}, x^{b [2]}, \dots, x^{b [N]}] \in R^{n \times N}, \end{matrix}

(4)

where N is the ensemble size, and

x^{b [e]} \in R^{n}

stands for the e-th ensemble member, for

1 \leq e \leq N

. The empirical moments of Ensemble (4) are employed to estimate the moments of the prior error distributions:

\begin{matrix} x^{b} \approx {\bar{x}}^{b} = \frac{1}{N} \sum_{e = 1}^{N} x^{b [e]} \in R^{n}, \end{matrix}

(5)

and

\begin{matrix} B \approx P^{b} = \frac{1}{N} Δ X Δ X^{T} \in R^{n \times n}, \end{matrix}

(6)

where ΔX is the matrix of background anomalies:

\begin{matrix} Δ X = X^{b} - {\bar{x}}^{b} 1^{T} \in R^{n \times N}, \end{matrix}

(7)

and 1 is a vector whose components are all ones. A well-known method in the ensemble context is the Ensemble Kalman Filter (EnKF) [6]. In the EnKF, a posterior ensemble can be built via the use of synthetic observation [7,8] or by employing an affine transformation on prior members [9,10]. Regardless which method is employed to estimate the analysis members, sampling errors impact the quality of the analysis members. This obeys the fact that, in practice, ensemble sizes are much smaller than model dimensions [11]. To counteract the effects of sampling noise, localization techniques are commonly employed during the assimilation steps. Localization relies on the idea that, for most geophysical systems, distant observations are weakly correlated [11,12]. Covariance localization and domain localization are frequently employed in operational scenarios. Furthermore, another possible choice is to make use of inverse covariance matrix estimation. In the EnKF based on a modified Cholesky decomposition [13,14,15], the precision covariance

B^{- 1}

is estimated via the Bickel and Levina covariance matrix estimator [16]. This estimator has the form

\begin{matrix} {\hat{B}}^{- 1} = {\hat{L}}^{T} {\hat{D}}^{- 1} \hat{L} \in R^{n \times n}, \end{matrix}

(8)

where the nonzero components of

\hat{L} \in R^{n \times n}

are obtained by fitting linear models of the form

\begin{matrix} x^{[i]} = \sum_{q \in Π (i, r)} {- \hat{L}}_{i q}^{} x^{[q]} + η^{[i]} \in R^{N}, \end{matrix}

(9)

where

x^{[i]} \in R^{N}

is a vector holding the i-th model component across all ensemble members in Equation (7), for

1 \leq i \leq n

, and

Π (i, r)

denotes components within a local box of i for a radius size r. Note that the

\hat{L}

factor is sparse since local neighborhoods are assumed for each model component. Moreover, it is possible to obtain sparse lower triangular factors by exploiting the mesh structures of numerical grids, that is, the sparsity pattern of

\hat{L}

relies on the selection of r. Likewise,

η \in R^{N}

is Gaussian with zero-mean and uncorrelated errors with unknown variance

σ^{2}

. Some structures of

\hat{L}

are shown in Figure 1 for a one-dimensional grid and different values of r, cyclic boundary conditions are assumed for the physics/dynamics.

The ordering of model components plays an important role in an efficient manner to perform computations [17,18]. Thus, one can potentially exploit the special structure of the numerical mesh to obtain estimates which can be efficiently applied during the analysis steps [19]. However, the current literature proposes a modified Cholesky implementation, which can be applied without a prespecified ordering of model components [20].

EnKF methods commonly linearize observation operators when these are (highly) nonlinear [21], and as a direct consequence, this can induce bias on posterior members [22]. To handle nonlinear observation operators during the assimilation steps, optimization-based methods can be employed to estimate analysis increments. A well-known method in this context is the Maximum Likelihood Ensemble Filter (MLEF) [23]. This square-root filter employs the ensemble space to compute analysis increments [24,25]:

\begin{matrix} {\bar{x}}^{a} - {\bar{x}}^{b} \in range {Δ X}, \end{matrix}

which is nothing but a pseudo square-root approximation of

B^{1 / 2}

. Thus, vector states can be written as follows:

\begin{matrix} x = {\bar{x}}^{b} + Δ X w, \end{matrix}

(10)

where

w \in R^{N}

is a vector in redundant coordinates to be computed later. By replacing Equation (10) in Equation (2), one obtains [26,27]

\begin{matrix} J (x) = J ({\bar{x}}^{b} + Δ X w) = \frac{N - 1}{2} {∥ w ∥}^{2} + \frac{1}{2} ∥ y - H ({\bar{x}}^{b} + Δ X w) ∥_{R^{- 1}}^{2} . \end{matrix}

(11)

The optimization problem to solve reads

\begin{matrix} w^{*} = arg min_{w} J ({\bar{x}}^{b} + Δ X w) . \end{matrix}

(12)

This problem can be numerically solved via Line-Search (LS) and/or Trust-Region methods. However, convergence cannot be ensured as long as gradient approximations are performed onto a reduced space whose dimension is much smaller than that of the model.

2.2. Line-Search Optimization Methods

The solution of optimization problems of the form in Equation (3) can be approximated via Numerical Optimization [28,29]. In this context, solutions are obtained via iterations:

\begin{matrix} x_{k + 1} = x_{k} + {Δ x}_{k}, \end{matrix}

(13)

wherein k denotes the iteration index, and

{Δ x}_{k} \in R^{n}

is a descent direction, for instance, the gradient descent direction [30,31,32,33]

\begin{matrix} {Δ x}_{k} = - \nabla J (x_{k}), \end{matrix}

(14a)

the Newton’s step [34,35,36],

\begin{matrix} \nabla^{2} J (x_{k}) {Δ x}_{k} = - \nabla J (x_{k}), \end{matrix}

(14b)

or a quasi-Newton-based method [37,38,39],

\begin{matrix} P_{k} {Δ x}_{k} = - \nabla J (x_{k}), \end{matrix}

(14c)

where

P_{k} \in R^{n \times n}

is a positive definite matrix. A concise survey of Newton-based methods can be consulted in [40]. Since step sizes in Equation (14) are based on first or second-order Taylor polynomials, the step size can be chosen via Line-Search [41,42,43] and/or Trust-Region [44,45,46] methods. Thus, we can ensure global convergence of optimization methods to stationary points of the cost function (2). This holds as long as some assumptions regarding the functions, gradients, and (potentially) Hessians are preserved [47]. In the context of Line-Search, the following assumptions are commonly made:

C-A: A lower bound of $J (x)$ exists on $Ω_{0} = {x \in R^{n}, J (x) \leq J (x^{†})}$ , where $x^{†} \in R^{n}$ is available.
C-B: There is a constant L such as

$\begin{matrix} ∥ \nabla J (x) - \nabla J (z) ∥ \leq L ∥ x - z ∥, for x, z \in B, and L > 0, \end{matrix}$

where B is an open convex set which contains $Ω_{0}$ . These conditions together with iterates of the form

$\begin{matrix} x_{k + 1} = x_{k} + α {Δ x}_{k}, \end{matrix}$

(15)

ensure global convergence [48] as long as $α$ is chosen as an (approximated) minimizer of

$\begin{matrix} α^{*} = arg min_{α \geq 0} J (x_{k} + α {Δ x}_{k}) . \end{matrix}$

(16)

In practice, rules for choosing step size such as the Goldstein rule [49], the Strong Wolfe rule [50], and the Halving method [51] are employed to partially solve Equation (16).

3. A Maximum Likelihood Ensemble Filter via a Modified Cholesky Decomposition

In this section, we develop an efficient and practical implementation of an MLEF-based filter via a modified Cholesky decomposition.

3.1. Filter Derivation

To solve the optimization problem (Equation (3)), we consider the matrix of anomalies (Equation (7)) to estimate

B^{- 1}

via a modified Cholesky decomposition. We then employ the square-root approximation

\begin{matrix} {\hat{B}}^{1 / 2} = [{\hat{L}}^{T} {\hat{D}}^{1 / 2}]^{- 1}, \end{matrix}

(17a)

as a control space onto which analysis increments can be estimated. Note that

\begin{matrix} rank (B^{1 / 2}) = rank ({\hat{B}}^{1 / 2}) . \end{matrix}

We constrain vector states to the space spanned by Equation (17a):

\begin{matrix} x = {\bar{x}}^{b} + {\hat{B}}^{1 / 2} η, \end{matrix}

(17b)

where

η \in R^{n}

is a vector of weights to be computed later. The 3D-Var cost function (Equation (2)) onto the space (Equation (17a)) reads

\begin{matrix} J (x) = J ({\bar{x}}^{b} + {\hat{B}}^{1 / 2} η) = \frac{1}{2} {∥ η ∥}^{2} + \frac{1}{2} ∥ y - H ({\bar{x}}^{b} + {\hat{B}}^{1 / 2} η) ∥_{R^{- 1}}^{2}, \end{matrix}

(17c)

with the corresponding optimization problem:

\begin{matrix} η^{*} = arg min_{η} J ({\bar{x}}^{b} + {\hat{B}}^{1 / 2} η) . \end{matrix}

(17d)

To approximate a solution for Equation (17d), we consider iterates of the form

\begin{matrix} x_{k + 1} = x_{k} + {\hat{B}}^{1 / 2} η_{k}, for 0 \leq k \leq K, \end{matrix}

(18a)

with

x_{0} = {\bar{x}}^{b}

, where k denotes iteration index, and K is the maximum number of iterations. The weights

η_{k}

can be computed as follows: at iteration k, we linearize the observation operator about

x_{k}

, this is

\begin{matrix} H (x) \approx H (x_{k}) + H_{k} [{\hat{B}}^{1 / 2} η_{k}], \end{matrix}

(18b)

where

H_{k}

is the Jacobian of

H (x)

at

x_{k}

. By employing this linear Taylor expansion, we obtain the following quadratic approximation of Equation (17c):

\begin{matrix} J (x_{k} + {\hat{B}}^{1 / 2} η_{k}) \approx J_{k} (η_{k}) = \frac{1}{2} ∥ η_{k} + \sum_{p = 0}^{k - 1} η_{p} ∥^{2} + \frac{1}{2} ∥ δ_{k} - {\hat{Q}}_{k} η_{k} ∥_{R^{- 1}}^{2}, \end{matrix}

(18c)

where

δ_{k} = y - H (x_{k}) \in R^{m}

, and

{\hat{Q}}_{k} = H_{k} {\hat{B}}^{1 / 2} \in R^{m \times n}

. The gradient of Equation (18c) reads

\begin{matrix} \nabla J_{k} (η_{k}) = η_{k} + \sum_{p = 0}^{k - 1} η_{p} - {\hat{Q}}_{k}^{T} R^{- 1} [δ_{k} - {\hat{Q}}_{k} η_{k}], \end{matrix}

(18d)

from which an estimate of the optimal weight

η_{k}^{*}

is as follows:

\begin{matrix} η_{k}^{*} = [I + {\hat{Q}}_{k}^{T} R^{- 1} {\hat{Q}}_{k}]^{- 1} {\hat{Q}}_{k}^{T} R^{- 1} δ_{k} - \sum_{p = 0}^{k - 1} η_{p}^{*} . \end{matrix}

(18e)

Since

η_{k}^{*}

is obtained via a quadratic approximation of Equation (2), the step size (Equation (18e)) can be too large. Thus, we employ a Line-Search on Equation (17c) in the direction

{\hat{B}}^{1 / 2} η_{k}^{*}

:

\begin{matrix} ρ_{k}^{*} = arg min_{ρ_{k}} J (x_{k} + ρ_{k} [{\hat{B}}^{1 / 2} η_{k}^{*}]), \end{matrix}

(18f)

and therefore, by letting

η_{k} \approx ρ_{k}^{*} η_{k}^{*}

in Equation (18a), we obtain

\begin{matrix} x_{k + 1} = x_{k} + {\hat{B}}^{1 / 2} [ρ_{k}^{*} η_{k}^{*}] . \end{matrix}

(18g)

This process is repeated until a maximum number of iterations K is reached. Hence, an approximation of the optimal weight (Equation (17d)) reads

\begin{matrix} η^{*} \approx \sum_{k = 0}^{K} η_{k} \approx \sum_{k = 0}^{K} ρ_{k}^{*} η_{k}^{*}, \end{matrix}

(19a)

from which an estimate of the analysis state (Equation (3)) can be computed as follows:

\begin{matrix} x^{a} \approx {\bar{x}}^{a} = {\bar{x}}^{b} + {\hat{B}}^{1 / 2} [\sum_{k = 0}^{K} ρ_{k}^{*} η_{k}^{*}] . \end{matrix}

(19b)

The posterior covariance can be readily obtained from Equation (18d). Posterior weights can be sampled as follows:

\begin{matrix} η^{a [e]} \sim N (η^{*}, [I + {\hat{Q}}_{K}^{T} R^{- 1} {\hat{Q}}_{K}]^{- 1}), for 1 \leq e \leq N, \end{matrix}

(19c)

and therefore, the analysis ensemble members read

\begin{matrix} x^{a [e]} = {\bar{x}}^{b} + {\hat{B}}^{1 / 2} η^{a [e]} . \end{matrix}

(19d)

The analysis members (Equation (19d)) are then propagated in time until new observations are available:

\begin{matrix} x_{next}^{b [e]} = M_{t_{current} \to t_{next}} (x_{next}^{a [e]}) . \end{matrix}

Putting it all together, the entire assimilation process is condensed in Algorithm 1. We call this filter formulation the Maximum Likelihood Ensemble Filter via a Modified Cholesky decomposition (MLEF-MC). Note that our goal is to obtain a minimizer (local optimum) of the 3D-Var optimization problem (Equation (3)). Other families of methods such as the Cluster Sampling Filters [52] target entire posterior density functions, that is, their goal is to draw samples from posterior kernels and using their empirical moments, to estimate posterior modes of error distributions.

3.2. Computational Cost of the MLEF-MC

We detail the computational cost of each line of Algorithm 1, and in this manner, we can estimate the overall computational cost of the MLEF-MC. We do not consider the computational cost of the Line-Search in Equation (18f), which will depend on the algorithm chosen for computing the optimal steps.

In Line 1, the direct inversion of matrix $B^{1 / 2}$ is not actually needed. Note that the optimization variable in Equation (17d) can be expressed in terms of a new control variable $ς_{k}$ as follows:

$\begin{matrix} [{\hat{L}}^{T} {\hat{D}}^{- 1 / 2}] ς_{k} = η_{k}^{*}, \end{matrix}$

(20)

and in this manner, we can exploit the special structure of $\hat{L}$ and $\hat{D}$ to perform forward and backward substitutions on the optimal weights $η_{k}^{*}$ . Thus, the number of computations to solve Equation (20) reads $O (φ^{2} n)$ where $φ$ is the maximum number of nonzero entries across all rows in $\hat{L}$ with $φ ≪ n$ . $φ$ is commonly some function of the radius of influence r.
The computation of ${\hat{Q}}_{k} = H_{k} [{\hat{L}}^{T} {\hat{D}}^{- 1 / 2}]^{- 1}$ in Line 5 can be performed similarly to Equation (20). On the basis of the dimensions of $H_{k}$ , a bound for computing ${\hat{Q}}_{k}$ is as follows: $O (φ^{2} n m)$ .
In Line 7, the bounds for computations are as follows:

$\begin{matrix} η_{k}^{*} = [I + {\hat{Q}}_{k}^{T} R^{- 1} {\hat{Q}}_{k}]^{- 1} \underset{O (n m)}{\underset{︸}{{\hat{Q}}_{k}^{T} \underset{O (m^{2})}{\underset{︸}{R^{- 1} δ_{k}}}}} - \sum_{p = 0}^{k - 1} η_{p}^{*}, \end{matrix}$

where the implicit linear system involving $I + {\hat{Q}}_{k}^{T} R^{- 1} {\hat{Q}}_{k}$ can be solved, for instance, via the iterative Sherman Morrison formula [53] with no more than $O (φ^{2} n m)$ computations. Thus, the computational effort of computing Equation (18e) reads

$\begin{matrix} O (φ^{2} n m + n m + m^{2}) . \end{matrix}$

This bound is valid for Lines 9, 12, and 13. Since Lines 12 and 13 are performed N times, their computational cost reads $O (φ^{2} N n m + n N m + {N m}^{2})$ . Since all computations are performed K times, the overall cost of the MLEF-MC is as follows:

$\begin{matrix} O (K [φ^{2} n + φ^{2} n m + φ^{2} n m + n m + m^{2}]) . \end{matrix}$

(21)

Algorithm 1 Forecasts and Analysis Steps of the MLEF-MC

Require: Background ensemble members ${x^{b [e]}}_{e = 1}^{N}$
Ensure: An estimate of analysis members ${x^{a [e]}}_{e = 1}^{N}$

1:: Estimate ${\hat{B}}^{1 / 2} = [{\hat{L}}^{T} {\hat{D}}^{- 1 / 2}]^{- 1}$ via ${x^{b [e]}}_{e = 1}^{N}$ ▹ Control space estimation
2:: Set $x_{0} \leftarrow {\bar{x}}^{b}$ ▹ Best estimation before observations
3:: for $k \leftarrow 0 \to K$ do ▹ Iterative solution of optimization problem (Equation (17d))
4:: Compute the Jacobian $H_{k}$ of $H (x)$ at $x_{k}$ .
5:: Set ${\hat{Q}}_{k} \leftarrow H_{k} {\hat{B}}^{1 / 2}$
6:: Let $d_{k} \leftarrow y - H (x_{k})$
7:: Compute: ▹k-th weight estimation

$\begin{matrix} η_{k}^{*} = [I + {\hat{Q}}_{k}^{T} R^{- 1} {\hat{Q}}_{k}]^{- 1} {\hat{Q}}_{k}^{T} R^{- 1} δ_{k} - \sum_{p = 0}^{k - 1} δ_{p}^{*} \end{matrix}$
8:: Solve: ▹ Line-Search optimization

$\begin{matrix} ρ_{k}^{*} = arg min_{ρ_{k}} J (x_{k} + ρ_{k} [{\hat{B}}^{1 / 2} η_{k}^{*}]), \end{matrix}$
9:: Let $x_{k + 1} \leftarrow x_{k} + {\hat{B}}^{1 / 2} [ρ_{k}^{*} η_{k}^{*}]$
10:: Set $η^{*} \leftarrow \sum_{k = 0}^{K} ρ_{k}^{*} η_{k}^{*}$ ▹Analysis weight
11:: for $e \leftarrow 1 \to N$ do ▹ Analysis members computation
12:: Set $η^{a [e]} \sim N (η^{*}, [I + {\hat{Q}}_{K}^{T} R^{- 1} {\hat{Q}}_{K}]^{- 1})$
13:: Let $x^{a [e]} \leftarrow {\bar{x}}^{b} + {\hat{B}}^{1 / 2} η^{a [e]}$
14:: for $e \leftarrow 1 \to N$ do ▹ Forecast step
15:: Let $x_{next}^{b [e]} \leftarrow M_{t_{current} \to t_{next}} (x_{current}^{a [e]})$

3.3. Global Convergence of the Analysis Step in the MLEF-MC

To prove the global convergence of the proposed MLEF-MC in the analysis step, we consider the assumptions in Conditions (C-A), (C-B), and

\begin{matrix} \nabla J (x^{k} + {\hat{B}}_{k} η_{k}^{*})^{T} {\hat{J}}_{k} (η_{k}^{*}) < 0, for 0 \leq k \leq K . \end{matrix}

(22)

In the next theorem, we state the necessary conditions to ensure global convergence in the MLEF-MC method.

Theorem 1.

If the Conditions (C-A), (C-B), and Equation (22) hold, then the MLEF-MC with exact Line-Search generates an infinite sequence

{x_{k}}_{k = 0}^{\infty}

, then

\begin{matrix} lim_{k \to \infty} [\frac{- \nabla J (x_{k})^{T} {\hat{B}}^{1 / 2} η_{k}^{*}}{∥ {\hat{B}}^{1 / 2} η_{k}^{*} ∥}]^{2} = 0 \end{matrix}

(23)

holds.

Proof.

By Taylor series, the cost function (Equation (2)) can be expanded as follows:

\begin{matrix} J (x_{k} + ρ_{k}^{*} {\hat{B}}^{1 / 2} η_{k}^{*}) & = & J (x_{k}) \\ + & ρ_{k}^{*} \int_{0}^{1} \nabla J (x_{k} + ρ_{k}^{*} t {\hat{B}}^{1 / 2} η_{k}^{*})^{T} \\ {\hat{B}}^{1 / 2} η_{k}^{*} d t, \end{matrix}

and therefore,

\begin{matrix} J (x_{k}) - J (x_{k + 1}) & \geq & - ρ_{k}^{*} \int_{0}^{1} \nabla J (x_{k} + ρ_{k}^{*} t {\hat{B}}^{1 / 2} η_{k}^{*})^{T} \\ {\hat{B}}^{1 / 2} η_{k}^{*} d t \end{matrix}

for any

x_{k + 1}

on the ray

x_{k} + ρ_{k} {\hat{B}}^{1 / 2} η_{k}^{*}

, with

ρ \in [0, 1]

, we have

\begin{matrix} J (x_{k}) - J (x_{k + 1}) \geq J (x_{k}) - J (x_{k} + ρ_{k}^{*} {\hat{B}}^{1 / 2} η_{k}^{*}), \end{matrix}

hence,

\begin{matrix} J (x_{k}) & - & J (x_{k + 1}) \geq - ρ_{k}^{*} \nabla J (x_{k})^{T} {\hat{B}}^{1 / 2} η_{k}^{*} \\ - & ρ_{k}^{*} \int_{0}^{1} [\nabla J (x_{k} + ρ_{k}^{*} t {\hat{B}}^{1 / 2} η_{k}^{*}) - \nabla J (x_{k})]^{T} \\ {\hat{B}}^{1 / 2} η_{k}^{*} d t, \end{matrix}

by the Cauchy–Schwarz inequality, we have

\begin{matrix} J (x_{k}) & - & J (x_{k + 1}) \geq - ρ_{k}^{*} \nabla J (x_{k})^{T} {\hat{B}}^{1 / 2} η_{k}^{*} \\ - & ρ_{k}^{*} \int_{0}^{1} ∥ \nabla J (x_{k} + ρ_{k}^{*} t {\hat{B}}^{1 / 2} η_{k}^{*}) - \nabla J (x_{k}) ∥ \\ ∥ {\hat{B}}^{1 / 2} η_{k}^{*} ∥ d t \\ \geq & - ρ_{k}^{*} \nabla J (x_{k})^{T} {\hat{B}}^{1 / 2} η_{k}^{*} \\ - & ρ_{k}^{*} \int_{0}^{1} L ∥ ρ_{k}^{*} t {\hat{B}}^{1 / 2} η_{k}^{*} ∥ ∥ {\hat{B}}^{1 / 2} η_{k}^{*} ∥ d t \\ = & - ρ_{k}^{*} \nabla J (x_{k})^{T} {\hat{B}}^{1 / 2} η_{k}^{*} \\ - & ρ_{k}^{*} L ∥ {\hat{B}}^{1 / 2} η_{k}^{*} ∥ \int_{0}^{1} ∥ t ρ_{k}^{*} {\hat{B}}^{1 / 2} η_{k}^{*} ∥ d t \\ = & - ρ_{k}^{*} \nabla J (x_{k})^{T} {\hat{B}}^{1 / 2} η_{k}^{*} - \frac{1}{2} ρ_{k}^{* 2} L ∥ {\hat{B}}^{1 / 2} η_{k}^{*} ∥^{2}, \end{matrix}

and choose

\begin{matrix} ρ_{k}^{*} = - \frac{\nabla J (x_{k})^{T} {\hat{B}}^{1 / 2} η_{k}^{*}}{L ∥ {\hat{B}}^{1 / 2} η_{k}^{*} ∥^{2}}, \end{matrix}

therefore,

\begin{matrix} J (x_{k}) & - & J (x_{k + 1}) \geq \frac{[\nabla J (x_{k})^{T} {\hat{B}}^{1 / 2} η_{k}^{*}]^{2}}{L ∥ {\hat{B}}^{1 / 2} η_{k}^{*} ∥^{2}} \\ - & \frac{1}{2} \frac{[- \nabla J (x_{k})^{T} {\hat{B}}^{1 / 2} η_{k}^{*}]^{2}}{L ∥ {\hat{B}}^{1 / 2} η_{k}^{*} ∥^{2}} \\ = & \frac{1}{2 L} [- \frac{\nabla J (x_{k})^{T} {\hat{B}}^{1 / 2} η_{k}^{*}}{∥ {\hat{B}}^{1 / 2} η_{k}^{*} ∥}]^{2} . \end{matrix}

By Condition (C-A), and Equation (22), it follows that

{J (x_{k})}_{k = 0}^{\infty}

is a monotone decreasing number sequence and it has a bound below, therefore

{J (x_{k})}_{k = 0}^{\infty}

has a limit, and consequently Equation (23) holds. □

3.4. Further Comments

Note that the proposed filter implementation performs the analysis step onto a control space whose dimension is equal to that of the model. This space is obtained via a modified Cholesky decomposition to mitigate the impact of sampling errors. Furthermore, its computational cost is linear with regard to the model size, which makes the MLEF-MC formulation for operational settings attractive. Moreover, the analysis step globally converges to posterior modes of the error distribution. The next section assesses the accuracy of our proposed filter implementation in several experimental settings.

4. Numerical Simulations

In this section, we test the proposed MLEF-MC implementation and compare our results with those obtained by the well-know MLEF method. We make use of two surrogate models for the experiments: the Lorenz-96 model [54] and an Atmospheric General Circulation Model (AT-GCM). In both cases, we consider the following general settings:

Starting with a random solution, we employ the numerical model to obtain an initial condition which is consistent with the model dynamics. In a similar fashion, the background state, the actual state, and the initial ensemble are computed;
We consider the following nonlinear observation operator [55]:

$\begin{matrix} {H (x)}_{j} = \frac{{x}_{j}}{2} [{(\frac{| {x}_{j} |}{2})}^{γ - 1} + 1], \end{matrix}$

(24)

where j denotes the j-th observed component from the model state, for $1 \leq j \leq m$ . Likewise, we vary $γ$ in $γ \in {1, 3, 5}$ . Note that we start with a linear observation operator and end up with a highly nonlinear one. Since this observation operator is nondifferentiable, we employ the sign function to approximate its derivative:

$\begin{matrix} \frac{\partial H (x)}{\partial {x}_{j}} = \frac{{(\frac{|{x}_{j}|}{2})}^{γ - 1}}{2} + \frac{x sign ({x}_{j}) {(\frac{|{x}_{j}|}{2})}^{γ - 2} (γ - 1)}{4} + \frac{1}{2}; \end{matrix}$
The $ℓ - 2$ norm measures the accuracy of analysis states at assimilation stages,

$\begin{matrix} λ_{t} = ε (x_{[t]}^{*}, x_{[t]}^{a}) = \sqrt{{[x_{t}^{*} - x_{t}^{a}]}^{T} [x_{t}^{*} - x_{t}^{a}]}, for 0 \leq t \leq M, \end{matrix}$

(25)

where $x_{t}^{*}$ and $x_{t}^{a}$ are the reference and the analysis solution at the assimilation step t, respectively;
We employ the Root Mean Square Error (RMSE) as a measure of accuracy (average) for an entire set of time-spaced observations,

$\begin{matrix} λ = \sqrt{\frac{1}{M} \sum_{t = 0}^{M} λ_{t}^{2}}; \end{matrix}$

(26)
We employ a Truncated Singular Value Decomposition (T-SVD) to fit the models (Equation (9));
All experiments were performed under perfect model assumptions. No model errors were present during the assimilation steps;
We employ the MLEF formulation proposed by Zupansky in [23].

4.1. The Lorenz-96 Model

The Lorenz-96 model is described by the following ordinary differential equations [56]:

\begin{matrix} \frac{d x_{j}}{d t} = \{\begin{matrix} (x_{2} - x_{n - 1}) x_{n} - x_{1} + F & for j = 1, \\ (x_{j + 1} - x_{j - 2}) x_{j - 1} - x_{j} + F & for 2 \leq j \leq n - 1, \\ (x_{1} - x_{n - 2}) x_{n - 1} - x_{n} + F & for j = n, \end{matrix} \end{matrix}

(27)

where n is the number of model variables, and F is the external force. Periodic boundary conditions are assumed. When

F = 8

and

n = 40

, the model exhibits chaotic behavior, which makes it a relevant surrogate problem for atmospheric dynamics [57]. One time unit in the Lorenz-96 represents 7 days in the atmosphere. Details regarding the construction of the reference solution, background state, initial background ensemble member, and experimental settings are as follows:

We create an initial pool $\hat{X^{b}}$ of $\hat{N} = 1000$ ensemble members. For each experiment, we sample $N = 20$ members from $\hat{X^{b}}$ to obtain the initial ensemble $X^{b}$ . Two-dimensional projections of the initial pool making use of its two leading directions are shown in Figure 2;
The assimilation window consists of $M = 100$ time-spaced observations. Two observation frequencies are employed during the experiments: 16 h (time step of 0.1 time units) and 80 h (time step of 0.5 time units). We denote by $δ t \in {16, 80}$ the time between two subsequent observations;
At assimilation times, observational errors are characterized by Gaussian distributions with parameters

$\begin{matrix} y \sim N (H (x^{*}), σ_{o}^{2} I), for 0 \leq t \leq M, \end{matrix}$

where $x^{*}$ is the actual state of the system, and $σ_{o}$ is the noise level. We tried three different noise levels for the observations $σ_{o} = {0.01, 0.1, 1}$ ;
We consider two percentage of observations (s): $70 %$ of model components ( $s = 0.7$ ) and $100 %$ of model components ( $s = 1$ ). The components are randomly chosen at the different assimilation steps;
The radii of influence to compute control spaces are ranged in $r \in {1, 3, 5}$ ;
The ensemble size for the MLEF-MC reads $N = 40$ ;
For a reference, we employ a MLEF method with an ensemble size of $N = 100$ members and a full observational network $s = 1$ . Note that this ensemble size is more than twice the model resolution $n = 40$ . In this manner, we can have an idea about how errors should evolve for large ensemble sizes and full observational networks. We refer to this as the ideal case.

The evolution of errors for the proposed filter implementation is detailed in Figure 3 and Figure 4 for the percentage of observations

s = 1

and

s = 0.7

, respectively. We employ a log-scale of

ℓ - 2

error norms for ease of reading. Note that as the noise level

σ_{o}

increases, the accuracy of the MLEF-MC degrades. This should be expected since more uncertainty is injected into the observations and as a direct consequence, the expected posterior errors increased. Nevertheless, in all cases, the evolution of errors are visually bounded (they do not blow up), and therefore, filter convergence is evidenced. For full observational networks, increments in the observation frequencies do not degrade the quality of the analysis increments; however, for observation coverages of

s = 0.7

, the initial accuracies (spin-up period) can be impacted slightly as the observation frequency increases. However, this does not prevent errors from becoming stable (and to decrease) in time. Note that the degree

γ

of the observation operator does not impact the quality of analysis corrections in the MLEF-MC method. One can see that errors are stable in time regardless of the degree of

H (x)

. On the other hand, the radius coverage plays an important role in the assimilation of observations as the time frequency of observations increases. For instance, as

δ t

increases, the forecast steps are longer, and therefore, more information about background error correlations can be properly captured in our estimate

{\hat{B}}^{- 1}

. Recall that background error correlations are driven by the nonlinear dynamics of the model (Equation (27)), and given the special structure of the ODE system (Equation (27)), it is reasonable to think that radius lengths larger than one can provide useful information to unobserved components during the analysis corrections. Thus, as the radius length increases, errors in the MLEF-MC behave similar to those in the ideal case.

In Figure 5 and Figure 6, we report the gradient norms of the initial assimilation step for

s = 1

and

s = 0.7

, respectively, for the MLEF-MC implementation. Note that for small

γ

and

σ_{o}

values, gradient norms are similarly decreased for different values of r among iterations in the MLEF-MC context. As the noise level increases, high accuracies demand more iterations for large r values. Thus, the noise level plays an important role as long as radius lengths are increased. As should be expected, the rate of convergence can be impacted by the degree of the nonlinear observation operator. Recall that we employ a second-order approximation of

J (x)

to estimate its gradient, and therefore, as the degree

γ

increases, small step lengths will be employed by the Line-Search method, among iterations.

For the first assimilation cycle, we show two-dimensional projections of the optimization steps using the two leading directions of

\hat{X^{b}}

in Figure 7 and Figure 8 for observation coverages of

s = 1

and

s = 0.7

, respectively. We report the actual state

x^{*}

, some samples from the background error distribution

\hat{X^{b}}

, and the iterates for different r values. The ideal case is also reported. Note that as the degree

γ

increases, more iterations are needed before we obtain a reasonable estimate of

x^{*}

. As we mentioned before, second-order Taylor-based approximations can poorly estimate

\nabla J (x)

as

γ

increases. As can be seen, as the noise level increases, the analysis estimate for the different radius lengths can be impacted.

In Figure 9, we report the average of elapsed times for computing analysis increments across

M = 100

assimilation steps. As can be seen, as the radius of influence increases, the elapsed time of assimilation steps slightly increases. This agrees with the bound (Equation (21)) wherein the computational cost of the MLEF-MC formulation linearly depends on the model resolution n. Recall that the factor r is strictly related to

φ

, which in turn is bounded by n. In practice,

φ ≪ n

.

It is essential to note that by employing a modified Cholesky decomposition (Equation (8)), the degree of freedom of the control space (Equation (10)) is artificially increased. Thus, we have more directions (which are consistent with the model dynamics) onto which error dynamics can be captured. This is similar to having a localized square-root approximation of B. In this manner, we can decorrelate distant model components based on our prior knowledge about the model dynamics. Moreover, we can also decrease the impact of sampling errors. All these properties are possessed by our set of basis vectors (Equation (17a)), which can explain why our proposed filter implementation can decrease initial background errors by several orders of magnitudes. This obeys two important facts: the control-space dimension is equal to that of the model, and more importantly, MLEF-MC ensures convergence as long as the conditions of Theorem 1 are satisfied.

4.2. An Atmospheric General Circulation Model (AT-GCM)

In this section, we study the performance of the MLEF-MC method by using a highly nonlinear model: the SPEEDY model. This model is an atmospheric general circulation model that mimics the behavior of the atmosphere across different pressure levels [58,59]. The number of numerical layers in this model is seven, and we employ a T-30 spectral model resolution (

96 \times 48 \times 7

grid components) for the space discretization of each model layer [60,61]. We employ four model variables. These are detailed in Table 1 with their corresponding units and the number of layers.

Note that the total number of model components to be estimated is n = 133,632. We set the number of model realizations (ensemble size) as

N = 30

for all experimental scenarios. In this case, the model resolution is approximately 4454 times larger than the sample size (n ≫ N), which is very common under operational DA scenarios. Additional details of the experimental settings are described below, some are similar to those detailed in [62]:

Starting with a system in equilibrium, the model is integrated over a long time period to obtain an initial condition whose dynamics are consistent with those of the SPEEDY model;
The initial condition is perturbed N times and propagated over a long time period from which the initial background ensemble is obtained;
We employ the trajectory of the initial condition as the reference. This reference trajectory serves to build synthetic observations;
We set the standard deviations of errors in the observations as follows:
-
Temperature 1 K;
-
Zonal Wind Component 1 m/s;
-
Meridional Wind Component 1 m/s;
-
Specific Humidity $10^{- 3}$ g/kg;
Two percentages of observations are tried during the experiments: $s = 0.7$ and $s = 1$ . Figure 10 shows an example of this operator;
Observations are available every six hours (6 h);
The experiments are performed under perfect model assumptions;
The number of assimilation steps is $M = 12$ . Thus, the total simulation time is 7.5 days.

Table 2 shows the RMSE values for the MLEF-MC method. We vary the nonlinear degree of

γ

, and the percentage of observations s. Likewise, the radius of influence r is 1. As can be seen, the proposed filter implementation can decrease forecast errors for all model variables by, in some cases, several orders of magnitudes. As the degree of the observation operator increases, analysis errors can impact the analysis corrections, but all analysis errors are within the same orders of magnitude. Moreover, filter convergence is evident for all synthetic scenarios which agrees with Theorem 1. Note that as the number of observations increases, the accuracy of posterior estimates improves. This is expected since more information regarding the error dynamics is injected into the numerical forecast.

Figure 11 and Figure 12 show the time evolution of errors for

s = 0.7

and

s = 1.0

, respectively. Clearly, initial errors are drastically decreased by the proposed filter implementation. This behavior is obtained regardless of the degree

γ

of the nonlinear observation operator (Equation (24)). As we mentioned before, the more observations employed during assimilation steps, the faster the posterior errors can be decreased. Furthermore, on the basis of the number of observations, the differences between posterior errors can be of orders of magnitude.

Figure 13 shows snapshots of the first assimilation step. The results are reported for the first numerical layer of the SPEEDY model and the model variables u and T. As can be seen, background errors are drastically improved by the MLEF-MC method. Spurious waves near the poles of the T and u variables are quickly dissipated, and the numerical model retains the actual shapes (and magnitudes) of these variables.

5. Conclusions

Satellite remote sensing with a wide range of sources, for instance, on-board sensors, platforms, and satellite data (which provides genuine earth observation information), has transformed our view of the Earth and its environment. These sensors offer different types of observations on large scales and over decades. Typically, observations (data) are nonlinearly related to model states. This paper proposes a Maximum Likelihood Ensemble Filter method via a Modified Cholesky decomposition (MLEF-MC) for nonlinear data assimilation. This method works as follows: snapshots of an ensemble of model realizations are taken at observation steps; these ensembles are employed to build control spaces onto which analysis increments can be estimated. The control spaces are obtained via a modified Cholesky decomposition. The control-space dimension is equal to that of the model, which mitigates the impact of sampling errors. Experimental tests were performed by using the Lorenz-96 model and an Atmospheric General Circulation Model (AT-GCM). The well-known Maximum Likelihood Ensemble Filter (MLEF) was employed with an ensemble size of 100 and a full observational network as a reference method to compare solutions. The results reveal that the proposed filter implementation performs similarly to the MLEF implementation (ideal case) in terms of

ℓ_{2}

error norms and Root Mean Square Error values.

Author Contributions

E.D.N.-R., and A.M.-H. derive the MLEF-MC filter; E.D.N.-R., A.M.-H., and S.L.-R. conceived and designed the experiments; S.L.-R., and O.Q.-M. performed the experiments; E.D.N.-R., and O.Q.-M. analyzed the data; E.D.N.-R., A.M.-H. and O.Q.-M. wrote the paper. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

This work was supported in part by award UN 2018-38, and by the Applied Math and Computer Science Laboratory at Universidad del Norte (AML-CS).

Conflicts of Interest

The authors declare no conflict of interest.

References

Asner, G.P.; Warner, A.S. Canopy shadow in IKONOS satellite observations of tropical forests and savannas. Remote Sens. Environ. 2003, 87, 521–533. [Google Scholar] [CrossRef]
Mayr, S.; Kuenzer, C.; Gessner, U.; Klein, I.; Rutzinger, M. Validation of Earth Observation Time-Series: A Review for Large-Area and Temporally Dense Land Surface Products. Remote Sens. 2019, 11, 2616. [Google Scholar] [CrossRef] [Green Version]
Jin, X.; Kumar, L.; Li, Z.; Feng, H.; Xu, X.; Yang, G.; Wang, J. A review of data assimilation of remote sensing and crop models. Eur. J. Agron. 2018, 92, 141–152. [Google Scholar] [CrossRef]
Khaki, M. Data Assimilation and Remote Sensing Data. In Satellite Remote Sensing in Hydrological Data Assimilation; Springer: Basel, Switzerland, 2020; pp. 7–9. [Google Scholar]
Evensen, G. The Ensemble Kalman Filter: Theoretical formulation and practical implementation. Ocean Dyn. 2003, 53, 343–367. [Google Scholar] [CrossRef]
Evensen, G. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. Oceans 1994, 99, 10143–10162. [Google Scholar] [CrossRef]
Nino-Ruiz, E.D.; Sandu, A. Ensemble Kalman filter implementations based on shrinkage covariance matrix estimation. Ocean Dyn. 2015, 65, 1423–1439. [Google Scholar] [CrossRef] [Green Version]
Nino-Ruiz, E.D.; Sandu, A.; Deng, X. An Ensemble Kalman Filter Implementation Based on Modified Cholesky Decomposition for Inverse Covariance Matrix Estimation. SIAM J. Sci. Comput. 2018, 40, A867–A886. [Google Scholar] [CrossRef]
Bishop, C.H.; Etherton, B.J.; Majumdar, S.J. Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Weather Rev. 2001, 129, 420–436. [Google Scholar] [CrossRef]
Hunt, B.R.; Kostelich, E.J.; Szunyogh, I. Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D 2007, 230, 112–126. [Google Scholar]
Petrie, R.E. Localization in the Ensemble Kalman Filter. Master’s Thesis, University of Reading, Reading, UK, August 2008. [Google Scholar]
Hamill, T.M.; Whitaker, J.S.; Snyder, C. Distance-Dependent Filtering of Background Error Covariance Estimates in an Ensemble Kalman Filter. Mon. Weather Rev. 2001, 129, 2776–2790. [Google Scholar] [CrossRef] [Green Version]
Nino-Ruiz, E.D.; Sandu, A.; Deng, X. A parallel ensemble Kalman filter implementation based on modified Cholesky decomposition. In Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Austin, TX, USA, 15–20 November 2015; pp. 1–8. [Google Scholar]
Nino-Ruiz, E.D.; Sandu, A.; Deng, X. A parallel implementation of the ensemble Kalman filter based on modified Cholesky decomposition. J. Comput. Sci. 2019, 36, 100654. [Google Scholar]
Nino-Ruiz, E. A matrix-free posterior ensemble kalman filter implementation based on a modified cholesky decomposition. Atmosphere 2017, 8, 125. [Google Scholar]
Bickel, P.J.; Levina, E. Regularized estimation of large covariance matrices. Ann. Stat. 2008, 36, 199–227. [Google Scholar] [CrossRef]
Dellaportas, P.; Pourahmadi, M. Cholesky-GARCH models with applications to finance. Stat. Comput. 2012, 22, 849–855. [Google Scholar] [CrossRef]
Rajaratnam, B.; Salzman, J. Best permutation analysis. J. Multivar. Anal. 2013, 121, 193–223. [Google Scholar]
Kang, X.; Deng, X.; Tsui, K.W.; Pourahmadi, M. On variable ordination of modified Cholesky decomposition for estimating time-varying covariance matrices. Int. Stat. Rev. 2019, 1, 1–26. [Google Scholar]
Zheng, H.; Tsui, K.W.; Kang, X.; Deng, X. Cholesky-based model averaging for covariance matrix estimation. Stat. Theor. Relat. Fields 2017, 1, 48–58. [Google Scholar] [CrossRef]
Bertino, L.; Evensen, G.; Wackernagel, H. Sequential Data Assimilation Techniques in Oceanography. Int. Stat. Rev. 2007, 71, 223–241. [Google Scholar] [CrossRef]
Zupanski, M.; Navon, I.M.; Zupanski, D. The Maximum Likelihood Ensemble Filter as a non-differentiable minimization algorithm. Q. J. R. Meteorol. Soc. 2008, 134, 1039–1050. [Google Scholar] [CrossRef] [Green Version]
Zupanski, M. Maximum Likelihood Ensemble Filter: Theoretical Aspects. Mon. Weather Rev. 2005, 133, 1710–1726. [Google Scholar] [CrossRef]
Fletcher, S.J.; Zupanski, M. A study of ensemble size and shallow water dynamics with the Maximum Likelihood Ensemble Filter. Tellus A 2008, 60, 348–360. [Google Scholar] [CrossRef]
Carrassi, A.; Vannitsem, S.; Zupanski, D.; Zupanski, M. The maximum likelihood ensemble filter performances in chaotic systems. Tellus A 2009, 61, 587–600. [Google Scholar] [CrossRef] [Green Version]
Tran, A.P.; Vanclooster, M.; Zupanski, M.; Lambot, S. Joint estimation of soil moisture profile and hydraulic parameters by ground-penetrating radar data assimilation with maximum likelihood ensemble filter. Water Resour. Res. 2014, 50, 3131–3146. [Google Scholar] [CrossRef]
Zupanski, D.; Zupanski, M. Model Error Estimation Employing an Ensemble Data Assimilation Approach. Mon. Weather Rev. 2006, 134, 1337–1354. [Google Scholar] [CrossRef] [Green Version]
Vanderplaats, G.N. Numerical Optimization Techniques for Engineering Design: With Applications; McGraw-Hill: New York, NY, USA, 1984; Volume 1. [Google Scholar]
Wright, S.; Nocedal, J. Numerical optimization; Springer Science: Berlin, Germany, 1999; Volume 35, p. 7. [Google Scholar]
Savard, G.; Gauvin, J. The steepest descent direction for the nonlinear bilevel programming problem. Oper. Res. Lett. 1994, 15, 265–272. [Google Scholar] [CrossRef]
Hager, W.W.; Zhang, H. A survey of nonlinear conjugate gradient methods. Pac. J. Optim. 2006, 2, 35–58. [Google Scholar]
Fletcher, R.; Reeves, C.M. Function minimization by conjugate gradients. Comput. J. 1964, 7, 149–154. [Google Scholar] [CrossRef] [Green Version]
Lewis, R.M.; Torczon, V.; Trosset, M.W. Direct search methods: Then and now. J. Comput. Appl. Math. 2000, 124, 191–207. [Google Scholar] [CrossRef] [Green Version]
Battiti, R. First-and second-order methods for learning: Between steepest descent and Newton’s method. Neural Comput. 1992, 4, 141–166. [Google Scholar] [CrossRef]
Grippo, L.; Lampariello, F.; Lucidi, S. A truncated Newton method with nonmonotone line search for unconstrained optimization. J. Optim. Theory Appl. 1989, 60, 401–419. [Google Scholar] [CrossRef]
Pan, V.Y.; Branham, S.; Rosholt, R.E.; Zheng, A.L. Newton’s iteration for structured matrices. In Fast Reliable Algorithms for Matrices with Structure; SIAM: Philadelphia, PA, USA, 1999; pp. 189–210. [Google Scholar]
Shanno, D.F. Conditioning of quasi-Newton methods for function minimization. Math. Comput. 1970, 24, 647–656. [Google Scholar] [CrossRef]
Nocedal, J. Updating quasi-Newton matrices with limited storage. Math. Comput. 1980, 35, 773–782. [Google Scholar] [CrossRef]
Loke, M.H.; Barker, R. Rapid least-squares inversion of apparent resistivity pseudosections by a quasi-Newton method. Geophys. Prospect. 1996, 44, 131–152. [Google Scholar] [CrossRef]
Knoll, D.A.; Keyes, D.E. Jacobian-free Newton–Krylov methods: A survey of approaches and applications. J. Comput. Phys. 2004, 193, 357–397. [Google Scholar] [CrossRef] [Green Version]
Grippo, L.; Lampariello, F.; Lucidi, S. A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 1986, 23, 707–716. [Google Scholar] [CrossRef]
Uschmajew, A.; Vandereycken, B. Line-search methods and rank increase on low-rank matrix varieties. In Proceedings of the 2014 International Symposium on Nonlinear Theory and Its Applications (NOLTA2014), Luzern, Switzerland, 14–18 September 2014; pp. 52–55. [Google Scholar]
Hosseini, S.; Huang, W.; Yousefpour, R. Line search algorithms for locally Lipschitz functions on Riemannian manifolds. SIAM J. Optim. 2018, 28, 596–619. [Google Scholar] [CrossRef] [Green Version]
Conn, A.R.; Gould, N.I.; Toint, P.L. Trust Region Methods; SIAM: Philadelphia, PA, USA, 2000; Volume 1. [Google Scholar]
Moré, J.J.; Sorensen, D.C. Computing a trust region step. SIAM J. Sci. Comput. 1983, 4, 553–572. [Google Scholar] [CrossRef] [Green Version]
Curtis, F.E.; Robinson, D.P.; Samadi, M. A trust region algorithm with a worst-case iteration complexity of $O$ (ϵ^−3/2) for nonconvex optimization. Math. Program. 2017, 162, 1–32. [Google Scholar] [CrossRef]
Shi, Z.J. Convergence of line search methods for unconstrained optimization. Appl. Math. Comput. 2004, 157, 393–405. [Google Scholar] [CrossRef]
Zhou, W.; Akrotirianakis, I.; Yektamaram, S.; Griffin, J. A matrix-free line-search algorithm for nonconvex optimization. Optim. Methods Softw. 2017, 34, 1–24. [Google Scholar] [CrossRef]
Dunn, J.C. Newton’s method and the Goldstein step-length rule for constrained minimization problems. SIAM J. Control Optim. 1980, 18, 659–674. [Google Scholar] [CrossRef]
Dai, Y.H.; Yuan, Y. A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 1999, 10, 177–182. [Google Scholar] [CrossRef] [Green Version]
Ravindran, A.; Reklaitis, G.V.; Ragsdell, K.M. Engineering Optimization: Methods and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Attia, A.; Moosavi, A.; Sandu, A. Cluster sampling filters for non-Gaussian data assimilation. Atmosphere 2018, 9, 213. [Google Scholar] [CrossRef] [Green Version]
Nino-Ruiz, E.D.; Sandu, A.; Anderson, J. An efficient implementation of the ensemble Kalman filter based on an iterative Sherman–Morrison formula. Stat. Comput. 2015, 25, 561–577. [Google Scholar] [CrossRef] [Green Version]
Lorenz, E.N. Designing Chaotic Models. J. Atmos. Sci. 2005, 62, 1574–1587. [Google Scholar] [CrossRef]
Van Leeuwen, P.J. Nonlinear data assimilation in geosciences: An extremely efficient particle filter. Q. J. R. Meteorol. Soc. 2010, 136, 1991–1999. [Google Scholar] [CrossRef] [Green Version]
Gottwald, G.A.; Melbourne, I. Testing for chaos in deterministic systems with noise. Physica D 2005, 212, 100–110. [Google Scholar] [CrossRef] [Green Version]
Karimi, A.; Paul, M. Extensive Chaos in the Lorenz-96 Model. Chaos 2010, 20, 043105. [Google Scholar] [CrossRef] [Green Version]
Bracco, A.; Kucharski, F.; Kallummal, R.; Molteni, F. Internal variability, external forcing and climate trends in multi-decadal AGCM ensembles. Clim. Dyn. 2004, 23, 659–678. [Google Scholar] [CrossRef]
Miyoshi, T. The Gaussian approach to adaptive covariance inflation and its implementation with the local ensemble transform Kalman filter. Mon. Weather Rev. 2011, 139, 1519–1535. [Google Scholar] [CrossRef]
Molteni, F. Atmospheric simulations using a GCM with simplified physical parametrizations. I: Model climatology and variability in multi-decadal experiments. Clim. Dyn. 2003, 20, 175–191. [Google Scholar] [CrossRef]
Kucharski, F.; Molteni, F.; Bracco, A. Decadal interactions between the western tropical Pacific and the North Atlantic Oscillation. Clim. Dyn. 2006, 26, 79–91. [Google Scholar] [CrossRef]
Miyoshi, T.; Kondo, K.; Imamura, T. The 10,240-member ensemble Kalman filtering with an intermediate AGCM. Geophys. Res. Lett. 2014, 41, 5264–5271. [Google Scholar] [CrossRef]

Figure 1. Structure of the Cholesky factor

{\hat{L}}_{k}

as a function of the localization radius r.

Figure 1. Structure of the Cholesky factor

{\hat{L}}_{k}

as a function of the localization radius r.

Figure 2. 2D projections of the initial pool

\hat{X^{b}}

. Its two leading directions are employed for the projections.

Figure 2. 2D projections of the initial pool

\hat{X^{b}}

. Its two leading directions are employed for the projections.

Figure 3. Error evolution in the log-scale of the compared filter implementations. Different time frequencies of observations were employed during the experiments. The percentage of observations from the model state reads

s = 100 %

.

Figure 3. Error evolution in the log-scale of the compared filter implementations. Different time frequencies of observations were employed during the experiments. The percentage of observations from the model state reads

s = 100 %

.

Figure 4. Error evolution in the log-scale of the compared filter implementations. Different time frequencies of observations were employed during the experiments. The percentage of observations from the model state reads

s = 70 %

.

Figure 4. Error evolution in the log-scale of the compared filter implementations. Different time frequencies of observations were employed during the experiments. The percentage of observations from the model state reads

s = 70 %

.

Figure 5. Gradient norms in the log-scale of the Maximum Likelihood Ensemble Filter via a Modified Cholesky decomposition (MLEF-MC) for the initial assimilation step. Different time frequencies of observations were employed during the experiments. The observation coverage from the model state reads

s = 100 %

.

Figure 5. Gradient norms in the log-scale of the Maximum Likelihood Ensemble Filter via a Modified Cholesky decomposition (MLEF-MC) for the initial assimilation step. Different time frequencies of observations were employed during the experiments. The observation coverage from the model state reads

s = 100 %

.

Figure 6. Gradient norms in the log-scale of the MLEF-MC for the initial assimilation step. Different time frequencies of observations were employed during the experiments. The observation coverage from the model state reads

s = 70 %

.

Figure 6. Gradient norms in the log-scale of the MLEF-MC for the initial assimilation step. Different time frequencies of observations were employed during the experiments. The observation coverage from the model state reads

s = 70 %

.

Figure 7. Snapshots of iterates for the initial analysis step. Different time frequencies of observations were employed during the experiments. The percentage of observations from the model state reads

s = 100 %

.

Figure 7. Snapshots of iterates for the initial analysis step. Different time frequencies of observations were employed during the experiments. The percentage of observations from the model state reads

s = 100 %

.

Figure 8. Snapshots of iterates for the initial analysis step. Different time frequencies of observations were employed during the experiments. The percentage of observations from the model state reads

s = 70 %

.

Figure 8. Snapshots of iterates for the initial analysis step. Different time frequencies of observations were employed during the experiments. The percentage of observations from the model state reads

s = 70 %

.

Figure 9. Average of elapsed times (in seconds) of the assimilation step for the MLEF-MC. Different parameters r were tried during experiments.

Figure 10. Linear observation operator during assimilation steps. Shaded regions denote observed components (observations) from the model state. The operator is replicated across all numerical layers.

Figure 11. Time evolution of analysis errors for different values of parameters

γ

and r (MLEF-MC).

s = 1

(full observational network).

Figure 11. Time evolution of analysis errors for different values of parameters

γ

and r (MLEF-MC).

s = 1

(full observational network).

Figure 12. Time evolution of analysis errors for different values of parameters

γ

and r (MLEF-MC).

s = 1

(full observational network).

Figure 12. Time evolution of analysis errors for different values of parameters

γ

and r (MLEF-MC).

s = 1

(full observational network).

Figure 13. Snapshots of the SPEEDY model for the reference solution, the background estimate, and the analysis of the MLEF-MC. Results are shown for the first numerical layer (100 hPa), and the first assimilation step.

Table 1. Physical variables of the SPEEDY model.

Name	Notation	Units	Number of Layers
Temperature	T	K	7
Zonal Wind Component	u	m/s	7
Meridional Wind Component	v	m/s	7
Specific Humidity	Q	g/kg	7

Table 2. Root Mean Square Error (RMSE) values for different values of s and

γ

.

Table 2. Root Mean Square Error (RMSE) values for different values of s and

γ

.

Variable	NODA	s = 0.7			s = 1
Variable	NODA	$γ = 1$	$γ = 3$	$γ = 5$	$γ = 1$	$γ = 3$	$γ = 5$
u (m/s)	$330.7048$	0.6315	0.7113	0.7990	0.4703	0.5447	0.6232
v (m/s)	$336.0850$	0.5974	0.6742	0.7717	0.4708	0.5436	0.6266
T (K)	$196.0983$	0.6828	0.7416	0.8029	0.5402	0.6048	0.6629
Q (g/kg)	$0.1010$	0.0032	0.0070	0.0135	0.0026	0.0058	0.0113

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nino-Ruiz, E.D.; Mancilla-Herrera, A.; Lopez-Restrepo, S.; Quintero-Montoya, O. A Maximum Likelihood Ensemble Filter via a Modified Cholesky Decomposition for Non-Gaussian Data Assimilation. Sensors 2020, 20, 877. https://doi.org/10.3390/s20030877

AMA Style

Nino-Ruiz ED, Mancilla-Herrera A, Lopez-Restrepo S, Quintero-Montoya O. A Maximum Likelihood Ensemble Filter via a Modified Cholesky Decomposition for Non-Gaussian Data Assimilation. Sensors. 2020; 20(3):877. https://doi.org/10.3390/s20030877

Chicago/Turabian Style

Nino-Ruiz, Elias David, Alfonso Mancilla-Herrera, Santiago Lopez-Restrepo, and Olga Quintero-Montoya. 2020. "A Maximum Likelihood Ensemble Filter via a Modified Cholesky Decomposition for Non-Gaussian Data Assimilation" Sensors 20, no. 3: 877. https://doi.org/10.3390/s20030877

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Maximum Likelihood Ensemble Filter via a Modified Cholesky Decomposition for Non-Gaussian Data Assimilation

Abstract

1. Introduction

2. Preliminaries

2.1. Ensemble-Based Data Assimilation

2.2. Line-Search Optimization Methods

3. A Maximum Likelihood Ensemble Filter via a Modified Cholesky Decomposition

3.1. Filter Derivation

3.2. Computational Cost of the MLEF-MC

3.3. Global Convergence of the Analysis Step in the MLEF-MC

3.4. Further Comments

4. Numerical Simulations

4.1. The Lorenz-96 Model

4.2. An Atmospheric General Circulation Model (AT-GCM)

5. Conclusions

Author Contributions

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI