solve weighted least-squares
W1 = sqrtm(W)
du0 = np.linalg.lstsq(W1@Q, W1@ef, rcond=None)[0]
return du0.ravel()
```
#### *Example: Lorenz 63 System*

We apply the described FSM to estimate the initial conditions for the Lorenz 63 system, using the same parameters and setup as described in Section 4. Sample code is presented in Listing 13. We highlight that instead of adding the correction vector Δ**u**<sup>0</sup> directly to the base value **u**(*t*0), we multiply it with a learning rate to mitigate the effects of first-order approximations. We utilize the golden search method to update this learning rate at each iteration.

✝ ✆

**Listing 13.** Implementation of the FSM for the Lorenz 63 system. ✞ ☎

```
#%% Application: Lorenz 63
########################### Data Assimilation #################################
u0b = np.array([2.0,3.0,4.0])
u0a = u0b
J0 = loss(Lorenz63,h,t,ind_m,u0a,w,R,1,sigma,beta,rho)
for iter in range(200):
#computing the correction vector
du0 = fsm1st(Lorenz63,JLorenz63,h,Dh,t,ind_m,u0a,w,R,1,sigma,beta,rho)
#minimization direction
p = du0#/np.linalg.norm(du0)
#Golden method for linesearch
alpha = GoldenAlpha(p,Lorenz63,h,t,ind_m,u0a,w,R,1,sigma,beta,rho)
#update initial condition with gradient descent
u0a = u0a + alpha*p
```
*Fluids* **2020**, *5*, 225

```
J = loss(Lorenz63,h,t,ind_m,u0a,w,R,1,sigma,beta,rho)
if np.abs(J0-J) < 1e-2:
print('Convergence: loss function')
break
#else:
J0=J
if np.linalg.norm(du0) < 1e-4:
print('Convergence: correction vector')
break
##################### Time Integration [Comparison] ###########################
ub = np.zeros([3,nt+1])
ub[:,0] = u0b
ua = np.zeros([3,nt+1])
ua[:,0] = u0a
km = 0
for k in range(nt):
ub[:,k+1] = RK4(Lorenz63,ub[:,k],dt,sigma,beta,rho)
ua[:,k+1] = RK4(Lorenz63,ua[:,k],dt,sigma,beta,rho)
✝ ✆
```
Prediction results are provided in Figure 3, where we notice a large discrepancy at the estimated initial conditions. However, the predicted trajectory perfectly match the true one for the rest of the testing time window. This is largely affected by the nature of the Lorenz system itself and the attachment to its attractor. Furthermore, this can be partially attributed to the lack of background information and its contribution to the cost functional. Moreover, this can be highly improved by adding more observations close to the initial time since the correction vector is estimated based on the forecast error computed at observation times. Anyhow, we see that the analysis trajectory is significantly more accurate than the background one, with iterative first-order approximations of the forward sensitivity method, even for long time predictions.

**Figure 3.** Results of FSM implementation for the Lorenz 63 system.

#### **6. Kalman Filtering**

The idea behind Kalman filtering techniques is to propagate the mean as well as the covariance matrix of the system's state sequentially in time. That is, in addition to providing an improved state estimate (i.e., the analysis), it also gives some information about the statistical properties of this state estimate. This is one main difference between Kalman filtering and variational methods, which often assumes a fixed (stationary) background covariance matrices. Kalman filters are also very popular in systems engineering, robotics, navigation, and control. Almost all modern control systems use the Kalman filter. It assisted the guidance of the Apollo 11 lunar module to the moon's surface, and most probably will do the same for next generations of aircraft as well.

Although most application in fluid dynamics involve nonlinear systems, we first describe the standard Kalman filter developed for the linear dynamical system case with linear observation operator described as

$$\mathbf{u}\_{l}(t\_{k+1}) = \mathbf{M}\_{k}\mathbf{u}\_{l}(t\_{k}) + \mathfrak{J}\_{p}(t\_{k+1}),\tag{38}$$

$$\mathbf{w}(t\_k) = \mathbf{H}\_k \mathbf{u}\_l(t\_k) + \mathfrak{J}\_m(t\_k) \tag{39}$$

where **<sup>M</sup>** <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* is a non-singular system matrix defining the underlying governing processes and *<sup>ξ</sup> <sup>p</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* describes the process noise (or model error). **<sup>H</sup>** <sup>∈</sup> <sup>R</sup>*m*×*<sup>n</sup>* represents the measurement system with a measurement noise of *<sup>ξ</sup><sup>m</sup>* <sup>∈</sup> <sup>R</sup>*m*.

As presented in Section 2, the true state **u***t*(*tk*) is assumed to be a random variable with known mean *<sup>E</sup>*[**u***t*(*tk*)] = **<sup>u</sup>***b*(*tk*) and covariance matrix of *<sup>E</sup>*[(**u***t*(*tk*) <sup>−</sup> **<sup>u</sup>***b*(*tk*))(**u***t*(*tk*) <sup>−</sup> **<sup>u</sup>***b*(*tk*))*T*] = **<sup>B</sup>***k*. In Kalman filtering, we note that the covariance matrix evolves in time, and thus appears the subscript. We also assume that the process noise is unbiased with zero mean and a covariance matrix **Q**. That is *E*(*ξ <sup>p</sup>*(*tk*)) = 0 and *E*(*ξ <sup>p</sup>*(*tk*)*ξ <sup>p</sup>*(*tk*)*T*) = **Q***k*.

Thus, the goal of the filtering problem is to find a good estimate (analysis) **u***a*(*tk*) of the true system's state **u***t*(*tk*) given a dynamical model a set of noisy observation {**w**(*ti*)} collected at some time instants *ti* ∈ (0, *tk*]. The optimality of the estimate **u***a*(*tk*) is defined as the one which minimizes *<sup>E</sup>*[(**u***t*(*tk*) <sup>−</sup> **<sup>u</sup>***a*(*tk*))*T*(**u***t*(*tk*) <sup>−</sup> **<sup>u</sup>***a*(*tk*))]. This filtering process generally consists of two steps: the forecast step and the data assimilation step.

The forecast step is performed using the predictable part of the given dynamical model starting from the best known information at time *tk* (denoted as **<sup>u</sup>***<sup>b</sup>*(*tk*)) to produce a forecast or background estimate **<sup>u</sup>***b*(*tk*<sup>+</sup>1) = **<sup>M</sup>***k***<sup>u</sup>***<sup>b</sup>*(*tk*). The difference between the background forecast and true state at *tk*<sup>+</sup><sup>1</sup> can be written as follows,

$$\begin{aligned} \mathfrak{F}\_b(t\_{k+1}) &= \mathbf{u}\_t(t\_{k+1}) - \mathbf{u}\_b(t\_{k+1}) \\ &= (\mathbf{M}\_k \mathbf{u}\_t(t\_k) + \mathfrak{F}\_p(t\_{k+1})) - \mathbf{M}\_k \hat{\mathbf{u}}\_b(t\_k) \\ &= \mathbf{M}\_k(\mathbf{u}\_t(t\_k) - \hat{\mathbf{u}}\_b(t\_k)) + \mathfrak{F}\_p(t\_{k+1}) \\ &= \mathbf{M}\_k \hat{\mathfrak{F}}\_b(t\_k) + \mathfrak{F}\_p(t\_{k+1}), \end{aligned}$$

where *<sup>ξ</sup>b*(*tk*)=(**u***t*(*tk*) <sup>−</sup> **<sup>u</sup>***<sup>b</sup>*(*tk*)) is the error estimate at *tk*, with zero mean and covariance matrix of **<sup>B</sup>***<sup>k</sup>*.

$$\begin{aligned} & \text{The covariance matrix of the background estimate at } t\_{k+1} \text{ can be evaluated as } \mathbf{B}\_{k+1} = \mathbf{E}[\mathbf{\tilde{\xi}}\_{b}(t\_{k+1}) \mathbf{\tilde{\xi}}\_{b}(t\_{k}) + \mathbf{\tilde{\xi}}\_{p}(t\_{k+1})] \text{ and } \mathbf{\tilde{\xi}}\_{b}(t\_{k}) = \mathbf{E}[\mathbf{\tilde{\xi}}\_{b}(t\_{k}) \mathbf{\tilde{\xi}}\_{b}(t\_{k}) + \mathbf{\tilde{\xi}}\_{p}(t\_{k+1})] \\ & \text{For } i \ge 1 \text{ and } \mathbf{\tilde{\xi}}\_{b}(t\_{k}) \mathbf{\tilde{\xi}}\_{b}(t\_{k}) + \mathbf{\tilde{\xi}}\_{p}(t\_{k+1}) \mathbf{\tilde{\xi}}\_{b}(t\_{k}) + \mathbf{\tilde{\xi}}\_{p}(t\_{k+1}) \end{aligned}$$

*<sup>ξ</sup> <sup>p</sup>*(*tk*<sup>+</sup>1) are assumed to be uncorrelated (i.e., *<sup>E</sup>*[*<sup>ξ</sup>b*(*tk*)*<sup>ξ</sup> <sup>p</sup>*(*tk*<sup>+</sup>1)*T*] = **<sup>0</sup>**), the background covariance matrix at *tk*<sup>+</sup><sup>1</sup> can be computed as follows,

$$\mathbf{B}\_{k+1} = \mathbf{M}\_k \hat{\mathbf{B}}\_k \mathbf{M}\_k^T + \mathbf{Q}\_{k+1}.\tag{40}$$

*Fluids* **2020**, *5*, 225

Now, with the forecast step, we have a background estimate at *tk*<sup>+</sup><sup>1</sup> defined as **u***b*(*tk*<sup>+</sup>1) with a covariance matrix **B***k*+1. Then, measurements **w**(*tk*<sup>+</sup>1) are collected at *tk*<sup>+</sup><sup>1</sup> with a linear operator **H***k*+<sup>1</sup> and measurement noise *ξm*(*tk*<sup>+</sup>1) with zero mean a covariance matrix of **R***k*<sup>+</sup>1. Thus, we would like to fuse these pieces of information to create an optimal unbiased estimate (analysis) **u***a*(*tk*<sup>+</sup>1) with a covariance matrix **P***k*<sup>+</sup>1. This can be defines as linear function of **u***b*(*tk*<sup>+</sup>1) and **w**(*tk*<sup>+</sup>1) as follows,

$$\mathbf{u}\_{a}(t\_{k+1}) = \mathbf{u}\_{b}(t\_{k+1}) + \mathbf{K}\_{k+1} \left( \mathbf{w}(t\_{k+1}) - \mathbf{H}\_{k+1} \mathbf{u}\_{b}(t\_{k+1}) \right), \tag{41}$$

where **w**(*tk*<sup>+</sup>1) − **H***k*+1**u***b*(*tk*<sup>+</sup>1) is the innovation vector and **<sup>K</sup>** <sup>∈</sup> <sup>R</sup>*n*×*<sup>m</sup>* is called the Kalman gain matrix. We highlight that Kalman gain matrix is defined in such a way to minimize *E*[(**u***t*(*tk*<sup>+</sup>1) − **<sup>u</sup>***a*(*tk*<sup>+</sup>1))*T*(**u***t*(*tk*<sup>+</sup>1) <sup>−</sup> **<sup>u</sup>***a*(*tk*<sup>+</sup>1))] = tr(**P***k*+1). This can be written as [27]

$$\mathbf{K}\_{k+1} = \mathbf{B}\_{k+1} \mathbf{H}\_{k+1}^T \left( \mathbf{H}\_{k+1} \mathbf{B}\_{k+1} \mathbf{H}\_{k+1}^T + \mathbf{R}\_{k+1} \right)^{-1} \tag{42}$$

resulting in an analysis covariance matrix defined as

$$\mathbf{P}\_{k+1} = (\mathbf{I}\_n - \mathbf{K}\_{k+1}\mathbf{H}\_{k+1})\mathbf{B}\_{k+1\prime} \tag{43}$$

where **I***<sup>n</sup>* is the *n* × *n* identity matrix. The resulting analysis **u***a*(*tk*<sup>+</sup>1) is known as the best linear unbiased estimate (BLUE). We highlight that information at *tk* might correspond to the analysis (i.e., **<sup>u</sup>***<sup>b</sup>*(*tk*) = **<sup>u</sup>***a*(*tk*)) obtained from the last data assimilation implementation, or just from previous forecast if no other information is available. Thus, the Kalman filtering process can be summarized as follows,

$$\begin{aligned} \text{Inputs:} \quad & \quad \hat{\mathbf{u}}\_{b}(t\_{k}), \mathbf{B}\_{k}, \mathbf{M}\_{k}, \mathbf{Q}\_{k+1}, \mathbf{w}(t\_{k+1}), \mathbf{R}\_{k+1}, \mathbf{H}\_{k+1} \\ \text{Forecast:} \quad & \mathbf{u}\_{b}(t\_{k+1}) = \mathbf{M}\_{k} \hat{\mathbf{u}}\_{b}(t\_{k}) \\ & \mathbf{B}\_{k+1} = \mathbf{M}\_{k} \hat{\mathbf{B}}\_{k} \mathbf{M}\_{k}^{T} + \mathbf{Q}\_{k+1} \\ \text{Kalman gain:} \quad & \mathbf{K}\_{k+1} = \mathbf{B}\_{k+1} \mathbf{H}\_{k+1}^{T} \left( \mathbf{H}\_{k+1} \mathbf{B}\_{k+1} \mathbf{H}\_{k+1}^{T} + \mathbf{R}\_{k+1} \right)^{-1} \\ & \text{Analysis:} \quad \quad \mathbf{u}\_{a}(t\_{k+1}) = \mathbf{u}\_{b}(t\_{k+1}) + \mathbf{K}\_{k+1} \left( \mathbf{w}(t\_{k+1}) - \mathbf{H}\_{k+1} \mathbf{u}\_{b}(t\_{k+1}) \right) \\ & \mathbf{P}\_{k+1} = \left( \mathbf{I}\_{\text{h}} - \mathbf{K}\_{k+1} \mathbf{H}\_{k+1} \right) \mathbf{B}\_{k+1}. \end{aligned}$$

where the inputs at *tk* are defined as

$$\mathbb{P}(\widehat{\mathbf{u}}\_{b}(t\_{k}), \widehat{\mathbf{B}}\_{k}) = \begin{cases} (\mathbf{u}\_{d}(t\_{k}), \mathbf{P}\_{k}) & \text{if } \mathbf{w}(t\_{k})\_{\prime} \text{ is available,} \\ (\mathbf{u}\_{b}(t\_{k}), \mathbf{B}\_{k}) & \text{otherwise.} \end{cases}$$

Listing 14 describes a basic Python implementation of the data assimilation step using the KF algorithm described before. Although efficient matrix inversion routines that benefit from specific matrix properties can be utilized, we use the standard built-in Numpy matrix inversion function.

**Listing 14.** Implementation of the KF with linear dynamics and observation operator. ✞ ☎

```
import numpy as np
def KF(ub,w,H,R,B):