parameters
sigma = 10.0
beta = 8.0/3.0
rho = 28.0
dt = 0.01
tm = 10
nt = int(tm/dt)
t = np.linspace(0,tm,nt+1)
u0True = np.array([1,1,1]) # True initial conditions
############################ Twin experiment ##################################
np.random.seed(seed=1)
sig_m= 0.15 # standard deviation for measurement noise
R = sig_m**2*np.eye(3) #covariance matrix for measurement noise
H = np.eye(3) #linear observation operator
dt_m = 0.2 #time period between observations
tm_m=2 #maximum time for observations
nt_m = int(tm_m/dt_m) #number of observation instants
#t_m = np.linspace(dt_m,tm_m,nt_m) #np.where( (t<=2) & (t%0.1==0) )[0]
ind_m = (np.linspace(int(dt_m/dt),int(tm_m/dt),nt_m)).astype(int)
t_m = t[ind_m]
#time integration
uTrue = np.zeros([3,nt+1])
uTrue[:,0] = u0True
km = 0
w = np.zeros([3,nt_m])
for k in range(nt):
uTrue[:,k+1] = RK4(Lorenz63,uTrue[:,k],dt,sigma,beta,rho)
if (km<nt_m) and (k+1==ind_m[km]):
w[:,km] = H@uTrue[:,k+1] + np.random.normal(0,sig_m,[3,])
km = km+1
plt.plot(t,uTrue[0,:])
```
*Fluids* **2020**, *5*, 225

```
plt.plot(t_m,w[0,:],'o')
########################### Data Assimilation #################################
u0b = np.array([2.0,3.0,4.0])
sig_b= 0.1
B = sig_b**2*np.eye(3)
#time integration
ub = np.zeros([3,nt+1])
ub[:,0] = u0b
ua = np.zeros([3,nt+1])
ua[:,0] = u0b
km = 0
for k in range(nt):
ub[:,k+1] = RK4(Lorenz63,ub[:,k],dt,sigma,beta,rho)
ua[:,k+1] = RK4(Lorenz63,ua[:,k],dt,sigma,beta,rho)
if (km<nt_m) and (k+1==ind_m[km]):
ua[:,k+1] = Lin3dvar(ua[:,k+1],w[:,km],H,R,B,3)
km = km+1
############################### Plotting ######################################
import matplotlib as mpl
mpl.rc('text', usetex=True)
mpl.rcParams['text.latex.preamble']=[r"\usepackage{amsmath}"]
mpl.rcParams['text.latex.preamble']=[r'\boldmath']
font = {'family' : 'normal',
'weight' : 'bold',
'size' : 20}
mpl.rc('font', **font)
fig, ax = plt.subplots(nrows=3,ncols=1, figsize=(10,8))
ax = ax.flat
for k in range(3):
ax[k].plot(t,uTrue[k,:], label=r'\bf{True}', linewidth = 3)
ax[k].plot(t,ub[k,:], ':', label=r'\bf{Background}', linewidth = 3)
ax[k].plot(t[ind_m],w[k,:], 'o', fillstyle='none', \
label=r'\bf{Observation}', markersize = 8, markeredgewidth = 2)
ax[k].plot(t,ua[k,:], '--', label=r'\bf{Analysis}', linewidth = 3)
ax[k].set_xlabel(r'$t$',fontsize=22)
ax[k].axvspan(0, tm_m, color='y', alpha=0.4, lw=0)
ax[0].legend(loc="center", bbox_to_anchor=(0.5,1.25),ncol =4,fontsize=15)
ax[0].set_ylabel(r'$x(t)$')
ax[1].set_ylabel(r'$y(t)$')
fig.subplots_adjust(hspace=0.5) ✝ ✆
```
**Figure 1.** Results of 3DVAR implementation for the Lorenz 63 system.

#### **4. Four Dimensional Variational Data Assimilation**

We highlighted in Section 3 that the 3DVAR can be referred to as a stationary case since the observations, background, and analysis all correspond to a fixed time instant. In other words, the optimization problem that minimizes Equation (4) takes place in the spatial state-space only. As an extension, the four dimensional variational data assimilation (4DVAR) aims to solve the optimization in both space and time, proving a non-stationary framework. In particular, the model's dynamics are incorporated into the optimization problem to relate different points in time to each other. The cost functional for the 4DVAR Can be written as follows,

$$J(\mathbf{u}(t\_0)) = \sum\_{t\_k \in \mathcal{T}} \frac{1}{2} (\mathbf{w}(t\_k) - h(\mathbf{u}(t\_k)))^T \mathbf{R}^{-1}(t\_k) (\mathbf{w}(t\_k) - h(\mathbf{u}(t\_k))),\tag{13}$$

where **w**(*tk*) is the measurement at time *tk* and T defines the set of time instants where observations are available. Note that the argument of this cost functional is the initial condition **u**(*t*0). In other words, the purpose of the 4DVAR algorithm is to evaluate an initial state estimate, which if evolved in time, would produce a trajectory that is as close to the collected measurements as possible (weighted by the inverse of the covariance matrix of interfering noise). This is the place where the model's dynamics comes into play to relate initial condition to future predictions when measurements are accessible. In other words, the values of **u**(*tk*)) are constrained by the underlying model. Instead of presenting the linear and nonlinear mappings separately, we will focus on the general case of both nonlinear model mapping and nonlinear observation operator, where simplification to linear cases should be straightforward.

In Equation (2), we introduced the one-step transition map and here, we can extend it to the *k*-step transition case by applying Equation (2) recursively as

$$\mathbf{u}(t\_k) = M^{(k)}(\mathbf{u}(t\_0); \boldsymbol{\theta}) = M(M^{(k-1)}\left(\mathbf{u}(t\_0); \boldsymbol{\theta}\right); \boldsymbol{\theta}) \,, \tag{14}$$

where *M*(1)(**u**(*t*0); *θ*) = *M*(**u**(*t*0); *θ*). Now, we consider a base trajectory given by **u**(*tk*) for *k* = 1, 2, ... generated from an initial condition of **u**(*t*0). A perturbed trajectory (**u**(*tk*) for *k* = 1, 2, ... ,) can be obtained by correcting the initial condition as **u**(*t*0) = **u**(*t*0) + Δ**u**<sup>0</sup> and the difference between the perturbed and based trajectories can be written as

$$\mathbf{u}(t\_k) - \overline{\mathbf{u}}(t\_k) = M^{(k)}(\overline{\mathbf{u}}(t\_0) + \Delta \mathbf{u}\_0; \boldsymbol{\theta}) - M^{(k)}(\overline{\mathbf{u}}(t\_0); \boldsymbol{\theta}). \tag{15}$$

A first-order Taylor expansion of *M*(**u**(*t*0) + Δ**u**0; *θ*) around **u**(*t*0) can be given as follows

$$M(\overline{\mathbf{u}}(t\_0) + \Delta \mathbf{u}\_0; \boldsymbol{\theta}) \approx M(\overline{\mathbf{u}}(t\_0); \boldsymbol{\theta}) + \mathbf{D}\_M(\overline{\mathbf{u}}(t\_0)) \Delta \mathbf{u}\_0. \tag{16}$$

where **D***M*(**u**(*tk*)) is the Jacobian of the model *M*(**u**; *θ*), evaluated at **u**(*tk*), also known as the tangent linear operator. Note that *M*(**u**(*t*0); *θ*) = **u**(*t*1) and *M*(**u**(*t*0) + Δ**u**0; *θ*) = **u**(*t*1), thus Δ**u**<sup>1</sup> = **u**(*t*1) − **u**(*t*1) ≈ **D***M*(**u**(*t*0))Δ**u**0. Similarly, we can expand *M*(**u**(*t*1) + Δ**u**1; *θ*) around **u**(*t*1) as follows,

$$M(\overline{\mathbf{u}}(t\_1) + \Delta \mathbf{u}\_1; \boldsymbol{\theta}) \approx M(\overline{\mathbf{u}}(t\_1); \boldsymbol{\theta}) + \mathbf{D}\_M(\overline{\mathbf{u}}(t\_1)) \Delta \mathbf{u}\_1,\tag{17}$$

where **u**(*t*<sup>2</sup> = *M*(**u**(*t*1); *θ*) ≈ *M*(**u**(*t*1) + Δ**u**1; *θ*) and **u**(*t*2) = *M*(**u**(*t*1); *θ*). Consequently, Δ**u**<sup>2</sup> = **u**(*t*2) − **u**(*t*2) ≈ **D***M*(**u**(*t*1))Δ**u**1, which can be generalized as,

$$
\Delta \mathbf{u}\_{k+1} \approx \mathbf{D}\_M(\overline{\mathbf{u}}(t\_k)) \Delta \mathbf{u}\_{k\prime} \tag{18}
$$

with **u**(*tk*) ≈ Δ**u***<sup>k</sup>* + **u**(*tk*). It is customary to call Equation (18) as the perturbation equation, or the tangent linear system (TLS). Equation (18) can be related to Δ**u**<sup>0</sup> by recursion as follows,

$$\begin{split} \Delta \mathbf{u}\_{k+1} &\approx \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{k})) \Delta \mathbf{u}\_{k} \\ &\approx \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{k})) \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{k-1})) \Delta \mathbf{u}\_{k-1} \\ &\approx \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{k})) \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{k-1})) \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{k-2})) \Delta \mathbf{u}\_{k-2} \\ &\approx \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{k})) \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{k-1})) \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{k-2})) \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{k-3})) \dots \mathbf{D}\_{M}(\overline{\mathbf{u}}(t\_{0})) \Delta \mathbf{u}\_{0\prime} \end{split}$$

which can be short-handed as Δ**u***k*+<sup>1</sup> ≈ **D***M*(**u**(*tk*:0))Δ**u**<sup>0</sup> (please, notice the order of matrix multiplication and the subscript "*k* : 0").

Now, we investigate the first order variation Δ*J* of the cost functional *J*(**u**(*t*0)) induced by the perturbation Δ**u**<sup>0</sup> in the initial condition. This can be approximated as below,

$$
\Delta J = \Delta \mathbf{u}\_0^T \nabla I(\mathfrak{w}(t\_0)) \tag{19}
$$

$$\mathbf{x} = -\sum\_{t\_k \in \mathcal{T}} \Delta \mathbf{u}\_k^T \mathbf{D}\_h^T \left(\overline{\mathbf{u}}(t\_k)\right) \mathbf{R}^{-1}(t\_k) \left(\mathbf{w}(t\_k) - h(\overline{\mathbf{u}}(t\_k))\right). \tag{20}$$

Given that <sup>Δ</sup>**u***<sup>k</sup>* <sup>≈</sup> **<sup>D</sup>***M*(**u**(*tk*−1:0))Δ**u**0, then <sup>Δ</sup>**u***<sup>T</sup> <sup>k</sup>* <sup>≈</sup> <sup>Δ</sup>**u***<sup>T</sup>* <sup>0</sup> **<sup>D</sup>***<sup>T</sup> M*(**u**(*t*0))**D***<sup>T</sup> <sup>M</sup>*(**u**(*t*1))... **<sup>D</sup>***<sup>T</sup> <sup>M</sup>*(**u**(*tk*−1)) = Δ**u***<sup>T</sup>* <sup>0</sup> **<sup>D</sup>***<sup>T</sup> <sup>M</sup>*(**u**(*t*0:*k*−1)) and Equation (20) can be rewritten as

$$
\Delta I = -\sum\_{t\_k \in \mathcal{T}} \Delta \mathbf{u}\_0^T \mathbf{D}\_M^T (\overline{\mathbf{u}}(t\_{0:k-1})) \mathbf{D}\_h^T \left(\overline{\mathbf{u}}(t\_k)\right) \mathbf{R}^{-1}(t\_k) \left(\mathbf{w}(t\_k) - h(\overline{\mathbf{u}}(t\_k))\right). \tag{21}
$$

By comparing Equations (19) and (21), the gradient of the cost functional can be approximated as

$$\nabla f(\overline{\mathbf{u}}(t\_0)) = -\sum\_{t\_k \in \mathcal{T}} \mathbf{D}\_M^T(\overline{\mathbf{u}}(t\_{0:k-1})) \mathbf{D}\_h^T(\overline{\mathbf{u}}(t\_k)) \mathbf{R}^{-1}(t\_k) \left(\mathbf{w}(t\_k) - h(\overline{\mathbf{u}}(t\_k))\right) \tag{22}$$

$$=-\sum\_{t\_k \in \mathcal{T}} \mathbf{D}\_M^T(\overline{\mathbf{u}}(t\_{0:k-1})) \mathbf{f}(t\_k),\tag{23}$$

where **f**(*tk*) = **D***<sup>T</sup> h* **u**(*tk*) **R**−1(*tk*) **<sup>w</sup>**(*tk*) <sup>−</sup> *<sup>h</sup>*(**u**(*tk*)) . If we denote the time instants at which measurements are available as T = {*tO*1, *tO*2,..., *tON*}, Equation (23) can be expanded as

$$\nabla f(\overline{\mathbf{u}}(t\_0)) = -\left\{ \mathbf{D}\_M^T(\overline{\mathbf{u}}(t\_{0:\Omega \cap 1-1})) \mathbf{f}(t\_{01}) + \mathbf{D}\_M^T(\overline{\mathbf{u}}(t\_{0:\Omega \cap 2-1})) \mathbf{f}(t\_{02}) + \dots + \mathbf{D}\_M^T(\overline{\mathbf{u}}(t\_{0:\Omega \cap 1-1})) \mathbf{f}(t\_{0N}) \right\}.\tag{24}$$

Now, defining a sequence of *<sup>λ</sup><sup>k</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* as below,

$$\lambda\_k = \begin{cases} \mathbf{f}\_{k\prime} & \text{if } t\_k = t\_{ON} \\ \mathbf{D}\_M^T(\overline{\mathbf{u}}(t\_k)) \lambda\_{k+1} + \mathbf{f}\_{k\prime} & \text{if } t\_k \in \{t\_{O1\prime}, t\_{O2\prime}, \dots, t\_{ON-1}\} \\ \mathbf{D}\_M^T(\overline{\mathbf{u}}(t\_k)) \lambda\_{k+1} & \text{otherwise} \end{cases} \tag{25}$$

It can be verified that ∇*J*(**u**(*t*0)) = −*λ*<sup>0</sup> (assume some numbers and you can see this relation holds!). Therefore, in order to obtain the gradient of the cost functional, *λ*<sup>0</sup> has to be computed, which depends on the evaluation of *λ*1. In turn, the computation of *λ*<sup>1</sup> requires *λ*<sup>2</sup> and so on. Equation (25) is known as the first-order adjoint equation, as it implies the evaluation of *λ<sup>k</sup>* sequence from *tk*<sup>+</sup><sup>1</sup> to *tk* (i.e., reverse order).

Therefore, the first-order approximation of the 4DVAR works as follows. Starting from a prior guess of the initial condition, the base trajectory is computed by solving the model forward in time until the final time corresponding the last observation point (i.e., *k* = 0, 1, 2, ... ,*ON*). Then, the value of *λON* = **f***ON* is evaluated at final time. After that, *λ<sup>k</sup>* is evolved backward in time using Equation (25) until ∇*J*(**u**(*t*0)) = −*λ*<sup>0</sup> is obtained. This value of the gradient is thus utilized to update the initial condition and a new base trajectory is generated. The solution of the 4DVAR problem requires the solution of the model dynamics forward in time and adjoint problem backward in time until to compute the gradient of the cost functional and update the initial condition. The process is thus repeated until convergence takes place. Listing 6 shows a sample function to compute the gradient of the cost functional ∇*J*(**u**(*t*0)) corresponding to a base trajectory generated from a guess of the initial condition **u**(*t*0). We highlight that, in practice, the storage of the base trajectory as well as the *λ* sequence at every time instant might be overwhelming. However, we are not addressing such issues in these introductory tutorials to data assimilation techniques.

**Listing 6.** Computation of the gradient of the cost functional with the 4DVAR using the first-order adjoint algorithm. ✞ ☎

```
def Adj4dvar(rhs,Jrhs,ObsOp,JObsOp,t,ind_m,u0b,w,R,opt,*args):