*2.4. A Continuous 4D-Var DA System*

Given the forward and adjoint models, a 4D-Var DA framework can be built to minimize the the following scalar cost function [30–32]:

$$\begin{split} \mathcal{J} &= \mathcal{J}\_{b} + \mathcal{J}\_{o} \\ &= \frac{1}{2} (\mathbf{x}\_{0} - \mathbf{x}\_{b})^{T} \mathbf{B}^{-1} (\mathbf{x}\_{0} - \mathbf{x}\_{b}) \\ &+ \frac{1}{2} \sum\_{r=0}^{N} (H\_{r}(\mathbf{x}\_{r}) - \mathbf{y}\_{r})^{T} \mathbf{O}\_{r}^{-1} (H\_{r}(\mathbf{x}\_{r}) - \mathbf{y}\_{r}) \end{split} \tag{13}$$

where **x**<sup>0</sup> is the analysis to be solved for, **x***<sup>b</sup>* the first guess, **y***<sup>r</sup>* observations available within the assimilation window, **x***<sup>r</sup>* the model state advanced by model M from **x**<sup>0</sup> to observation time *tr*, and *Hr* is the observation operator that maps the model state **x** at the observation time *tr* to the observation space. The matrices **B** and **O** are the background and observation error covariance matrices, respectively. Essentially, the term *Jb* measures the discrepency between the analysis and the model background, weighted by the inverse of the background error covariances, and the term *Jo* the discrepency between the analysis and the observation weighted by the inverse of observation errors. The solved analysis **x**<sup>0</sup> with the minimum of *J* yields the minimum variance estimate. The gradient of the scalar *J* with respect to **x**<sup>0</sup> can be obtained following:

$$\nabla\_{\mathbf{x}\_0} J = \mathbf{B}^{-1}(\mathbf{x}\_0 - \mathbf{x}\_b) + \sum\_{r=0}^{N} \mathbf{M}\_r^T \mathbf{H}\_r^T \mathbf{O}\_r^{-1} (H\_r(\mathbf{x}\_r) - \mathbf{y}\_r)\_r \tag{14}$$

where **M***<sup>T</sup>* denotes the adjoint model and **x***<sup>r</sup>* is obtained by advancing analysis **x**<sup>0</sup> forward in time to the observation time with the forward model. In the presence of any observations, the correctness of the overall gradient calculation can be validated with the following:

$$\Phi(\alpha) = \frac{J(\mathbf{x}\_0 + \alpha \Delta \mathbf{x}) - J(\mathbf{x}\_0)}{\alpha \Delta \mathbf{x}^T \nabla J(\mathbf{x}\_0)} = 1 + O(\alpha), \tag{15}$$

#### **3. Experiment Design**

The forward NN and its adjoint in Section 2.3 are applied into the equations in Section 2.4 describing the 4D-Var DA system. The NN-based 4D-Var DA system is then used to analyze the initial condition. As described in Section 2, the NN-based SW model is trained with MPAS-SW simulation results at 1000 km resolutions. MPAS-SW was run at 250 km resolution initialized with the ERA5 500 hPa atmospheric state at 00 UTC on 1 January 2021. The simulation results at 12 and 24 h will be interpolated into the 1000 km mesh and assimilated as observations to help analyze the initial conditions at the NN native resolutions. The matrices **B** and **O** are both kept diagonal and assigned with values of RMSE from the test dataset described in Section 2.2. Taking the entire model state (*h*, *u*, and *v* over the globe) as observations, the correctness of the gradient calculation is first checked following Equation (15), the result of which are shown in Figure 5. As the scaling factor decreases in magnitude, the quantity Φ(*α*) linearly approaches unity, as expected, proving the accuracy of the calculated gradient of the cost function with respect to the model state vector **x**. As the accuracy in gradient calculation is an essential and necessary step to ensure that the 4D-Var DA system will perform as expected, the results in Figure 5 is a reassuring signal for a working DA system.

**Figure 5.** Variations in the gradient-check results *log*(|Φ(*α*) − 1|) as a function of the log of the scaling factor *α*.

#### *3.1. A Single Observation Experiment*

The value of height at a single location of [35.24◦ N, 164.48◦ W] from the highresolution simulations 12 h into the forecast is assimilated. The minimum of the cost function was reached after four iterations as shown in Figure 6a. The norm of the gradient decreased by nearly five orders of magnitude, indicating a local minimum. The analysis increment in both heights and winds are plotted in Figure 6b, with the observation location marked as a white cross. It can be seen that an anticyclonic adjustment was generated near the observation location with some additional adjustments away from the observation location due to the gravity wave mode in the SW dynamics. Notice that only the height at the given location is observed and the background error covariance is kept diagonal in this study, both height and wind adjustments are generated in the analyzed solution, demonstrating the flow dependency in 4D-Var solutions. As simple as this experiment is, a single point observation experiment can be a rather straightforward way of showing the function and feasibility of a DA system.

**Figure 6.** (**a**) Variation of the cost function (solid curve) and the norm of the gradient (dashed curve) with respect to the number of iterations when assimilating a single point observation. (**b**) The analysis increment in height (shaded) and wind (vectors) after assimilating only the height at the location marked with a white cross.

## *3.2. Full Vector Observation Experiment*

The entire atmospheric state (*h*, *u*, and *v* over the global domain) 12 and 24 h after the analysis time simulated by the 250 km resolution MPAS-SW run will be assimilated with the NN-based 4D-Var DA system. The analyzed initial conditions will be used to make forecasts with MPAS-SW at 1000 km resolutions. A control experiment of 1000 km forecast will be made without assimilating any observations, to demonstrate the improvements when observations are assimilated. The predictions with and without DA will be compared against the 250 km simulation results. The evolution of the cost function (solid curve) and the norm of its gradient (dashed curve) are plotted in Figure 7a. After 25 iterations, the value of the gradient norm decreased by more than four orders of magnitude, indicating an extreme point with the solved model state **x**. The corresponding analysis increment calculated after convergence is given in Figure 7b. Most of the adjustments are located in mid to high latitude regions, especially in the Northern Hemisphere.

**Figure 7.** (**a**) Variation of the cost function (solid curve) and the norm of the gradient (dashed curve) with respect to the number of iterations when assimilating the observations of the entire model state. (**b**) The analysis increment in height (shaded) and wind (vectors).

Two MPAS-SW simulations at 1000 km are then initialized with the atmospheric state with and without assimilating observations. As the observations come from the 250 km resolution experiment, the 1000 km forecast results from both simulations are compared against the 250 km forecasts in the form of RMSE. Figure 8 shows the differences in RMSE between the control run and that with DA. It shows that in the first four days, the control outperforms the DA experiment when compared with the 250 km simulation results. However, starting from day four, the DA experiment becomes better than the control and this advantage is maintained over 20 days of forecasts. To compare the forecasts from the two experiments spatially, the differences in forecasts between the control and the 250 km simulation are shown in Figure 9a and those between the DA run and the 250 km simulation are shown in Figure 9b. It can be visually seen that the magnitude of differences in Figure 9b is greater than those in Figure 9a, indicating an inferior performance as also illustrated in Figure 8. The same comparison with the 5-day forecasts are shown in Figure 10. In contrast, the magnitude of differences in both heights and winds in the control run are greater than those in the DA experiment, proving forecasts improvements after assimilating observations with the NN-based DA system.

**Figure 8.** The differences in root mean squared errors (RMSE) between the control and DA experiments with respect to the forecast lead time.

**Figure 9.** Differences in 2-day forecasts (**a**) between the control experiment and referenced high resolution simulations and (**b**) between the DA experiment and the referenced high-resolution simulations.

**Figure 10.** Differences in 5-day forecasts (**a**) between the control experiment and referenced high resolution simulations and (**b**) between the DA experiment and the referenced high-resolution simulations.

#### *3.3. Discussion*

In some of the previous studies, the applications of NN techniques in data assimilation has been explored such as replacing physics parameterization components with a NN [20] or a DA system with Lorenz 96 model [21]. Some studies were even exploring the possibility of scaling ML to the entire NWP system like in [3,14]. This study endeavors to extend the application of NN and make use of the convenience in obtaining the TL/AD of an NN with a global shallow water model to formulate a 4D-Var DA system. The convenience of obtaining the TL/AD from a NN is applicable and can potentially benefit various components in a NWP and DA system demonstrated in the aforementioned studies. One example can be to use NN to approximate certain parts of moist physics parameterizations in the nonlinear forward model, which is a process that often involve nonlinear and/or discontinuous calculations and will make the tangent linear approximation invalid and thus make the 4D-Var technique fail. NN can potentially be useful to emulate the physical process while mitigate the nonlinearity/discontinuity. The tangent linear and adjoint of the physics NN can then be conveniently obtained and more reliably incorporated in a variational DA system. The analyses obtained from the purely NN-based 4D-Var DA prove to improve the forecast performances compared with a control experiment, demonstrating the promising prospect of further applications of NN in DA systems.

The potential next steps for this research are numerous. The NN design in this setup was relatively straight forward since it only had a single hidden layer. Additional hidden layers with more sophisticated NN techniques may be experimented in later studies to produce better emulating results. Similar techniques are readily applicable in substituting moise physics parameterizations or observation operators in a DA system. Furthermore, the analyzed results produced in this study may also be compared with a 4D-Var DA system that needs be developed with the traditional MPAS-SW model adjoint. Finally, applying these techniques to larger models such as the MPAS-Atmosphere model will demonstrate whether these techniques are feasible for NWP operations. The recent advances of ML applications in NWP are especially encouraging in this aspect.

#### **4. Conclusions**

This study proposes a NN-based SW model that emulates SW dynamics and makes predictions given an initial condition. The NN-based SW model was trained with 60 years (1959 to 2019) of hourly ERA5 atmospheric state and the corresponding 12-h predictions made with MPAS-SW at 1000 km resolution. Taking the ERA5 atmospheric state in 2020 and 2021 and their MPAS-SW predictions as an independent evaluation, the predictions made with the trained NN have an RMSE value of 6.32 m in fluid heights and 0.58 m/s in wind field. An example of the NN-based prediction result is shown to well capture the atmospheric evolution simulated by shallow water dynamics.

The tangent linear and adjoint models of a NN can be conveniently developed, the process of which is described in this study. The NN-based SW model and its adjoint are used to formulate a continuous 4D-Var DA system. Synthetic observations are made with a MPAS-SW experiment at 250 km resolution that is four times higher than the trained NN native 1000 km resolution. In the presence of observations, the calculation of the gradient of the cost function is checked for correctness to ensure that the minimum of the cost function can be found in the 4D-Var DA system.

In a single point observation experiment, the height value at a single point is assimilated as observation. A convergence is achieved rapidly in five iterations. The analysis increment by differing the analyzed initial conditions from the first guess show both local and remote impacts propagated by gravity waves, indicating flow dependency in the solution, even with a simple diagonal background error covariance. In the second DA experiment, the entire model state vectors, i.e., both height and wind fields over the global domain, 12 and 24 h into the forecasts are assimilated. A convergence was achieved with 25 iterations, in which the norm of the cost function gradient decreased by nearly five orders of magnitude. The analysis increments show adjustments throughout the global domain with greater magnitudes in mid and high latitude regions. Forecasts are created with MPAS-SW at 1000 km initialized with the first guess and the analyzed initial conditions. These forecasts are shown to be closer to the 250 km simulations that served as observations. These encouraging results demonstrate the feasibility of the tangent linear and adjoint components obtained from neural networks and the potential value of the proposed DA system.

**Funding:** This work has received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data analyzed during the current study are available are available at https://xiaoxutian.com/products/, accessed on 30 November 2022.

**Acknowledgments:** The authors are thank the editor and reviews for their help to make this manuscript better.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Sample Availability:** Samples of the compounds are available from the authors.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
