Skip to Content
EntropyEntropy
  • Editor’s Choice
  • Article
  • Open Access

4 July 2018

Conditional Gaussian Systems for Multiscale Nonlinear Stochastic Systems: Prediction, State Estimation and Uncertainty Quantification

and
1
Department of Mathematics and Center for Atmosphere Ocean Science, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
2
Center for Prototype Climate Modeling, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi 129188, UAE
*
Author to whom correspondence should be addressed.

Abstract

A conditional Gaussian framework for understanding and predicting complex multiscale nonlinear stochastic systems is developed. Despite the conditional Gaussianity, such systems are nevertheless highly nonlinear and are able to capture the non-Gaussian features of nature. The special structure of the system allows closed analytical formulae for solving the conditional statistics and is thus computationally efficient. A rich gallery of examples of conditional Gaussian systems are illustrated here, which includes data-driven physics-constrained nonlinear stochastic models, stochastically coupled reaction–diffusion models in neuroscience and ecology, and large-scale dynamical models in turbulence, fluids and geophysical flows. Making use of the conditional Gaussian structure, efficient statistically accurate algorithms involving a novel hybrid strategy for different subspaces, a judicious block decomposition and statistical symmetry are developed for solving the Fokker–Planck equation in large dimensions. The conditional Gaussian framework is also applied to develop extremely cheap multiscale data assimilation schemes, such as the stochastic superparameterization, which use particle filters to capture the non-Gaussian statistics on the large-scale part whose dimension is small whereas the statistics of the small-scale part are conditional Gaussian given the large-scale part. Other topics of the conditional Gaussian systems studied here include designing new parameter estimation schemes and understanding model errors.

Contents

3

6
8
8

9
9
12
13
13
13
17
18
19
21
23
25
25
27
27
30
32
33
34
34
37
38
40
41

42
43
43
44

45
45
46
48
49
49

50
51

55
55
56
57
59

62
62
63
66
66
67
68

70

70
71
  • References
72

1. Introduction

Multiscale nonlinear dynamical systems are ubiquitous in different areas, including geoscience, engineering, neural science and material science [1,2,3,4,5,6,7,8]. They are characterized by a large dimensional phase space and a large dimension of strong instabilities which transfer energy throughout the system. Many non-Gaussian characteristics, such as extreme and rare events, intermittency and fat-tailed probability density functions (PDFs), are often observed in these multiscale nonlinear dynamical systems [1,9,10]. Key mathematical issues are their basic mathematical structural properties and qualitative features [2,3,11,12], short- and long-range forecasting [13,14,15], uncertainty quantification (UQ) [16,17,18], and state estimation, filtering or data assimilation [15,19,20,21,22]. Due to the lack of physical understanding or the inadequate resolution in the models because of the current computing power, coping with the inevitable model errors that arise in approximating such complex systems becomes a necessary and crucial issue in dealing with these multiscale nonlinear dynamical systems [16,23,24,25,26]. Understanding and predicting complex multiscale turbulent dynamical systems involve the blending of rigorous mathematical theory, qualitative and quantitative modelling, and novel numerical procedures [2,27].
Despite the fully nonlinearity in many multiscale turbulent dynamical systems and the non-Gaussian features in both the marginal and joint PDFs, these systems have conditional Gaussian structures [1,15,28,29]. Writing the state variables as u = ( u I , u II ) , the conditional Gaussianity means that once the trajectories of u I are given, the dynamics of u II conditioned on these highly nonlinear observed trajectories become Gaussian processes. In this article, we develop a general conditional Gaussian framework for mathematical modeling, prediction, state estimation and UQ. One of the desirable features of such conditional Gaussian system is that it allows closed analytical formulae for updating the conditional statistics of u II given the trajectories of u I [28,30]. This facilitates the development of both rigorous mathematical theories and efficient numerical algorithms for these complex multiscale turbulent dynamical systems.
With the great progress in developing and improving the observational networks, a vast amount of observational data is now available. An important scientific issue is to make use of these observational data to build data-driven models that advance the understanding of the underlying physical processes and improve the prediction skill. Physics-constrained nonlinear stochastic models are the recent development of data-driven statistical-dynamical models for the time series of a partial subset of observed variables, which arise from observations of nature or from an extremely complex physical model [31,32]. The physics-constrained nonlinear stochastic modeling framework builds on early works for single layer models without memory effects, which uses physical analytic properties to constrain data-driven methods. They succeed in overcoming both the finite-time blowup and the lack of physical meaning issues in various ad hoc multi-layer regression models [31,33]. Many of the physics-constrained nonlinear stochastic models belong to the conditional Gaussian framework, including the noisy version of the famous Lorenz 63 and 84 models as well as a two-layer Lorenz 96 model [34,35,36]. In addition, two important conceptual models for turbulent dynamical systems [37,38] and a conceptual model of the coupled atmosphere and ocean [39] also fit perfectly into the physics-constrained nonlinear stochastic modeling framework with conditional Gaussian structures. These models are extremely useful for testing various new multiscale data assimilation and prediction schemes [1,38,40]. Other physics-constrained nonlinear stochastic models with conditional Gaussian structures include a low-order model of Charney–DeVore flows [29] and a paradigm model for topographic mean flow interaction [32]. Notably, the physics-constrained nonlinear stochastic models combining with the conditional Gaussian framework have been successfully applied for the real-time prediction and data assimilation of the Madden–Julian oscillation (MJO) and monsoon [41,42,43,44], which are the dominant modes of intraseasonal variabilities in nature.
In many multiscale turbulent dynamical systems, there is a natural time scale separation between different variables. Therefore, the MTV strategy [45,46,47,48] (named after Majda, Timofeyev and Vanden-Eijnden) can be applied to these systems for stochastic mode reduction. Using the MTV strategy, the equations of motion for the unresolved fast modes are modified by representing the nonlinear self-interactions terms between unresolved modes by damping and stochastic terms. The resulting systems then naturally belong to the conditional Gaussian framework and they also preserve physics-constrained properties. Examples include a simple stochastic model with key features of atmospheric low-frequency variability [49] and a simple prototype nonlinear stochastic model with triad interactions that mimics structural features of low-frequency variability of global circulation models (GCM) [50] .
In addition to the data-driven physics-constrained nonlinear stochastic models, many large-scale dynamical models in turbulence, fluids and geophysical flows also have conditional Gaussian structures. These stochastic partial differential equation (SPDE) models include the stochastic skeleton model for the MJO [51] and a coupled El Niño dynamical model capturing the El Niño diversity [52], where nonlinearity interacts with state-dependent (multiplicative) noise and many salient non-Gaussian features such as intermittency and fat-tailed PDFs are observed in these systems. The stochastically forced Boussinesq equation [11] for the Rayleigh–Bénard convection [53,54] and strongly stratified flows in geophysics [55] as well as its generalized version, namely the Darcy–Brinkman–Oberbeck–Boussinesq system [56] for the convection phenomena in porous media, also belong to the conditional Gaussian framework. Likewise, the multiscale shallow water equation [11] with coupled geostrophically balanced (GB) and gravity modes fits into the conditional Gaussian model family. On the other hand, quite a few stochastically coupled reaction–diffusion models in neuroscience and ecology illustrate conditional Gaussian structures as well. Examples include the stochastically coupled FitzHugh–Nagumo (FHN) models [57], the predator–prey models [58], a stochastically coupled Susceptible-Infectious-Recovered (SIR) epidemic model [59] and a nutrient-limited model for avascular cancer growth [60]. Furthermore, the conditional Gaussian framework can be applied to model the coupled observation-filtering systems for the state estimation of turbulent ocean flows using Lagrangian tracers [61,62,63]. The framework can also be utilized to develop cheap and effective stochastic parameterizations for turbulent dynamical systems [64,65].
One of the key issues in studying the complex multiscale nonlinear turbulent dynamical systems is to solve the time evolution of the associated PDFs, which is extremely useful in ensemble prediction, data assimilation as well as understanding the intermittency and extreme events. The time evolution of the PDFs associated with the underlying turbulent dynamical systems is described by the so-called Fokker–Planck equation [66,67]. Due to the complexity of many multiscale turbulent dynamical systems, the dimension of the systems can be quite large. However, solving the Fokker–Planck equation in high dimensions is a well-known notoriously difficult problem. Traditional numerical methods such as finite element and finite difference as well as the direct Monte Carlo simulations all suffer from the curse of dimension [68,69]. Nevertheless, for the conditional Gaussian systems, efficient statistically accurate algorithms can be developed for solving the Fokker–Planck equation in high dimensions and thus beat the curse of dimension. The algorithms involve a hybrid strategy that requires only a small number of ensembles [38]. Specifically, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. The parametric method provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace and is therefore computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture [38]. For even larger dimensional systems, a judicious block decomposition and statistical symmetry are further applied that facilitate an extremely efficient parallel computation and a significant reduction of sample numbers [36]. These algorithms are applied here to the statistical prediction of a stochastically coupled FHN model with 1500 dimensions and an inhomogeneous two-layer Lorenz 96 model with 240 dimensions. Significant prediction skill shows the advantages of these algorithms in terms of both accuracy and efficiency.
The conditional Gaussian framework also provides a power tool for the multiscale data assimilation. In fact, data assimilation of turbulent signals is an important challenging problem because of the extremely complicated large dimension of the signals and incomplete partial noisy observations that usually mix the large scale mean flow and small scale fluctuations. Due to the limited computing power, it is desirable to use multiscale forecast models which are cheap and fast to mitigate the curse of dimensionality in turbulent systems. An extremely cheap multiscale data assimilation scheme, called stochastic superparameterization [70,71,72,73], resolves the large-scale mean flow on a coarse grid and replaces the nonlinear small-scale eddy terms by quasilinear stochastic processes on formally infinite embedded domains where the stochastic processes are Gaussian conditional to the large scale mean flow. The key ingredient of such a multiscale data assimilation method is the systematic use of conditional Gaussian mixtures which make the methods efficient by filtering a subspace whose dimension is smaller than the full state. This conditional Gaussian closure approximation results in a seamless algorithm without using the high resolution space grid for the small scales and is much cheaper than the conventional superparameterization, with significant success in difficult test problems [71,72,74] including the Majda-McLaughlin-Tabak (MMT) model [71,75] and ocean turbulence [76,77,78]. The method uses particle filters [20,79] or ensemble filters on the large scale part [75,76] whose dimension is small enough so that the non-Gaussian statistics of the large scale part can be calculated from a particle filter, whereas the statistics of the small scale part are conditionally Gaussian given the large scale part. This framework is not restricted to superparameterization as the forecast model and other cheap forecast models can also be employed. In fact, another multiscale filtering algorithm with quasilinear Gaussian dynamically orthogonality method as the forecast method in an adaptively evolving low dimensional subspace has been developed [80]. The multiscale filtering also provides a mathematical framework for representation errors, which are due to the contribution of unresolved scales [81,82] in the observations. Other theoretic and applied studies of the conditional Gaussian framework include effective parameter estimation, model reduction and the understanding of various model errors using information theory.
The remaining of the article is organized as follows. Section 2 provides an overview of data, model and data-driven modeling framework as well as efficient data assimilation and prediction strategies with solvable conditional statistics. Section 3 summarizes the general mathematical structure of nonlinear conditional Gaussian systems, the physics-constrained nonlinear stochastic models and the application of the MTV strategy to the conditional Gaussian systems. Then, a gallery of examples of conditional Gaussian systems is shown in Section 4. Section 5 involves the effective statistically accurate algorithms that beat the curse of dimension for Fokker–Planck equation for conditional Gaussian systems together with their applications to statistical prediction. The topics related to the parameter estimation, model error and multiscale data assimilation utilizing the conditional Gaussian framework are shown in Section 6. The article is concluded in Section 7.

2. Overview: Data vs. Model, Data-Driven Modeling Framework, and Efficient Data Assimilation and Prediction Strategies with Solvable Conditional Statistics

A central contemporary issue for complex turbulent systems is to use data to improve scientific understanding of the underlying physics, make real-time predictions, and quantify the uncertainty in long range forecasting [1,27]. Recently, with the great progress in developing and improving the observational networks, vast amount of observational data are now available. Many purely data-driven statistical methods (regression, clustering and analog forecast etc.) [83,84,85,86,87] are thus developed and have become popular approaches in attempting to understand and predict nature. Despite the success in a crude understanding of nature in terms of explaining and forecasting some of the coarse-grained variables in the largest scale to some extent, these purely data-driven statistical methods usually cannot discover the nonlinear and intermittent nature in the underlying complex dynamics. They often fail to reveal the non-trivial interactions between physical processes in different scales either. In addition, these purely data-driven statistical methods typically require a large training dataset in order to obtain complete and unbiased information from nature, which is however infeasible in many areas, including climate, atmosphere and ocean science. In fact, satellite observations are only available for a few decades [88,89] which are far from enough in understanding decadal or interannual variabilities containing rare and extreme events. Note that many complex turbulent systems in nature are in high dimensions, and therefore most purely data driven statistical methods are extremely expensive to use.
Therefore, combining model with data becomes necessary in understanding and predicting nature. Suitable models involve important physical mechanisms and they can be used for real-time prediction by incorporating only a small amount of data. In many simple but natural scenarios, some low-dimensional physical variables are observed and low-order nonlinear stochastic models are preferred for describing their behavior. Using data outcome to fit a quadratic regression model [90,91] is a data-driven modeling strategy which outweighs the linear regression models and allows nonlinearity and memory in time. However, there is no physical information in such quadratic regression models. In fact, it has been shown in [33] via rigorous mathematical theory that such ad hoc nonlinear regression strategies can exhibit finite time blow-up and pathological invariant measures even though they fit the data with high precision. To include physical information into the models, systematic physics-constrained multi-level quadratic regression models have been developed [31,32]. These models avoid pathological behavior by incorporating physical constraints with nonlinear energy-conserving principles developed in the earlier stochastic climate modeling strategy [45]. Meanwhile, these physics-constrained models allow memory in effects. Although the number of the parameters in physics-constrained nonlinear models can be large, the models are in general robust with respect to the perturbation of the parameters around their optimal values [41]. This is crucial in practice because it requires only a crude estimation of the parameters for the model, which greatly reduces the computational cost for searching in high-dimensional parameter space. A striking real-world application of these physics-constrained nonlinear models is to assess the predictability limits of time series associated with the tropical intraseasonal variability such as the the Madden–Julian oscillation (MJO) and monsoon. It has been shown in [41,42,43] that these physics-constrained nonlinear models are able to reach the predictability limits of the large-scale MJO and monsoon and improves prediction skill using the low-order models. Notably, the physics-constrained nonlinear stochastic models require only a short training period [42,44] because the model development automatically involves a large portion of the information of nature. This is extremely important since practically only a limited observational data is available. In fact, comparable and even slightly more skillful prediction results have been found using the physics-constrained nonlinear model compared with those using non-parametric methods in predicting the MJO and monsoon intermittent time series [92,93], but the prediction using the physics-constrained nonlinear model adopted a much shorter training period [44].
Another important reason that entails the model in understanding and predicting nature is due to the fundamental limitations in measuring the observations. In fact, only partial observations are available in most applications and observational noise is inevitable. For example, the sea surface temperature is easy to measure, but the temperature in the deep ocean is hard to estimate. In addition, although the surface wind in the entire earth can be measured by satellites, the measured signals usually contain large noise. Therefore, models are combined with the available observations for the state estimation of the unobserved variables as well as reducing the noise in the observed ones. This is known as data assimilation or filtering [15,20,22]. Note that most of the complex multiscale dynamical systems involve strong nonlinearity and extreme events. Thus, the classical Kalman filter that works only for linear models [94] fails to capture the nonlinear and non-Gaussian features in nature. Although the so-called particle filter [95,96] is able to recover all the non-Gaussian statistics, it is computationally expensive and can only be applied to low-dimensional scenarios, which are typically not the case in most applications. In fact, even a global weather model with very coarse grid points at every 200 km already has about 10 6 state variables. Other direct strategies in handling large dimensional systems, such as the ensemble Kalman filters [97,98], can also be problematic. For example, they are not immune from “catastrophic filter divergence” (diverge beyond machine infinity) when observations are sparse, even when the true signal is a dissipative system [99,100]. Therefore, designing new efficient strategies for data assimilation and prediction that are accurate and can overcome the curse of dimension is crucial in studying complex multiscale nonlinear dynamical systems with only noisy partial observations [20,27,65,101,102]. Since both the data assimilation and prediction involve running forward the dynamical models (known as the forecast models), the development of new strategies entails the understanding and utilization of model structures.
Due to the complexity in many complex multiscale turbulent dynamical systems, developing cheap and effective approximate models which nevertheless capture the main characteristics of the underlying dynamics becomes necessary in data assimilation, prediction and the understanding of nature. In particular, novel stochastic parameterization schemes play a significant role in reducing the model complexity while retaining the key features of various multiscale turbulent dynamical systems. These key features include the statistical feedback from small to large scales, accurate dynamical and statistical behavior in large scale variables, and the main effect of the large-scale variables on the statistical evolution of small scale processes. Then, efficient hybrid strategy can be developed for dealing with large and small scale variables, respectively.
A simple yet practically useful strategy in filtering nonlinear and intermittent signals is via the so-called stochastic parameterized extended Kalman filter (SPEKF) type of the forecast model [64,65]. The idea of the SKEPF model is that the small or unresolved scale variables are stochastically parameterized by cheap linear and Gaussian processes, representing stochastic damping, stochastic phase and stochastic forcing. Despite the model error in using such Gaussian approximations for the unresolved nonlinear dynamics, these Gaussian processes succeed in providing accurate statistical feedback from the unresolved scales to the resolved ones and thus the intermittency and non-Gaussian features as observed in the resolved variables can be accurately recovered. The statistics in the SPEKF model can also be solved with exact and analytic formulae, which allow an accurate and efficient estimation of the model states. The SPEKF type of model has been used for filtering mutiscale turbulent dynamical systems [20], stochastic superresolution [103] and filtering Navier–Stokes equations with model error [104]. A detailed description of the SPEKF model can be found in Section 4.5.1. Notably, the SPEKF model involves a hybrid approach, where certain cheap and statistically solvable Gaussian approximations are used to describe the statistical features of the unresolved (or small scale) variables in a large dimensional subspace while the highly nonlinear dynamical features of the resolved (or large scale) variables as well as their interactions with the unresolved variables are retained in a relatively low dimensional subspace. Such a strategy can be used as a guideline for designing suitable approximate forecast models for various complex multiscale turbulent dynamical systems.
For multiscale data assimilation of many realistic and complex turbulent dynamical systems, an extremely cheap scheme, called stochastic superparameterization [70,71,72,73], has been developed, which makes use of a similar hybrid strategy as discussed above but with a more refined setup. In the stochastic superparameterization, the large-scale mean flow is resolved on a coarse grid and the nonlinear small-scale eddy terms are replaced by quasilinear stochastic processes on formally infinite embedded domains where these stochastic processes are Gaussian conditional to the large scale mean flow. The key ingredient of such a multiscale data assimilation method is the systematic use of conditional Gaussian mixtures which make the methods efficient by filtering a subspace whose dimension is smaller than the full state. This conditional Gaussian closure approximation results in a seamless algorithm without using the high resolution space grid for the small scales and is thus extremely efficient. The method uses particle filters [20,79] or ensemble filters on the large scale part [75,76] whose dimension is small enough so that the non-Gaussian statistics of the large scale part can be calculated from a particle filter, whereas the statistics of the small scale part are conditionally Gaussian given the large scale part. See Section 6.3 for more details. Note that such ideas can also be applied to other cheap forecast models in addition to the superparameterization. For example, another multiscale filtering algorithm with quasilinear Gaussian dynamically orthogonality method as the forecast method in an adaptively evolving low dimensional subspace has been developed in [80].
Due to the significance in model approximation, data assimilation and prediction, it is necessary to develop a general framework for the complex multiscale nonlinear turbulent dynamical systems with conditional Gaussian structures. The key idea of such conditional Gaussian systems is that a hybrid strategy can be applied to deal with the state variables u = ( u I , u II ) , where particle methods are used to solve the non-Gaussian statistics in the relatively low dimensional subspace associated with u I and extremely cheap algorithms with closed analytical formulae are adopted to solve the conditional Gaussian statistics of u II conditioned on u I in the remaining high dimensional subspace. The Gaussianity in the conditional statistics are usually quite accurate as approximations since the small or unresolved scales with the given (or fixed) large-scale variables usually represent fluctuations in multiscale systems and the statistics are close to Gaussian. Nevertheless, the marginal statistics of the small-scale variables u II themselves can be highly non-Gaussian, which is one of the salient features as in nature [37]. Notably, developing such computationally efficient models that explores conditional statistics also involves the mutual feedback between large scale and small scale variables. Thus, the full system is completely nonlinear and allows physics-constrained nonlinear interactions. The general conditional Gaussian nonlinear modeling framework provides a powerful tool in multiscale data assimilation, statistical prediction, solving high-dimensional PDFs as well as parameter estimation, causality analysis and understanding model errors.

3. A Summary of the General Mathematical Structure of Nonlinear Conditional Gaussian Systems

3.1. Conditional Gaussian Nonlinear Dynamical Systems

The conditional Gaussian systems have the following abstract form [28]:
d u I = [ A 0 ( t , u I ) + A 1 ( t , u I ) u II ] d t + Σ I ( t , u I ) d W I ( t ) ,
d u II = [ a 0 ( t , u I ) + a 1 ( t , u I ) u II ] d t + Σ II ( t , u I ) d W II ( t ) .
Once u I ( s ) for s t is given, u II ( t ) conditioned on u I ( s ) becomes a Gaussian process:
p u II ( t ) | u I ( s t ) N ( u ¯ II ( t ) , R II ( t ) ) .
Despite the conditional Gaussianity, the coupled system (1) and (2) remains highly nonlinear and is able to capture the non-Gaussian features as in nature. The conditional Gaussian distribution p u II ( t ) | u I ( s t ) in (3) has closed analytic form [30]
d u ¯ II ( t ) = [ a 0 ( t , u I ) + a 1 ( t , u I ) u ¯ II ] d t + ( R II A 1 ( t , u I ) ) ( Σ I Σ I ) 1 ( t , u I ) × d u I ( A 0 ( t , u I ) + A 1 ( t , u I ) u ¯ II ) d t ,
d R II ( t ) = { a 1 ( t , u I ) R II + R II a 1 ( t , u I ) + ( Σ II Σ II ) ( t , u I ) ( R II A 1 ( t , u I ) ) ( Σ I Σ I ) 1 ( t , u I ) ( R II A 1 ( t , u I ) ) } d t ,
which can be solved in an exact and efficient way. The conditional Gaussian framework (1)-(2)-(5) is useful in the parameter estimation, data assimilation, prediction and uncertainty quantification of complex turbulent dynamical systems as will be discussed throughout this article.
In this article, many conditional Gaussian systems with complex turbulent dynamical structures will be studied. For the convenience of the reader, we will always use a blue color to represent the variables that belong to u I and use a magenta color to denote those for u II .

3.2. Kalman–Bucy Filter: The Simplest and Special Data Assimilation Example within Conditional Gaussian Framework

A special case of the general conditional Gaussian framework (1) and (2) is the so-called Kalman–Bucy filter [105,106,107,108]. The Kalman–Bucy filter is a continuous time version of the Kalman filter [21,94] and it deals with the linear coupled systems:
d u I = [ A 0 ( t ) + A 1 ( t ) u II ] d t + Σ I ( t ) d W I ( t )
d u II = [ a 0 ( t ) + a 1 ( t ) u II ] d t + Σ II ( t ) d W II ( t )
In (6) and (7), all the vectors and matrices A 0 , A 1 , a 0 , a 1 , Σ I and Σ II are functions of only t and they have no dependence on u I in order to guarantee the linearity in the coupled system. In the Kalman–Bucy filter, (7) is the underlying dynamics and (6) is the observational process. The observation cannot change the underlying dynamics and therefore no u I appears in (7).
The filter estimate (also known as the posterior distribution) is the conditional distribution of u II ( t ) given the observation u I ( s t ) , i.e., p ( u II ( t ) | u I ( s t ) ) . In light of (4) and (5), the mean and variance of p ( u II | u I ) has the following explicit expressions:
d u ¯ II ( t ) = [ a 0 ( t ) + a 1 ( t ) u ¯ II ] d t + ( R II A 1 ( t ) ) ( Σ I Σ I ) 1 ( t ) d u I ( A 0 ( t ) + A 1 ( t ) u ¯ II ) d t , d R II ( t ) = a 1 ( t ) R II + R II a 1 ( t ) + ( Σ II Σ II ) ( t ) ( R II A 1 ( t ) ) ( Σ I Σ I ) 1 ( t ) ( R II A 1 ( t ) ) d t .
Corresponding to (4) and (5), Chapter 6 of Bensoussan’s book [109] includes rigorous mathematical derivations of the exact solutions of the Kalman–Bucy filter and some other more general conditional Gaussian filters. It is also pointed out in [109] that all these filters belong to the general conditional Gaussian filtering framework in (1)-(2)-(4)-(5) introduced in [110], which is an early version of the book authored by Liptser and Shiryaev [30].

3.3. Physics-Constrained Nonlinear Models with Conditional Gaussian Statistics

Physics-constrained nonlinear stochastic models are the recent development of data driven statistical-dynamical models for the time series of a partial subset of observed variables, which arise from observations of nature or from an extremely complex physical model [31,32]. The physics-constrained nonlinear stochastic modeling framework builds on early works for single layer models without memory effects, which uses physical analytic properties to constrain data driven methods. These physics-constrained nonlinear stochastic models succeed in overcoming both the finite-time blowup and the lack of physical meaning issues in various ad hoc multi-layer regression models [31,33].
The physics-constrained nonlinear stochastic models contain energy-conserving quadratic nonlinear interactions [1,31,32], namely
d u = ( L + D ) u + B ( u , u ) + F ( t ) d t + Σ ( t , u ) d W ( t ) , with u · B ( u , u ) = 0 ,
where u = ( u I , u II ) and the dimensions of u I and u II are N I and N I I , respectively. In (8), L + D is a linear operator representing dissipation and dispersion. Here, L is skew symmetric representing dispersion and D is a negative definite symmetric operator representing dissipative process such as surface drag, radiative damping, viscosity, etc. B u , u is a bilinear term and it satisfies energy conserving property with u · B u , u = 0 .
Many of the physics-constrained nonlinear stochastic models belong to the conditional Gaussian framework. The goal here is to derive a general class of conditional Gaussian physics-constrained nonlinear stochastic models. To this end, we rewrite the Equation (8) in the following way:
d u I = L I , 1 u I + L I , 2 u II + B I ( u , u ) + F I d t + Σ I ( u I ) d W I , d u II = L II , 1 u I + L II , 2 u II + B II ( u , u ) + F II d t + Σ II ( u I ) d W II ,
where the explicit dependence of the coefficients on time t has been omitted for notation simplicity. In (9), L I , 1 u I , L I , 2 u II , L II , 1 u I and L II , 2 u II correspond to the the linear term L + D in (8) while B I ( u , u ) and B II ( u , u ) represent the nonlinear terms in the processes associated with the variables in (1) and those in (2), respectively. Since the conditional Gaussian systems do not allow quadratic nonlinear interactions between u II and itself, both B I ( u , u ) and B II ( u , u ) can be written down as follows:
B I ( u , u ) = B I , 1 ( u I , u I ) + B I , 2 ( u I , u II ) , B II ( u , u ) = B II , 1 ( u I , u I ) + B II , 2 ( u I , u II ) ,
where B · , 1 ( u I , u I ) stands for the quadratic terms involving only u I and B · , 2 ( u I , u II ) represents the quadratic interactions between u I and u II . Given the nonlinear terms in (10), the energy-conserving quadratic nonlinearity in (8) implies
u I · B I , 1 ( u I , u I ) + B I , 2 ( u I , u II ) + u II · B II , 1 ( u I , u I ) + B II , 2 ( u I , u II ) = 0 .
Inserting (10) into (9) yields the conditional Gaussian systems with energy-conserving quadratic nonlinear interactions,
d u I = B I , 1 ( u I , u I ) + B I , 2 ( u I , u II ) + L I , 1 u I + L I , 2 u II + F I d t + Σ I ( u I ) d W I ,
d u II = B II , 1 ( u I , u I ) + B II , 2 ( u I , u II ) + L II , 1 u I + L II , 2 u II + F II d t + Σ II ( u I ) d W II ,
Now, we explore the detailed forms of the energy-conserving nonlinear terms in (12) and (13).
We start with B II , 2 ( u I , u II ) , which can be written as
B II , 2 ( u I , u II ) = S II ( u I ) u II , with S II ( u I ) = j = 1 N I S II , j u I , j ,
where each S II , j is a skew-symmetric matrix with S II , j T = S II , j and u I , j is the j-th entry of u I . The energy-conserving property is easily seen by multiplying u II to B II , 2 ( u I , u II ) in (14),
u II · B II , 2 ( u I , u II ) = u II · S ( u I ) u II = j = 1 N I u I , j · u II · S II , j u II = 0 .
Due to the skew-symmetric property of S II , j . In fact, B II , 2 ( u I , u II ) usually represents the internal oscillation with non-constant oscillation frequency that depends on u I .
Next, B I , 2 ( u I , u II ) contains three components,
B I , 2 ( u I , u II ) = B I , 2 1 ( u I , u II ) + B I , 2 2 ( u I , u II ) + B I , 2 3 ( u I , u II ) .
One of the components in (15), say B I , 2 1 ( u I , u II ) , has its own energy conservation, i.e.,
u I · B I , 2 1 ( u I , u II ) = 0 .
Here, B I , 2 1 ( u I , u II ) = S I ( u I ) u II and therefore
u I · S I ( u I ) u II = 0 ,
where each column of S I ( u I ) is given by
S I , j ( u I ) = S I , j u I ,
with S I , j being a skew-symmetric matrix. Thus, with (17) in hand, (16) becomes
j = 1 N II u I · S I , j · u I u II , j = 0 ,
where u II , j is the j-th entry of u II .
The other two components of B I , 2 ( u I , u II ) in (12) involve the interactions with B II , 1 ( u I , u I ) = B II , 1 2 ( u I , u I ) + B II , 1 3 ( u I , u I ) in (13). On one hand, the energy-conserving property in the following two terms is obvious,
B I , 2 2 ( u I , u II ) = j = 1 N I Γ j u I , j u II ,
B II , 1 2 ( u I , u I ) = j = 1 N I Γ j T u I 2 ,
where each Γ j is a N I × N II matrix, u I , j is the j-th entry of u I and u I 2 is a vector of size N I × 1 with the j-th entry being u I , j 2 . On the other hand, the remaining two terms B I , 2 3 ( u I , u II ) and B II , 1 3 ( u I , u I ) are similar to those in (18)–(20) but deal with the cross-interactions between different components of u I such as replacing u I 2 by u I , j 1 u I , j 2 in (20). To this end, we define the following
G ( u I ) = j = 1 N I G j u I , j ,
which satisfies
u I · G ( u I ) u II u II · G T ( u I ) u I = 0 .
In fact, (18)–(21) are important for generating the intermittent instability, where u II plays the role of both damping and anti-damping for the dynamics of u I .
Finally, B I , 1 ( u I , u I ) involves any iterations between u I and itself that satisfies
u I · B I , 1 ( u I , u I ) = 0 .
Therefore, with (14)–(22) in hand, the conditional Gaussian system with energy-conserving quadratic nonlinear interactions (12) and (13) has the following form:
d u I = B I , 1 ( u I , u I ) + j = 1 N I Γ j u I , j u II + S I ( u I ) u II + G ( u I ) u II + L I , 1 u I + L I , 2 u II + F I d t + Σ I ( u I ) d W I ,
d u II = S II ( u I ) u II j = 1 N I Γ j T u I 2 + L II , 1 u I + L II , 2 u II G T ( u I ) u I + F II d t + Σ II ( u I ) d W II .

3.4. Multiscale Conditional Gaussian with MTV Stochastic Modeling Strategy

Let’s start with a general nonlinear deterministic model with quadratic nonlinearity,
d u = ( L + D ) u + B ( u , u ) + F ( t ) d t ,
where the notations of vectors and matrices are the same as in Section 3.3. In (25), the state variables are again decomposed into u = ( u I , u II ) . Here, ( u I , u II ) has multiscale features, where u I denotes the resolved variables that evolve slowly in time (e.g., climate variables) while u II are unresolved or unobserved fast variables (e.g., weather variables). The system (25) can be written down into more detailed forms:
d u I = [ ( L 11 + D 11 ) u I + ( L 12 + D 12 ) u II + B 11 1 ( u I , u I ) + B 12 1 ( u I , u II ) + B 22 1 ( u II , u II ) + F 1 ( t ) ] d t ,
d u II = [ ( L 22 + D 22 ) u II + ( L 21 + D 21 ) u I + B 11 2 ( u I , u I ) + B 12 2 ( u I , u II ) + B 22 2 ( u II , u II ) + F 2 ( t ) ] d t .
Different from (12) and (13), the nonlinear interaction between fast scale variables themselves B 22 1 ( u II , u II ) and B 22 2 ( u II , u II ) are also included in (26) and (27). To make the mutiscale system (26) and (27) fit into the conditional Gaussian framework, two modifications are needed. First, if we link (26) and (27) with the general conditional Gaussian framework (1) and (2), the quadratic terms involving the interactions between u II and itself, namely B 22 1 ( u II , u II ) and B 22 2 ( u II , u II ) , are not allowed there. Second, stochastic noise is required at least to the system of u I .
To fill in these gaps, the most natural way is to apply idea from the MTV strategy to the multiscale system (26) and (27). The MTV strategy [45,46,47,48], named after Majda, Timofeyev and Vanden-Eijnden, is a systematic mathematical strategy for stochastic mode reduction. The MTV strategy is a two-step procedure:
  • The equations of motion for the unresolved fast modes are modified by representing the nonlinear self-interactions terms between unresolved modes by stochastic terms.
  • The equations of motion for the unresolved fast modes are eliminated using the standard projection technique for stochastic differential equations.
In fact, we only need to take the first step in the MTV strategy here. Using ϵ to represent the time scale separation between u I and u II , the terms with quadratic nonlinearity of u II and itself are approximated by
B 22 1 ( u II , u II ) Γ 1 ϵ u II + Σ I ϵ W I , B 22 2 ( u II , u II ) Γ 2 ϵ u II + Σ II ϵ W II .
That is, we replace the quadratic nonlinearity of u II by a linear damping and stochastic noise, where Γ 1 , Γ 2 , Σ I and Σ II are positive definite symmetric matrices. The motivation of (28) is that the nonlinear self-interacting terms of fast variables u II are responsible for the chaotic sensitive dependence on small perturbations and do not require a more detailed description if their effect on the coarse-grained dynamics for the climate variables alone is the main objective. On the other hand, the quadratic nonlinear interactions between u I and u II are retained.
Note that in the second step of the MTV strategy, standard techniques of averaging and adiabatic elimination of fast modes in stochastic equations are applied, assuming that ϵ 1 . Here, we assume ϵ is a finite small value and thus do not apply the second step of the MTV strategy.
Now, inserting (28) into (26) and (27) yields
d u I = ( L 11 + D 11 ) u I + ( L 12 + D 12 ) u II + B 11 1 ( u I , u I ) + B 12 1 ( u I , u II ) + F 1 ( t ) d t + Σ I d W I ( t ) ,
d u II = ( L 22 + D 22 ) u II + ( L 21 + D 21 ) u I + B 11 2 ( u I , u I ) + B 12 2 ( u I , u II ) + F 2 ( t ) d t + Σ II W II ( t ) ,
where L 12 = L 12 Γ 1 / ϵ , L 22 = L 22 Γ 2 / ϵ , Σ I = Σ I / ϵ and Σ II = Σ II / ϵ . Clearly, (29) and (30) belongs to the conditional Gaussian framework. Notably, if the nonlinear terms in (29) and (30) satisfy the conditions in Section 3.3, then (29) and (30) becomes a physics-constrained nonlinear model.

5. Algorithms Which Beat the Curse of Dimension for Fokker–Planck Equation for Conditional Gaussian Systems: Application to Statistical Prediction

The Fokker–Planck equation is a partial differential equation (PDE) that governs the time evolution of the probability density function (PDF) of a complex system with noise [66,67]. For a general nonlinear dynamical system,
d u = F ( u , t ) d t + Σ ( u , t ) d W ,
with state variables u R N , noise matrix Σ R N × K and white noise W R K , the associated Fokker–Planck equation is given by
t p ( u , t ) = u F ( u , t ) p ( u , t ) + 1 2 u · u ( Q ( u , t ) p ( u , t ) ) , p t | t = t 0 = p 0 ( u ) ,
with Q = Σ Σ T . In many complex dynamical systems, including geophysical and engineering turbulence, neuroscience and excitable media, the solution of the Fokker–Planck equation in (123) involves strong non-Gaussian features with intermittency and extreme events [1,57,191]. In addition, the dimension of u in these complex systems is typically very large, representing a variety of variability in different temporal and spatial scales [1,121] (see also the examples in Section 4). Therefore, solving the high-dimensional Fokker–Planck equation for both the steady state and transient phases with non-Gaussian features is an important topic. However, traditional numerical methods such as finite element and finite difference as well as the direct Monte Carlo simulations of (122) all suffer from the curse of dimension [68,69]. Nevertheless, for the conditional Gaussian systems (1) and (2), efficient statistically accurate algorithms can be developed for solving the Fokker–Planck equation in high dimensions and thus beat the curse of dimension. Since the conditional Gaussian system is able to capture many salient features of the turbulent behavior, such algorithms are quite useful in uncertainty quantification, data assimilation and statistical prediction of turbulent phenomena in nature.

5.1. The Basic Algorithm with a Hybrid Strategy

Here, we state the basic efficient statistically accurate algorithms developed in [38]. The only underlying assumption here is that the dimension of u I is low while the dimension of u II can be arbitrarily high.
First, we generate L independent trajectories of the variables u I , namely u I 1 ( s t ) , , u I L ( s t ) , where L is a small number. This first step can be done by running a Monte Carlo simulation for the full system, which is computationally affordable with a small L. Then, different strategies are adopted to deal with the parts associated with u I and u II , respectively. The PDF of u II is estimated via a parametric method that exploits the closed form of the conditional Gaussian statistics in (4) and (5),
p ( u II ( t ) ) = lim L 1 L i = 1 L p ( u II ( t ) | u I i ( s t ) ) .
See [38] for the details of the derivation of (124). Note that the limit L in (124) (as well as (125) and (126) below) is taken to illustrate the statistical intuition, while the estimator is the non-asymptotic version. On the other hand, due to the underlying assumption of the low dimensionality of u I , a Gaussian kernel density estimation method is used for solving the PDF of the observed variables u I ,
p u I ( t ) = lim L 1 L i = 1 L K H u I ( t ) u I i ( t ) ,
where K H ( · ) is a Gaussian kernel centered at each sample point u I i ( t ) with covariance given by the bandwidth matrix H ( t ) . The kernel density estimation algorithm here involves a “solve-the-equation plug-in” approach for optimizing the bandwidth [192] that works for any non-Gaussian PDFs. Finally, combining (124) and (125), a hybrid method is applied to solve the joint PDF of u I and u II through a Gaussian mixture,
p ( u I ( t ) , u II ( t ) ) = lim L 1 L i = 1 L K H ( u I ( t ) u I i ( t ) ) · p ( u II ( t ) | u I i ( s t ) ) .
Practically, L O ( 100 ) is sufficient for the hybrid method to solve the joint PDF with N I 3 and N II O ( 10 ) . See [38] for the illustration of various concrete examples. Note that the closed form of the L conditional distributions in (124) can be solved in a parallel way due to their independence [38], which further reduces the computational cost. Rigorous analysis [193] shows that the hybrid algorithm (126) requires a much less number of samples as compared with traditional Monte Carlo method especially when the dimension of u II is large.

5.2. Beating the Curse of Dimension with Block Decomposition

The basic algorithm in (126) succeeds in solving the Fokker–Planck equation with O ( 10 ) state variables. However, many complex turbulent dynamical systems in nature involve variables that have a much higher dimension (for example, those in Section 4.3). In such a situation, the update of the conditional covariance matrix becomes extremely expensive since the number of entries in the covariance matrix is N 2 , where n = N I + N II is the total dimension of the variables. Therefore, new strategies are required to be incorporated into the basic algorithm in (126) in order to beat the curse of dimension. In this subsection, we develop an effective strategy with block decomposition [36], which combines with the basic algorithm and can efficiently solve the Fokker–Planck equation in much higher dimensions even with orders in the millions.
Consider the following decomposition of state variables:
u k = ( u I , k , u II , k ) with u I , k R N I , k and u II , k R N II , k ,
where 1 k K , N I = k = 1 K N I , k and N II = k = 1 K N II , k . Correspondingly, the full dynamics in (1) and (2) are also decomposed into K groups, where the variables on the left-hand side of the k-th group are u k . In addition, for notation simplicity, we assume both Σ I and Σ II are diagonal and thus the noise coefficient matrices associated with the equations of u I , k and u II , k are Σ I , k and Σ II , k , respectively.
To develop efficient statistically accurate algorithms that beat the curse of dimension, the following two conditions are imposed on the coupled system.
Condition 1: In the dynamics of each u k in (1) and (2), the terms A 0 , k and a 0 , k can depend on all the components of u I while the terms A 1 , k and a 1 , k are only functions of u I , k , namely,
A 0 , k = A 0 , k ( t , u I ) , a 0 , k = a 0 , k ( t , u I ) , A 1 , k = A 1 , k ( t , u I , k ) , a 1 , k = a 1 , k ( t , u I , k ) .
In addition, only u II , k interacts with A 1 , k and a 1 , k on the right-hand side of the dynamics of u k . Therefore, the equation of each u I = ( u I , k , u II , k ) becomes
d u I , k = [ A 0 ( t , u I ) + A 1 ( t , u I , k ) u II , k ] d t + Σ I ( t , u I , k ) d W I ( t ) ,
d u II , k = [ a 0 ( t , u I ) + a 1 ( t , u I , k ) u II , k ] d t + Σ II ( t , u I , k ) d W II ( t ) .
Note that in (128) and (129) each u k is fully coupled with other u k for all k k through A 0 ( t , u I ) and a 0 ( t , u I ) . There is no trivial decoupling between different state variables.
Condition 2: The initial values of ( u I , k , u II , k ) and ( u I , k , u II , k ) with k k are independent with each other.
These two conditions are not artificial and they are actually the salient features of many complex systems with multiscale structures [73], multilevel dynamics [119] or state-dependent parameterizations [103]. Under these two conditions, the conditional covariance matrix becomes block diagonal, which can be easily verified according to (5). The evolution of the conditional covariance of u II , k conditioned on u I is given by:
d R II , k ( t ) = a 1 , k R II , k + R II , k a 1 , k + ( Σ II , k Σ II , k ) ( R II , k A 1 , k ) ( Σ I , k Σ I , k ) 1 ( R II , k A 1 , k ) d t ,
which has no interaction with that of R II , k for all k k since A 0 and a 0 do not enter into the evolution of the conditional covariance. Notably, the evolutions of different R II , k with k = 1 , , K can be solved in a parallel way and the computation is extremely efficient due to the small size of each individual block. This facilitates the algorithms to efficiently solve the Fokker–Planck equation in large dimensions.
Next, the structures of A 0 , k and a 0 , k in (127) allow the coupling among all the K groups of variables in the conditional mean according to (4). The evolution of u ¯ II , k , namely the conditional mean of u II , k conditioned on u I , is given by
d u ¯ II , k ( t ) = [ a 0 , k + a 1 , k u ¯ II , k ] d t + R II , k A 1 , k ( Σ I , k Σ I , k ) 1 [ d u I , k ( A 0 , k ( t , u I ) + A 1 , k u ¯ II , k ) d t ] .
Finally, let’s use concrete examples to illustrate the reduction in the computational cost with the block decomposition.
The first example is the two-layer L-96 model in (35). A typical choice of the number of grid points for the large scale variables u i ( i = 1 , , I ) is I = 40 and that of the equations for the small scale variables v i , j ( j = 1 , , J ) is J = 5 . Thus, the total number of the small-scale variables v i , j is I × J = 200 and the size of conditional covariance is n t o t a l = ( I × J ) 2 = 40,000 . Note that the two-layer L-96 model in (35) can be decomposed into the form in (128) and (129). Each block contains one element of u k and five elements of v k , j with j = 1 , , 5 and k fixed, and the number of the blocks is K = I = 40 . Therefore, the conditional covariance matrix associated with each block is of size 5 × 5 = 25 and the total number of the elements that need to be updated is n r e d u c e d = 25 × 40 = 1000 , which is only 2.5 % compared with the total elements in the full conditional covariance matrix!
Another example is the stochastically coupled FHN models. In fact, the block decomposition can be applied to all the versions (67)–(69) of the stochastically coupled FHN models as discussed in Section 4.2.1. Here, we illustrate the one in (69) with multiplicative noise. Let’s take n = 500 . Since the conditional statistics involves the information for both v i and γ i with i = 1 , , 500 , the total size of the conditional covariance is n t o t a l = ( 2 × 500 ) 2 = 1,000,000 . On the other hand, with the block decomposition that involves 500 blocks, where each block is of size 2 × 2 = 4 and contains the information of only one v i and one γ i , the total number of the elements that need to be updated is n r e d u c e d = 500 × 4 = 2000 . Clearly, the computational cost in the conditional covariance update with the block decomposition is only n r e d u c e d / n t o t a l = 0.5 % compared with a full update! In Section 5.5, the statistical prediction of both the two-layer L-96 model and the stochastically coupled FHN model with multiplicative noise using the efficient statistically accurate algorithms will be demonstrated.

5.3. Statistical Symmetry

As was discussed in the previous two subsections, the hybrid strategy and the block decomposition provide an extremely efficient way to solve the high-dimensional Fokker–Planck equation associated with the conditional Gaussian systems. In fact, the computational cost in the algorithms developed above can be further reduced if the coupled system (1) and (2) has statistical symmetry [36]:
p u I , k ( t ) , u II , k ( t ) = p u I , k ( t ) , u II , k ( t ) , for   all   k and   k .
Namely, the statistical features for variables with different k are the same. The statistical symmetry is often satisfied when the underlying dynamical system represents a discrete approximation of some PDEs in a periodic domain with nonlinear advection, diffusion and homogeneous external forcing [2,20].
With the statistical symmetry, collecting the conditional Gaussian ensembles N ( u ¯ II , k ( t ) , R II , k ( t ) ) for a specific k in K different simulations is equivalent to collecting that for all k with 1 ≤ kK in a single simulation. This also applies to N ( u I i ( t ) , H ( t ) ) that are associated with u I . Therefore, the statistical symmetry implies that the effective sample size is L = K L , where K is the number of the group variables that are statistically symmetric and L is the number of different simulations of the coupled systems via Monte Carlo. If K is large, then a much smaller L is needed to reach the same accuracy as in the situation without utilizing statistical symmetry, which greatly reduces the computational cost.
Below, we discuss the details of constructing the joint PDFs obtaining from the algorithms with statistical symmetry. First, we adopt the one-dimensional case for the convenience of illustration. The method can be easily extended to systems with multivariables and multidimensions. Denote u ¯ 1 , u ¯ 2 , , u ¯ K the mean values of the Gaussian ensembles at different grid points and the associated variance are R 1 , , R K . For simplicity, we only take one full run of the system. Therefore, the total number of Gaussian ensembles is K. Clearly, the 1D PDFs p ( u i ) at different grid points are the same and are given by
p ( u i ) = lim k 1 K k = 1 K N ( u ¯ k , R k ) , i = 1 , , K .
The limit is taken for statistical intuition while a finite and small K is adopted in practice.
Now, we discuss the construction of the joint PDFs. We use the 2D joint PDF p ( u 1 , u 2 ) as an illustration. The joint PDF is a Gaussian mixture with K Gaussian ensembles, where the mean of each 2D Gaussian ensemble is
μ 1 = u ¯ 1 u ¯ 2 , μ 1 = u ¯ 2 u ¯ 3 μ K 1 = u ¯ K 1 u ¯ K , μ K = u ¯ K u ¯ 1 ,
and the covariance matrix is given by
R 1 = R 1 R 2 , R 2 = R 2 R 3 R K 1 = R K 1 R K , R K = R K R 1 .
It is clear from the construction of the ensemble mean in (132) that the subscript in the second component equals that of the first component plus one. That is, the first component of each μ k , k = 1 , , K is treated as u 1 due to the statistical symmetry and the second component is treated as the corresponding u 2 associated with each k. The diagonal covariance matrix in (133) comes from the fact that each sample point is independent with each other. This is also true and more obvious for the block diagonal conditional covariance. Notably, the diagonal covariance matrix of each ensemble does not mean that the correlation between u 1 and u 2 is completely ignored. The correlation is reflected regarding how the points of ensemble means μ k , k = 1 , , K are distributed.

5.4. Quantifying the Model Error Using Information Theory

Before we apply the efficient statistically accurate algorithms to statistical prediction, we introduce a measurement for quantifying errors. The natural way to quantify the error in the predicted PDF related to the truth is through an information measure, namely the relative entropy (or Kullback–Leibler divergence) [2,187,194,195,196,197]. The relative entropy is defined as
P ( p ( u ) , p M ( u ) ) = p ( u ) ln p ( u ) p M ( u ) d u ,
where p ( u ) is the true PDF and p M ( u ) is the predicted one from the efficient statistically accurate algorithms. This asymmetric functional on probability densities P ( p , p M ) 0 measures lack of information in p M compared with p and has many attractive features. First, P ( p , p M ) 0 with equality if and only if p = p M . Secondly, P ( p , p M ) is invariant under general nonlinear changes of variables. Notably, the relative entropy is a good indicator of quantifying the difference in the tails of the two PDFs, which is particularly crucial in the turbulent dynamical systems with intermittency and extreme events. On the other hand, the traditional ways of quantifying the errors, such as the relative error p p M / p , usually underestimate the lack of information in the PDF tails.

5.5. Applications to Statistical Prediction

Now, we apply the efficient statistically accurate algorithms developed in the previous subsections to statistical prediction. Two examples of nonlinear complex turbulent dynamical systems in high dimensions are shown below.
The first example is the stochastically coupled FHN model in (69) with multiplicative noise (Section 4.2.1) with n = 500 ,
ϵ d u i = d u ( u i + 1 + u i 1 2 u i ) + u i 1 3 u i 3 v i d t + ϵ δ 1 d W u i , d v i = γ i u i + a d t + δ 2 d W v i , d γ i = d γ i ( γ i γ ^ i ) d t + σ γ i d W γ i , i = 1 , , N .
The parameters are given by (71).
The second example is the two-layer L-96 model in (35) (Section 4.1.1) with I = 40 and J = 5 . The parameters are given by (36) and (37):
d u i = u i 1 ( u i + 1 u i 2 ) + j = 1 J γ i , j u i v i , j d ¯ i u i + F d t + σ u d W u i , i = 1 , , I , d v i , j = d v i , j v i , j γ j u i 2 d t + σ i , j d W v i , j , j = 1 , , J .
Both the stochastically coupled FHN model (135) and the two-layer L-96 model (136) satisfy the model structure as described in Section 5.2 such that the block decomposition applies. Figure 19 provides a schematic illustration of the coupling between different variables in these models (for illustration purposes, I = n = 6 is used in the figure). The multiscale and layered structures can be easily observed. It is also clear that all the variables are coupled to each other with no trivial decoupling. Note that the stochastically coupled FHN model (135) with the given parameters here also satisfies the statistical symmetry while the two-layer L-96 model (136) is inhomogeneous.
Figure 19. Schematic illustration of the coupling between different variables in the stochastically coupled FHN model (135) (left) and the two-layer L-96 model (136) (right). For illustration purposes, I = n = 6 is used in this figure while I = 40 and n = 500 in the model simulations. All the variables are coupled to each other with no trivial decoupling.

5.5.1. Application to the Stochastically Coupled FHN Model with Multiplicative Noise Using Statistical Symmetry

Here, we illustrate the statistical prediction of the stochastically coupled FHN model with multiplicative noise (135), where the number of grid points in space is n = 500 . The noise coefficient in the γ i process is σ γ i = 0.6 for all i. The other parameters are the same as those in (71) and the noise coefficients are the same at different grid points i, namely
ϵ = 0.01 , δ 1 = 0.2 , δ 2 = 0.4 , a = 1.05 , d γ i = 1 , γ ^ i = γ ^ = 1 , σ γ i = σ γ = 0.6 .
We also adopt homogeneous initial conditions u i ( 0 ) = 2 and v i ( 0 ) = 0.5 for all i = 1 , , N . Therefore, the model satisfies the statistical symmetry. The model simulations were shown in Figure 12 and Figure 13, where intermittent events are observed in both the fields of u and v.
Figure 20 shows the time evolution of the first four moments associated with u 1 and v 1 . Note that due to the statistical symmetry, the evolutions of these moments for different u i and v i are the same as u 1 and v 1 . The dot at t = 0.68 indicates the time instant that u arrives at its most non-Gaussian phase while t = 4.2 is a non-Gaussian phase, where u is nearly the statistical equilibrium while v is still transient. In Figure 21 and Figure 22, the statistical prediction of the 1D marginal and 2D joint PDFs are shown at these two time instants. At t = 0.68 , most of the probability density of u 1 is concentrated around u 1 = 1.8 . However, there is a small but non-zero probability around u 1 = 2 (see the subpanel with the logarithm scale), which leads to large skewness and kurtosis of the statistics of u 1 as observed in Figure 20. The efficient statistically accurate algorithms succeed in predicting the PDFs at this transient phase and are able to accurately recover the statistics in predicting such extreme events. A similar performance is found in Figure 22, where the bimodal PDF of u 1 , the highly skewed PDF of v 1 and the Gaussian one of γ 1 are all predicted with high accuracy. The fully non-Gaussian joint PDFs p ( u 1 , v 1 ) and p ( v 1 , γ 1 ) are also predicted with almost no bias. These results indicate the skillful behavior of the algorithms developed above.
Figure 20. Time evolution of the first four moments ((a) mean; (b) variance; (c) skewness and (d) kurtosis) associated with u 1 and v 1 of the stochastically coupled FHN model with multiplicative noise (135), where n = 500 . Due to the statistical symmetry, the evolutions of these moments for different u i and v i are the same as u 1 and v 1 . The dots at t = 0.68 and t = 4.2 indicate the time instants that the statistical prediction using the efficient statistically accurate algorithms are tested. Note that t = 0.68 corresponds to the time instant that u arrives at its most non-Gaussian phase while t = 4.2 is a non-Gaussian phase where u is nearly the statistical equilibrium while v is still transient.
Figure 21. Statistical prediction of the 1D and 2D PDFs of the stochastically coupled FHN model with multiplicative noise (135) at a transit phase t = 0.68 , where n = 500 . Note that there is a small but non-zero probability around u 1 = 2 (see the subpanel with the logarithm scale).
Figure 22. Statistical prediction of the 1D and 2D PDFs of the stochastically coupled FHN model with multiplicative noise (135) at a transit phase t = 4.2 , where n = 500 .
Note that the total dimension of the stochastically coupled FHN model with multiplicative noise (135) is 3 N = 1500 . Due to the statistical symmetry, the effective sample size in the statistical prediction as shown in Figure 21 and Figure 22 is L = N L = 500 L , where L is the number of repeated simulations of the systems. In fact, the simulations in the efficient statistically accurate algorithms are achieved with only L = 1 . This means we only run the model (135) once and apply the efficient statistically accurate algorithms that provide the accurate results in these figures. Therefore, the statistical prediction here is extremely cheap! On the other hand, we take L C = 300 in Monte Carlo simulations and again use statistical symmetry to generate the true PDFs and therefore the effective sample size in Monte Carlo simulation is L C = N L C = 150,000 .

5.5.2. Application to the Two-Layer L-96 Model with Inhomogeneous Spatial Structures

Now we apply the efficient statistically accurate algorithms to predict the two-layer L-96 model in (136). Here, I = 40 and J = 5 . The parameters are given by (36) and (37). The model behavior with different forcing F is shown in Figure 2, Figure 3 and Figure 4.
Although the two-layer inhomogeneous L-96 model in (136) has no statistical symmetry, the model structure nevertheless allows the effective block decomposition. Below, L = 500 trajectories of each variable u i are simulated from (136) to implement the efficient statistically accurate algorithms. As a comparison, a direct Monte Carlo method requires L C = 150,000 samples for each of the 240 variables for an accurate estimation of at least the one-point statistics. This means the total number of samples is around 4 × 10 7 ! For an efficient calculation of the truth, we focus only on the statistical equilibrium state here, but the algorithms are not restricted to the equilibrium statistics. The true PDFs are calculated using the Monte Carlo samples over a long time series in light of the ergodicity while the recovered PDFs from the efficient statistically accurate algorithms are computed at t = 25 .
In Figure 23, Figure 24 and Figure 25, we show the statistical prediction in all the three regimes with F = 5 , 8 and 16, respectively, using the efficient statistically accurate algorithm and compare them with the truth. Here, we only show the results at i = 11 . Qualitative similar results are found at other grid points. In these figures, the diagonal subpanels here show the 1D marginal PDFs of u 11 and v 11 , 1 , , v 11 , 5 , where the blue one is the prediction and the red one is the truth. The ( k i , k j ) -subpanel with k i > k j (below the diagonal panel) shows the true 2D PDF using a large number of Monte Carlo samples (red colormap) while the one with k i < k j (above the diagonal panels) shows the predicted one using the efficient statistically accurate algorithm (blue colormap). The ( k i , k j ) -panel is compared with the ( j , i ) -panel. Note that for the simplicity of comparison the labels u 11 and v 11 , 1 , , v 11 , 5 on the bottom and left of the ( i , j ) -panel correspond to the x-axis and y-axis of the truth and the y-axis and x-axis of the predicted PDFs. On the other hand, Figure 26 shows several 2D joint PDFs of the large scale p ( u i 1 , u i 2 ) with different i 1 and i 2 . It is clear that all the strong non-Gaussian features, including highly skewed, multimodal and fat-tailed distribution, are accurately predicted by the statistically accurate algorithm. Finally, Figure 27 shows the error in the predicted PDF as a function of the sample points L compared with the truth, where the error is computed via the relative entropy (134). In fact, with L O ( 100 ) samples in the efficient statistically accurate algorithms, the statistical prediction has already been accurate enough. Note that the small non-zero asymptotic values as a function of L in the error curves is due to the numerical error in computing the integral of the relative entropy (134) as well as the sampling in the Monte Carlo simulations.
Figure 23. Inhomogeneous two-layer L-96 model (35) with F = 5. The other parameters are given in (36) and (37). Comparison of the 1D and 2D true and recovered PDFs at i = 11 . The diagonal subpanels here show the 1D marginal PDFs of u 11 and v 11 , 1 , , v 11 , 5 , where the blue one is the prediction and the red one is the truth. In this figure with 6 × 6 subpanels, the ( k i , k j ) -subpanel with k i > k j (below the diagonal panel) shows the true 2D PDF using a large number of Monte Carlo samples (red colormap) while the one with k i < k j (above the diagonal panels) shows the predicted one using the efficient statistically accurate algorithm (blue colormap). The ( k i , k j ) -panel is compared with the ( j , i ) -panel. Note that for the simplicity of comparison the labels u 11 and v 11 , 1 , , v 11 , 5 on the bottom and left of the ( i , j ) -panel correspond to the x-axis and y-axis of the truth and the y-axis and x-axis of the predicted PDFs.
Figure 24. Inhomogeneous two-layer L-96 model (35), similar to Figure 23 but with F = 8.
Figure 25. Inhomogeneous two-layer L-96 model (35), similar to Figure 23 but with F = 16.
Figure 26. Inhomogeneous two-layer L-96 model (35) with F = 5 (top); F = 8 (middle) and F = 16 (bottom). The other parameters are given in (36) and (37). Comparison of the joint PDFs of the large scale p ( u i 1 , u i 2 ) with different i 1 and i 2 .
Figure 27. Inhomogeneous two-layer L-96 model (35) with F = 5 . Error in the predicted PDF as a function of the sample points L compared with the truth. The error is computed via the relative entropy (134).
It is worthwhile pointing out that although only the predicted one-point and two-point statistics are shown in both the examples here, the algorithms can actually provide an accurate estimation of the full joint PDF of u II , using a small number of samples. This is because the sample size in these algorithms does not grow exponentially as the dimension of u II , which is fundamentally different from Monte Carlo methods. See [193] for a theoretical justification.

6. Multiscale Data Assimilation, Particle Filters, Conditional Gaussian Systems and Information Theory for Model Errors

6.1. Parameter Estimation

The conditional Gaussian system (1) and (2) and its closed analytic formula for solving the conditional statistics (4) and (5) also provide a framework for parameter estimation. In fact, u II can be written as
u II = ( u ˜ II , Λ ) ,
where u II in R N ˜ 2 is physical process variables and Λ = ( λ 1 , λ 2 , , λ p ) R N 2 , p denotes the model parameters. Here, N 2 = N ˜ 2 + N 2 , p . Rewriting the conditional Gaussian system (1) and (2) in terms of u II = ( u ˜ II , Λ ) yields
d u I = [ A 0 ( t , u I ) + A 1 ( t , u I ) u ˜ II + A 1 , λ ( t , u I ) Λ ] d t + Σ I ( t , u I ) d W I ( t ) ,
d u II = [ a 0 ( t , u I ) + a 1 ( t , u I ) u ˜ II + a 1 , λ ( t , u I ) Λ ] d t + Σ II ( t , u I ) d W II ( t ) .
The Equation (138) and (139) includes both the dynamics of u ˜ II and those of Λ . Both the physical process variables u ˜ II and the parameters Λ are coupled with any highly nonlinear functions of u I . Nevertheless, any monomial involving both u ˜ II and Λ is not allowed since otherwise the conditional Gaussian structure will break. Notably, although u ˜ II and Λ are named as model states and parameters, they can also be understood as stochastic parameterization and simplification of complex physical process via bulk average, respectively. Therefore, u ˜ II and Λ often share the same role in providing extra information of u I that leads to various non-Gaussian characteristics. This is a typical feature in multiscale complex turbulent dynamical systems.
Below, we provide different frameworks for parameter estimation. Let’s temporally ignore u ˜ II for notation simplicity and the dynamics is given by
d u I = [ A 0 ( t , u I ) + A 1 , λ ( t , u I ) Λ ] d t + Σ I ( t , u I ) d W I ( t ) ,
where Λ is the true parameter.

6.1.1. Direct Parameter Estimation Algorithm

Since Λ are constant parameters, it is natural to augment the dynamics with the following relationship,
d u I = [ A 0 ( t , u I ) + A 1 , λ ( t , u I ) Λ ] d t + Σ I ( t , u I ) d W I ( t ) ,
d Λ = 0 ,
where an initial uncertainty for the parameter Λ is assigned. According to (4) and (5), the time evolutions of the mean u ¯ II and covariance R II of the estimate of Λ are given by
d u ¯ II ( t ) = ( R II A 1 ( t , u I ) ) ( Σ I Σ I ) 1 ( t , u I ) d u I ( A 0 ( t , u I ) + A 1 ( t , u I ) u ¯ II ) d t ,
d R II ( t ) = ( R II A 1 ( t , u I ) ) ( Σ I Σ I ) 1 ( t , u I ) ( R II A 1 ( t , u I ) ) d t .
The formula in (144) indicates that R II = 0 is a solution, plugging which into (143) results in u ¯ II = Λ . This means, by knowing the perfect model, the estimated parameters in (141) and (142) and (143) and (144) under certain conditions will converge to the truth.
As a simple test example, we consider estimating the three parameters σ , ρ and γ in the noisy L-63 model (31) with ρ = 28 , σ = 10 , β = 8 / 3 . Putting into the framework (141) and (142), the augmented system becomes
d x = σ ( y x ) d t + σ x d W x , d y = x ( ρ z ) y d t + σ y d W y , d z = ( x y β z ) d t + σ z d W z , d σ = 0 , d ρ = 0 , d β = 0 ,
where u I = ( x , y , z ) T and Λ = ( σ , ρ , β ) T . Figure 28 shows the parameter estimation skill with σ x = σ y = σ z = 5 . Here, the initial guess of all the three parameters are ρ = σ = β = 0 and the initial uncertainty is a 3 × 3 identity matrix. It is clear that the estimated parameter converge quickly to the truth and the corresponding uncertainty in the estimated parameters goes to zero. In Figure 29, the parameter estimation skill with different noise level is compared. When the noise level σ x , σ y and σ z in the observed signals become larger, the convergence rate becomes slower.
Figure 28. Parameter estimation of the noisy L-63 system using the direct approach (145) with σ x = σ y = σ z = 5. (ac) sample trajectories of x , y , z which are used as observations; (df) time evolution of the estimated parameters and the truth (red); (gi) the uncertainty evolution of each estimated parameter.
Figure 29. Parameter estimation of the noisy L-63 system using the direct approach (145) with σ x = σ y = σ z = 1, 5 and 15 for the three rows. For illustration purposes, only the trajectory of x and the estimation of σ is shown in all the three cases. (a) sample trajectory of x; (b,c) mean estimation of σ and the associated uncertainty.
In [28], systematic studies of both one-dimensional linear and nonlinear systems have revealed that the convergence rate depends on different factors, such as the observability, the signal-to-noise ratio and the initial uncertainty. In particular, both theoretical analysis and numerical simulations in [28] showed that the convergence rate can be extremely slow and sometimes with an undesirable initial guess the estimation even converges to a wrong solution for some highly nonlinear systems. Thus, alternative parameter estimation scheme is required.

6.1.2. Parameter Estimation Using Stochastic Parameterized Equations

Instead of augmenting the equation in a trivial but natural way for the parameters as shown in (142), a new approach of the augmented system can be formed in the following way [28]:
d u I = [ A 0 ( t , u I ) + A 1 , λ ( t , u I ) Λ ] d t + Σ I ( t , u I ) d W I ( t )
d Λ = [ c 1 Λ + c 2 ] d t + σ Λ d W Λ ( t ) .
Here, c 1 is a negative-definite diagonal matrix, c 2 is a constant vector and σ Λ is a diagonal noise matrix. Different from (142), a stochastic parameterized equation is used in (147) to describe the time evolution of each component of the parameter Λ i . This approach is motivated from the stochastic parameterized extended Kalman filter model [64,65] as discussed in Section 4.5.1. The stochastic parameterized equations in (147) serve as the prior information of the parameter estimation. Although certain model error will be introduced in the stochastic parameterized equations due to the appearance of c 1 , c 2 and σ Λ , it has shown in [28] that the convergence rate will be greatly accelerated. In fact, in linear models, rigorous analysis reveals that the convergence rate using stochastic parameterized Equation (146) and (147) is exponential while that using the direct method (146) and (147) is only algebraic.
Now, we apply the parameter estimation using stochastic parameterized Equation (146) and (147) for the noisy L-63 model. The augmented system reads,
d x = σ ( y x ) d t + σ x d W x , d y = x ( ρ z ) y d t + σ y d W y , d z = ( x y β z ) d t + σ z d W z , d σ = d σ ( σ σ ^ ) + σ σ d W σ , d ρ = d ρ ( ρ ρ ^ ) + σ ρ d W ρ , d β = d β ( β β ^ ) + σ β d W β .
The same initial values are taken as those in the direct method. Recall in Figure 29 that the system with a large observational noise σ x = σ y = σ z = 15 leads to a slow convergence rate. Below, we focus on this case and use the parameter estimation scheme (148) to improve the result. The parameters in the stochastic parameterized equation are chosen as follows:
σ ^ = 10 + 2 = 12 , ρ ^ = 28 + 5.6 = 33.6 , β ^ = 8 / 3 + 1.6 / 3 = 3.2 , d σ = d ρ = d β = 0.5 , σ σ = 2 , σ ρ = 5.6 , σ β = 1.6 / 3 .
Here, we introduce 20 % errors in all the mean states σ ^ , ρ ^ and β ^ . The variance of the stochastic parameters is given by σ σ 2 / ( 2 d σ ) , σ ρ 2 / ( 2 d ρ ) and σ β 2 / ( 2 d β ) , respectively. Therefore, the truth is located at one standard deviation (square root of the variance) of the mean state. The decorrelation time of all the process is 1 / d σ = 1 / d ρ = 1 / d β = 2 time units.
Figure 30 shows the results of parameter estimation using the method with stochastic parameterized Equations (146) and (147). Despite the error in σ ^ , ρ ^ and β ^ as the prior information, the estimation of the parameters is quite accurate. In fact, by using the averaged value over the time interval t [ 10 , 50 ] as the estimation, we compare the estimated parameters with the truth:
Estimation : σ = 10.9 , ρ = 29.1 , β = 2.88 , Truth : σ = 10.0 , ρ = 28.0 , β = 2.67 .
Figure 30. Parameter estimation of the noisy L-63 system using the method with stochastic parameterized Equations (146) and (147) with σ x = σ y = σ z = 15 . (ac) sample trajectories of x , y , z which are used as observations; (df) time evolution of the estimated parameters and the truth (red); the thin black line shows the average of the estimation from t = 10 to t = 50 ; (g)–(i) the uncertainty evolution of each estimated parameter.
The relative error in the three estimated parameters compared with the truth is 9.0 % , 3.9 % and 7.8 % , respectively, all of which are much less than 20 % in σ ^ , ρ ^ and β ^ . This is because the conditional Gaussian framework (1) and (2) automatically makes use of the information in both the observations and the stochastic parameterized equations. The combination of these two components reduces the error even if the prior knowledge in the stochastic parameterized equations is biased (as in most of the real-world applications). Notably, the convergence of the estimated parameters as shown in Figure 30 is much faster than the direct method. This is one of the key features of the approach using stochastic parameterized equations and is practically quite useful since only limited observational data is known in many applications.
It is also important to note that the perfect model is unknown in most realistic situations. Therefore, dealing with model error in the parameter estimation is a crucial topic. The judicious model error in the approach using stochastic parameterized equations allows parameter uncertainties and provides more robust results compared with the direct method. Thus, the approach using stochastic parameterized equations has a wider application in real-world problems. More comparison between the two methods developed here can be found in [28].

6.1.3. Estimating Parameters in the Unresolved Processes

As discussed in Section 4.5.1 and Section 6.1.2, stochastic parameterizations are widely used in describing turbulent signals as observed in nature. One important step is to estimate the parameters in the equations associated with these stochastic parameterizations (e.g, σ γ , γ ^ , d γ , σ ω , ω ^ , d ω , σ b , b ^ and d b in (119)). Note that there are typically no direct observations of these stochastic parameterized processes since they usually represent unresolved variables or they may not have clear physical correspondence. However, they play important roles in describing the underlying complex turbulent dynamical system. An accurate estimation of these parameters is crucial. See examples in Section 4.5.1 and Section 6.1.2 and those in [65,101].
Traditional methods in estimating these parameters include Markov Chain Monte Carlo (MCMC) [198,199,200,201], maximum likelihood estimation [202,203] and the ensemble Kalman filter [204,205]. Note that, if both the state variables γ , ω , b and the associated 9 parameters in these three equations are treated as unobserved variables u II , then the augmented system does not belong to the conditional Gaussian model family. Thus, the closed form in (4) and (5) cannot be applied in the two methods discussed in Section 6.1.1 and Section 6.1.2 and ensemble Kalman filter has to be adopted in these methods, which is computationally expensive.
Nevertheless, a judicious application of the conditional Gaussian framework still allows an efficient parameter estimation algorithm for the parameters in the stochastic parameterized equations. Below, we discuss the idea using the so-called SPEKF-M model, which is a simplified version of the SPEKF model (119) and contains only the multiplicative noise process γ (and “M” stands for multiplicative). The model reads
d u = γ u + F ( t ) d t + σ u d W u , d γ = d γ ( γ γ ^ ) d t + σ γ d W γ .
The goal here is to estimate the three parameters σ γ , d γ and γ ^ by observing only one sample trajectory of u. First, choose an arbitrary initial estimation of the three parameters σ γ ( 0 ) , d γ ( 0 ) and γ ^ ( 0 ) . Then, run the conditional Gaussian filter (4) and (5), which gives a time series of the conditional mean γ ¯ ( 0 ) ( t ) and another time series of the conditional variance R ( 0 ) ( t ) of γ ( t ) conditioned on the trajectory of u. With the γ ¯ ( 0 ) ( t ) and R ( 0 ) ( t ) , applying the same argument as that in (124) but changing the L Gaussian distributions associated with L different trajectories by those at L different time instants, the distribution p ( γ ( 0 ) ) can be obtained via a Gaussian mixture. Here, p ( γ ( 0 ) ) is also a Gaussian distribution due to its dynamical structure in (151). This Gaussian distribution provides two quantities—the mean μ γ , e q ( 0 ) and the variance R γ , e q ( 0 ) . In addition, the sample autocorrelation function τ γ , e q ( 0 ) of the conditional mean γ ¯ ( 0 ) ( t ) can be easily computed. Therefore, these three quantities are used to update the iteration of the three parameters σ γ , d γ and γ ^ with the relation
( σ γ ( 1 ) ) 2 2 d γ ( 1 ) = R γ ( 0 ) , γ ^ ( 1 ) = μ γ , e q ( 0 ) , d γ ( 1 ) = 1 τ γ , e q ( 0 ) .
Then, run the conditional Gaussian filter (4) and (5) again using the same observed time series u ( t ) but with the updated parameters σ γ ( 1 ) , d γ ( 1 ) and γ ^ ( 1 ) . Repeating the above procedure leads to another update of the parameters σ γ ( 2 ) , d γ ( 2 ) and γ ^ ( 2 ) . Continue this process until the updates of the parameters converge. This fixed point iteration results in an efficient parameter estimation algorithm.
As a simple test example, the following parameters are adopted to generate the true signal in the SPEKF-M model (151),
σ γ = 0.5 , d γ = 0.5 , γ ^ = 1 , σ u = 0.5 , F = 2 .
A pair of sample trajectories ( u ( t ) , γ ( t ) ) is shown in Panel (a) of Figure 31 and the corresponding PDFs are shown in Panel (b). Clearly, these parameters result in a highly intermittent time series of u ( t ) with a non-Gaussian PDF with an one-side fat tail. Panel (c) shows the conditional mean and conditional variance of γ conditioned on the trajectory of u with the perfect parameters. The conditional mean does not equal the truth, but the conditional mean μ ( t ) at the intermittent phases of u are close to the truth of γ with a small conditional variance. On the other hand, at quiescent phases, μ ( t ) differs from the truth while the uncertainty (conditional variance) in the estimation is large.
Figure 31. (a) a pair of sample trajectories ( u ( t ) , γ ( t ) ) of the SPEKF-M model (151) with parameters in (153); (b) the corresponding PDFs. Note that the PDF of u is high non-Gaussian with an one-side fat tail; (c) the conditional mean μ ( t ) and conditional variance R ( t ) using the model with perfect parameters.
Now, we apply the algorithm above to estimate the three parameters σ γ , d γ and γ ^ assuming the other two parameters σ u = 0.5 and F = 2 are known. The trajectory of u (with the length of 1000 time units) in Panel (a) of Figure 31 is used as the input of the algorithm. The iteration curve of the algorithm is shown in Figure 32, where the initial values σ γ ( 0 ) = 0.1 , d γ ( 0 ) = 1 and γ ^ ( 0 ) = 2.5 are far from the truth. It is clear that, after five iterations, the update of the parameters converges to fixed points. The parameter γ ^ converges exactly to the truth while both d γ and σ γ are slightly larger than the true values. Nevertheless, the variance of γ using the estimated parameters, namely σ γ 2 / ( 2 d γ ) = 0.61 2 / ( 2 · 0.66 ) = 0.28 , is close to that of the truth, which is 0.5 2 / ( 2 · 0.5 ) = 0.25 . The slightly overestimation of the variance compensates the slightly underestimation of the damping time 1 / d γ and therefore the model with the estimated parameters is able to generate essentially the same intermittent signals as in the observed period of u in Figure 31. Note that since the input of the time series of u is only of finite length (and is actually pretty short here), the estimated parameters reflect the features in the observed signal, which may be slightly different from the true underlying dynamics unless an infinitely long time series is used as the input. Next, the skill of the estimated parameters is examined via the model statistics. In Figure 33, both the trajectories and the associated PDFs of u and γ using the true parameters and the estimated parameters are shown. Due to the high skill in estimating both the mean and variance of γ , the PDFs of γ M and γ are quite close to each other with only a slight overestimation of the variance in γ M . The decorrelation time of the trajectory of γ M is slightly shorter than that of γ as discussed above. Nevertheless, γ M and γ provide quite similar feedback to u M and u. Therefore, the statistical features in u M , including the decorrelation time, mean, variance and highly non-Gaussian PDF, are almost the same as those in u.
Figure 32. Iteration of the updates of the parameters (blue curve). The initial values σ γ ( 0 ) = 0.1 , d γ ( 0 ) = 1 and γ ^ ( 0 ) = 2.5 are far from the truth (black dashed lines).
Figure 33. Sample trajectories and the associated PDFs using the true parameters (blue) and the estimated ones (red). (a) u ( t ) ; (b) u M ( t ) ; (c) γ ( t ) ; (d) γ M ( t ) . The PDFs are calculated based on a long trajectory with 10,000 time units but only 1000 time units are shown here. In the panels of the PDFs of u M and γ M , those of u and γ are also illustrated (thin dashed blue curves) for comparison.
A more detailed study of this algorithm, including the application to the full SPEKF model (119) and the convergence rate dependence on different model ingredients, will be included in a follow-up work.

6.2. Data Assimilation with Physics-Constrained Forecast Models and Information Theory for Quantifying Model Errors

6.2.1. An Information Theoretical Framework for Data Assimilation and Prediction

The following two traditional path-wise measurements are widely used in assessing the skill in data assimilation and prediction [15,206,207,208,209,210]. Denote u i , i = 1 , , n the true signal and u ^ i the filtering/prediction estimate. These measurements are given by
  • The root-mean-square error (RMSE):
    R M S E = i = 1 n ( u ^ i u i ) 2 n .
  • The pattern correlation (PC):
    P C = i = 1 n ( u ^ i u ^ ¯ i ) ( u i u ¯ i ) i = 1 n ( u ^ i u ^ ¯ i ) 2 i = 1 n ( u i u ¯ i ) 2 ,
where u ^ ¯ i and u ¯ i denotes the mean of u ^ i and u i , respectively.
While these two path-wise measurements are easy to implement and are able to quantify the data assimilation and prediction skill to some extent, they have fundamental limitations. It has been shown in [42,211] that these two measurements fail to quantify the skill of capturing the extreme events and other non-Gaussian features, which lead to misleading results. In fact, concrete examples even in the Gaussian models [42,211] showed that two different predictions can have the same RMSE and PC, but one is way more skillful than the other in capturing the extreme events.
Due to the fundamental limitations of assessing the data assimilation and prediction skill based only on these two traditional path-wise measurements, various information measurements have been proposed to improve the quantification of the data assimilation and prediction [16,194,195,212,213,214,215,216,217,218]. In [218,219,220,221,222], an information measurement called Shannon entropy difference was introduced and was used to assess the data assimilation and prediction skill. The Shannon entropy difference takes into account the estimation of the extreme events and improves the insufficiency in the traditional path-wise measurements. However, relying solely on the Shannon entropy difference in assessing the data assimilation and prediction skill is also misleading. In fact, the Shannon entropy difference fails to capturing the model error in the mean state and it computes the uncertainty of the two distributions separately rather than considering the pointwise difference between the two PDFs.
Due to the fundamental limitations in the two classical path-wise measurement, RMSE and PC, as well as those in the Shannon entropy difference, a new information-theoretic framework [186] has developed to assess the data assimilation/prediction skill. Denote π π ( u ) and π M π ( u M ) the PDFs associated with truth u and the data assimilation/prediction estimate u M , respectively. Denote p ( u , u M ) the joint PDF of u and u M . Let U = u u M be the residual between the truth and the estimate. This information-theoretic framework involves three information measurements:
  • The Shannon entropy residual,
    S ( U ) = p ( U ) log p ( U ) .
  • The mutual information,
    M ( π , π M ) = p ( u , u M ) log p ( u , u M ) π ( u ) π ( u M ) .
  • The relative entropy,
    R ( π , π M ) = π log π π M .
The Shannon entropy residual quantifies the uncertainty in the point-wise difference between u and u M . It is an information surrogate of the RMSE in the Gaussian framework. The mutual information quantifies the dependence between the two processes. It measures the lack of information in the factorized density π ( u ) π ( u M ) relative to the joint density p ( u , u M ) , which follows the identity,
M ( π , π M ) = P p ( u , u M ) , π ( u ) π ( u M ) .
The mutual information is an information surrogate of the PC in the Gaussian framework. On the other hand, the relative entropy quantifies the lack of information in π M related to π and it is a good indicator of the skill of u M in capturing the peaks and extreme events of u. It also takes into account the pointwise discrepancy between π and π M rather than only computing the difference between the uncertainties associated with the two individual PDFs (as in the Shannon entropy difference). Therefore, the combination of these three information measurements is able to capture all the features in assessing the data assimilation/prediction skill and overcomes the shortcomings as discussed in the previous subsection. Note that, when π N ( u ¯ , R ) and π M N ( u ¯ M , R M ) are both Gaussian, then the above three information measurements have explicit expressions [215].
The information-theoretic framework (156)–(158) is usually defined in the super-ensemble sense [215]. Note that in many practical issues only one realization (trajectory) is available. Nevertheless, the information-theoretic framework can also be used in a pathwise way, where the statistics are computed by collecting all the sample points in the given realization. Some realistic applications of the information-theoretic framework for filtering and prediction can be found in [42,63,215].

6.2.2. Important Roles of Physics-Constrained Forecast Models in Data Assimilation

It has been shown in Section 3.3 that the physics-constrained nonlinear stochastic models are the recent development for data-driven statistical models with partial observations. The physics-constrained nonlinear stochastic models overcome the finite-time blowup and the lack of physical meaning issues in various ad hoc multi-layer regression models [31,32]. Here, our goal is to show that the physics-constrained nonlinear stochastic models also play important role in data assimilation (or filtering). Ignoring the energy-conserving nonlinear interactions in the forecast models will result in large model errors.
The test model below is a simple dyad model, which mimics the interactions between resolved large-scale mean flow and unresolved turbulent fluctuations with intermittent instability [37,50,223]. The dyad model reads
d u = ( d u u + γ u v + F u ) d t + σ u d W u ,
d v = ( d v v γ u 2 ) d t + σ v d W v .
In (160) and (161), u is regarded as representing one of the resolved modes in a turbulent signal, which interacts with the unresolved mode v through quadratic nonlinearities. The conserved energy in the quadratic nonlinear terms in (160) and (161) is easily seen. Below, the physics-constrained dyad model (160) and (161) is utilized to generate true signals of nature. The goal here is to filter the unobserved process v given one single realization of the observed process u. In addition to adopting the perfect filter (160) and (161), an imperfect filter with no energy-conserving nonlinear interactions is studied for comparison. In this imperfect filter, the nonlinear feedback γ u 2 in v is dropped and the result is a stochastic parameterized filter [20],
d u = ( d u u + γ u v + F u ) d t + σ u d W u ,
d v = d v M ( v v ¯ M ) d t + σ v M d W v .
In the stochastic parameterized filter (162) and (163), the parameters in the resolved variable u are assumed to be the same as nature (160) and (161). We further assume the statistics of the unobserved variable v of nature (160) and (161) are available. Thus, the parameters d v M , v ¯ M and σ v M in the unresolved process v are calibrated [187,188,224] by matching the mean, variance and decorrelation time of those in (160) and (161). Note that both (160) and (161) and (162) and (163) belong to the conditional Gaussian framework (1) and (2) by denoting u I = u and u II = v and (4) and (5) is used to efficiently calculate the filter estimates.
Note that in (160) and (161), if F u = 0 , then the fixed point associated with the deterministic part is u c = v c = 0. This leads to an important issue in the state estimation of v, namely the observability [207,225]. The coupled system (160) and (161) is said to lose its observability if the observed process u provides no information in determining the unobserved variable v. Intuitively, this corresponds to u = 0 in (160) and (161), in which case v disappears in the observed process u. Therefore, if F u = 0 , then the filtering skill of v is expected to deteriorate especially with a small σ u . Below, we consider two different dynamical regimes:
Regime   I : = d u = 0.8 , d v = 0.8 , γ = 1.2 , a n d F u = 1 , Regime   II : d u = 0.8 , d v = 0.8 , γ = 1.2 , a n d F u = 0 .
In Regime II, there is no practical observability in the quiescent phase (near the fixed point associated with the deterministic model) while in Regime I the forcing drives the signal out of the value with u c = 0 .
Figure 34 shows the model error in terms of RMSE, PC and relative entropy as a function of σ u and σ v . Both the perfect physics-constrained forecast model (160) and (161) and the stochastic parameterized filter (162) and (163) are used. Here, instead of showing the Shannon entropy residual and the mutual information, we still use the RMSE and PC since most readers are familiar with these two measurements. Nevertheless, the readers should keep in mind that the Shannon entropy residual and the mutual information are more suitable measurements in the non-Gaussian framework. On the other hand, the relative entropy is shown here in assessing the model error.
Figure 34. Model error in terms of RMSE, PC and relative entropy as a function of σ u and σ v in the true signal. (a,b) Regime I with F u = 1 ; (c,d) Regime I with F u = 0 ; (a,c) show the model error using the perfect physics-constrained forecast model (160) and (161) while (b,d) uses the stochastic parameterized filter (162) and (163).
First, in Columns (a,b) of Figure 34, the filtering skill in Regime I with F u = 1 is illustrated. With a small σ u , both filters have skillful estimation. This is because when σ u is small, the filters trust more towards the observational process, which has a large signal to noise ratio and therefore it provides accurate estimates. However, when σ u is large but σ v is small in generating the true signal, the stochastic parameterized filter becomes much worse than the perfect filter using physics-constrained forecast model. In fact, a large σ u leads to large signals in u and it also tells the filter to trust more towards the underlying process of v. This implies the filter estimate of v is then essentially driven by the feedback term γ u 2 . Since the stochastic parameterized filter has no such feedback mechanism, the error becomes large. See Panel (b) of Figure 35 for an example with path-wise trajectories. It is also important to note that in such a situation the PDF of the filter estimate is completely different from the truth and thus a large model error is found.
Figure 35. Regime I with F u = 1 . Path-wise trajectories and the PDFs of the true signal (blue) and the filter estimation of the physics-constrained forecast model (160) and (161) (red) and the stochastic parameterized filter (162) and (163) (green). (a) σ u = 0.2 and σ v = 2 ; (b) σ u = 2 and σ v = 0.2 .
Columns (c,d) of Figure 34 show the filter estimates in Regime II with F u = 0 . Compared with the results in Regime I, it is clear that when σ u is small in generating the true signal, the filter estimates become inaccurate. This is in fact due to the observability issue since small σ u means the signal of u stays near original. This is clearly seen in Panel (a) of Figure 36, where the filter estimate is accurate only at intermittent phases. One interesting phenomenon is that although the filter estimate using the stochastic parameterized filter in Panel (b) of Figure 35 has a smaller RMSE compared with that in Panel (a) of Figure 36, the relative entropy clearly indicates a much worse filter estimates in the former case since they fail to capture any of the amplitudes. These facts all indicate the importance in including the physics-constrained structure in designing filters especially in the regimes that are dominated by the energy-conserving nonlinear interactions.
Figure 36. Regime II with F u = 0 . Path-wise trajectories and the PDFs of the true signal (blue) and the filter estimation of the physics-constrained forecast model (160) and (161) (red) and the stochastic parameterized filter (162) ans (163) (green). (a) σ u = 0.2 and σ v = 2 ; (b) σ u = 2 and σ v = 0.2 .
Finally, it is important to note that although the stochastic parameterized filter (162) and (163) is not able to recover the signal due to the strong feedback from the physics-constrained nonlinear interactions, the stochastic parameterized filter (162) and (163) is still quite useful in detecting the intermittent phases in turbulent signals. In fact, in Panel (a) of both Figure 35 and Figure 36, the intermittent phases are all accurately recovered by the stochastic parameterized filter. Other works showing the skillful behavior of the stochastic parameterized filter and its advantages over the mean stochastic models can be found in [16,64,65].

6.3. Multiscale Data Assimilation with Particles Interacting with Conditional Gaussian Statistics

6.3.1. A General Description

Data assimilation of turbulent signals is an important challenging problem because of the extremely complicated large dimension of the signals and incomplete partial noisy observations which usually mix the large scale mean flow and small scale fluctuations. See Chapter 7 of [20] for examples of new phenomena due to this multiscale coupling through the observations even for linear systems. Due to the limited computing power, it is desirable to use multi-scale forecast models which are cheap and fast to mitigate the curse of dimensionality in turbulent systems. Thus, model errors from imperfect forecast models are unavoidable in the development of a data assimilation method in turbulence.
Among different methods, conventional superparameterization is a multi-scale algorithm that was originally developed for the purpose of parameterizing unresolved cloud process in tropical atmospheric convection [73,226,227]. This conventional superparameterization resolves the large scale mean flow on a coarse grid in a physical domain while the fluctuating parts are resolved using a fine grid high resolution simulation on periodic domains embedded in the coarse grid. A much cheaper version of superparameterization, called stochastic superparameterization [70,71,72,73], replaces the nonlinear eddy terms by quasilinear stochastic processes on formally infinite embedded domains where the stochastic processes are Gaussian conditional to the large scale mean flow. The key ingredient of these multiscale data assimilation methods is the systematic use of conditional Gaussian mixtures which make the methods efficient by filtering a subspace whose dimension is smaller than the full state. This conditional Gaussian closure approximation results in a seamless algorithm without using the high resolution space grid for the small scales and is much cheaper than the conventional superparameterization, with significant success in difficult test problems [71,72,74] including the MMT model [71,75] and ocean turbulence [76,77,78].
The key idea of the multiscale data assimilation method is to use conditional Gaussian mixtures [80,228] whose distributions are compatible with superparameterization. The method uses particle filters (see [79] and Chapter 15 of [20]) or ensemble filters on the large scale part [75,76] whose dimension is small enough so that the non-Gaussian statistics of the large scale part can be calculated from a particle filter, whereas the statistics of the small scale part are conditionally Gaussian given the large scale part. This framework is not restricted to superparameterization as the forecast model and other cheap forecast models can also be employed. See [80] for another multiscale filter with quasilinear Gaussian dynamically orthogonality method as the forecast method in an adaptively evolving low dimensional subspace without using superparameterization. Note that data assimilation using superparameterization has already been discussed in [229] with noisy observations of the large scale part of the signal alone. There it was shown that, even in this restricted setting, ignoring the small scale fluctuations even when they are rapidly decaying can completely degrade the filter performance compared with the high skill using superparameterization. Here, in contrast to [229], we consider multiscale data assimilation methods with noisy observations with contributions from both the large and small scale parts of the signal, which is a more difficult problem than observing only the large scale because it requires accurate estimation of statistical information of the small scales [75,76,230]. In addition, mixed observations of the large and small scale parts occur typically in real applications. For example, in geophysical fluid applications, the observed quantities such as temperature, moisture, and the velocity field necessarily mix both the large and small scale parts of the signal [20,231]. Thus, the multiscale filtering also provides a mathematical framework for representation errors, which are due to the contribution of unresolved scales [81,82] in the observations.

6.3.2. Particle Filters with Superparameterization

Superparameterization retains the large scale variables by resolving them on a coarse grid while the effect of the small scales on the large scales is parameterized by approximating the small scales on local or reduced spaces. Stochastic superparameterization discussed in the previous section uses Gaussian closure for the small scales conditional to the large scale variable u ¯ with u ¯ R N [70,71,72,73]. Thus, we consider a multi-scale filtering algorithm with forecast prior distributions given by the conditional distribution
p f ( u ) = p f ( u ¯ , u ) = p f ( u ¯ ) p G f ( u | u ¯ ) ,
where p G f ( u | u ¯ ) is a Gaussian distribution conditional to u ¯
p G f ( u | u ¯ ) = N ( u ( u ¯ ) , R ( u ¯ ) ) .
Here, we assume that N 1 is sufficiently small enough that particle filters (see Chapter 15 of [20]) can be applied to the large scales. For a low dimensional space u ¯ , the marginal distribution of u ¯ can be approximated by Q particles
p f ( u ¯ ) = j = 1 Q p j f δ ( u ¯ u ¯ j ) ,
where p j f 0 are particle weights such that j p j f = 1 . After the forecast step where superparameterization is applied to each particle member, we have the following general form for the prior distribution p f ( u )
p f ( u ) = p f ( u ¯ , u ) = j = 1 Q p j f δ ( u ¯ u ¯ j ) p G f ( u f | u ¯ j ) = j = 1 Q p j f δ ( u ¯ u ¯ j ) N ( u ( u ¯ j ) f , R ( u ¯ j ) f ) ,
which is a conditional Gaussian mixture distribution where each summand is a Gaussian distribution conditional to u ¯ j . The Gaussian mixture has already been used in data assimilation [232,233,234] but the multi-scale method developed here is different in that conditional Gaussian distributions are applied in the reduced subspace u with particle approximations only in the lower dimensional subspace u ¯ . Thus, the proposed multi-scale data assimilation method can be highly efficient and fast in comparison with conventional data assimilation methods which use the whole space for the filter.
Assume that the prior distribution from the forecast is in the from (168) and that the observations have the following structure:
v = G ( u ¯ , u ) + σ θ = G ¯ u ¯ + G ( u ¯ ) u + σ θ ,
where G ( u ¯ j ) has rank M and σ θ is the observation noise error which is Gaussian. Then, the posterior distribution in the analysis step taking into account the observations (169) is in the form of (168)
p a ( u ) = p a ( u ¯ , u ) = j = 1 Q p j a δ ( u ¯ u ¯ j ) N ( u ( u ¯ j ) a , R ( u ¯ j ) a ) .
The new mixture weights are
p j a = p j f I j k = 1 Q p k f I k ,
where I j = p ( v | u ¯ j , u ) p ( u | u ¯ j ) d u and for each particle u ¯ j , the posterior mean and variance of u , u ( u ¯ j ) a and R ( u ¯ j ) a , respectively, are
u ( u ¯ j ) a = u f + K ( v G ¯ u ¯ j f G ( u ¯ j f ) u ) , R ( u ¯ j ) a = ( I K G ( u ¯ j f ) ) R ( u ¯ ) f ,
where the Kalman gain matrix K is given by
K = R f G ( u ¯ j f ) T ( G ( u ¯ j f ) R f G ( u ¯ j f ) T + r θ ) 1 .
See the supplementary material of [80] for more details.

6.3.3. Clustered Particle Filters and Mutiscale Data Assimilation

Clustered particle filters (CPFs) [40,235] are a new class of particle filters, introduced for high-dimensional dynamical systems such as geophysical systems. The clustered particle filters use relatively few particles compared with the standard particle filter and capture the non-Gaussian features of the true signal, which are typical in complex nonlinear systems. The method is also robust for the difficult regime of high-quality sparse and infrequent observations and does not show any filter divergence in our tests. In the clustered particle filter, coarse-grained localization is implemented through the clustering of state variables and particles are adjusted to stabilize the filter.
One of the key features of the CPF is particle adjustment. Particle adjustment updates the prior particles closer to the observation instead of reweighing the particles when the prior is too far from the observation likelihood. When observations are sparse, unobserved adjacent state variables must have the same particle weights with the observed variable as they are updated using cross-correlations. For this purpose, the CPF partitions the state variables into nonoverlapping clusters { C l , l = 1 , 2 , , N o b s } , where each cluster boundary is chosen as the midpoint of two adjacent observations, which is easily applicable to irregularly spaced observations. This yields N o b s clusters corresponding to N o b s observationlocations. Instead of using different weights for each state variable in the localized particle filter, the CPF uses scalar particle weights { ω l , k } for the state variables in the same cluster C l . For the substate vector x C l = { x i | x i C l } corresponding to cluster C l , the CPF considers the marginalized PDF,
p ( x C l ) = k K ω l , k δ ( x C l x C l , k )
and each observation y j updates only the marginalized PDF of the corresponding cluster that implements coarse-grained localization. Thus, the assimilation of the full state vector is decomposed into N o b s independent assimilation problems for each cluster of a dimension smaller than the full state dimension N s t a t e . Note that, in contrast to the localization using a smoothly varying correlation function with a localization radius parameter, the CPF has no adjustable parameter to tune localization. See [235] for more details.
The CPF can also be applied for multiscale particle filtering [40,235]. As a test example, it is shown below the skill of the multiscale cluster particle filter for the wave turbulence model introduced by Majda, McLaughlin and Tabak (MMT) [236,237] as a computationally tractable model of waveturbulence. The model is described by the following one dimensional partial differential equation for a complex scalar ψ :
i t ψ = | x | 1 / 2 ψ | ψ | 2 ψ + i F + i D ψ .
in a periodic domain of length L with large-scale forcing set to F = 0.0163   sin ( 4 π x / L ) and dissipation D for both the large and small scales. It has several features of wave turbulence that make it a difficult test problem for data assimilation. The model has a shallow energy spectrum proportional to k 5 / 6 for wavenumber k and an inverse cascade of energy from small to large scales. It also has non-Gaussian extreme event statistics caused by intermittent instability and breaking of solitons. Because the unresolved small scales carry more than two-thirds of the total variance, it is a difficult filtering problem to estimate resolved large scales using mixed observations of the large- and small-scale components.
Here, we compare the filtering results of the ensemble-based multiscale data assimilation method [75] and the multiscale CPF for the MMT model. As the forecast model for both filtering methods, we use the stochastic superparameterization multiscale method [73] as discussed in the previous subsection. The forecast model uses only 128 grid points, whereas the full resolution uses 8192 grid points, which yields about 250 times cheaper computational savings. Because the forecast model has a low computational cost compared with the full-resolution model, the forecast model has significant model errors. Observations of the full-scale variables are available at uniformly distributed 64 grid points (which are extremely sparse compared with the full-resolution 8192 grid points) with an observation error variance corresponding to 3 % of the total climatological variance at every time interval of 0.25 . The ensemble-based method uses the tuned parameters in [75] (i.e., a short localization radius 1 and 2 % covariance inflation). For the hard threshold version CPF, the particle adjustment is triggered if either real or imaginary parts are not in the convex hull of the corresponding predicted observations as we observe both parts of the true signal. Both particle and ensemble-based methods use 129 samples.
The time series of the large-scale estimation RMSE of the ensemble-based filter and clustered multiscale particle filter are shown in Figure 37. The dashed and dotted line is the effective observation error 0.34 , which is defined as observation   error   variance + small-scale   variance by treating the small-scale contribution as an additional error (i.e., a representation error). The dashed line is the climatological error 0.20 , which is the standard deviation of the large-scale variables. The ensemble-based method has RMSE smaller than the effective observation error but larger than the climatological error. The CPF, however, shows skillful filter performance with RMSE smaller than the climatological error. The forecast PDFs and forecast error PDFs that show the prediction skill of the method are shown in Figure 37. The CPF has a better forecast PDF fit to the true signal and a narrower peak in the forecast error PDF than the ensemble-based method.
Figure 37. (a,b) time series of the large-scale estimation RMSE of the ensemble-based multiscale data assimilation method [75] and the CPF with 64 observations for the MMT model. The dashed line is the climatological error 0.20 . The dashed and dotted line is the effective observation error 0.34 ; (c,d) large-scale forecast PDF and forecast error PDF using 64 observations.

6.3.4. Blended Particle Methods with Adaptive Subspaces for Filtering Turbulent Dynamical Systems

In the multi-scale data assimilation algorithms discussed above based on superparameterization, the subspace of particles defined by u ¯ is fixed. An attractive idea is to change the subspace with particles adaptively in time to capture the non-Gaussian features as they change in time. Very accurate filtering algorithms based on these ideas for multi-scale filtering utilizing this adaptive strategy have been developed [80,238]. Nonlinear statistical forecast models such as the modified quasilinear Gaussian [17,239] are implemented in the adaptive algorithm. In particular, the paper [238] also contains many detailed numerical experiments and interesting counterexamples to more naive strategies for multi-scale data assimilation.

6.3.5. Extremely Efficient Multi-Scale Filtering Algorithms: SPEKF and Dynamic Stochastic Superresolution (DSS)

The SPEKF models as discussed in Section 4.5.1 are a class of nonlinear filters which are exact statistical equations for the mean and covariance for nonlinear forecast models that learn hidden parameters “on the fly” from the observed data. The parameters represent adaptive additive and multiplicative bias corrections from model error. They explicitly make judicious model error and utilize conditional Gaussian structure as developed in Section 4.5.1 above. The book [20] contains many examples and successful applications of this method.
Dynamical Stochastic Superresolution (DSS) uses the same idea but in addition exploits the aliased information in the observations to super-resolve a multi-scale turbulent signal [103,240]. Nontrivial applications of DSS including recovering geophysical turbulence from surface satellite observations [240] and filtering “black swans” and dispersive wave turbulence [103] with severe judicious model errors. An interesting mathematical problem is to understand the reasons for the skill of these radical methods. Recent progress in conceptual understanding of these methods for the two dimensional Navier–Stokes equations can be found in [104].

7. Conclusions

Multiscale nonlinear dynamical systems are ubiquitous in different areas, including geoscience, engineering, neural science and material science. In this article, a conditional Gaussian framework is developed and applied to the prediction, the state estimation and the uncertainty quantification of multiscale nonlinear stochastic systems. Despite the conditional Gaussianity, such systems are nevertheless highly nonlinear and are able to capture the non-Gaussian features of nature. The special structure of the system allows closed analytical formulae for solving the conditional statistics and is thus computationally efficient.
In Section 2, an overview of data, model and data-driven modeling framework is presented. Data and models are combined with each other to improve the understanding of nature and promote the filtering and prediction skill. However, solving the high-dimensional complex multiscale nonlinear dynamical systems in a direct way is computationally unaffordable and sometimes even not necessary. Therefore, cheap and effective approaches are required to efficiently solve the systems. Hybrid strategies can be used to greatly reduce the computational cost while they are able to preserve the key feature of the complex systems. In the hybrid strategies, particle methods are combined with analytically solvable conditional Gaussian statistics to deal with highly non-Gaussian characteristics in a relatively low dimensional subspace and the conditional Gaussian features in the remaining subspace, respectively. This indicates the importance of a systematic study of the conditional Gaussian system. Section 3 summarizes the general mathematical structure of nonlinear conditional Gaussian systems, the physics-constrained nonlinear stochastic models and the application of the MTV strategy to the conditional Gaussian systems. To show the wide application of the conditional Gaussian framework, a rich gallery of examples of conditional Gaussian systems are illustrated in Section 4, which includes data-driven physics-constrained nonlinear stochastic models, stochastically coupled reaction–diffusion models in neuroscience and ecology, large-scale dynamical models in turbulence, fluids and geophysical flow, and other models for filtering and predicting complex multiscale turbulent dynamical systems. Section 5 involves the effective statistically accurate algorithms that beat the curse of dimension for Fokker–Planck equation for conditional Gaussian systems. A hybrid strategy is developed where a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. For even larger dimensional systems, a judicious block decomposition and statistical symmetry are further applied that facilitate an extremely efficient parallel computation and a significant reduction of sample numbers. These algorithms are applied to the statistical prediction of a stochastically coupled FHN model with 1500 dimensions and an inhomogeneous two-layer Lorenz 96 model with 240 dimensions. Significant prediction skill shows the advantages of these algorithms in terms of both accuracy and efficiency. In Section 6, the conditional Gaussian framework is applied to develop extremely cheap multiscale data assimilation schemes, such as the stochastic superparameterization, which use particle filters to capture the non-Gaussian statistics on the large-scale part whose dimension is small whereas the statistics of the small-scale part are conditional Gaussian given the large-scale part. Other topics of the conditional Gaussian systems studied here include designing new parameter estimation schemes and understanding model errors using information theory.
The conditional Gaussian framework can also be used to study many other important topics. For example, the closed analytic formulae in the conditional statistics provide an efficient way to understand the causality between different processes in light of the information theory. The representation error is another important issue that requires a comprehensive study and the conditional Gaussian framework have great potentials to provide both theoretic and applied insights. In addition, model selection, model reduction and more studies on the parameter estimation are all important future works within the conditional Gaussian framework.

Author Contributions

Conceptualization, N.C. and A.J.M.; Data curation, N.C. and A.J.M.; Formal analysis, N.C. and A.J.M.; Funding acquisition, A.J.M.; Methodology, N.C. and A.J.M.; Visualization, N.C.; Writing—Original draft, N.C.; Writing—Review and editing, A.J.M.

Funding

The research of A.J.M. is partially supported by the Office of Naval Research (ONR) Multidisciplinary University Research Initiative (MURI) Grant N0001416-1-2161 and the New York University Abu Dhabi Research Institute. N.C. is supported as a postdoctoral fellow through A.J.M.’s ONR MURI Grant.

Acknowledgments

The authors thank Yoonsang Lee, Di Qi and Sulian Thual for useful discussion.

Conflicts of Interest

The authors declare no conflict of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

References

  1. Majda, A.J. Introduction to Turbulent Dynamical Systems in Complex Systems; Springer: New York, NY, USA, 2016. [Google Scholar]
  2. Majda, A.; Wang, X. Nonlinear Dynamics and Statistical Theories for Basic Geophysical Flows; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
  3. Strogatz, S.H. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
  4. Baleanu, D.; Machado, J.A.T.; Luo, A.C. Fractional Dynamics and Control; Springer: New York, NY, USA, 2011. [Google Scholar]
  5. Deisboeck, T.; Kresh, J.Y. Complex Systems Science in Biomedicine; Springer: Boston, MA, USA, 2007. [Google Scholar]
  6. Stelling, J.; Kremling, A.; Ginkel, M.; Bettenbrock, K.; Gilles, E. Foundations of Systems Biology; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
  7. Sheard, S.A.; Mostashari, A. Principles of complex systems for systems engineering. Syst. Eng. 2009, 12, 295–311. [Google Scholar] [CrossRef]
  8. Wilcox, D.C. Multiscale model for turbulent flows. AIAA J. 1988, 26, 1311–1320. [Google Scholar] [CrossRef]
  9. Mohamad, M.A.; Sapsis, T.P. Probabilistic description of extreme events in intermittently unstable dynamical systems excited by correlated stochastic processes. SIAM/ASA J. Uncertain. Quantif. 2015, 3, 709–736. [Google Scholar] [CrossRef]
  10. Sornette, D. Probability distributions in complex systems. In Encyclopedia of Complexity and Systems Science; Springer: New York, NY, USA, 2009; pp. 7009–7024. [Google Scholar]
  11. Majda, A. Introduction to PDEs and Waves for the Atmosphere and Ocean; American Mathematical Society: Providence, RI, USA, 2003; Volume 9. [Google Scholar]
  12. Wiggins, S. Introduction to Applied Nonlinear Dynamical Systems and Chaos; Springer: New York, NY, USA, 2003; Volume 2. [Google Scholar]
  13. Vespignani, A. Predicting the behavior of techno-social systems. Science 2009, 325, 425–428. [Google Scholar] [CrossRef] [PubMed]
  14. Latif, M.; Anderson, D.; Barnett, T.; Cane, M.; Kleeman, R.; Leetmaa, A.; O’Brien, J.; Rosati, A.; Schneider, E. A review of the predictability and prediction of ENSO. J. Geophys. Res. Oceans 1998, 103, 14375–14393. [Google Scholar] [CrossRef]
  15. Kalnay, E. Atmospheric Modeling, Data Assimilation and Predictability; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  16. Majda, A.J.; Branicki, M. Lessons in uncertainty quantification for turbulent dynamical systems. Discret. Contin. Dyn. Syst. A 2012, 32, 3133–3221. [Google Scholar]
  17. Sapsis, T.P.; Majda, A.J. Statistically accurate low-order models for uncertainty quantification in turbulent dynamical systems. Proc. Natl. Acad. Sci. USA 2013, 110, 13705–13710. [Google Scholar] [CrossRef] [PubMed]
  18. Mignolet, M.P.; Soize, C. Stochastic reduced order models for uncertain geometrically nonlinear dynamical systems. Comput. Methods Appl. Mech. Eng. 2008, 197, 3951–3963. [Google Scholar] [CrossRef]
  19. Lahoz, W.; Khattatov, B.; Ménard, R. Data assimilation and information. In Data Assimilation; Springer: New York, NY, USA, 2010; pp. 3–12. [Google Scholar]
  20. Majda, A.J.; Harlim, J. Filtering Complex Turbulent Systems; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
  21. Evensen, G. Data Assimilation: The Ensemble Kalman Filter; Springer: New York, NY, USA, 2009. [Google Scholar]
  22. Law, K.; Stuart, A.; Zygalakis, K. Data Assimilation: A Mathematical Introduction; Springer: New York, NY, USA, 2015; Volume 62. [Google Scholar]
  23. Palmer, T.N. A nonlinear dynamical perspective on model error: A proposal for non-local stochastic-dynamic parametrization in weather and climate prediction models. Q. J. R. Meteorol. Soc. 2001, 127, 279–304. [Google Scholar] [CrossRef]
  24. Orrell, D.; Smith, L.; Barkmeijer, J.; Palmer, T. Model error in weather forecasting. Nonlinear Process. Geophys. 2001, 8, 357–371. [Google Scholar] [CrossRef]
  25. Hu, X.M.; Zhang, F.; Nielsen-Gammon, J.W. Ensemble-based simultaneous state and parameter estimation for treatment of mesoscale model error: A real-data study. Geophys. Res. Lett. 2010, 37, L08802. [Google Scholar] [CrossRef]
  26. Benner, P.; Gugercin, S.; Willcox, K. A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Rev. 2015, 57, 483–531. [Google Scholar] [CrossRef]
  27. Majda, A.J. Challenges in climate science and contemporary applied mathematics. Commun. Pure Appl. Math. 2012, 65, 920–948. [Google Scholar] [CrossRef]
  28. Chen, N.; Majda, A.J. Filtering nonlinear turbulent dynamical systems through conditional Gaussian statistics. Mon. Weather Rev. 2016, 144, 4885–4917. [Google Scholar] [CrossRef]
  29. Olbers, D. A gallery of simple models from climate physics. In Stochastic Climate Models; Springer: Berlin, Germany, 2001; pp. 3–63. [Google Scholar]
  30. Liptser, R.S.; Shiryaev, A.N. Statistics ofRandom Processes II: Applications. In Applied Mathematics; Springer: Berlin/Heidelberg, Germany, 2001; Volume 6. [Google Scholar]
  31. Majda, A.J.; Harlim, J. Physics constrained nonlinear regression models for time series. Nonlinearity 2012, 26, 201. [Google Scholar] [CrossRef]
  32. Harlim, J.; Mahdi, A.; Majda, A.J. An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models. J. Comput. Phys. 2014, 257, 782–812. [Google Scholar] [CrossRef]
  33. Majda, A.J.; Yuan, Y. Fundamental limitations of ad hoc linear and quadratic multi-level regression models for physical systems. Discret. Contin. Dyn. Syst. B 2012, 17, 1333–1363. [Google Scholar] [CrossRef]
  34. Lorenz, E.N. Deterministic nonperiodic flow. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
  35. Lorenz, E.N. Formulation of a low-order model of a moist general circulation. J. Atmos. Sci. 1984, 41, 1933–1945. [Google Scholar] [CrossRef]
  36. Chen, N.; Majda, A.J. Beating the curse of dimension with accurate statistics for the Fokker–Planck equation in complex turbulent systems. Proc. Natl. Acad. Sci. USA 2017, 114, 12864–12869. [Google Scholar] [CrossRef] [PubMed]
  37. Majda, A.J.; Lee, Y. Conceptual dynamical models for turbulence. Proc. Natl. Acad. Sci. USA 2014, 111, 6548–6553. [Google Scholar] [CrossRef] [PubMed]
  38. Chen, N.; Majda, A.J. Efficient statistically accurate algorithms for the Fokker–Planck equation in large dimensions. J. Comput. Phys. 2018, 354, 242–268. [Google Scholar] [CrossRef]
  39. Ferrari, R.; Cessi, P. Seasonal synchronization in a chaotic ocean–atmosphere model. J. Clim. 2003, 16, 875–881. [Google Scholar] [CrossRef]
  40. Lee, Y.; Majda, A. Multiscale data assimilation and prediction using clustered particle filters. J. Comput. Phys. 2017, in press. [Google Scholar]
  41. Chen, N.; Majda, A.J.; Giannakis, D. Predicting the cloud patterns of the Madden–Julian Oscillation through a low-order nonlinear stochastic model. Geophys. Res. Lett. 2014, 41, 5612–5619. [Google Scholar] [CrossRef]
  42. Chen, N.; Majda, A.J. Predicting the real-time multivariate Madden–Julian oscillation index through a low-order nonlinear stochastic model. Mon. Weather Rev. 2015, 143, 2148–2169. [Google Scholar] [CrossRef]
  43. Chen, N.; Majda, A.J. Predicting the Cloud Patterns for the Boreal Summer Intraseasonal Oscillation through a Low-Order Stochastic Model. Math. Clim. Weather Forecast. 2015, 1, 1–20. [Google Scholar] [CrossRef]
  44. Chen, N.; Majda, A.J.; Sabeerali, C.; Ajayamohan, R. Predicting Monsoon Intraseasonal Precipitation using a Low-Order Nonlinear Stochastic Model. J. Clim. 2018, 25, 4403–4427. [Google Scholar] [CrossRef]
  45. Majda, A.J.; Timofeyev, I.; Eijnden, E.V. Models for stochastic climate prediction. Proc. Natl. Acad. Sci. USA 1999, 96, 14687–14691. [Google Scholar] [CrossRef] [PubMed]
  46. Majda, A.J.; Timofeyev, I.; Vanden Eijnden, E. A mathematical framework for stochastic climate models. Commun. Pure Appl. Math. 2001, 54, 891–974. [Google Scholar] [CrossRef]
  47. Majda, A.; Timofeyev, I.; Vanden-Eijnden, E. A priori tests of a stochastic mode reduction strategy. Phys. D Nonlinear Phenom. 2002, 170, 206–252. [Google Scholar] [CrossRef]
  48. Majda, A.J.; Timofeyev, I.; Vanden-Eijnden, E. Systematic strategies for stochastic mode reduction in climate. J. Atmos. Sci. 2003, 60, 1705–1722. [Google Scholar] [CrossRef]
  49. Majda, A.J.; Franzke, C.; Khouider, B. An applied mathematics perspective on stochastic modelling for climate. Philos. Trans. R. Soc. Lond. A 2008, 366, 2427–2453. [Google Scholar] [CrossRef] [PubMed]
  50. Majda, A.J.; Franzke, C.; Crommelin, D. Normal forms for reduced stochastic climate models. Proc. Natl. Acad. Sci. USA 2009, 106, 3649–3653. [Google Scholar] [CrossRef] [PubMed]
  51. Thual, S.; Majda, A.J.; Stechmann, S.N. A stochastic skeleton model for the MJO. J. Atmos. Sci. 2014, 71, 697–715. [Google Scholar] [CrossRef]
  52. Chen, N.; Majda, A.J. Simple stochastic dynamical models capturing the statistical diversity of El Niño Southern Oscillation. Proc. Natl. Acad. Sci. USA 2017, 114, 201620766. [Google Scholar] [CrossRef] [PubMed]
  53. Castaing, B.; Gunaratne, G.; Heslot, F.; Kadanoff, L.; Libchaber, A.; Thomae, S.; Wu, X.Z.; Zaleski, S.; Zanetti, G. Scaling of hard thermal turbulence in Rayleigh-Bénard convection. J. Fluid Mech. 1989, 204, 1–30. [Google Scholar] [CrossRef]
  54. Wang, X. Infinite Prandtl number limit of Rayleigh-Bénard convection. Commun. Pure Appl. Math. 2004, 57, 1265–1282. [Google Scholar] [CrossRef]
  55. Majda, A.J.; Grote, M.J. Model dynamics and vertical collapse in decaying strongly stratified flows. Phys. Fluids 1997, 9, 2932–2940. [Google Scholar] [CrossRef]
  56. Kelliher, J.P.; Temam, R.; Wang, X. Boundary layer associated with the Darcy–Brinkman–Boussinesq model for convection in porous media. Phys. D Nonlinear Phenom. 2011, 240, 619–628. [Google Scholar] [CrossRef]
  57. Lindner, B.; Garcıa-Ojalvo, J.; Neiman, A.; Schimansky-Geier, L. Effects of noise in excitable systems. Phys. Rep. 2004, 392, 321–424. [Google Scholar] [CrossRef]
  58. Medvinsky, A.B.; Petrovskii, S.V.; Tikhonova, I.A.; Malchow, H.; Li, B.L. Spatiotemporal complexity of plankton and fish dynamics. SIAM Rev. 2002, 44, 311–370. [Google Scholar] [CrossRef]
  59. Shulgin, B.; Stone, L.; Agur, Z. Pulse vaccination strategy in the SIR epidemic model. Bull. Math. Biol. 1998, 60, 1123–1148. [Google Scholar] [CrossRef]
  60. Ferreira, S., Jr.; Martins, M.; Vilela, M. Reaction-diffusion model for the growth of avascular tumor. Phys. Rev. E 2002, 65, 021907. [Google Scholar] [CrossRef] [PubMed]
  61. Chen, N.; Majda, A.J.; Tong, X.T. Information barriers for noisy Lagrangian tracers in filtering random incompressible flows. Nonlinearity 2014, 27, 2133. [Google Scholar] [CrossRef]
  62. Chen, N.; Majda, A.J.; Tong, X.T. Noisy Lagrangian tracers for filtering random rotating compressible flows. J. Nonlinear Sci. 2015, 25, 451–488. [Google Scholar] [CrossRef]
  63. Chen, N.; Majda, A.J. Model error in filtering random compressible flows utilizing noisy Lagrangian tracers. Mon. Weather Rev. 2016, 144, 4037–4061. [Google Scholar] [CrossRef]
  64. Gershgorin, B.; Harlim, J.; Majda, A.J. Improving filtering and prediction of spatially extended turbulent systems with model errors through stochastic parameter estimation. J. Comput. Phys. 2010, 229, 32–57. [Google Scholar] [CrossRef]
  65. Gershgorin, B.; Harlim, J.; Majda, A.J. Test models for improving filtering with model errors through stochastic parameter estimation. J. Comput. Phys. 2010, 229, 1–31. [Google Scholar] [CrossRef]
  66. Gardiner, C.W. Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences; Springer Series in Synergetics; Springer: Berlin/Heidelberg, Germany, 2004; Volume 13. [Google Scholar]
  67. Risken, H. The Fokker–Planck Equation—Methods of Solution and Applications; Springer Series in Synergetics; Springer: Berlin/Heidelberg, Germany, 1989; Volume 18, p. 301. [Google Scholar]
  68. Pichler, L.; Masud, A.; Bergman, L.A. Numerical solution of the Fokker–Planck equation by finite difference and finite element methods-a comparative study. In Computational Methods in Stochastic Dynamics; Springer: Dordrecht, The Netherlands, 2013; pp. 69–85. [Google Scholar]
  69. Robert, C.P. Monte Carlo Methods; Wiley Online Library: Hoboken, NJ, USA, 2004. [Google Scholar]
  70. Majda, A.J.; Grote, M.J. Mathematical test models for superparametrization in anisotropic turbulence. Proc. Natl. Acad. Sci. USA 2009, 106, 5470–5474. [Google Scholar] [CrossRef] [PubMed]
  71. Grooms, I.; Majda, A.J. Stochastic superparameterization in a one-dimensional model for wave turbulence. Commun. Math. Sci 2014, 12, 509–525. [Google Scholar] [CrossRef]
  72. Grooms, I.; Majda, A.J. Stochastic superparameterization in quasigeostrophic turbulence. J. Comput. Phys. 2014, 271, 78–98. [Google Scholar] [CrossRef]
  73. Majda, A.J.; Grooms, I. New perspectives on superparameterization for geophysical turbulence. J. Comput. Phys. 2014, 271, 60–77. [Google Scholar] [CrossRef]
  74. Grooms, I.; Lee, Y.; Majda, A.J. Numerical schemes for stochastic backscatter in the inverse cascade of quasigeostrophic turbulence. Multiscale Model. Simul. 2015, 13, 1001–1021. [Google Scholar] [CrossRef]
  75. Grooms, I.; Lee, Y.; Majda, A.J. Ensemble Kalman filters for dynamical systems with unresolved turbulence. J. Comput. Phys. 2014, 273, 435–452. [Google Scholar] [CrossRef]
  76. Grooms, I.; Lee, Y.; Majda, A.J. Ensemble filtering and low-resolution model error: Covariance inflation, stochastic parameterization, and model numerics. Mon. Weather Rev. 2015, 143, 3912–3924. [Google Scholar] [CrossRef]
  77. Grooms, I.; Majda, A.J. Efficient stochastic superparameterization for geophysical turbulence. Proc. Natl. Acad. Sci. USA 2013, 110, 4464–4469. [Google Scholar] [CrossRef] [PubMed]
  78. Grooms, I.; Majda, A.J.; Smith, K.S. Stochastic superparameterization in a quasigeostrophic model of the Antarctic Circumpolar Current. Ocean Model. 2015, 85, 1–15. [Google Scholar] [CrossRef]
  79. Bain, A.; Crisan, D. Fundamentals of stochastic filtering; Springer: New York, NY, USA, 2009; Volume 3. [Google Scholar]
  80. Majda, A.J.; Qi, D.; Sapsis, T.P. Blended particle filters for large-dimensional chaotic dynamical systems. Proc. Natl. Acad. Sci. USA 2014, 111, 7511–7516. [Google Scholar] [CrossRef] [PubMed]
  81. Janjić, T.; Cohn, S.E. Treatment of observation error due to unresolved scales in atmospheric data assimilation. Mon. Weather Rev. 2006, 134, 2900–2915. [Google Scholar] [CrossRef]
  82. Daley, R. Estimating observation error statistics for atmospheric data assimilation. Ann. Geophys. 1993, 11, 634–647. [Google Scholar]
  83. Qian, C.; Zhou, W.; Fong, S.K.; Leong, K.C. Two approaches for statistical prediction of non-Gaussian climate extremes: A case study of Macao hot extremes during 1912–2012. J. Clim. 2015, 28, 623–636. [Google Scholar] [CrossRef]
  84. Hjorth, J.U. Computer Intensive Statistical Methods: Validation, Model Selection, and Bootstrap; Routledge: London, UK, 2017. [Google Scholar]
  85. Alessandrini, S.; Delle Monache, L.; Sperati, S.; Cervone, G. An analog ensemble for short-term probabilistic solar power forecast. Appl. Energy 2015, 157, 95–110. [Google Scholar] [CrossRef]
  86. Alessandrini, S.; Delle Monache, L.; Sperati, S.; Nissen, J. A novel application of an analog ensemble for short-term wind power forecasting. Renew. Energy 2015, 76, 768–781. [Google Scholar] [CrossRef]
  87. Nair, P.; Chakraborty, A.; Varikoden, H.; Francis, P.; Kuttippurath, J. The local and global climate forcings induced inhomogeneity of Indian rainfall. Sci. Rep. 2018, 8, 6026. [Google Scholar] [CrossRef] [PubMed]
  88. Hodges, K.; Chappell, D.; Robinson, G.; Yang, G. An improved algorithm for generating global window brightness temperatures from multiple satellite infrared imagery. J. Atmos. Ocean. Technol. 2000, 17, 1296–1312. [Google Scholar] [CrossRef]
  89. Winker, D.M.; Vaughan, M.A.; Omar, A.; Hu, Y.; Powell, K.A.; Liu, Z.; Hunt, W.H.; Young, S.A. Overview of the CALIPSO mission and CALIOP data processing algorithms. J. Atmos. Ocean. Technol. 2009, 26, 2310–2323. [Google Scholar] [CrossRef]
  90. Kravtsov, S.; Kondrashov, D.; Ghil, M. Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability. J. Clim. 2005, 18, 4404–4424. [Google Scholar] [CrossRef]
  91. Wikle, C.K.; Hooten, M.B. A general science-based framework for dynamical spatio-temporal models. Test 2010, 19, 417–451. [Google Scholar] [CrossRef]
  92. Alexander, R.; Zhao, Z.; Székely, E.; Giannakis, D. Kernel analog forecasting of tropical intraseasonal oscillations. J. Atmos. Sci. 2017, 74, 1321–1342. [Google Scholar] [CrossRef]
  93. Harlim, J.; Yang, H. Diffusion Forecasting Model with Basis Functions from QR-Decomposition. J. Nonlinear Sci. 2017, 1–26. [Google Scholar] [CrossRef]
  94. Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
  95. Arulampalam, M.S.; Maskell, S.; Gordon, N.; Clapp, T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 2002, 50, 174–188. [Google Scholar] [CrossRef]
  96. Ristic, B.; Arulampalam, S.; Gordon, N. Beyond the Kalman Filter: Particle Filters for Tracking Applications; Artech House: Norwood, MA, USA, 2003. [Google Scholar]
  97. Tippett, M.K.; Anderson, J.L.; Bishop, C.H.; Hamill, T.M.; Whitaker, J.S. Ensemble square root filters. Mon. Weather Rev. 2003, 131, 1485–1490. [Google Scholar] [CrossRef]
  98. Anderson, J.L. An ensemble adjustment Kalman filter for data assimilation. Mon. Weather Rev. 2001, 129, 2884–2903. [Google Scholar] [CrossRef]
  99. Harlim, J.; Majda, A.J. Catastrophic filter divergence in filtering nonlinear dissipative systems. Commun. Math. Sci. 2010, 8, 27–43. [Google Scholar] [CrossRef]
  100. Tong, X.T.; Majda, A.J.; Kelly, D. Nonlinear stability and ergodicity of ensemble based Kalman filters. Nonlinearity 2016, 29, 657. [Google Scholar] [CrossRef]
  101. Majda, A.J.; Harlim, J.; Gershgorin, B. Mathematical strategies for filtering turbulent dynamical systems. Discret. Contin. Dyn. Syst. 2010, 27, 441–486. [Google Scholar] [CrossRef]
  102. Frenkel, Y.; Majda, A.J.; Khouider, B. Using the stochastic multicloud model to improve tropical convective parameterization: A paradigm example. J. Atmos. Sci. 2012, 69, 1080–1105. [Google Scholar] [CrossRef]
  103. Branicki, M.; Majda, A.J. Dynamic stochastic superresolution of sparsely observed turbulent systems. J. Comput. Phys. 2013, 241, 333–363. [Google Scholar] [CrossRef]
  104. Branicki, M.; Majda, A.J.; Law, K.J.H. Accuracy of some approximate gaussian filters for the Navier–Stokes equation in the presence of model error. Multiscale Model. Simul. 2018. submitted. [Google Scholar]
  105. Kalman, R.E.; Bucy, R.S. New results in linear filtering and prediction theory. J. Basic Eng. 1961, 83, 95–108. [Google Scholar] [CrossRef]
  106. Brammer, K.; Siffling, G. Kalman–Bucy Filters; Artech House: Norwood, MA, USA, 1989. [Google Scholar]
  107. Bucy, R.S.; Joseph, P.D. Filtering for Stochastic Processes with Applications to Guidance; American Mathematical Society: Providence, RI, USA, 1987; Volume 326. [Google Scholar]
  108. Jazwinski, A.H. Stochastic Processes and Filtering Theory; Courier Corporation: Washington, DC, USA, 2007. [Google Scholar]
  109. Bensoussan, A. Stochastic Control of Partially Observable Systems; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  110. Liptser, R.S.; Shiryaev, A. Statistics of Random Processes; Springer: New York, NY, USA, 1977. [Google Scholar]
  111. Sparrow, C. The Lorenz Equations: Bifurcations, Chaos, and Strange Attractors; Springer: New York, NY, USA, 2012; Volume 41. [Google Scholar]
  112. Haken, H. Analogy between higher instabilities in fluids and lasers. Phys. Lett. A 1975, 53, 77–78. [Google Scholar] [CrossRef]
  113. Knobloch, E. Chaos in the segmented disc dynamo. Phys. Lett. A 1981, 82, 439–440. [Google Scholar] [CrossRef]
  114. Gorman, M.; Widmann, P.; Robbins, K. Nonlinear dynamics of a convection loop: A quantitative comparison of experiment with theory. Phys. D Nonlinear Phenom. 1986, 19, 255–267. [Google Scholar] [CrossRef]
  115. Hemati, N. Strange attractors in brushless DC motors. IEEE Trans. Circuits Syst. I 1994, 41, 40–45. [Google Scholar] [CrossRef]
  116. Cuomo, K.M.; Oppenheim, A.V. Circuit implementation of synchronized chaos with applications to communications. Phys. Rev. Lett. 1993, 71, 65. [Google Scholar] [CrossRef] [PubMed]
  117. Poland, D. Cooperative catalysis and chemical chaos: A chemical model for the Lorenz equations. Phys. D Nonlinear Phenom. 1993, 65, 86–99. [Google Scholar] [CrossRef]
  118. Tzenov, S.I. Strange attractors characterizing the osmotic instability. arXiv, 2014; arXiv:1406.0979. [Google Scholar]
  119. Wilks, D.S. Effects of stochastic parametrizations in the Lorenz’96 system. Q. J. R. Meteorol. Soc. 2005, 131, 389–407. [Google Scholar] [CrossRef]
  120. Arnold, H.; Moroz, I.; Palmer, T. Stochastic parametrizations and model uncertainty in the Lorenz’96 system. Phil. Trans. R. Soc. A 2013, 371, 20110479. [Google Scholar] [CrossRef] [PubMed]
  121. Vallis, G.K. Atmospheric and Oceanic Fluid Dynamics; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
  122. Salmon, R. Lectures on Geophysical Fluid Dynamics; Oxford University Press: Oxford, UK, 1998. [Google Scholar]
  123. Lorenz, E.N. Irregularity: A fundamental property of the atmosphere. Tellus A 1984, 36, 98–110. [Google Scholar] [CrossRef]
  124. Giannakis, D.; Majda, A.J. Nonlinear Laplacian spectral analysis for time series with intermittency and low-frequency variability. Proc. Natl. Acad. Sci. USA 2012, 109, 2222–2227. [Google Scholar] [CrossRef] [PubMed]
  125. Majda, A.; Abramov, R.V.; Grote, M.J. Information Theory and Stochastics for Multiscale Nonlinear Systems; American Mathematical Society: Providence, RI, USA, 2005; Volume 25. [Google Scholar]
  126. Majda, A.J.; Gershgorin, B.; Yuan, Y. Low-frequency climate response and fluctuation–dissipation theorems: Theory and practice. J. Atmos. Sci. 2010, 67, 1186–1201. [Google Scholar] [CrossRef]
  127. Mantua, N.J.; Hare, S.R. The Pacific decadal oscillation. J. Oceanogr. 2002, 58, 35–44. [Google Scholar] [CrossRef]
  128. Hinze, J. Turbulence: An introduction to its mechanisms and theory. Mech. Eng. 1959, 1396. [Google Scholar] [CrossRef]
  129. Townsend, A.A. The Structure of Turbulent Shear Flow; Cambridge University Press: Cambridge, UK, 1980. [Google Scholar]
  130. Frisch, U. Turbulence: The Legacy of AN Kolmogorov; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
  131. Cessi, P.; Primeau, F. Dissipative selection of low-frequency modes in a reduced-gravity basin. J. Phys. Oceanogr. 2001, 31, 127–137. [Google Scholar] [CrossRef]
  132. LaCasce, J.; Pedlosky, J. Baroclinic Rossby waves in irregular basins. J. Phys. Oceanogr. 2002, 32, 2828–2847. [Google Scholar] [CrossRef]
  133. Charney, J.G.; DeVore, J.G. Multiple flow equilibria in the atmosphere and blocking. J. Atmos. Sci. 1979, 36, 1205–1216. [Google Scholar] [CrossRef]
  134. Treutlein, H.; Schulten, K. Noise Induced Limit Cycles of the Bonhoeffer-Van der Pol Model of Neural Pulses. Ber. Bunsenges. Phys. Chem. 1985, 89, 710–718. [Google Scholar] [CrossRef]
  135. Lindner, B.; Schimansky-Geier, L. Coherence and stochastic resonance in a two-state system. Phys. Rev. E 2000, 61, 6103. [Google Scholar] [CrossRef]
  136. Longtin, A. Stochastic resonance in neuron models. J. Stat. Phys. 1993, 70, 309–327. [Google Scholar] [CrossRef]
  137. Wiesenfeld, K.; Pierson, D.; Pantazelou, E.; Dames, C.; Moss, F. Stochastic resonance on a circle. Phys. Rev. Lett. 1994, 72, 2125. [Google Scholar] [CrossRef] [PubMed]
  138. Neiman, A.; Schimansky-Geier, L.; Cornell-Bell, A.; Moss, F. Noise-enhanced phase synchronization in excitable media. Phys. Rev. Lett. 1999, 83, 4896. [Google Scholar] [CrossRef]
  139. Hempel, H.; Schimansky-Geier, L.; Garcia-Ojalvo, J. Noise-sustained pulsating patterns and global oscillations in subexcitable media. Phys. Rev. Lett. 1999, 82, 3713. [Google Scholar] [CrossRef]
  140. Hu, B.; Zhou, C. Phase synchronization in coupled nonidentical excitable systems and array-enhanced coherence resonance. Phys. Rev. E 2000, 61, R1001. [Google Scholar] [CrossRef]
  141. Jung, P.; Cornell-Bell, A.; Madden, K.S.; Moss, F. Noise-induced spiral waves in astrocyte syncytia show evidence of self-organized criticality. J. Neurophysiol. 1998, 79, 1098–1101. [Google Scholar] [CrossRef] [PubMed]
  142. DeVille, R.L.; Vanden-Eijnden, E.; Muratov, C.B. Two distinct mechanisms of coherence in randomly perturbed dynamical systems. Phys. Rev. E 2005, 72, 031105. [Google Scholar] [CrossRef] [PubMed]
  143. Casado, J. Noise-induced coherence in an excitable system. Phys. Lett. A 1997, 235, 489–492. [Google Scholar] [CrossRef]
  144. Sadhu, S.; Kuehn, C. Stochastic mixed-mode oscillations in a three-species predator–prey model. Chaos 2018, 28, 033606. [Google Scholar] [CrossRef] [PubMed]
  145. Sun, G.Q.; Jin, Z.; Liu, Q.X.; Li, B.L. Rich dynamics in a predator–prey model with both noise and periodic force. BioSystems 2010, 100, 14–22. [Google Scholar] [CrossRef] [PubMed]
  146. McCluskey, C.C. Complete global stability for an SIR epidemic model with delay-distributed or discrete. Nonlinear Anal. 2010, 11, 55–59. [Google Scholar] [CrossRef]
  147. Kim, K.I.; Lin, Z.; Zhang, Q. An SIR epidemic model with free boundary. Nonlinear Anal. 2013, 14, 1992–2001. [Google Scholar] [CrossRef]
  148. Allen, L.J.; Brauer, F.; Van den Driessche, P.; Wu, J. Mathematical Epidemiology; Springer: New York, NY, USA, 2008; Volume 1945. [Google Scholar]
  149. Gray, A.; Greenhalgh, D.; Hu, L.; Mao, X.; Pan, J. A stochastic differential equation SIS epidemic model. SIAM J. Appl. Math. 2011, 71, 876–902. [Google Scholar] [CrossRef]
  150. Zhou, Y.; Liu, H. Stability of periodic solutions for an SIS model with pulse vaccination. Math. Comput. Model. 2003, 38, 299–308. [Google Scholar] [CrossRef]
  151. Madden, R.A.; Julian, P.R. Observations of the 40–50-day tropical oscillation—A review. Mon. Weather Rev. 1994, 122, 814–837. [Google Scholar] [CrossRef]
  152. Zhang, C. Madden-julian oscillation. Rev. Geophys. 2005, 43, 2004RG000158. [Google Scholar] [CrossRef]
  153. Lau, W.K.M.; Waliser, D.E. Intraseasonal Variability in the Atmosphere-Ocean Climate System; Springer: Berlin, Germany, 2011. [Google Scholar]
  154. Hendon, H.H.; Wheeler, M.C.; Zhang, C. Seasonal dependence of the MJO–ENSO relationship. J. Clim. 2007, 20, 531–543. [Google Scholar] [CrossRef]
  155. McPhaden, M.J. Genesis and evolution of the 1997-98 El Niño. Science 1999, 283, 950–954. [Google Scholar] [CrossRef] [PubMed]
  156. Tang, Y.; Yu, B. MJO and its relationship to ENSO. J. Geophys. Res. Atmos. 2008, 113, D14106. [Google Scholar] [CrossRef]
  157. Majda, A.J.; Stechmann, S.N. The skeleton of tropical intraseasonal oscillations. Proc. Natl. Acad. Sci. USA 2009, 106, 8417–8422. [Google Scholar] [CrossRef] [PubMed]
  158. Majda, A.J.; Stechmann, S.N. Nonlinear dynamics and regional variations in the MJO skeleton. J. Atmos. Sci. 2011, 68, 3053–3071. [Google Scholar] [CrossRef]
  159. Stechmann, S.N.; Majda, A.J. Identifying the skeleton of the Madden–Julian oscillation in observational data. Mon. Weather Rev. 2015, 143, 395–416. [Google Scholar] [CrossRef]
  160. Stachnik, J.P.; Waliser, D.E.; Majda, A.J.; Stechmann, S.N.; Thual, S. Evaluating MJO event initiation and decay in the skeleton model using an RMM-like index. J. Geophys. Res. Atmos. 2015, 120, 11486–11508. [Google Scholar] [CrossRef]
  161. Biello, J.A.; Majda, A.J. Modulating synoptic scale convective activity and boundary layer dissipation in the IPESD models of the Madden–Julian oscillation. Dyn. Atmos. Oceans 2006, 42, 152–215. [Google Scholar] [CrossRef]
  162. Chen, N.; Majda, A.J. Filtering the stochastic skeleton model for the Madden–Julian oscillation. Mon. Weather Rev. 2016, 144, 501–527. [Google Scholar] [CrossRef]
  163. Thual, S.; Majda, A.J.; Stechmann, S.N. Asymmetric intraseasonal events in the stochastic skeleton MJO model with seasonal cycle. Clim. Dyn. 2015, 45, 603–618. [Google Scholar] [CrossRef]
  164. Thual, S.; Majda, A.J. A skeleton model for the MJO with refined vertical structure. Clim. Dyn. 2016, 46, 2773–2786. [Google Scholar] [CrossRef]
  165. Majda, A.J.; Tong, X.T. Geometric ergodicity for piecewise contracting processes with applications for tropical stochastic lattice models. Commun. Pure Appl. Math. 2016, 69, 1110–1153. [Google Scholar] [CrossRef]
  166. Clarke, A.J. An Introduction to the Dynamics of El Niño and the Southern Oscillation; Elsevier: Amsterdam, The Netherlands, 2008. [Google Scholar]
  167. Sarachik, E.S.; Cane, M.A. The El Nino-Southern Oscillation Phenomenon; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  168. Holton, J.R.; Dmowska, R. El Niño, La Niña, and the Southern Oscillation; Academic Press: New York, NY, USA, 1989; Volume 46. [Google Scholar]
  169. Lee, T.; McPhaden, M.J. Increasing intensity of El Niño in the central-equatorial Pacific. Geophys. Res. Lett. 2010, 37, L14603. [Google Scholar] [CrossRef]
  170. Kao, H.Y.; Yu, J.Y. Contrasting eastern-Pacific and central-Pacific types of ENSO. J. Clim. 2009, 22, 615–632. [Google Scholar] [CrossRef]
  171. Ashok, K.; Behera, S.K.; Rao, S.A.; Weng, H.; Yamagata, T. El Niño Modoki and its possible teleconnection. J. Geophys. Res. Oceans 2007, 112, C11. [Google Scholar] [CrossRef]
  172. Kug, J.S.; Jin, F.F.; An, S.I. Two types of El Niño events: Cold tongue El Niño and warm pool El Niño. J. Clim. 2009, 22, 1499–1515. [Google Scholar] [CrossRef]
  173. Capotondi, A.; Wittenberg, A.T.; Newman, M.; Di Lorenzo, E.; Yu, J.Y.; Braconnot, P.; Cole, J.; Dewitte, B.; Giese, B.; Guilyardi, E.; et al. Understanding ENSO diversity. Bull. Am. Meteorol. Soc. 2015, 96, 921–938. [Google Scholar] [CrossRef]
  174. Chen, D.; Lian, T.; Fu, C.; Cane, M.A.; Tang, Y.; Murtugudde, R.; Song, X.; Wu, Q.; Zhou, L. Strong influence of westerly wind bursts on El Niño diversity. Nat. Geosci. 2015, 8, 339. [Google Scholar] [CrossRef]
  175. Thual, S.; Majda, A.J.; Chen, N.; Stechmann, S.N. Simple stochastic model for El Niño with westerly wind bursts. Proc. Natl. Acad. Sci. USA 2016, 113, 10245–10250. [Google Scholar] [CrossRef] [PubMed]
  176. Chen, N.; Majda, A.J. Simple dynamical models capturing the key features of the Central Pacific El Niño. Proc. Natl. Acad. Sci. USA 2016, 113, 11732–11737. [Google Scholar] [CrossRef] [PubMed]
  177. Chen, N.; Majda, A.J.; Thual, S. Observations and mechanisms of a simple stochastic dynamical model capturing El Niño diversity. J. Clim. 2018, 31, 449–471. [Google Scholar] [CrossRef]
  178. Thual, S.; Majda, A.J.; Chen, N. Mechanisms of the 2014–2016 Delayed Super El Nino Captured by Simple Dynamical Models. Clim. Dyn. 2018, in press. [Google Scholar] [CrossRef]
  179. Thual, S.; Majda, A.; Chen, N. Seasonal synchronization of a simple stochastic dynamical model capturing El Niño diversity. J. Clim. 2017, 30, 10047–10066. [Google Scholar] [CrossRef]
  180. Wang, X. A Note on Long-Time Behavior of Solutions to the Boussinesq System at Large Prandtl Number. Contemp. Math. 2005, 371, 315. [Google Scholar]
  181. Constantin, P.; Doering, C.R. Infinite Prandtl number convection. J. Stat. Phys. 1999, 94, 159–172. [Google Scholar] [CrossRef]
  182. Boffetta, G.; Davoudi, J.; Eckhardt, B.; Schumacher, J. Lagrangian tracers on a surface flow: The role of time correlations. Phys. Rev. Lett. 2004, 93, 134501. [Google Scholar] [CrossRef] [PubMed]
  183. Ide, K.; Kuznetsov, L.; Jones, C.K. Lagrangian data assimilation for point vortex systems*. J. Turbul. 2002, 3. [Google Scholar] [CrossRef]
  184. Apte, A.; Jones, C.K.; Stuart, A.; Voss, J. Data assimilation: Mathematical and statistical perspectives. Int. J. Numerical Methods Fluids 2008, 56, 1033–1046. [Google Scholar] [CrossRef]
  185. Salman, H.; Ide, K.; Jones, C.K. Using flow geometry for drifter deployment in Lagrangian data assimilation. Tellus A 2008, 60, 321–335. [Google Scholar] [CrossRef]
  186. Branicki, M.; Gershgorin, B.; Majda, A.J. Filtering skill for turbulent signals for a suite of nonlinear and linear extended Kalman filters. J. Comput. Phys. 2012, 231, 1462–1498. [Google Scholar] [CrossRef]
  187. Branicki, M.; Chen, N.; Majda, A.J. Non-Gaussian test models for prediction and state estimation with model errors. Chin. Ann. Math. Ser. B 2013, 34, 29–64. [Google Scholar] [CrossRef]
  188. Harlim, J.; Majda, A.J. Filtering turbulent sparsely observed geophysical flows. Mon. Weather Rev. 2010, 138, 1050–1083. [Google Scholar] [CrossRef]
  189. Monahan, A.H. Temporal Filtering Enhances the Skewness of Sea Surface Winds. J. Clim. 2018. [Google Scholar] [CrossRef]
  190. Monahan, A.H. The temporal autocorrelation structure of sea surface winds. J. Clim. 2012, 25, 6684–6700. [Google Scholar] [CrossRef]
  191. Cousins, W.; Sapsis, T.P. Reduced-order precursors of rare events in unidirectional nonlinear water waves. J. Fluid Mech. 2016, 790, 368–388. [Google Scholar] [CrossRef]
  192. Botev, Z.I.; Grotowski, J.F.; Kroese, D.P. Kernel density estimation via diffusion. Ann. Stat. 2010, 38, 2916–2957. [Google Scholar] [CrossRef]
  193. Chen, N.; Majda, A.J.; Tong, X.T. Rigorous analysis for efficient statistically accurate algorithms for solving Fokker–Planck equations in large dimensions. arXiv, 2017; arXiv:1709.05585. [Google Scholar]
  194. Majda, A.J.; Gershgorin, B. Quantifying uncertainty in climate change science through empirical information theory. Proc. Natl. Acad. Sci. USA 2010, 107, 14958–14963. [Google Scholar] [CrossRef] [PubMed]
  195. Majda, A.J.; Gershgorin, B. Link between statistical equilibrium fidelity and forecasting skill for complex systems with model error. Proc. Natl. Acad. Sci. USA 2011, 108, 12599–12604. [Google Scholar] [CrossRef] [PubMed]
  196. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  197. Kullback, S. Letter to the editor: The Kullback-Leibler distance. Am. Stat. 1987, 41, 340–341. [Google Scholar]
  198. Chen, N.; Giannakis, D.; Herbei, R.; Majda, A.J. An MCMC algorithm for parameter estimation in signals with hidden intermittent instability. SIAM/ASA J. Uncertain. Quantif. 2014, 2, 647–669. [Google Scholar] [CrossRef]
  199. Andrieu, C.; De Freitas, N.; Doucet, A.; Jordan, M.I. An introduction to MCMC for machine learning. Mach. Learn. 2003, 50, 5–43. [Google Scholar] [CrossRef]
  200. Beskos, A.; Roberts, G.; Stuart, A.; Voss, J. MCMC methods for diffusion bridges. Stoch. Dyn. 2008, 8, 319–350. [Google Scholar] [CrossRef]
  201. Golightly, A.; Wilkinson, D.J. Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics 2005, 61, 781–788. [Google Scholar] [CrossRef] [PubMed]
  202. Johansen, S.; Juselius, K. Maximum likelihood estimation and inference on cointegration-with applications to the demand for money. Oxf. Bull. Econ. Stat. 1990, 52, 169–210. [Google Scholar] [CrossRef]
  203. Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
  204. Evensen, G. The ensemble Kalman filter for combined state and parameter estimation. IEEE Control Syst. 2009, 29, 83–104. [Google Scholar] [CrossRef]
  205. Annan, J.; Hargreaves, J.; Edwards, N.; Marsh, R. Parameter estimation in an intermediate complexity earth system model using an ensemble Kalman filter. Ocean Model. 2005, 8, 135–154. [Google Scholar] [CrossRef]
  206. Taylor, K.E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 2001, 106, 7183–7192. [Google Scholar] [CrossRef]
  207. Houtekamer, P.L.; Mitchell, H.L. Data assimilation using an ensemble Kalman filter technique. Mon. Weather Rev. 1998, 126, 796–811. [Google Scholar] [CrossRef]
  208. Lermusiaux, P.F. Data assimilation via error subspace statistical estimation. Part II: Middle Atlantic Bight shelfbreak front simulations and ESSE validation. Mon. Weather Rev. 1999, 127, 1408–1432. [Google Scholar] [CrossRef]
  209. Hendon, H.H.; Lim, E.; Wang, G.; Alves, O.; Hudson, D. Prospects for predicting two flavors of El Niño. Geophys. Res. Lett. 2009, 36, L19713. [Google Scholar] [CrossRef]
  210. Kim, H.M.; Webster, P.J.; Curry, J.A. Seasonal prediction skill of ECMWF System 4 and NCEP CFSv2 retrospective forecast for the Northern Hemisphere Winter. Clim. Dyn. 2012, 39, 2957–2973. [Google Scholar] [CrossRef]
  211. Majda, A.J.; Chen, N. Model Error, Information Barriers, State Estimation and Prediction in Complex Multiscale Systems. Entropy 2018. submitted. [Google Scholar]
  212. Majda, A.J.; Gershgorin, B. Improving model fidelity and sensitivity for complex systems through empirical information theory. Proc. Natl. Acad. Sci. USA 2011, 108, 10044–10049. [Google Scholar] [CrossRef] [PubMed]
  213. Gershgorin, B.; Majda, A.J. Quantifying uncertainty for climate change and long-range forecasting scenarios with model errors—Part I: Gaussian models. J. Clim. 2012, 25, 4523–4548. [Google Scholar] [CrossRef]
  214. Branicki, M.; Majda, A.J. Quantifying uncertainty for predictions with model error in non-Gaussian systems with intermittency. Nonlinearity 2012, 25, 2543. [Google Scholar] [CrossRef]
  215. Branicki, M.; Majda, A. Quantifying Bayesian filter performance for turbulent dynamical systems through information theory. Commun. Math. Sci 2014, 12, 901–978. [Google Scholar] [CrossRef]
  216. Kleeman, R. Information theory and dynamical system predictability. Entropy 2011, 13, 612–649. [Google Scholar] [CrossRef]
  217. Kleeman, R. Measuring dynamical prediction utility using relative entropy. J. Atmos. Sci. 2002, 59, 2057–2072. [Google Scholar] [CrossRef]
  218. Majda, A.; Kleeman, R.; Cai, D. A mathematical framework for quantifying predictability through relative entropy. Methods Appl. Anal. 2002, 9, 425–444. [Google Scholar]
  219. Xu, Q. Measuring information content from observations for data assimilation: Relative entropy versus Shannon entropy difference. Tellus A 2007, 59, 198–209. [Google Scholar] [CrossRef]
  220. Barato, A.; Seifert, U. Unifying three perspectives on information processing in stochastic thermodynamics. Phys. Rev. Lett. 2014, 112, 090601. [Google Scholar] [CrossRef] [PubMed]
  221. Kawaguchi, K.; Nakayama, Y. Fluctuation theorem for hidden entropy production. Phys. Rev. E 2013, 88, 022147. [Google Scholar] [CrossRef] [PubMed]
  222. Pham, D.T. Stochastic methods for sequential data assimilation in strongly nonlinear systems. Mon. Weather Rev. 2001, 129, 1194–1207. [Google Scholar] [CrossRef]
  223. Majda, A.J. Statistical energy conservation principle for inhomogeneous turbulent dynamical systems. Proc. Natl. Acad. Sci. USA 2015, 112, 8937–8941. [Google Scholar] [CrossRef] [PubMed]
  224. Harlim, J.; Majda, A. Filtering nonlinear dynamical systems with linear stochastic models. Nonlinearity 2008, 21, 1281. [Google Scholar] [CrossRef]
  225. Gajic, Z.; Lelic, M. Modern Control Systems Engineering; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1996. [Google Scholar]
  226. Grabowski, W.W.; Smolarkiewicz, P.K. CRCP: A cloud resolving convection parameterization for modeling the tropical convecting atmosphere. Phys. D Nonlinear Phenom. 1999, 133, 171–178. [Google Scholar] [CrossRef]
  227. Grabowski, W.W. An improved framework for superparameterization. J. Atmos. Sci. 2004, 61, 1940–1952. [Google Scholar] [CrossRef]
  228. Doucet, A.; Godsill, S.; Andrieu, C. On sequential Monte Carlo sampling methods for Bayesian filtering. Stat. Comput. 2000, 10, 197–208. [Google Scholar] [CrossRef]
  229. Harlim, J.; Majda, A.J. Test models for filtering with superparameterization. Multiscale Model. Simul. 2013, 11, 282–308. [Google Scholar] [CrossRef]
  230. Lee, Y.; Majda, A.J. Multiscale methods for data assimilation in turbulent systems. Multiscale Model. Simul. 2015, 13, 691–713. [Google Scholar] [CrossRef]
  231. Daley, R. Atmospheric Data Analysis, Cambridge Atmospheric and Space Science Series; Cambridge University Press: Cambridge, UK, 1991; Volume 6966, p. 25. [Google Scholar]
  232. Sorenson, H.W.; Alspach, D.L. Recursive Bayesian estimation using Gaussian sums. Automatica 1971, 7, 465–479. [Google Scholar] [CrossRef]
  233. Hoteit, I.; Luo, X.; Pham, D.T. Particle Kalman filtering: A nonlinear Bayesian framework for ensemble Kalman filters. Mon. Weather Rev. 2012, 140, 528–542. [Google Scholar] [CrossRef]
  234. Alspach, D.L.; Samant, V.S.; Sorenson, H.W. Practical Control Algorithms for Nonlinear Stochastic Systems and Investigations of Nonlinear Filters; Technical Report for Statistics and Probability: San Diego, CA, USA, July 1980. [Google Scholar]
  235. Lee, Y.; Majda, A.J. State estimation and prediction using clustered particle filters. Proc. Natl. Acad. Sci. USA 2016, 113, 14609–14614. [Google Scholar] [CrossRef] [PubMed]
  236. Majda, A.; McLaughlin, D.; Tabak, E. A one-dimensional model for dispersive wave turbulence. J. Nonlinear Sci. 1997, 7, 9–44. [Google Scholar] [CrossRef]
  237. Cai, D.; Majda, A.J.; McLaughlin, D.W.; Tabak, E.G. Dispersive wave turbulence in one dimension. Phys. D Nonlinear Phenom. 2001, 152, 551–572. [Google Scholar] [CrossRef]
  238. Qi, D.; Majda, A.J. Blended particle methods with adaptive subspaces for filtering turbulent dynamical systems. Phys. D Nonlinear Phenom. 2015, 298, 21–41. [Google Scholar] [CrossRef]
  239. Sapsis, T.P.; Majda, A.J. Blended reduced subspace algorithms for uncertainty quantification of quadratic systems with a stable mean state. Phys. D Nonlinear Phenom. 2013, 258, 61–76. [Google Scholar] [CrossRef]
  240. Keating, S.R.; Majda, A.J.; Smith, K.S. New methods for estimating ocean eddy heat transport using satellite altimetry. Mon. Weather Rev. 2012, 140, 1703–1722. [Google Scholar] [CrossRef]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.