Abstract
We present a class of efficient parametric closure models for 1D stochastic Burgers equations. Casting it as statistical learning of the flow map, we derive the parametric form by representing the unresolved high wavenumber Fourier modes as functionals of the resolved variable’s trajectory. The reduced models are nonlinear autoregression (NAR) time series models, with coefficients estimated from data by least squares. The NAR models can accurately reproduce the energy spectrum, the invariant densities, and the autocorrelations. Taking advantage of the simplicity of the NAR models, we investigate maximal space-time reduction. Reduction in space dimension is unlimited, and NAR models with two Fourier modes can perform well. The NAR model’s stability limits time reduction, with a maximal time step smaller than that of the K-mode Galerkin system. We report a potential criterion for optimal space-time reduction: the NAR models achieve minimal relative error in the energy spectrum at the time step, where the K-mode Galerkin system’s mean Courant–Friedrichs–Lewy (CFL) number agrees with that of the full model.
1. Introduction
Closure modeling aims for computationally efficiently reduced models for tasks requiring repeated simulations such as Bayesian uncertainty quantification [1,2] and data assimilation [3,4]. Consisting of low-dimensional resolved variables, the closure model must take into account the non-negligible effects of unresolved variables so as to capture both the short-time dynamics and large-time statistics. As suggested by the Mori–Zwanzig formalism [5,6,7], trajectory-wise approximation is no longer appropriate, and the approximation is in a statistical sense. That is, the reduced model aims to generate a process that approximates the target process in distribution, or at least, reproduce the key statistics and dynamics for the quantities of interest. For general nonlinear systems, such a reduced closure model is out of the reach of direct derivations from first principles.
Data-driven approaches, which are based on statistical learning methods, provide useful and practical tools for model reduction. The past decades witness revolutionary developments of data-driven strategies, ranging from parametric models (see, e.g., [8,9,10,11,12,13,14] and the references therein) to nonparametric and machine learning methods (see, e.g., [15,16,17,18]). These developments demand a systematic understanding of model reduction from the perspectives of dynamical systems (see, e.g., [7,19,20]), numerical approximation [21,22], and statistical learning [17,23].
With 1D stochastic Burgers equation as a prototype model, we aim to further the understanding of model reduction from an interpretable statistical inference perspective. More specifically, we consider a stochastic Burgers equation with a periodic solution on :
from an initial condition . We consider a stochastic force that is smooth in space, residing on low wavenumber Fourier modes, and white in time, given by
where are independent Brown motions. Here is the viscosity constant and represents the strength of the stochastic force.
Our goal is to find a discrete-time closure model for the first K Fourier modes, so as to efficiently reproduce the energy spectrum and other statistics of these modes.
We present a class of efficient parametric reduced closure models for 1D stochastic Burgers equations. The key idea is to approximate the discrete-in-time flow map statistically, in particular, to represent the unresolved high wavenumber Fourier modes as functionals of the resolved variable’s trajectory. The reduced models are nonlinear autoregression (NAR) time series models, with coefficients estimated from data simply by least squares. We test the NAR models in four settings: reduction of deterministic responses () vs. reduction involving unresolved stochastic force (), and small vs. large scales of stochastic force (with and ), where is the number of Fourier modes of the white-in-time stochastic force and is the scale of the force. In all these settings, the NAR models can accurately reproduce the energy spectrum, invariant densities, and autocorrelation functions (ACF). We also discuss model selection, consistency of estimators, and memory length of the reduced models.
Taking advantage of our NAR models’ simplicity, we further investigate a critical issue in model reduction of (stochastic) partial differential equations: maximal space-time reduction. The space dimension can be reduced arbitrarily in our parametric inference approach: NAR models with two Fourier modes perform well. The time reduction is another story. The maximal time step is limited by the NAR model’s stability and is smaller than those of the K-mode Galerkin system. Numerical tests indicate that the NAR models achieve the minimal relative error at the time step where the K-mode Galerkin system’s mean CFL (Courant–Friedrichs–Lewy) number agrees with the full model’s, suggesting a potential criterion for optimal space-time reduction.
One can readily extend our parametric closure modeling strategy to general nonlinear dissipative systems beyond quadratic nonlinearities. Along with [14], we may view it as a parametric inference extension of the nonlinear Galerkin methods [24,25,26,27]. However, it does not require the existence of an inertial manifold (and the stochastic Burgers equation does not satisfy the spectral gap condition that is sufficient for the existence of an inertial manifold [28]), and it applies to resolved variables of any dimension (e.g., lower than the dimension of the inertial manifold if it exists [14]). Notably, one may use NAR models that are linear in parameters and estimate them by least squares. Therefore, the algorithm is computationally efficient and is scalable for large systems.
The limitation of the parametric modeling approach is its reliance on the derivation of a parametric form using the Picard iteration, which depends on the nonlinearity of the unresolved variables (see Section 3.1). When the nonlinearity is complicated, a linear-in-parameter ansatz may be out of reach. One can overcome this limitation by nonparametric techniques [23,29] and machine learning methods (see, e.g., [16,17,30]).
The stochastic Burgers equation is a prototype model for developing closure modeling techniques for turbulence (see e.g., [31,32,33,34,35,36,37]). In particular, Dolaptchiev et al. [37] propose a closure model for stochastic Burgers equation in a similar setting, based on local averages of finite-difference discretization, reproducing accurate energy spectrum similar to this study. We directly construct a simple yet effective NAR model for the Fourier modes, providing the ground of a statistical inference examination of model reduction.
We note that the closure reduced models based on parametric inference are different from the widely studied proper orthogonal decomposition (POD)-based reduced order models (ROM) for parametric full models [38,39]. These POD-ROMs seek new effective bases to capture the effective dynamics by a linear system for the whole family of parametric full models. The inference-based closure models focus on nonlinear dynamics in a given basis and aim to capture both short-time dynamics and large-time statistics. In a probabilistic perspective, both approaches approximate the target stochastic process: the POD-ROMs are based on Karhunen-Loéve expansion, while the inference-based closure models aim to learn the nonlinear flow-map. One may potentially combine the two and find nonlinear closure models for the nonlinear dynamics in the POD basis.
The exposition of our study proceeds as follows. We first summarize the notations in Table 1. Following a brief review of the basic properties of the stochastic Burgers equation and its numerical integration, we introduce in Section 2 the inference approach to closure modeling and compare it with the nonlinear Galerkin methods. Section 3 presents the inference of NAR models: derivation of the parametric form, parameter estimation, and model selection. Examining NAR models’ performance in four settings in Section 4, we investigate the space-time reduction. Section 5 concludes our main findings and possible future research.
2. Space-Time Reduction for Stochastic Burgers Equationations
In this section, we first review basic properties of the stochastic Burgers equation and its numerical integration. Then, we introduce inference-based model reduction and compare it with the nonlinear Galerkin methods.
2.1. The Stochastic Burgers Equationation
A Fourier transform of Equation leads to
with , where are Fourier modes:
The system has the following properties. First, it is Galilean invariant: if is a solution, then , with c an arbitrary constant speed, is a solution. To see this, let . Then, , , and
Without loss of generality, we set . This implies that . In this study, we only consider forces with mean zero, i.e., , therefore from Equation (3), we see that , or equivalently, . Second, the system has an invariant measure [31,40,41], due to a balance between the diffusion term, which dissipates energy, and the stochastic force, which injects energy. In particular, the initial condition does not affect the large time statistical properties of the solution. Third, since u is real, the Fourier modes satisfies , where is the complex conjugate of .
2.2. Galerkin Spectral Method
We consider the Galerkin spectral method for numerical solutions of the Burgers equation. The system is approximated as follows: the function is represented at grid points with and . The Fourier transform is replaced by discrete Fourier transform
For simplicity of notation, we abuse the notation so that it denotes either the true solution or its high-resolution -mode approximation. Since u is real, we have . Noticing further that due to Galilean invariance, and setting , we obtain a truncated system
We solve Equation (4) using the exponential time differencing fourth order Rouge–Kutta method (ETDRK4) (see [42,43]) with standard zero-padding for dealiasing (see e.g., [44]), with the force term treated as a constant in each time step. Such a mixture scheme is of strong order 1, but it has an advantage of preserving both the numerical stability of ETDRK4 and the simplicity of Euler–Maruyama.
We will consider a relatively small viscosity , so that random shocks are about to emerge in the solution. In general, a smaller viscosity constant demands a higher resolution in space-time to resolve the solution, particularly the emerging shocks as vanishes. To sufficiently resolve the solution, we set and . The solution is accurately resolved, with mean Courant–Friedrichs–Lewy (CFL) numbers being 0.139 and 0.045 for and , respectively. Here the mean CFL number is computed as the average along a trajectory with steps
where and are the time step and space step, respectively. Furthermore, numerical tests show that the marginal densities converge as trajectory length increases.
2.3. Nonlinear Galerkin and Inferential Model Reduction
For simplicity of notation, we write the Burgers equation in an operator form as
with a linear operator and a nonlinear operator
We first decompose the Fourier modes of u into resolved and unresolved variables. Recall that our goal of model reduction is to derive a closed system that can faithfully describe the dynamics of the coefficients , or equivalently, the low dimensional process .
Denote by P the projection operator from to , and let (and for simplicity of notation, we will also denote them as projections on the corresponding vector spaces of Fourier modes). With , we can write the system (5) as
To find a closed system for v, we quantify the truncation error in (6), which represents the nonlinear interaction between the low and high wavenumber modes, by either a function of v or a functional of the trajectory of v. In particular, in the nonlinear Galerkin method based on inertial manifold theory, (see e.g., [24,25,26,27]), one aims to represent the high modes w as a function of the low modes v (and hence obtaining an approximate inertial manifold). In the simplest implementation, one neglects the time derivative in Equationation (7) and solves from
by fixed point iterations: . This leads to an approximation of w as a function of v, which exists if K is large enough and if the system satisfies a gap condition (so that an inertial manifold exists). However, among many dissipative systems with global attractor, only a few have been proven to satisfy the gap condition (see [28] for a recent review). More importantly, we can not always expect K to be larger than the dimension of an inertial manifold, which is unknown in general. Therefore, such a nonlinear Galerkin approach works for neither a system without an inertial manifold nor for a K smaller than the dimension of the inertial manifold.
We take a different perspective on the reduction. Unlike the nonlinear Galerkin which aims for a trajectory-wise approximation, we aim for a probabilistic approximation of the distribution of the stochastic process . The randomness of the process v can come from random initial conditions and/or from the stochastic force. We emphasize that a key is to represent the dependence of the model error on the process v, not simply constructing a stochastic process with the same distribution as , which may be independent of the process of v.
In a data-driven approach, such a probabilistic approximation leads naturally to the statistical inference of the underlying process, aiming to represent the model error as a functional of the past trajectory . This inferential reduction approach works flexibly for general settings: there is no need of an inertial manifold and the dimension K can be arbitrary (e.g., less than the dimension of the inertial manifold, as shown in [14]).
Space-time reduction. To achieve a space-time reduction for practical computation, the reduced model should be a time series model with a time step for time reduction, instead of a differential system. It approximates the flow map (with )
where is the vector of all Fourier modes, and thus the above map is not a closed system for the low modes. Recall that for ,
Clearly, the K-mode truncated Galerkin system can provide an immediate approximation to F in (8). Making use of it, we propose a time series model for in the form of
where is from a one-step forward integrator with time step-size of the deterministic K-mode Galerkin, and is white noise in the kth Fourier mode of the stochastic force at time . Here, the term and the noise aim to represent the truncation error, as well as the discretization error. Together with the other terms in (10), they provide a statistical approximation to the flow map F in (8). In particular, the term approximates F based on information up to time n (e.g., the conditional expectation), and the noise aims to statistically represent the residual of the approximation. Since the truncation error depends on the past history of the low wavenumber modes, and as suggested by the Mori–Zwanzig formalism [6,7], we make depend on the trajectory of the state process, as well as the trajectories and :
For simplicity, we assume the noise to be iid Gaussian, and the resulted time series model in (10) is a nonlinear autoregression moving average model (NARMA) [13,45,46].
The right hand side of Equation (10), together with defined in Equation (11), aims for a statistical approximation of the discrete-time map (8). However, the general form in Equation (11) leads to a high dimensional function to be learned from data, which is intractable by regression methods using either global or local polynomial basis, due to the well-known curse of dimensionality. Fortunately, the physical model provides informative structures to reduce the dimension, and we can obtain effective approximations based on only a few basis functions with finite memory. In the next section, we derive from the physical model a parametric form for the reduced model, whose coefficients can be efficiently estimated from data.
To avoid confusions between notations, we summarize the correspondence of the variables between the full and reduced models in Table 2.
3. Inference of Reduced Models
We present here the parametric inference of NAR models: derivation of parametric forms, estimation of the parameters, and model selection.
3.1. Derivation of Parametric Reduced Models
We derive parametric reduced models by extracting basis functions from numerical integration of Equation (6). The combination of these basis functions will give us in (11), which approximates the flow maps in (8) in a statistical sense.
We first write a closed integro-differential system for the low-mode process . In view of Equation (6), this can be simply done by integrating the equation of the high modes w in Equation (7):
where . Note that in addition to the trajectories and , which we can assume to be known, the state also depends on the initial condition . Therefore, this equation is not strictly closed. However, as increases, the effect of the initial condition decays exponentially, allowing for possible finite time approximate closure. Given and , the Picard iteration can provide us an approximation of w as a functional of the trajectory of v. That is, the sequence of functions , defined by
with for , will converge to w as . In particular, the first Picard iteration
provides us a closed representation: from its numerical integrator, we can derive parametric terms for the reduced model. We emphasize that the goal is to derive parametric terms for statistical inference, but not to have a trajectory-wise approximation. Thus, high-order numerical integrators or high-order Picard iterations are helpful but may complicate the parametrization. For simplicity, we shall consider only the first Picard iteration and Riemann sum approximation of this integral.
We can now propose parametric numerical reduced models from the above integro-differential equation. In a simple form, we parametrize both the Riemann sum approximation of the first Picard iteration and a numerical scheme of the differential equation to obtain
Here denotes the time step-size, the nonlinear function comes from a numerical integration of the deterministic truncated Galerkin equation at time and with time step-size , and the coefficients are to be estimated by fitting to data in a statistical sense. To distinguish the approximate process in the reduced model from the original process, we denote it by , and write the reduced model as
where is a process representing the residual, can be assumed to be stochastic force for simplicity, but can also be assumed to be a moving average part to better capture the time correlation as in [13,46]. The second Equation (14b) does not have a residual term, as its goal is to provide a set of basis functions for the approximation of the forward map as in Equation (8), not to model the high modes.
Note that the time step-size can be relatively large, as long as the truncated Galerkin equation of the slow variable v can be reasonably resolved. In general, such a step-size can be much larger than the time step-size needed to resolve the fast process w, because the effect of the unresolved fast process is “averaged” statistically when fitting the coefficients to data. Furthermore, the numerical error in the discretization is taken into account statistically.
Theoretically, the right-hand side of Equation (14a) is an approximation of the conditional expectation , which is the optimal estimator of the forward map conditional on the information up time . Here, the is with respect to the joint measure of the vector , which is approximated by their joint empirical measure when fitting to data.
To avoid nonlinear optimization, the parametric form may be further simplified to be linearly dependent on the coefficients by dropping the terms that are nonlinear in the parameter, which is quadratic. In fact, recall that in the Burgers equation and . By dropping the interaction between the high modes and approximating
in (14a), we obtain a reduced model that depends linearly on the coefficients .
3.2. The Numerical Reduced Model in Fourier Modes
We now write the reduced model in terms of the Fourier modes as in Equation (10).
As discussed in the above section, the major task is to parametrize the truncation error . Recall that the operator P projects u to modes with wavenumber and that the bilinear function (hereafter, to simplify notation, we also denote P and Q on the corresponding vector spaces of Fourier modes).
Since the quadratic term can only propagate energy from to modes with wave numbers less than , we get only the high modes with wave numbers when we compute w by a single iteration of . (We use a single iteration for simplicity, but one can reach higher wave numbers by multiple iterations at the price of more complicated parametric forms.) Therefore, in a single iteration approximation, the truncated error will involve the first Fourier modes:
Dropping the interaction between the high-modes to avoid nonlinear optimization in parameter estimation, we have
We approximate the high modes by a functional of low modes as in (14b),
where is the high modes of the nonlinear function :
Here only represents the modes up to wavenumber , due to the fact that quadratic nonlinearity only involves interaction between double wave-numbers. One can reach higher wave numbers by iterations of the quadratic interaction.
The truncation error term can now be linearly parametrized as
where we also denote for for simplicity of notation.
We have now reached a parametric numerical reduced model for the Fourier modes. Denote the low-modes in the reduced model that approximates the original low modes . The reduced model is
with the convention that (with the sup-script denoting complex conjugate), and where the notion represents the high modes and is defined by
The reduced model is in the form of a nonlinear auto-regression moving average (NARMA) model:
- The map is the 1-step forward of the deterministic K-mode Galerkin truncation equation using a numerical integration scheme with a time step-size , i.e., . We use the ETDRK4 scheme.
- The term denotes the increment of the k-th Fourier modes of the stochastic force in the time interval , scaled by , and it is separated from so that the reduced model can linearly quantify the response of the low-modes to the stochastic force.
- The function is a function with parameters to be estimated from data. In particular, the coefficients and act as a correction to the integration of the truncated equation.
- The new noise terms are assumed for simplicity to be a white noise independent of the original stochastic force . That is, we assume that is a sequence of independent identically distributed (iid) Gaussian random vectors, with independent real and imaginary parts, distributed as with to be estimated from data. Under such a white noise assumption, the parameters can be estimated simply by least squares (see next section). In general, one can also assume other distributions for , or other structures such as moving average with being a white noise sequence [13,46].
3.3. Data Generation and Parameter Estimation
We estimate the parameters of the NAR model by maximizing the likelihood of the data.
Data for the NAR model. To infer a reduced model in form of Equation (17), we generate relevant data from a numerical scheme that sufficiently resolve the system in space and time, as introduced in Section 2.2. The relevant data are trajectories of the low-modes of the state and the stochastic force, i.e., for and , which are taken as in the reduced model. Here, the time instants are , where can be much larger than the time step-size needed to resolve the system. Furthermore, the data do not include the high modes. In short, the data are generated by a downsampling, in both space and time, of the high-resolution solutions of the system.
The data can be either a long trajectory or many independent short trajectories. We denote the data consisting of M independent trajectories by
where m indexes the trajectories, with being the time interval between two observations, and denotes the number of steps for each trajectory,
Parameter estimation. The parameters in the discrete-time reduced model Equation (17) is estimated by maximum likelihood methods. Our discrete-time reduced model has a few attractive features: (i) the likelihood function can be computed exactly, avoiding possible approximation error that could lead to biases in estimators; (ii) the maximum likelihood estimator (MLE) may be computed by least squares under the assumption that the process is white noise, avoiding time-consuming nonlinear optimizations.
Under the assumption that is white noise, the parameters can be estimated simply by least squares, because the reduced model in Equation (17) depends linearly on the parameters. More precisely, the log-likelihood of the data in (19) can be written as
where denotes the absolute value of a complex number, and . To compute the maximum likelihood estimator (MLE) of the parameter , we note that in (17b) depends linearly on the parameter . Therefore, the estimators of and can be analytically computed by finding a zero of the gradient of the likelihood function. More precisely, denoting
with denoting the parameterized terms in (17b), we compute the MLE as
where the normal matrix and vector are defined by
In practice, may be singular and it can be dealt with by pseudo inverse or regularization. We assume for simplicity that the stochastic force g has independent components, so that the coefficients can be estimated by simple least square regression. One may further improve the NAR model by considering spatial correlation between the components of g or by using moving average models [13,46,47].
3.4. Model Selection
The parametric form in Equation (17b) leaves a family of reduced models with many freedoms underdetermined, such as the time lag p and possible redundant terms. To avoid overfitting and redundancy, we proposed to select the reduced model by the following criterion.
- Cross validation: the reduced model should be stable and can reproduce the distribution of the resolved process, particularly the main dynamical-statistical properties. We will consider the energy spectrum, the marginal invariant densities, and temporal correlations:
- Consistency of the estimators. If the model is perfect and the data are either independent trajectories or a long trajectory from an ergodic measure, the estimators should converge as the data size increases (see e.g., [45,48]). While our parametric model may not be perfect, the estimators should also become less oscillatory as the data size increases, so that the algorithm is robust and can yield similar reduced models from different data sets.
- Simplicity and sparsity. When there are multiple reduced models performing similarly, we prefer the simplest model. We remove the redundant terms and enforce sparsity by LASSO (least absolute shrinkage and selection operator) regression [49]. Particularly, a singular normal matrix (22) indicates the redundancy of the terms and the need to remove strongly correlated terms.
These criteria are by no means exhaustive. Other methods include Bayesian information criterion (BIC, see, e.g., [50]), and the error reduction ratio [51] may be applied, but in our experience, they provide limited help for the selection of reduced models [7,14,46].
In view of statistical learning of the high-dimensional nonlinear flow map in (8), each linear-in-parameter reduced model provides an optimal approximation to the flow map in the hypothesis space spanned by the proposed terms. A possible future direction is to select adaptive-to-data hypothesis spaces in a nonparametric fashion [23] and analyze the distance between the flow map and the hypothesis space [52,53].
4. Numerical Study on Space-Time Reduction
We examine the inference and performance of NAR models for the stochastic Burgers equation in (1) and (2). We will consider two settings of the full model: the stochastic force has a scale of either or , representing that the stochastic force either dominates or subordinates to the dynamics, respectively. We will also consider two settings for reduction: the number of the Fourier modes of the reduced model is either or , representing a reduction of the deterministic responses and a reduction involving stochastic force, respectively.
4.1. Settings
As reviewed in Section 2.2, we integrate the Equation (4) of Fourier modes by ETD-RK4 with a time-stepping that the solution is resolved accurately. We call this discretized system the full model and its configuration is specified in Table 3. We will consider two different scales for the stochastic force, with standard deviations , leading to a dynamics dominated by the stochastic force, and , representing dynamics dominated by the deterministic drift.
We generate data in (19) from the full model as described in Section 3.3. We generate an ensemble of initial conditions by first integrating the system for time units from an initial condition and draw samples uniformly from this long trajectory. Then, we generate either a long trajectory or an ensemble of trajectories starting from randomly picked initial conditions, and we save data with the time-stepping . Numerical tests show that the invariant densities and the correlation functions vary little when the data are generated from different initial conditions.
We then infer NAR models for the first K Fourier modes with a time step . We will consider two values for K (recall that is the number of Fourier modes in the stochastic force)
- . In this case, , i.e., the stochastic force does not act on the unresolved Fourier modes w in (7), so w is a deterministic functional of the history of the resolved Fourier modes. In view of (14b), the reduced model mainly quantifies this deterministic map. We call this case “reduction of the deterministic response” and present the results in Section 4.3.
- . In this case, , and w in (7) depend on the unobserved Fourier modes of the stochastic force. Thus, the reduced model has to quantify the effects of the unresolved Fourier modes of both the solution and the stochastic force. We call this case “reduction involving unresolved stochastic force” and present the results in Section 4.4.
In either case, we explore the maximal time step that NAR models can reach by testing time steps .
We summarize the configurations and notations in Table 3.
4.2. Model Selection and Memory Length
We demonstrate model selection and the effect of memory length for reduced models with time step . We aim to select a universal parametric form of the NAR model for different setting of , where is the number of Fourier modes in the NAR model and is the standard deviation of the full model’s stochastic force. Such a parametric form will be used later for the exploration of maximal time reduction by NAR models in the next sections.
We select the model according to Section 3.4: for each pair , we test a pool of NAR models and select the simplest model that best reproduces the statistics and has consistent estimators. The statistics are computed along a long trajectory of time units. We say that an NAR is numerically unstable if it blows up (e.g., exceeding ) before reaching time units.
We estimate the coefficients in (17b) for a few time lag ps. Numerical tests show that the normal matrix in regression is almost singular, either when the stochastic force presents or when the lag for or is bigger than two. Thus, for simplicity, we remove them by setting:
and estimate only for .
That is, in (17b), the terms and have a time lag 1, the stochastic force term is removed, and only the high-order (the fourth) term has a time lag p. The memory length is .
Memory length. To select a memory length, we test NAR models with time lags and consider their reproduction of the energy spectrum in (23). Figure 1 shows the relative error in energy spectrum of these NAR models. It shows that as p increases: (1) when the scale of the stochastic force is large (), the error oscillates without a clear pattern; (2) when , the error first decreases and then increases. Thus, a longer memory does not necessarily lead to a better reduced model when the stochastic force dominates the dynamics; but when deterministic flow dominates the dynamics, a proper memory can be helpful.
Figure 1.
Relative error in energy spectrum reproduced by the NAR models with different memory lengths p, in four settings of . As the time lag p increases, the relative error tends to first decrease and then increase, particularly in (b,d) with .
In all four settings, the simplest NAR models with can consistently reproduce the energy spectrum with relative errors within 5%. Remarkably, the accuracy remains when the true energy spectrum is at the scale of for the modes with in Figure 2a,b and in Figure 2d. Figure 2 also shows that the truncated K-mode Galerkin systems cannot reproduce the true energy spectrum in any of these settings, with upward tails, due to the lack of fast energy dissipation from the high modes. Thus, the NAR model has introduced additional energy dissipation through .
Figure 2.
Energy spectrum of NAR models with and the K-mode Galerkin systems in four settings of . The time step is for the NAR models and is for the Galerkin models. The NAR models accurately reproduce the true energy spectrum in all settings.
Consistency of estimators. The estimator of the NAR models tends to converge as data size increases. Figure 3 shows that the estimated coefficients of NAR with from data consisting of M trajectories, each with length T, where and . As increases, all the estimators tend to converge (note that the coefficients are at the scale of or ). In particular, they converge faster when than when : the estimators in (a)–(c) oscillate little after , indicating that different trajectories lead to similar estimators, while the estimators (take for example) in (b)–(d) oscillate until . This agrees with the fact that a larger stochastic force makes the system mix faster, so each trajectory provides more effective samples driving the estimator to converge faster.
Figure 3.
Estimated coefficients in NAR models with and in four settings of . The estimators tend to converge fast as the trajectory length T and number M increase: note that the coefficients are at the scale of or .
Numerical tests also show that an NAR model can be numerically unstable, while its coefficient estimator was consistent (i.e., tending to converge as above). Thus, consistency is not sufficient for the selection of an NAR model.
In our tests, sparse regression algorithms such as LASSO (see e.g., [49]) or sequential thresholding (see e.g., [54,55]) have difficulty in proper thresholding, because the coefficient of the high order terms are small and can vary in scales in different settings, but these high order terms are important for the NAR model.
Since the NAR models with perform well in all the four settings, and since they are the simplest, we use them in the next sections to explore the maximal time reduction.
4.3. Reduction of the Deterministic Response
We explore in this and the next section the maximal time step that the NAR models can reach. We consider only the simplest models with time lag .
We consider first the models with Fourier modes. Since the stochastic force acts directly only on the first Fourier modes, the unresolved variable w in (12) is a deterministic functional of the path of the K modes, so is the truncation error in (14b). Thus, the NAR model mainly reduces the deterministic response of the resolved variables to the unresolved variables. In particular, the term in the NAR model (17a) optimally approximates this deterministic response on the function space linearly spanned by the terms in (17b).
We consider time steps with . For each , we first estimate the coefficients of the NAR model from the data with the same time step. We then validate the estimated NAR model by its statistics.
Numerical tests show that the NAR models with are numerically unstable for the setting , and the number is for the setting . Figure 4a,b shows the relative error in energy spectrum reproduced by NAR models with those stable time steps. The relative errors increase as the increases. Note that the relative errors for modes change little, but those with increase significantly. In particular, note that in (b), the relative errors at are about 8% for , but the relative errors at increase sharply to form a peak at when . We will discuss connections with CFL numbers in Section 4.5.
Figure 4.
Relative error in energy spectrum reproduced by the NAR models with time steps for in four settings of . All NAR models are with time lag . The missing s in (a,b) lead to numerically unstable NAR models. Thus, the maximal s that an NAR model can reach are and for (a,b) respectively, and for (c,d).
These NAR models reproduce the PDFs and ACFs relatively accurately. Figure 5 shows the marginal PDFs of the real parts of the modes. The top row shows the marginal PDFs for the NAR models with , in comparison with those of the full model and the Galerkin truncated system (solved with time step ). For the modes with wave numbers , the NAR model captures the shape and spread of the PDFs almost perfectly, improving those of the Galerkin truncated system. For the modes with , the NAR model still performs well, significantly improving those of the truncated Galerkin system. The discrepancy between the PDFs becomes larger as the wavenumber increases, because these modes are affected more by the unresolved modes. The bottom row shows that the Kolmogorov–Smirnov statistics (the maximal difference between the cumulative distribution functions) increase slightly as the increases. Figure 6 shows the ACFs. The top row shows that both the NAR model (with ) and the Galerkin system can reproduce the ACFs accurately. The bottom row shows that the relative error of the ACF, in -norm, increases as increases (particularly in the case ). Recall that the truncated Galerkin system produces PDFs with support much wider than the truth for the high modes (see Figure 5), and that becomes less accurate as increases. Thus, the terms u and in the NAR model (17) preserve the temporal correlation, and the high order term helps to dissipate energy and preserve the invariant measure.
Figure 5.
Marginal PDFs and K-S statistics (Kolmogorov--Smirnov statistics, which is the maximum difference between the cumulative distribution functions). In each of (a–d), the top panels are plots of the empirical marginal PDFs of the real parts of the Fourier modes, from data (True), the K-mode Galerkin system (Galerklin) and the NAR models with and with Gap = 5; the bottom panels are the K-S statistics of NAR models with different time steps , up to the largest Gap such that the NAR model is numerically stable.
Figure 6.
ACF (auto correlation functions. In each of (a–d), the top panels are the ACFs of the real parts of the Fourier modes when Gap = 5; the bottom panels are the relative errors (in -norm) of the NAR models with different time steps , up to the largest Gap such that the NAR model is numerically stable.
In summary, when , the maximal time steps are and when and , respectively, for NAR models with . All these NAR models can accurately reproduce the energy spectrum, the invariant measure and the temporal autocorrelation.
4.4. Reduction Involving Unresolved Stochastic Force
We consider next NAR models with . In this case, the unresolved variable w in (12) is a functional of both the path of the K modes and the unresolved stochastic force. Thus, in view of (14b), (16) and (17), the NAR model quantifies the response of the K-modes to both the unresolved Fourier modes and the unresolved stochastic force.
Note first that is too small for the K-mode Galerkin system to meaningfully reproduce any of the statistical or dynamical properties; see Figure 2c,d for the energy spectrum, Figure 5c,d for the marginal PDFs and Figure 6c,d for the ACFs. On the contrary, the NAR models with , whose term comes from the K-mode Galerkin, reproduce these statistics accurately. Remarkably, the NAR models remain accurate even when the time step is as large as , with the K-S statistics being less than in Figure 5c,d, and with the relative error in ACFs less than 6% in Figure 6c,d.
To explore the maximal time step that NAR models can reach, we consider time steps with . Numerical tests show that the NAR models are numerically stable for all of them in both settings of and . Figure 4c,d shows the relative error in energy spectrum reproduced by NAR models with these time steps. The relative error first decreases and then increases as increases, reaching the lowest when and for the settings and , respectively. In particular, all of these relative errors remain less than 9%, except when in the setting .
In summary, when , NAR models can tolerate large time steps. The maximal time steps are at least and when and , respectively, for the NAR models to reproduce the energy spectrum with relative error less than 9%.
4.5. Discussion on Space-Time Reduction
Since model reduction aims for space-time reduction, it is natural to consider the maximal reduction in space-time; in other words, the minimum “spatial” dimension K and maximum time step . We have the following observations from the previous sections:
- Space dimension reduction, memory length of the reduced model and the stochastic force are closely related. As suggested by the discrete Mori–Zwanzig formalism for random dynamics (see e.g., [7]), space dimension reduction would lead to non-Markovian closure models. Figure 1 suggests that a proper medium length of the memory leads to best NAR model. It also suggests that the scale of the white in time stochastic force can affect the memory length, and a larger scale of stochastic force leads to shorter memory. We leave it as future work to investigate the relations between memory length (colored or white in time), stochastic force, and energy dissipation.
- Maximal time step depends on the space dimension and the scale of the stochastic force, mainly limited by the stability of the nonlinear reduced model. Figure 4 shows that the maximum time step when is at least with , much larger than those of the case of . It also shows that as the scale of stochastic force increases from to , the NAR models’ maximal time step decreases (because the NAR models either become unstable or have larger errors in energy spectrum). It is noteworthy to mention that these maximal time steps of NAR models are smaller than those that the K-mode Galerkin system can tolerate. Figure 7 shows that the K-mode Galerkin system can be stable for time steps much larger than those of the NAR models: the maximal time step for the K-mode Galerkin system is when the mean CFL number (which increases linearly) reaches 1, but the maximal time step for the NAR models to be stable is smaller. For example, in the setting , the maximal time gap for the Galerkin system is (the end of the red diamond line), but the maximal time gap for the NAR model is about . The increased numerical instability of the NAR model is likely due to the nonlinear terms , which are important for the NAR model to preserve energy dissipation and the energy spectrum (see Figure 2 and the coefficients in Figure 3).
Figure 7.
The mean CFL numbers of the full models and the K-mode Galerkin systems. The mean CFL number is computed along a trajectory with steps. The time step is for the full model, and is for the K-mode Galerkin system. When , the K-mode Galerkin system blows up after , so its CFL number is missing afterwards. The stars are the maximal , such that the NAR model is stable. The red and blue squares are where the full model’s mean CFL numbers agree with those of the K-mode Galerkin systems. The stars (✩) are the largest time Gap that our NAR model is numerically stable. The relative errors in energy spectrum in Figure 4c,d are the smallest when the ’s are the closest to these squares.
Beyond maximal reduction, an intriguing question arises: when does the reduced model perform the best (i.e., the least relative error in energy spectrum)? We call it optimality of space-time reduction. It is more interesting and relevant to model reduction than maximal reduction in space-time, because one may achieve a large time step or a small space dimension at the price of a large error in the NAR model, as we have seen in Figure 4. We note that the relative errors in energy spectrum in Figure 4c,d are the smallest when the s are the closest to the squares in Figure 7, where the full model’s mean CFL numbers agree with those of the K-mode Galerkin system. We conjecture that optimal space-time reduction can be achieved by an NAR model when the K-mode Galerkin system preserves the CFL number of the full model.
5. Conclusions
We consider a data-driven model reduction for stochastic Burgers equations, casting it as a statistical learning problem on approximating the flow map of low-wavenumber Fourier modes. We derive a class of efficient parametric reduced closure models, based on representing the high modes as functionals of the resolved variables’ trajectory. The reduced models are nonlinear autoregression (NAR) time series models, with coefficients estimated from data by least squares. In various settings, the NAR models can accurately reproduce the statistics such as the energy spectrum, the invariant densities, and the autocorrelations.
Using the simplest NAR model, we investigate the maximal space-time reduction in four settings: reduction of deterministic responses () vs. reduction involving unresolved stochastic force (), and small vs. large scales of stochastic force (with and ), where is the number of Fourier modes of the white-in-time stochastic force, and is the scale of the force. Reduction in space dimension is unlimited, and NAR models with Fourier modes can reproduce the energy spectrum with relative errors less than 5%. The time reduction is another story. Maximal time reduction depends on both the dimension reduction and the stochastic force’s scale, as they affect the stability of the NAR model. The NAR model’s stability limits the maximal time step to be smaller than those of the K-mode Galerkin system. Numerical tests indicate that the NAR models achieve the minimal relative error at the time step where the K-mode Galerkin system’s mean CFL number agrees with the full model’s. This is a potential criterion for optimal space-time reduction.
The simplicity of our NAR model structure opens various fronts for a further understanding of data-driven model reduction. Future directions include: (1) studying the connection between optimal space-time reduction, the CFL number, and quantification of the accuracy of reduced models; (2) investigating the relation between memory length, dimension reduction, the stochastic force, and the energy dissipation of the system; (3) developing post-processing techniques to efficiently recover information of the high Fourier modes, so as to predict the shocks using the reduced models.
Funding
This research was funded by NSF-1913243 and NSF-1821211.
Acknowledgments
The author would like to thank the anonymous reviewers for valuable feedback that helped significantly improve the manuscript. He is grateful for Alexandre Chorin for introducing this problem. This study is part of our joint project on renormalization group methods. The author would like to thank Kevin Lin, Panos Stinis, John Harlim, Xiantao Li, Mauro Maggioni and Felix Ye for helpful discussions.
Conflicts of Interest
The author declares no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
ETDRK4 | exponential time differencing fourth order Rouge–Kutta method |
CFL number | Courant–Friedrichs–Lewy number |
NAR | nonlinear autoregression |
probability density function | |
ACF | autocorrelation function |
References
- Stinis, P. Mori-Zwanzig Reduced Models for Uncertainty Quantification II: Initial Condition Uncertainty. arXiv 2012, arXiv:1212.6360. [Google Scholar]
- Li, Z.; Bian, X.; Li, X.; Karniadakis, G.E. Incorporation of Memory Effects in Coarse-Grained Modeling via the Mori-Zwanzig Formalism. J. Chem. Phys. 2015, 143, 243128. [Google Scholar] [CrossRef] [PubMed]
- Lu, F.; Tu, X.; Chorin, A.J. Accounting for Model Error from Unresolved Scales in Ensemble Kalman Filters by Stochastic Parameterization. Mon. Weather. Rev. 2017, 145, 3709–3723. [Google Scholar] [CrossRef]
- Lu, F.; Weitzel, N.; Monahan, A. Joint state-parameter estimation of a nonlinear stochastic energy balance model from sparse noisy data. Nonlinear Process. Geophys. 2019, 26, 227–250. [Google Scholar] [CrossRef]
- Zwanzig, R. Nonequilibrium Statistical Mechanics; Oxford University Press: New York, NY, USA, 2001. [Google Scholar]
- Chorin, A.J.; Hald, O.H. Stochastic Tools in Mathematics and Science, 3rd ed.; Springer: New York, NY, USA, 2013. [Google Scholar]
- Lin, K.K.; Lu, F. Data-driven model reduction, Wiener projections, and the Koopman-Mori-Zwanzig formalism. J. Comput. Phys. 2020, 424, 109864. [Google Scholar] [CrossRef]
- Kondrashov, D.; Chekroun, M.D.; Ghil, M. Data-Driven Non-Markovian Closure Models. Physica D 2015, 297, 33–55. [Google Scholar] [CrossRef]
- Harlim, J.; Li, X. Parametric Reduced Models for the Nonlinear Schrödinger Equation. Phys. Rev. E 2015, 91, 053306. [Google Scholar] [CrossRef]
- Lei, H.; Baker, N.A.; Li, X. Data-Driven Parameterization of the Generalized Langevin Equation. Proc. Natl. Acad. Sci. USA 2016, 113, 14183–14188. [Google Scholar] [CrossRef]
- Xie, X.; Mohebujjaman, M.; Rebholz, L.G.; Iliescu, T. Data-Driven Filtered Reduced Order Modeling of Fluid Flows. SIAM J. Sci. Comput. 2018, 40, B834–B857. [Google Scholar] [CrossRef]
- Chekroun, M.D.; Kondrashov, D. Data-Adaptive Harmonic Spectra and Multilayer Stuart-Landau Models. Chaos Interdiscip. J. Nonlinear Sci. 2017, 27, 093110. [Google Scholar] [CrossRef]
- Chorin, A.J.; Lu, F. Discrete approach to stochastic parametrization and dimension reduction in nonlinear dynamics. Proc. Natl. Acad. Sci. USA 2015, 112, 9804–9809. [Google Scholar] [CrossRef]
- Lu, F.; Lin, K.K.; Chorin, A.J. Data-based stochastic model reduction for the Kuramoto–Sivashinsky equation. Physica D 2017, 340, 46–57. [Google Scholar] [CrossRef]
- Pathak, J.; Hunt, B.; Girvan, M.; Lu, Z.; Ott, E. Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach. Phys. Rev. Lett. 2018, 120, 024102. [Google Scholar] [CrossRef] [PubMed]
- Ma, C.; Wang, J.; E, W. Model Reduction with Memory and the Machine Learning of Dynamical Systems. Commun. Comput. Phys. 2018, 25, 947–962. [Google Scholar] [CrossRef]
- Harlim, J.; Jiang, S.W.; Liang, S.; Yang, H. Machine learning for prediction with missing dynamics. J. Comput. Phys. 2020. [Google Scholar] [CrossRef]
- Parish, E.J.; Duraisamy, K. A Paradigm for Data-Driven Predictive Modeling Using Field Inversion and Machine Learning. J. Comput. Phys. 2016, 305, 758–774, Eng0. [Google Scholar] [CrossRef]
- Duan, J.; Wei, W. Effective Dynamics of Stochastic Partial Differential Equations; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
- Stinis, P. Renormalized Mori-Zwanzig-Reduced Models for Systems without Scale Separation. Proc. R. Soc. A 2015, 471, 20140446. [Google Scholar] [CrossRef] [PubMed]
- Hudson, T.; Li, X.H. Coarse-Graining of Overdamped Langevin Dynamics via the Mori-Zwanzig Formalism. Multiscale Model. Simul. 2020, 18, 1113–1135. [Google Scholar] [CrossRef]
- Choi, Y.; Carlberg, K. Space–Time Least-Squares Petrov-Galerkin Projection for Nonlinear Model Reduction. SIAM J. Sci. Comput. 2019, 41, A26–A58. [Google Scholar] [CrossRef]
- Jiang, S.W.; Harlim, J. Modeling of missing dynamical systems: Deriving parametric models using a nonparametric framework. Res. Math. Sci. 2020, 7, 1–25. [Google Scholar] [CrossRef]
- Marion, M.; Temam, R. Nonlinear Galerkin methods. SIAM J. Numer. Anal. 1989, 26, 1139–1157. [Google Scholar] [CrossRef]
- Jolly, M.S.; Kevrekidis, I.G.; Titi, E.S. Approximate inertial manifolds for the Kuramoto-Sivashinsky equation: Analysis and computations. Physica D 1990, 44, 38–60. [Google Scholar] [CrossRef]
- Rosa, R. Approximate inertial manifolds of exponential order. Discrete Contin. Dynam. Syst. 1995, 3, 421–448. [Google Scholar] [CrossRef]
- Novo, J.; Titi, E.S.; Wynne, S. Efficient methods using high accuracy approximate inertial manifolds. Numer. Math. 2001, 87, 523–554. [Google Scholar] [CrossRef]
- Zelik, S. Inertial manifolds and finite-dimensional reduction for dissipative PDEs. Proc. R. Soc. Edinb. A 2014, 144, 1245–1327. [Google Scholar] [CrossRef]
- Zhang, H.; Harlim, J.; Li, X. Computing linear response statistics using orthogonal polynomial based estimators: An RKHS formulation. arXiv 2019, arXiv:1912.11110. [Google Scholar]
- Pan, S.; Duraisamy, K. Data-driven discovery of closure models. SIAM J. Appl. Dyn. Syst. 2018, 17, 2381–2413. [Google Scholar] [CrossRef]
- E, W.; Khanin, K.; Mazel, A.; Sinai, Y.G. Invariant Measures for Burgers Equation with Stochastic Forcing. Ann. Math. 2000, 151, 877–960. [Google Scholar] [CrossRef]
- Chorin, A.J. Averaging and Renormalization for the Korteveg-deVries-Burgers Equation. Proc. Natl. Acad. Sci. USA 2003, 100, 9674–9679. [Google Scholar] [CrossRef]
- Chorin, A.J.; Hald, O.H. Viscosity-Dependent Inertial Spectra of the Burgers and Korteweg-deVries-Burgers Equations. Proc. Natl. Acad. Sci. USA 2005, 102, 3921–3923. [Google Scholar] [CrossRef]
- Bec, J.; Khanin, K. Burgers Turbulence. Phys. Rep. 2007, 447, 1–66. [Google Scholar] [CrossRef]
- Beck, M.; Wayne, C.E. Using Global Invariant Manifolds to Understand Metastability in the Burgers Equation With Small Viscosity. SIAM J. Appl. Dyn. Syst. 2009, 8, 1043–1065. [Google Scholar] [CrossRef][Green Version]
- Wang, Z.; Akhtar, I.; Borggaard, J.; Iliescu, T. Two-Level Discretizations of Nonlinear Closure Models for Proper Orthogonal Decomposition. J. Comput. Phys. 2011, 230, 126–146. [Google Scholar] [CrossRef]
- Dolaptchiev, S.; Achatz, U.; Timofeyev, I. Stochastic closure for local averages in the finite-difference discretization of the forced Burgers equation. Theor. Comput. Fluid Dyn. 2013, 27, 297–317. [Google Scholar] [CrossRef]
- Benner, P.; Gugercin, S.; Willcox, K. A Survey of Projection-Based Model Reduction Methods for Parametric Dynamical Systems. SIAM Rev. 2015, 57, 483–531. [Google Scholar] [CrossRef]
- Quarteroni, A.; Manzoni, A.; Negri, F. Reduced Basis Methods for Partial Differential Equations: An Introduction; Springer: Berlin/Heidelberg, Germany, 2015; Volume 92. [Google Scholar]
- Sinai, Y.G. Two results concerning asymptotic behavior of solutions of the Burgers equation with force. J. Stat. Phys. 1991, 64, 1–12. [Google Scholar] [CrossRef]
- Da Prato, G. An Introduction to Infinite-Dimensional Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Cox, S.M.; Matthews, P.C. Exponential time differencing for stiff systems. J. Comput. Phys. 2002, 176, 430–455. [Google Scholar] [CrossRef]
- Kassam, A.K.; Trefethen, L.N. Fourth-order time stepping for stiff PDEs. SIAM J. Sci. Comput. 2005, 26, 1214–1233. [Google Scholar] [CrossRef]
- Gottlieb, D.; Orszag, S. Numerical Analysis of Spectral Methods: Theory and Applications; SIAM: Philadelphia, PA, USA, 1977. [Google Scholar]
- Fan, J.; Yao, Q. Nonlinear Time Series: Nonparametric and Parametric Methods; Springer: New York, NY, USA, 2003. [Google Scholar]
- Lu, F.; Lin, K.K.; Chorin, A.J. Comparison of continuous and discrete-time data-based modeling for hypoelliptic systems. Commun. Appl. Math. Comput. Sci. 2016, 11, 187–216. [Google Scholar] [CrossRef]
- Verheul, N.; Crommelin, D. Stochastic parameterization with VARX processes. arXiv 2020, arXiv:stat.ME/2010.03293. [Google Scholar]
- Kutoyants, Y.A. Statistical Inference for Ergodic Diffusion Processes; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
- Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Brockwell, P.; Davis, R. Introduction to Time Series and Forecasting; Springer: New York, NY, USA, 2002. [Google Scholar]
- Billings, S.A. Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatiotemporal Domains; John Wiley and Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Györfi, L.; Kohler, M.; Krzyzak, A.; Walk, H. A Distribution-Free Theory of Nonparametric Regression; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Lu, F.; Zhong, M.; Tang, S.; Maggioni, M. Nonparametric inference of interaction laws in systems of agents from trajectory data. Proc. Natl. Acad. Sci. USA 2019, 116, 14424–14433. [Google Scholar] [CrossRef] [PubMed]
- She, Y. Thresholding-Based Iterative Selection Procedures for Model Selection and Shrinkage. Electron. J. Statist. 2009, 3, 384–415. [Google Scholar] [CrossRef]
- Quade, M.; Abel, M.; Kutz, N.J.; Brunton, S.L. Sparse Identification of Nonlinear Dynamics for Rapid Model Recovery. Chaos 2018, 28, 063116. [Google Scholar] [CrossRef] [PubMed]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).