Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks

Bhatnagar, Saakaar

doi:10.3390/mca29010009

Open AccessArticle

Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks

by

Saakaar Bhatnagar

^1,2

¹

Department of Aeronautics and Astronautics, Stanford University, Stanford, CA 94305, USA

²

Altair Engineering Inc., Sunnyvale, CA 94086, USA

Math. Comput. Appl. 2024, 29(1), 9; https://doi.org/10.3390/mca29010009

Submission received: 25 November 2023 / Revised: 12 January 2024 / Accepted: 22 January 2024 / Published: 24 January 2024

(This article belongs to the Topic Mathematical Modeling)

Download

Browse Figures

Versions Notes

Abstract

Continuous Time Echo State Networks (CTESNs) are a promising yet under-explored surrogate modeling technique for dynamical systems, particularly those governed by stiff Ordinary Differential Equations (ODEs). A key determinant of the generalization accuracy of a CTESN surrogate is the method of projecting the reservoir state to the output. This paper shows that of the two common projection methods (linear and nonlinear), the surrogates developed via the nonlinear projection consistently outperform those developed via the linear method. CTESN surrogates are developed for several challenging benchmark cases governed by stiff ODEs, and for each case, the performance of the linear and nonlinear projections is compared. The results of this paper demonstrate the applicability of CTESNs to a variety of problems while serving as a reference for important algorithmic and hyper-parameter choices for CTESNs.

Keywords:

echo state networks; surrogate modeling; data-driven modeling; ordinary differential equations

1. Introduction

Modeling dynamic systems using scientific machine learning (SciML) techniques is a rapidly growing field with advanced ML architectures being applied to model complex problems across a diverse range of applications. Some examples are rapid design optimization [1], real-time health monitoring [2], turbulent flow modeling [3], and materials discovery [4]. Many of these applications utilize a “surrogate model” that makes real-time predictions of the system behavior in place of full-order models that would be too slow or expensive for the application.

Recently, Echo State Networks (ESNs) have seen an increase in popularity for modeling highly nonlinear and chaotic phenomena in domains such as optimal control planning [5], chaotic time series prediction [6,7], signals analysis [8] and even turbulent fluid flow [9]. These applications leverage the ability of ESNs to capture highly nonlinear transient behavior accurately, as well as the extremely low cost of training them, with the empirical success of ESNs on a wide range of approximation tasks discussed and explained in several works [10,11]. One of the biggest limitations, however, for using the standard ESN implementation in the surrogate modeling of nonlinear dynamical systems is that the available training data may not be uniformly sampled in time. A particular example of this is the numerical solution of stiff systems of Ordinary Differential Equations (ODEs) from ODE solvers. An ODE system is said to be stiff if, for any initial conditions and in certain intervals, the solving numerical method is forced to use a timestep which is very small compared to the smoothness of the exact solution [12]. Numerical ODE solvers overcome the instability due to stiffness by having a variable timestep size during the solve, leading to uneven sampling of the solution in time.

There have been several attempts to apply reduced-order modeling for stiff ODEs. Ji et al. [13] used Physics Informed Neural Networks [14], Kim et al. [15] used Neural ODEs [16], and Goswami et al. [17] used Deep Operator Networks [18] to solve several stiff systems such as the ROBER [19] and POLLU [20] problems. However, the methodologies applied require assumptions and scalings that may not generalize, or require many training data and computational resources to train deep neural networks. Work by Anantharaman et al. [21] also showed the failure of popular architectures such as Long Short-Term Memory (LSTM) and standard ESNs in modeling stiff systems.

To use the attractive properties of ESNs (i.e., capacity to model highly nonlinear signals and ease of training) for modeling stiff systems and to address this issue, a variant of ESNs called Continuous Time Echo State Networks or CTESNs has been proposed [21]. CTESNs have been successfully employed in various applications from accelerating predictions of power system dynamics [22] to accelerating solutions of pharmacology models [23].

In the recent literature, two ways of using CTESNs for surrogate modeling have emerged, the Linear Projection CTESN (LPCTESN) [21] and the Nonlinear Projection CTESN (NLPCTESN) [22,23], and these have been applied to a range of problems. However, there is currently a lack of work critically examining the accuracy of both methods and how they compare to one another. Further, both projection methods use a radial basis function (RBF) for interpolation, which also comes with several algorithmic choices that need to be explored. This study aims to fill this gap by investigating the effects of these algorithmic choices on surrogates created to solve several stiff ODE systems such as Robertson’s equations, the Sliding Basepoint model of automobile collision, and the POLLU air pollution model. The findings of the study show that for the same hyper-parameter settings of the CTESN, the NLPCTESN outperforms the LPCTESN on all benchmarks shown. Further, it is shown that for the interpolating RBFs used within CTESNs, k-Nearest Neighbor (k-NN) polynomial-augmented RBFs outperform standard RBFs in predictive accuracy.

This paper is divided as follows: Section 2 introduces the concept of ESN and CTESN, along with the projection methods LPCTESN and NLPCTESN described above discussed in detail. Section 3 demonstrates the application of the methods to several challenging stiff ODE problems, with a qualitative and quantitative discussion of the results. Section 4 summarizes the work with possible future directions to take.

2. Methods

2.1. Standard Echo State Networks

An Echo State Network [24], depicted in Figure 1, is a form of reservoir computing that is very similar to the more popular Recurrent Neural Network (RNN) in its architecture. It makes predictions by following a recurrence relation (Equation (1)) for updating a latent space vector and using that vector to map to a given output. Unlike RNNs, the parameters of the RNN in the “reservoir” are fixed and are not updated during training, and it is only the mapping from latent space to the output space that is learned. Depending on the implementation, this makes the training of ESNs very fast (sometimes as fast as a simple least squares fit) and computationally cheap.

The governing equation for an Echo State Network reads:

r_{n + 1} = σ (W_{i n} x_{n} + W r_{n}),

(1)

where

r_{n} \in R^{N_{r}}

is the latent state at timestep n,

σ

is an activation function (most commonly tanh),

W_{i n} \in R^{N_{r} \times N_{x}}

and

W \in R^{N_{r} \times N_{r}}

are the reservoir matrices which are fixed and randomly initialized, and

x_{n}

is the system state at time n. The matrix W is a sparse matrix and usually has 1% nonzero entries.

The output projection reads:

x_{n + 1} = Φ (r_{n + 1}),

(2)

where

Φ

is decided by the projection method. The most popular projection method is the linear projection, resulting in

x_{n + 1} = W_{o u t} r_{n + 1},

(3)

where

W_{o u t} \in R^{N_{x} \times N_{r}}

is the linear projection matrix that needs to be fitted to the training data. Like most machine learning algorithms, ESNs have a set of hyper-parameters that need to be tuned, and there are several works [25,26] that can be used as guides to select them. A key hyper-parameter in ESNs is the spectral radius of

W

. In this study, the spectral radius is fixed to a value of 0.01 for all models created. This value was obtained via a hyper-parameter search, details of which are given in Appendix C. Although a bit smaller than the conventional values used in standard ESNs, it was found in this study on CTESNs that using the slightly smaller value maximized predictive accuracy while also ensuring the stability of the reservoir (r) solution.

To fit the trainable matrix, one only has to solve the ordinary linear least squares problem:

\begin{matrix} \begin{matrix} If X = [x_{1}; x_{2}; \dots; x_{N}] & and R = [r_{1}; r_{2}; \dots; r_{N}] then \\ W_{o u t} & = {(R R^{T})}^{- 1} R X . \end{matrix} \end{matrix}

(4)

2.2. Continuous Time Echo State Networks (CTESNs)

Continuous Time Echo State Networks are a variant of ESNs that model time as a continuous rather than discrete quantity. The model equation for a CTESN is given by:

\dot{r} = σ (W_{i n} x + W r),

(5)

with all variables having the same definition as in the previous section. The projection equation reads:

x = Φ (r) .

(6)

As per the literature, there are two ways of modeling the relation between the latent space and the outputs [27]; one is the linear method (called Linear Projection CTESN or LPCTESN), described by:

x = W_{o u t} r .

(7)

The second method is the nonlinear method (called NonLinear Projection CTESN or NLPCTESN), where the projection

Φ

is a nonlinear mapping. Many possible functions can be used, but the literature on NLPCTESNs [22,27] currently uses standard radial basis functions (RBFs) to write the projection:

x = RBF (r) .

(8)

Surrogate Modeling via CTESNs

To create and use a surrogate model via the CTESN method, a few steps are followed. First, a Design of Experiments (DoE) space is created and N sample combinations of query parameters are drawn; call this set P = {

p_{1}, p_{2} \dots p_{N}

}. Each

p_{i}

represents a set of conditions at which the ODE is solved and the surrogate is expected to capture the change in the solution due to changes in the value of

p_{i}

. Examples include the initial conditions (see Section 3.1.2 and Section 3.2), rate constants of the ODE (see Section 3.1.1 and Section 3.3), etc. The ODE is solved numerically at each

p_{i} \in

P, to return the solution set Y = {

y_{1}, y_{2} \dots y_{n}

},

y_{i} \in R^{N_{x} \times N_{t s}^{i}}

. A single parameter combination

p^{*} \in

P is drawn at random, and the reservoir ODE (given by Equation (5)) is solved using a numerical ODE solver. This returns an r(t)

\in R^{N_{r} \times N_{t s}}

where

N_{t s}

is the number of timesteps in the solution at

p^{*}

. Then, depending on whether the method follows a linear or nonlinear projection method, either Algorithm 1 or Algorithm 2 is followed to fit and query the surrogate.

Algorithm 1 Linear Projection CTESN Surrogate Fitting

for $y_{i}$ in Y do
Fit $W_{o u t}^{i}$ from $y_{i} = W_{o u t}^{i} \cdot$ r using Equation (4)
end for
Fit RBF mapping $W_{o u t}^{i} = RBF (p_{i}), \forall$ ( $W_{o u t}^{i}, p_{i}$ ) pairs
To query a new parameter $p_{t e s t}$ :
Step 1: $W_{o u t}^{t e s t} = RBF (p_{t e s t})$
Step 2: $y_{t e s t}$ (t) = $W_{o u t}^{t e s t} \cdot$ r

Algorithm 2 NonLinear Projection CTESN Surrogate Fitting

for $y_{i}$ in Y do
Fit ${RBF}_{1}^{i}$ as per $y_{i} = {RBF}_{1}^{i} (r (t), W_{R B F_{1}}^{i})$
end for
Fit ${RBF}_{2}$ as per $W_{R B F_{1}}^{i} = {RBF}_{2} (p_{i})$ , ∀ $(W_{R B F_{1}}^{i}, p_{i})$ pairs
To query a new parameter $p_{t e s t}$ :
Step 1: $W_{R B F_{1}}^{t e s t} = {RBF}_{2} (p_{t e s t})$
Step 2: $y_{t e s t}$ (t) = ${RBF}_{1} (W_{R B F_{1}}^{t e s t}, r$ (t))

Algorithms 1 and 2 both use an interpolating RBF in their procedure, and in this article, it is demonstrated that polynomial-augmented k-Nearest Neighbor (k-NN) RBFs deliver superior results in terms of generalization to new test problems for CTESN-based surrogate models, compared to standard RBFs. The reader is encouraged to read Appendix A for a detailed explanation of k-NN polynomial-augmented RBFs.

3. Applications and Results

CTESN surrogates are created for three problems. First, Robertson’s equations are solved, parametrizing the rates of reaction and initial condition separately. Then, the Sliding Basepoint system is modeled, comparing explicitly the difference due to the projection method and parametrizing the initial conditions of the problem. Finally, the POLLU system is solved, again parametrizing the initial conditions of the problem and comparing the results obtained using differing projection methods. Unless mentioned otherwise, the training data were sampled from the DoE space randomly.

In all results in this section, the MAE is computed as

MAE = \frac{1}{N_{t e s t}} \sum_{j = 1}^{N_{t e s t}} \frac{\sum_{i = 1}^{N_{t s}^{j}} | y_{j, p r e d}^{i} - y_{j, t r u e}^{i} |}{N_{t s}^{j}},

(9)

where

N_{t e s t}

is the number of test cases;

y_{j, p r e d}^{i}

and

y_{j, t r u e}^{i}

are the prediction and true solution, respectively, for the ith timestep of the jth test case.

3.1. Robertson’s Equations

The first demonstrated application is the surrogate modeling of the Robertson Equations [19]. They are written as:

\begin{matrix} \begin{matrix} \dot{y_{1}} & = - r_{1} \cdot y_{1} + r_{3} \cdot y_{2} \cdot y_{3}, \\ \dot{y_{2}} & = r_{1} \cdot y_{1} - r_{3} \cdot y_{2} \cdot y_{3} - r_{2} \cdot y_{2}^{2}, \\ \dot{y_{3}} & = r_{2} \cdot y_{2}^{2}, \end{matrix} \end{matrix}

(10)

where

y_{1}, y_{2}

, and

y_{3}

represent the concentration of the reactant species and

r_{1}, r_{2}

, and

r_{3}

represent the fixed rates of reaction. The system is usually subject to initial conditions of

[y_{1}, y_{2}, y_{3}] = [1, 0, 0]

.

They are a system of ODEs that describe a standard rate process and are often used to benchmark numerical ODE solvers due to the stiffness of the system. More specifically, the long-time integration of Robertson’s equations is known to be a challenging problem for numerical ODE solvers, and the system hence serves as a good test for surrogate models. In this work, models are created by parametrizing the system in two ways; first, the rates of reaction are parametrized. This has been the focus of several other papers written on CTESNs [21,27]. Second, the initial conditions of the system are parametrized.

3.1.1. Parametrizing Rates of Reaction

In this section, the Design of Experiment (DoE) space for the rates is given as:

\begin{matrix} \begin{matrix} r_{1} & = [0.032, 0.048], \\ r_{2} & = [2.4, 3.6] \times E 7, \\ r_{3} & = [0.8, 1.2] \times E 4 . \end{matrix} \end{matrix}

(11)

The focus is limited to presenting the results of the prediction of the variable

y_{2}

. This variable has a sharp transient that occurs at a time scale much smaller than the other two, and hence causes the system to be stiff.

The average MAE for

y_{2}

, averaged across several test cases listed in Table 1, is shown in Table 2 sorted in descending order of generalization MAE. Predictions for

y_{2}

for a particular test parameter are shown in Figure 2. For the same hyper-parameters, it can be observed that the absence of either the augmenting polynomial or the k-NN interpolation (i.e., using all collocation points to predict the solution in the RBF) significantly increases the error of prediction. It can also be seen that the nonlinear projection performs better on average than the linear projection.

In the next section, the initial condition of Robertson’s ODE is parametrized, and a similar error analysis is performed.

3.1.2. Parametrizing Initial Conditions

The initial condition of the system is parametrized. This problem can be challenging for lower initial values of

y_{1}

(0) as it leads to a smaller and sharper transient in

y_{2}

, which becomes more difficult to capture accurately by the surrogate. The DoE space of the initial condition is given as

\begin{matrix} \begin{matrix} y_{1} (0) & = [0.5, 1], \\ y_{2} (0) & = 0, \\ y_{3} (0) & = 1 - y_{1} (0) . \end{matrix} \end{matrix}

(12)

The condition for

y_{3}

is decided on the basis that the sum of all quantities should always equal to 1. Once more, the focus is on comparing the predicted results in

y_{2}

.

Table 3 tabulates the average MAE across all test cases for the problem, listed in Table 4, sorted in descending order. Figure 3 shows the comparison for a test parameter, across five different configurations of the CTESN surrogate model. A similar trend of hyper-parameter performance as seen in Table 2 is noted, in that the absence of the k-NN interpolation increases the error of the prediction. For the same hyper-parameters, the nonlinear projection once again achieves lower generalization MAE than the linear projection.

3.2. Sliding Basepoint Model

Finally, the discussed methods are applied to the surrogate modeling of a realistic crash safety design problem. A system of ODEs proposed by Horváth et al. [28] that approximately models an automobile collision is solved via a created surrogate model. This system, called the Sliding Basepoint model, has parameters that were fitted on realistic crash data and are assumed to accurately model a collision problem. Figure 4 depicts the system at its initial state and at a later time. The system is given as:

\begin{matrix} \begin{matrix} \dot{x_{1}} & = v_{1}, \\ \dot{v_{1}} & = \frac{1}{m_{1}} F_{1} (F_{s}, c_{1}, v_{1}), \\ \dot{x_{2}} & = v_{2}, \\ \dot{v_{2}} & = \frac{1}{m_{2}} F_{2} (- F_{s}, c_{2}, v_{2}), \\ \dot{k} & = D_{k} \cdot P, \\ \dot{c_{1}} & = D_{c} \cdot P, \end{matrix} \end{matrix}

(13)

where

m_{1}, m_{2}, F_{1}, F_{2}, x_{1}, x_{2}, v_{1}, v_{2}

refer to the masses, total forces on, positions, and velocities of the bodies, respectively.

m_{1}

in reality reflects the mass of the chassis of the car;

x_{1}

behaves similarly to the deformation of the bumper.

F_{s}

represents the spring force:

F_{s} = k \cdot (x_{2} - x_{1}),

(14)

between the two masses and P represents the power of dissipation

P = | m_{1} \cdot v_{1}^{2} | .

(15)

Finally, the forces are computed based on best-fit models described in the paper:

\begin{matrix} \begin{matrix} F_{1} (F_{s}, c_{1}, v_{1}) & = F_{s} - c_{1} \cdot | v_{1} | \cdot s i g n (v_{1}), \\ F_{2} (F_{s}, c_{2}, v_{2}) & = \{\begin{matrix} 0, & if F_{s} < c_{2} or | v_{2} | < v_{f} \\ F_{s} - c_{2} \cdot s i g n (v_{2}), & otherwise \end{matrix} \end{matrix} \end{matrix}

(16)

The values of the fixed parameters are given in Table 5, and the parameter values changed while testing the surrogate are listed in Table 6. The reader is referred to the paper [28] for further details. The system is subject to initial conditions

[x_{1}, v_{1}, x_{2}, v_{2}, k, c_{1}] = [0, v_{0}, 0, 0, k_{0}, 0]

.

The mechanics of crash and impact problems are known to have sharp transients and highly oscillatory behaviors associated with them, making their numerical solutions slow and costly to compute for a wide range of parameters. Hence, CTESNs will be a good surrogate modeling tool for the problem.

In this work, the initial conditions of the problem (

v_{0}

and

k_{0}

) are parametrized to simulate different impact velocities and directions (the spring constant of the bumper can be assumed to be different in different directions). The state space for the problem is the vector

[x_{1}, v_{1}, x_{2}, v_{2}, k, c_{1}]

.

The DoE space is the range:

\begin{matrix} \begin{matrix} v_{0} & = [5, 25] (m / s), \\ k_{0} & = [1, 10] \times 10^{6} kg / s^{2}, \end{matrix} \end{matrix}

(17)

chosen to represent a wide range of speeds and stiffness constants. The DoE space was sampled using a space-filling sampling strategy [29] and the surrogate models were trained on 100 data points, and data were generated using a stiff ODE solver using solver parameters given in [28].

Table 7 shows the average MAE for the state variable

v_{2}

for the linear and nonlinear projection methods, all other hyper-parameters being kept the same, for test parameters listed in Table 5.

v_{2}

was chosen to demonstrate the accuracy of the surrogates because, as is visible in Figure 5, it has sharp nonlinear oscillatory transients starting from the moment of collision, which is difficult to capture for surrogate models. It can be seen from Table 7 and Figure 5 that the LPCTESN has a poorer performance on test cases compared to the NLPCTESN.

Of particular interest is the speedup obtained by using the surrogate; this leads to up to a 200× speedup in the prediction of the solution, the ODE solver taking roughly 0.02 s per solve of the ODE system. This is important because the transients in Figure 5 occur at the same time scale

O (10^{- 2} s)

. With a 200× speedup, if the surrogate is deployed on board a vehicle, it can judge the severity of the collision much more quickly by computing impact forces from the result of the system, and closed-loop control and safety measures can be deployed. In this case, it would effectively function as a digital twin for collision monitoring.

3.3. The POLLU Model

The CTESN approach is used to model the POLLU air pollution model developed at the Dutch National Institute of Public Health and Environmental Protection [20]. It consists of 20 species and 25 reactions, modeled by nonlinear ODEs of the form

\frac{d y}{d t} = P (t, y) - L (t, y) y,

(18)

where y is the concentration vector (in ppm) of the reacting species, P is the production term, and L is a diagonal matrix representing the loss term for every species in the system. Table 8 shows the production and loss rates for each species of the system. The values of the rate constants r are given in Table A1. The reader is referred to the paper [20] for a complete description of the reaction system.

The system is a common benchmark for stiff ODE solvers and represents a difficult problem to solve due to the large number of species and ODEs involved. When such systems have to be solved at many grid points, say, in a computational mesh, they represent a very expensive computation and hence this example is an ideal application for testing surrogates of stiff ODEs. In this work, the initial conditions of the system are simultaneously parametrized, according to the following:

\begin{matrix} \begin{matrix} y_{2, 0} & = [0.14, 0.26], \\ y_{4, 0} & = [0.028, 0.052], \\ y_{7, 0} & = [0.07, 0.13] . \end{matrix} \end{matrix}

(19)

Here,

y_{2, 0}

,

y_{4, 0}

, and

y_{7, 0}

refer to the initial concentration of the respective species. The initial conditions for the rest of the species are default as per the paper [20].

The training data were sampled using 100 data points within this DoE space, selected randomly. The reservoir size

N_{r}

was set to 100, and the number of queried neighbors

N_{n e i g h}

by the k-NN RBF was set to 10. Table 9 shows the mean absolute errors computed across several test cases (listed in Table 10) for a few species in the reaction, for the linear and nonlinear projection methods. It can be observed that the nonlinear projection CTESN outperforms the linear projection CTESN when all other hyper-parameters are kept the same. Figure 6 shows the comparison of the prediction graphically; the LPCTESN prediction is much noisier than the NLPCTESN prediction at later times. This was also observed with Robertson’s equations, where the predictions at larger time scales by the LPCTESN tended to become noisier.

4. Discussion and Conclusions

From the numerical experiments conducted, it was observed that polynomial-augmented K-Nearest Neighbor RBFs outperform standard RBFs in terms of accuracy when used as part of the CTESN surrogate modeling algorithm. It was observed that the nonlinear projection (NLPCTESN) method consistently outperformed the linear projection (LPCTESN) method, achieving superior accuracy when the hyper-parameters were the same, on a variety of problems. The NLPCTESN method demonstrated accuracy across several problems with sharp transients, varying time scales, and long horizons of integration, whereas the LPCTESN method had a higher generalization error on all test examples. The surrogate was also shown to achieve a speedup of several orders of magnitude compared to an ODE solver of a realistic collision problem and could be used for several cases like design optimization and online collision severity monitoring.

There are several directions in which this work could be built upon. The CTESN is a black-box data-driven method, and future works need to investigate how well the model learns the physics of the problem, or apply physics-constrained modeling approaches to the outputs of the CTESN. The speedup of the surrogate model becomes very apparent when solving many instances of the ODE; this happens either when the ODE system is very large, or the small ODE system has to be solved repeatedly many times, such as in coupled ODE–PDE systems. Examples include chemically reacting flows similar to the POLLU system, in which a large stiff system of ODEs has to be solved at every compute node, or FEM-based crash solvers which require accurate modeling of sharp transients similar to those demonstrated in this work.

Funding

This study received no external funding.

Data Availability Statement

All generated data are synthetic and generated by randomized trials; details of the generation have been provided in detail in the text.

Conflicts of Interest

The author declares no conflicts of interest. The author declares that the research was conducted in the absence of any commercial or financial relationships from Altair Engineering Inc. that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ESN	Echo State Networks
CTESN	Continuous Time Echo State Networks
LPCTESN	Linear Projection Continuous Time Echo State Network
NLPCTESN	NonLinear Projection Continuous Time Echo State Network
MAE	Mean Absolute Error
RBF	Radial Basis Function
k-NN	k-Nearest Neighbor
DoE	Design of Experiments
ODE	Ordinary Differential Equations
FEM	Finite Element Method

Appendix A. k-Nearest Neighbor Polynomial-Augmented Radial Basis Function Interpolation

Radial basis function (RBF) interpolation is a high-order accurate method that uses radial basis functions to create interpolants of unstructured data, which can be in an arbitrary number of dimensions. The scalar form of the interpolant is

s (x) = \sum_{i = 0}^{N} w_{i} ϕ (| | x - x_{i} {| |}_{2})

(A1)

where

x_{i}

represent the points at which the solution is known,

w_{i}

are the associated coefficients which are fitted, and

ϕ

is a kernel function. One of the most common kernel functions, also used in this work, is

ϕ (ϵ r) = {(ϵ r)}^{2} l o g (ϵ r)

(A2)

where

ϵ

is the shape factor, which is an important hyper-parameter. In this work, it is set to 1. However, it can often greatly affect the generalization capability of the RBF. Cao et al. [30] discussed how high-order polynomial-augmented RBFs outperformed standalone RBFs and reduced the dependency of the RBF on the shape factor

ϵ

, and several works [31,32] have shown that adding high-order polynomials to the RBF greatly enhances the accuracy of the model. In this work, the polynomial-augmented RBF takes the form:

f (x) = K (x, y) A + P (x) B

(A3)

where A and B are fitted coefficient matrices, and K and P need to be constructed for a query point x, given data points y. If

N_{d}

,

N_{O}

,

N_{p}

are determined by the number of data points, the output dimension, and the polynomial order, respectively, we have A

\in R^{N_{d} \times N_{O}}

and B

\in R^{N_{p} \times N_{O}}

. K

\in R^{N_{d}}

can be constructed as

K_{i} (x) = ϕ (| | x - y_{i} | |), i = 1 \dots N_{d},

(A4)

and P

\in R^{N_{p}}

can be constructed as

P (x) = 1 + x + \dots

(A5)

A further improvement to RBF interpolation is adding a k-Nearest Neighbor constraint to the prediction process. This means that during prediction, only k of the nearest collocation points (i.e.,

y_{i}

) will contribute to the prediction at the test point. These N points in practice are usually inferred using a decision tree. Intuitively, this method is useful when the interpolation spaces are large and the hyper-parameters of the RBF may not be optimally tuned, leading to collocation points at large distances from the test point and contributing to the interpolant evaluation, corrupting the solution.

Appendix B. POLLU Reaction Rates

The rates of reaction of the POLLU system are shown in Table A1.

Table A1. Reaction rates for the POLLU problem.

Reaction Rate	Value	Reaction Rate	Value
$r_{1}$	0.350 $\times 10^{0}$	$r_{14}$	0.163 $\times 10^{5}$
$r_{2}$	0.266 $\times 10^{2}$	$r_{15}$	0.480 $\times 10^{7}$
$r_{3}$	0.120 $\times 10^{5}$	$r_{16}$	0.350 $\times 10^{- 3}$
$r_{4}$	0.860 $\times 10^{- 3}$	$r_{17}$	0.175 $\times 10^{- 1}$
$r_{5}$	0.820 $\times 10^{- 3}$	$r_{18}$	0.100 $\times 10^{9}$
$r_{6}$	0.150 $\times 10^{5}$	$r_{19}$	0.444 $\times 10^{12}$
$r_{7}$	0.130 $\times 10^{- 3}$	$r_{20}$	0.124 $\times 10^{4}$
$r_{8}$	0.240 $\times 10^{5}$	$r_{21}$	0.210 $\times 10^{1}$
$r_{9}$	0.165 $\times 10^{5}$	$r_{22}$	0.578 $\times 10^{1}$
$r_{10}$	0.900 $\times 10^{4}$	$r_{23}$	0.474 $\times 10^{- 1}$
$r_{11}$	0.220 $\times 10^{- 1}$	$r_{24}$	0.178 $\times 10^{4}$
$r_{12}$	0.120 $\times 10^{5}$	$r_{25}$	0.312 $\times 10^{1}$
$r_{13}$	0.188 $\times 10^{1}$

Appendix C. Hyper-Parameter Search for Spectral Radius

Table A2 shows the average test error for different values of the spectral radius, for parametrizing the initial conditions of Robertson’s equations, described in Section 3.1.2. The optimal value of the spectral radius was found to be in a similar range for all experiments and was kept consistent at 0.01 for all shown results for consistency.

Table A2. Different spectral radii used and the MAE in

y_{2}

, averaged across all test cases. Using a value of 0.1–0.01 was found to be optimal across all experiments.

Table A2. Different spectral radii used and the MAE in

y_{2}

, averaged across all test cases. Using a value of 0.1–0.01 was found to be optimal across all experiments.

Spectral Radius	Average Error $(y_{2}) \times 10^{- 7}$
0.0001	0.94
0.001	0.51
0.01	0.33
0.1	0.27
1.0	0.95
10.0	2.24

References

Nascimento, R.G.; Viana, F.A.; Corbetta, M.; Kulkarni, C.S. A framework for Li-ion battery prognosis based on hybrid Bayesian physics-informed neural networks. Sci. Rep. 2023, 13, 13856. [Google Scholar] [CrossRef] [PubMed]
Malekloo, A.; Ozer, E.; AlHamaydeh, M.; Girolami, M. Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights. Struct. Health Monit. 2022, 21, 1906–1955. [Google Scholar] [CrossRef]
Yousif, M.Z.; Yu, L.; Hoyas, S.; Vinuesa, R.; Lim, H. A deep-learning approach for reconstructing 3D turbulent flows from 2D observation data. Sci. Rep. 2023, 13, 2529. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Zhao, T.; Ju, W.; Shi, S. Materials discovery and design using machine learning. J. Mater. 2017, 3, 159–177. [Google Scholar] [CrossRef]
Liang, J.; Ding, Z.; Han, Q.; Wu, H.; Ji, J. Online learning compensation control of an electro-hydraulic shaking table using Echo State Networks. Eng. Appl. Artif. Intell. 2023, 123, 106274. [Google Scholar] [CrossRef]
Li, X.; Bi, F.; Yang, X.; Bi, X. An echo state network with improved topology for time series prediction. IEEE Sensors J. 2022, 22, 5869–5878. [Google Scholar] [CrossRef]
Xu, M.; Han, M.; Qiu, T.; Lin, H. Hybrid regularized echo state network for multivariate chaotic time series prediction. IEEE Trans. Cybern. 2018, 49, 2305–2315. [Google Scholar] [CrossRef] [PubMed]
Duarte, A.L.; Eisencraft, M. Denoising of discrete-time chaotic signals using echo state networks. Signal Process. 2024, 214, 109252. [Google Scholar] [CrossRef]
Ghazijahani, M.S.; Heyder, F.; Schumacher, J.; Cierpka, C. On the benefits and limitations of echo state networks for turbulent flow prediction. Meas. Sci. Technol. 2022, 34, 014002. [Google Scholar] [CrossRef]
Gonon, L.; Grigoryeva, L.; Ortega, J.P. Approximation bounds for random neural networks and reservoir systems. Ann. Appl. Probab. 2023, 33, 28–69. [Google Scholar] [CrossRef]
Hart, A.; Hook, J.; Dawes, J. Embedding and approximation theorems for echo state networks. Neural Netw. 2020, 128, 234–247. [Google Scholar] [CrossRef]
Lambert, J.D. Numerical Methods for Ordinary Differential Systems; Wiley: New York, NY, USA, 1991; Volume 146. [Google Scholar]
Ji, W.; Qiu, W.; Shi, Z.; Pan, S.; Deng, S. Stiff-pinn: Physics-informed neural network for stiff chemical kinetics. J. Phys. Chem. A 2021, 125, 8098–8106. [Google Scholar] [CrossRef]
Cuomo, S.; Di Cola, V.S.; Giampaolo, F.; Rozza, G.; Raissi, M.; Piccialli, F. Scientific machine learning through physics-informed neural networks: Where we are and what’s next. J. Sci. Comput. 2022, 92, 88. [Google Scholar] [CrossRef]
Kim, S.; Ji, W.; Deng, S.; Ma, Y.; Rackauckas, C. Stiff neural ordinary differential equations. Chaos Interdiscip. J. Nonlinear Sci. 2021, 31, 093122. [Google Scholar] [CrossRef]
Chen, R.T.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 2–8 December 2018. [Google Scholar]
Goswami, S.; Jagtap, A.D.; Babaee, H.; Susi, B.T.; Karniadakis, G.E. Learning stiff chemical kinetics using extended deep neural operators. Comput. Methods Appl. Mech. Eng. 2024, 419, 116674. [Google Scholar] [CrossRef]
Lu, L.; Jin, P.; Pang, G.; Zhang, Z.; Karniadakis, G.E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell. 2021, 3, 218–229. [Google Scholar] [CrossRef]
Gobbert, M.K. Robertson’s Example for Stiff Differential Equations; Technical Report; Arizona State University: Tempe, AZ, USA, 1996. [Google Scholar]
Verwer, J.G. Gauss–Seidel iteration for stiff ODEs from chemical kinetics. SIAM J. Sci. Comput. 1994, 15, 1243–1250. [Google Scholar] [CrossRef]
Anantharaman, R.; Ma, Y.; Gowda, S.; Laughman, C.; Shah, V.; Edelman, A.; Rackauckas, C. Accelerating simulation of stiff nonlinear systems using continuous-time echo state networks. arXiv 2020, arXiv:2010.04004. [Google Scholar]
Roberts, C.; Lara, J.D.; Henriquez-Auba, R.; Bossart, M.; Anantharaman, R.; Rackauckas, C.; Hodge, B.M.; Callaway, D.S. Continuous-time echo state networks for predicting power system dynamics. Electr. Power Syst. Res. 2022, 212, 108562. [Google Scholar] [CrossRef]
Anantharaman, R.; Abdelrehim, A.; Jain, A.; Pal, A.; Sharp, D.; Edelman, A.; Rackauckas, C. Stably Accelerating Stiff Quantitative Systems Pharmacology Models: Continuous-Time Echo State Networks as Implicit Machine Learning. IFAC-PapersOnLine 2022, 55, 1–6. [Google Scholar] [CrossRef]
Jaeger, H. Echo state network. Scholarpedia 2007, 2, 2330. [Google Scholar] [CrossRef]
Viehweg, J.; Worthmann, K.; Mäder, P. Parameterizing echo state networks for multi-step time series prediction. Neurocomputing 2023, 522, 214–228. [Google Scholar] [CrossRef]
Lukoševičius, M. A practical guide to applying echo state networks. In Neural Networks: Tricks of the Trade, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 659–686. [Google Scholar]
Rackauckas, C.; Gwozdz, M.; Jain, A.; Ma, Y.; Martinuzzi, F.; Rajput, U.; Saba, E.; Shah, V.B.; Anantharaman, R.; Edelman, A.; et al. Composing modeling and simulation with machine learning in Julia. In Proceedings of the 2022 Annual Modeling and Simulation Conference (ANNSIM), San Diego, CA, USA, 18–20 July 2022; pp. 1–17. [Google Scholar]
Horváth, A.; Hatwágner, M.F.; Harmati, I.Á. Searching for a nonlinear ODE model of vehicle crash with genetic optimization. In Proceedings of the 2012 7th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 24–26 May 2012; pp. 131–136. [Google Scholar]
Morris, M.D.; Mitchell, T.J. Exploratory designs for computational experiments. J. Stat. Plan. Inference 1995, 43, 381–402. [Google Scholar] [CrossRef]
Cao, D.; Li, X.; Zhu, H. A polynomial-augmented RBF collocation method using fictitious centres for solving the Cahn–Hilliard equation. Eng. Anal. Bound. Elem. 2022, 137, 41–55. [Google Scholar] [CrossRef]
Jankowska, M.A.; Karageorghis, A.; Chen, C.S. Improved Kansa RBF method for the solution of nonlinear boundary value problems. Eng. Anal. Bound. Elem. 2018, 87, 173–183. [Google Scholar] [CrossRef]
Yao, G.; Chen, C.S.; Zheng, H. A modified method of approximate particular solutions for solving linear and nonlinear PDEs. Numer. Methods Partial Differ. Equ. 2017, 33, 1839–1858. [Google Scholar] [CrossRef]

Figure 1. Depiction of a standard Echo State Network.

Figure 2. Figures show the time history for

y_{2}

for a test parameter set (P2 from Table 1). Each trial mentioned below is referenced from Table 2. The NLPCTESN prediction is the best out of all models. (a) Trial 1 (b) Trial 2 (c) Trial 3 (d) Trial 4 (e) Trial 5.

Figure 2. Figures show the time history for

y_{2}

for a test parameter set (P2 from Table 1). Each trial mentioned below is referenced from Table 2. The NLPCTESN prediction is the best out of all models. (a) Trial 1 (b) Trial 2 (c) Trial 3 (d) Trial 4 (e) Trial 5.

Figure 3. Figures show time history for

y_{2}

for a test parameter set (P4 from Table 4). Each trial mentioned below is referenced from Table 3. The NLPCTESN performs the best out of all models. (a) Trial 1 (b) Trial 2 (c) Trial 3 (d) Trial 4 (e) Trial 5.

Figure 3. Figures show time history for

y_{2}

for a test parameter set (P4 from Table 4). Each trial mentioned below is referenced from Table 3. The NLPCTESN performs the best out of all models. (a) Trial 1 (b) Trial 2 (c) Trial 3 (d) Trial 4 (e) Trial 5.

Figure 4. Sliding Basepoint system for collision modeling.

Figure 5. Solution

v_{2}

for several tests with different test parameters, the average error of which is shown in Table 7. The left and right columns show results from the linear and nonlinear projections, respectively. Table 6 defines the values P3–P5 mentioned below. (a) LPCTESN—P3 (b) NLPCTESN—P3 (c) LPCTESN—P4 (d) NLPCTESN—P4 (e) LPCTESN—P5 (f) NLPCTESN—P5.

Figure 5. Solution

v_{2}

for several tests with different test parameters, the average error of which is shown in Table 7. The left and right columns show results from the linear and nonlinear projections, respectively. Table 6 defines the values P3–P5 mentioned below. (a) LPCTESN—P3 (b) NLPCTESN—P3 (c) LPCTESN—P4 (d) NLPCTESN—P4 (e) LPCTESN—P5 (f) NLPCTESN—P5.

Figure 6. Solutions

y_{1}

,

y_{2}

,

y_{14}

, and

y_{20}

for a test parameter (P2 from Table 10). The left and right columns contain results from the linear and nonlinear projections, respectively. The NLPCTESN outperforms the LPCTESN in this example. (a) LPCTESN—

y_{1}

(b) NLPCTESN—

y_{1}

(c) LPCTESN—

y_{2}

(d) NLPCTESN—

y_{2}

(e) LPCTESN—

y_{14}

(f) NLPCTESN—

y_{14}

(g) LPCTESN—

y_{20}

(h) NLPCTESN—

y_{20}

.

Figure 6. Solutions

y_{1}

,

y_{2}

,

y_{14}

, and

y_{20}

for a test parameter (P2 from Table 10). The left and right columns contain results from the linear and nonlinear projections, respectively. The NLPCTESN outperforms the LPCTESN in this example. (a) LPCTESN—

y_{1}

(b) NLPCTESN—

y_{1}

(c) LPCTESN—

y_{2}

(d) NLPCTESN—

y_{2}

(e) LPCTESN—

y_{14}

(f) NLPCTESN—

y_{14}

(g) LPCTESN—

y_{20}

(h) NLPCTESN—

y_{20}

.

Table 1. Test parameter values for rate parametrization of Robertson’s ODE system.

Parameter Set	$r_{1} / 0.04$	$r_{2} / 3 \cdot E 7$	$r_{3} / 1 \cdot E 4$
P1	0.95	1.05	0.95
P2	0.9	1.1	0.9
P3	1.1	0.9	1.1
P4	1.03	0.99	1.04

Table 2. Average Mean Absolute Error (MAE) in

y_{2}

, averaged across all test cases.

Table 2. Average Mean Absolute Error (MAE) in

y_{2}

, averaged across all test cases.

Trial No.	$N_{r}$	$N_{data}$	$N_{neigh}$	Polynomial Augmentation	Projection Method	Avg. MAE ( $\times 10^{- 7}$ )
1	50	50	4	N	Linear	17.6
2	50	50	All	Y	Linear	1.40
3	50	500	4	Y	Linear	1.29
4	50	50	4	Y	Linear	0.76
5	50	50	4	Y	Nonlinear	0.32

Table 3. Average Mean Absolute Error (MAE) in

y_{2}

, averaged across all test cases when parametrizing the initial condition.

Table 3. Average Mean Absolute Error (MAE) in

y_{2}

, averaged across all test cases when parametrizing the initial condition.

Trial No.	$N_{r}$	$N_{data}$	$N_{neigh}$	Projection Method	Avg. MAE ( $\times 10^{- 7}$ )
1	500	50	All	Linear	158
2	50	50	All	Linear	28.6
3	500	50	4	Linear	25.2
4	50	50	4	Linear	6.23
5	50	50	4	Nonlinear	0.99

Table 4. Test parameter values for initial condition parametrization of Robertson’s ODE system.

Parameter Set	$y_{1}$	$y_{3}$
P1	0.6	0.4
P2	0.7	0.3
P3	0.85	0.15
P4	0.9	0.1

Table 5. Parameter values taken from [28] with their SI units.

$m_{1}$ (kg)	$m_{2}$ (kg)	$D_{k}$	$D_{c}$	$c_{2}$ (N)	$v_{f}$ (m/s)
1916	24.9	3.38	0.459	145,000	0.1

Table 6. Test parameter values for the collision modeling problem.

Parameter Set	$v_{0} / 25$	$k_{0} / 10^{6}$
P1	0.496	4.46
P2	0.884	8.45
P3	0.424	1.43
P4	0.283	3.75
P5	0.9	9.95

Table 7. Average Mean Absolute Error (MAE) in

v_{2}

, averaged across all test cases in Table 6.

Table 7. Average Mean Absolute Error (MAE) in

v_{2}

, averaged across all test cases in Table 6.

Trial No.	$N_{x}$	$N_{neigh}$	Projection Method	Avg. MAE ( $v_{2}$ )
1	50	4	Linear	1.02
2	50	4	Nonlinear	0.51

Table 8. Species involved in the POLLU reaction system, with their production and loss rates, derived from [20].

Species (y)	Production Rate (P)	Loss Rate (L)
1	$r_{2} y_{2} y_{4} + r_{3} y_{5} y_{2} + r_{9} y_{11} y_{2} + r_{11} y_{13} r_{12} y_{10} y_{2} + r_{22} y_{19} + r_{25} y_{20}$	$r_{1} + r_{10} y_{11} + r_{14} y_{6} r_{23} y_{4} + r_{24} y_{19}$
2	$r_{1} y_{1} + r_{21} y_{19}$	$r_{2} y_{4} + r_{3} y_{5} + r_{9} y_{11} + r_{12} y_{10}$
3	$r_{1} y_{1} + r_{17} y_{5} + r_{19} y_{16} + r_{22} y_{19}$	$r_{15}$
4	$r_{15} y_{3}$	$r_{2} y_{2} + r_{16} + r_{17} + r_{23} y_{1}$
5	$r_{4} y_{7} + r_{4} y_{7} + r_{6} y_{7} y_{6} + r_{7} y_{9} r_{13} y_{14} + r_{20} y_{17} y_{6}$	$r_{3} y_{2}$
6	$r_{3} y_{5} y_{2} + r_{18} y_{16} + r_{18} y_{16}$	$r_{6} y_{7} + r_{8} y_{9} + r_{14} y_{1} + r_{20} y_{17}$
7	$r_{13} y_{14}$	$r_{4} + r_{5} + r_{6} y_{6}$
8	$r_{4} y_{7} + r_{5} y_{7} + r_{6} y_{7} y_{6} + r_{7} y_{9}$	0.0
9	0	$r_{7} + r_{8} y_{6}$
10	$r_{7} y_{9} + r_{9} y_{11} y_{2}$	$r_{12} y_{2}$
11	$r_{8} y_{9} y_{6} + r_{11} y_{13}$	$r_{9} y_{2} + r_{10} y_{1}$
12	$r_{9} y_{11} y_{2}$	0.0
13	$r_{10} y_{11} y_{1}$	$r_{11}$
14	$r_{12} y_{10} y_{2}$	$r_{13}$
15	$r_{14} y_{1} y_{6}$	0.0
16	$r_{16} y_{4}$	$r_{18} + r_{19}$
17	0.0	$r_{20} y_{6}$
18	$r_{20} y_{17} y_{6}$	0.0
19	$r_{23} y_{1} y_{4} + r_{25} y_{20}$	$r_{21} + r_{22} + r_{24} y_{1}$
20	$r_{24} y_{19} y_{1}$	$r_{25}$

Table 9. Mean Absolute Error (MAE) for the POLLU problem, averaged across all test cases in Table 10.

Projection Method	Mean Absolute Error
Projection Method	$y_{1} (\times 10^{- 5})$	$y_{2} (\times 10^{- 5})$	$y_{4} (\times 10^{- 4})$	$y_{14} (\times 10^{- 8})$	$y_{20} (\times 10^{- 6})$
Linear	75.1	76.3	7.68	51.5	22.1
Nonlinear	2.60	2.78	3.01	1.76	2.59

Table 10. Test parameter values for the POLLU air pollution modeling problem.

Parameter Set	$y_{2, 0}$	$y_{4, 0}$	$y_{7, 0}$
P1	0.192	0.0464	0.114
P2	0.258	0.04012	0.103
P3	0.188	0.0332	0.0938
P4	0.208	0.0508	0.115
P5	0.228	0.0488	0.0714

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bhatnagar, S. Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks. Math. Comput. Appl. 2024, 29, 9. https://doi.org/10.3390/mca29010009

AMA Style

Bhatnagar S. Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks. Mathematical and Computational Applications. 2024; 29(1):9. https://doi.org/10.3390/mca29010009

Chicago/Turabian Style

Bhatnagar, Saakaar. 2024. "Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks" Mathematical and Computational Applications 29, no. 1: 9. https://doi.org/10.3390/mca29010009

APA Style

Bhatnagar, S. (2024). Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks. Mathematical and Computational Applications, 29(1), 9. https://doi.org/10.3390/mca29010009

Article Menu

Investigating the Surrogate Modeling Capabilities of Continuous Time Echo State Networks

Abstract

1. Introduction

2. Methods

2.1. Standard Echo State Networks

2.2. Continuous Time Echo State Networks (CTESNs)

Surrogate Modeling via CTESNs

3. Applications and Results

3.1. Robertson’s Equations

3.1.1. Parametrizing Rates of Reaction

3.1.2. Parametrizing Initial Conditions

3.2. Sliding Basepoint Model

3.3. The POLLU Model

4. Discussion and Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. k-Nearest Neighbor Polynomial-Augmented Radial Basis Function Interpolation

Appendix B. POLLU Reaction Rates

Appendix C. Hyper-Parameter Search for Spectral Radius

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI