1. Introduction
Partial differential equations (PDEs) are ubiquitous in all branches of science and engineering. These mathematical models are generally derived from conservation laws, sound physical arguments, and empirical heuristics drawn from experiments by an insightful researcher. However, there are many complex systems (some being neuroscience, weather forecasting, disease control modeling, finance, and ecology) that are still awaiting quantitative models to be built on physical arguments. The counter approach is to build models based on observing complex patterns in enormous amounts of readily available data which can loosely be termed as reverse-engineering nature. This approach has been used in previous studies, for example by Kepler, to approximate elliptic orbits solely from planetary positions. This reverse-engineering is appropriate today as we can leverage computers to identify patterns from observing data that might not be comprehensible to humans. Recent progress in machine learning and data science [
1,
2], combined with an increase in computational power, has generated innovative algorithms to distill physical models from data. These algorithms are termed as data-driven tools, as they can infer correlations and extract patterns from high-dimensional data or measurements. Some popular data-driven tools for extracting dynamical system from data are artificial neural networks, compressive sensing, sparse optimization, and symbolic regression approaches based on evolutionary algorithms.
Artificial neural network (ANN) is a branch of machine learning also referred to as deep learning (if multiple hidden layers are used), a technique that transforms input features through nonlinear interactions and maps to output target features [
3,
4]. ANNs emulate human brain functioning with several hidden layers consisting of so-called neurons with specific weights assigned to each of them. These ANN models have gained popularity in recent times due to their superior performance in modeling complex nonlinear interactions across a wide range of applications including image processing [
5], video classification [
6] and autonomous driving [
7]. Extreme learning machine (ELM) was used as auto-encoders to extract the basis for the reconstruction of fine-scale information from limited data [
8]. The major drawback with the approach of a deep-learning framework is that the obtained model is not quite open to physical inference or interpretability and in general is considered to be a black-box model.
Compressive sensing (CS) [
9,
10] has been applied to signal processing in seeking the sparsest solution (i.e., solution with the fewest number of non-zero basis functions from data). Sparsity-promoting optimization techniques [
11,
12] play fundamental roles in CS and are generally categorized under basis pursuit algorithms [
13]. In sparse optimization, the least squares (LS) objective function is regularized in such a way that additional constraints on the cost function are added to shrink large coefficients and avoid overfitting, thereby promoting sparsity in feature selection. The least absolute shrinkage and selection operator (LASSO) [
11,
14] is one of the most popular regularized LS regression methods. LASSO performs feature selection through L
penalty added to LS objective function to recover sparse solutions [
15]. Ridge regression [
16] is another regularized variant where L
penalty is added to LS objective function. The L
penalty helps in grouping multiple correlated basis functions and increases robustness and convergence stability for ill-conditioned systems. Elastic net approach [
17,
18] is a hybrid of LASSO and ridge approach, combining the strengths of both algorithms.
Derived from these advances, sequentially threshold least squares (STLS) algorithm [
19] was developed where a hard threshold on non-zero coefficients was performed recursively to find sparse solutions. This algorithm was leveraged to form a framework called sparse identification of nonlinear dynamics (SINDy) [
19] to extract ordinary differential equations representing the underlying phenomena. This seminal work re-envisioned the model discovery from the preservative of sparse optimization and compressive sensing. They have recovered various benchmark dynamical systems such as chaotic Lorenz system and vortex shedding behind the cylinder. However, this framework faces challenge in recovering spatio-temporal data or high-dimensional measurements and highly correlated basis functions. This limitation was addressed using sequential threshold ridge regression (STRidge) algorithm forming a framework called PDE functional identification of nonlinear dynamics (PDE-FIND) [
20]. PDE-FIND was applied to high-dimensional spatio-temporal measurements representing various nonlinear dynamics. This framework also performs reasonably well under addition of noise to data/measurements. The sparse optimization methods discussed above generally have a free parameter associated with penalty term that is tuned by the user to recover multiple models containing complex parsimonious models.
Other works exploiting L
regularized LS minimization were used to recover various nonlinear PDEs [
21,
22] using both high-fidelity and distorted (noise) data. Additionally, limited and distorted data samples were used to recover chaotic and high-dimensional systems [
23,
24]. Bayes information criteria was used to rank different recovered models with different complexity for completely new systems, thereby realizing confidence in the recovered model [
25]. The above discussed frameworks assume that the structure of the model to be recovered is sparse in nature; that is, only a small number of terms govern the dynamics of the system. This assumption holds for many physical systems in science and engineering.
Symbolic regression (SR) approaches based on evolutionary computation [
26,
27] are another class of frameworks that can find analytically tractable functions that have been applied to system identification problems. Traditional linear and nonlinear regression algorithms assume a mathematical form and only find parameters that best fit the data. On the other hand, SR approaches aim to simultaneously find parameters and learn functional form of the model from observed data. This is generally achieved by searching mathematical abstractions with a preselected set of arithmetic operators while minimizing the error metrics. Finally, optimal model is selected from Pareto front analysis with respect to minimizing accuracy versus model complexity. Genetic programming (GP) [
26] is a popular choice leveraged by most of the SR frameworks. GP is an evolutionary computation technique that is inspired by Darwin’s theory of natural evolution. A seminal work was done in extracting nonlinear dynamics [
28] from input–output response using the GP approach. The use of GP-based SR approaches has appeared in various system identification problems [
29,
30,
31]. Furthermore, GP has been applied to identify closed-loop feedback control for turbulent separated flows [
32,
33]. Different improved versions of GP have been proposed recently; for instance, gene expression programming (GEP) [
27], parse matrix evolution (PME) [
34], and linear genetic programming (LGP) [
35]. GEP has been exploited recently to recover functional models approximating nonlinear behavior of stress tensors in Reynolds-averaged Navier–Stokes (RANS) and large eddy-simulation (LES) turbulence models [
36,
37]. Generally, GP-based SR approaches can identify models with complex nonlinear compositions given enough computational time.
Recently, researchers proposed fast and deterministic SR approaches by using results from CS and sparse optimization. Thus, a family of deterministic SR approaches were proposed. The first non-evolutionary SR algorithm was fast function extraction (FFX) [
38] where feasible models were confined to generalized linear models (GLM) [
39] and best bases and their corresponding coefficients were found by pathwise regularized learning that is also called elastic net algorithm [
17]. The final models are selected through non-dominated filtering with respect to accuracy and model complexity. FFX draws influences from both GP and CS to distill better models from data. FFX solves quadratic optimization problems and thus, computation cost increases quadratically as we increase the number of bases in search space. FFX and GP has been applied to various problems such as dynamical system recovery and solar power prediction based on energy production data [
40]. Elite base regression (EBR) [
41] is a recent advancement in non-evolutionary computation where only elite bases are selected by measuring the correlation coefficient of basis functions with respect to the target model. These elite bases are spanned in search space and use the parse matrix encoding scheme to propagate the algorithm further to recover mathematical model. Prioritized grammar enumeration (PGE) [
42] is another deterministic approach where genetic operators and random numbers from GP are replaced with grammar production rules and systematic choices. PGE approach also aims for the substantial reduction of search space.
In this paper, we demonstrate the use of FFX, a deterministic symbolic regression algorithm, to identify and recover the target PDEs representing both linear and nonlinear dynamical systems. We build candidate basis functions consisting of partial derivative terms of varying orders approximated by the central finite-difference formulas. For testing, we use exact analytical solutions of PDEs to get input data and use FFX Python package [
38] to demonstrate its feasibility. First, we recover simple linear PDEs such as wave and heat equations. We then recover higher-order nonlinear PDEs such as Burgers, KdV, and Kawahara equations. We further add noise to input data originated from the same PDEs to test the robustness of FFX.
The rest of the paper is organized as follows.
Section 2 gives a brief description of the FFX algorithm. In
Section 3, FFX is tested on different canonical PDEs. We demonstrate performance and robustness of FFX by inferring dynamics from both clean and noisy input data.
Section 4 gives the summary of our findings and limitations that need further investigation.
4. Summary and Conclusions
Machine-learning methods can be extremely useful for researchers in inferring complex models from data or measurements. FFX mainly leverages recent advances in compressive sensing to learn analytically tractable mathematical models using only data. This makes FFX a deterministic symbolic regression approach. The core of FFX is pathwise regularized learning also called elastic net algorithm, which is a popular sparse regression approach. The sparse regression methods regularize the ordinary least squares regression problem by adding L and L penalty terms to discourage overfitting and tame the coefficients. This regularization of LS problem promotes sparsity there by recovering only a subset of candidate bases to explain the data. This property is what made sparse regression approaches such as LASSO, ridge, and their variants fundamental to compressive sensing. FFX enumerates the given basis functions to add nonlinear terms and uses pathwise regularized learning to extract several models of varying complexity (number of bases) and prediction accuracy. These models are then filtered using non-dominated sorting favoring lower-complexity models with best test accuracy. The enumeration of basis functions to add nonlinear bases to library and non-dominated sorting of learned models with respect to model complexity and test accuracy are inspired by GP. The core of FFX is pathwise regularized learning (elastic net) inspired by CS which is a regularized LS problem.
In this paper, we demonstrate the use of FFX to extract different linear and nonlinear PDEs by exploring patterns in data. We especially build and enumerate large candidate features using input data to leverage sparse optimization algorithm of FFX to discover parsimonious equations. The numerical experiments of several canonical PDEs shows that FFX is a promising machine-learning technique to capture true features and associated coefficients accurately. Additionally, input sensor data is slightly distorted by adding noise to test the robustness of the algorithm (i.e., adding noise before computing candidate basis functions). FFX works reasonably well up to 1% of noise but fails for higher amounts of noise. Additionally, wave equation is recovered for higher levels of noise (up to 25%) compared to other PDEs. This aspect of algorithm needs further investigation as the real-world data sets might have higher levels of noise. However, we note that adding noise to input data which is later used to enumerate features might be more challenging than adding noise directly to candidate basis functions (not shown in this study). Additionally, there is a challenge in identifying the desired model if the solution satisfies multiple equations. This is due to the sparsity-promoting behavior of sparse regression methods which converges to the optimal model containing the least number of bases to explain the data, e.g., it finds wave equations even though the solution satisfies both wave and Kawahara equations. Overall, FFX is a deterministic and scalable algorithm that can be exploited as a potential data-driven tool for recovering hidden physical structures or parameterizations representing high-dimensional systems such as turbulent geophysical flows, traffic models, and weather forecasting using only data.