Next Article in Journal
On the Dynamics of New 4D and 6D Hyperchaotic Systems
Next Article in Special Issue
Hyperbolic Directed Hypergraph-Based Reasoning for Multi-Hop KBQA
Previous Article in Journal
A Reverse Positional Encoding Multi-Head Attention-Based Neural Machine Translation Model for Arabic Dialects
Previous Article in Special Issue
Wind Speed Prediction via Collaborative Filtering on Virtual Edge Expanding Graphs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identification of Nonlinear State-Space Systems via Sparse Bayesian and Stein Approximation Approach

1
College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
2
Department of Mathematics and Computer Science, Hengshui University, Hengshui 053000, China
3
Institute of Electrical Engineering, Yanshan University, Qinhuangdao 066004, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(19), 3667; https://doi.org/10.3390/math10193667
Submission received: 20 August 2022 / Revised: 18 September 2022 / Accepted: 30 September 2022 / Published: 6 October 2022
(This article belongs to the Special Issue Mathematics-Based Methods in Graph Machine Learning)

Abstract

:
This paper is concerned with the parameter estimation of non-linear discrete-time systems from noisy state measurements in the state-space form. A novel sparse Bayesian convex optimisation algorithm is proposed for the parameter estimation and prediction. The method fully considers the approximation method, parameter prior and posterior, and adds Bayesian sparse learning and optimization for explicit modeling. Different from the previous identification methods, the main identification challenge resides in two aspects: first, a new objective function is obtained by our improved Stein approximation method in the convex optimization problem, so as to capture more information of particle approximation and convergence; second, another objective function is developed with L 1 -regularization, which is sparse method based on recursive least squares estimation. Compared with the previous study, the new objective function contains more information and can easily mine more important information from the raw data. Three simulation examples are given to demonstrate the proposed algorithm’s effectiveness. Furthermore, the performances of these approaches are analyzed, including parameter estimation of root mean squared error (RMSE), parameter sparsity and prediction of state and output result.

1. Introduction

In the actual world, non-linear systems are commonplace, such as social networks, industry systems, biological systems, finance, and chemical engineering. Identification of non-linear systems are widely acknowledged for its importance and difficulties [1,2], such as fractional order system [3,4,5], neural networks [6], non-linear ARMAX (NARMAX) [7], and Hammerstein–Wiener [8] models. The non-linear state-space model is a kind of expression for all these non-linear systems. A common method for identifying non-linear state-space models is to look for a concise description that is consistent with some non-linear terms (kernel functions) based on data [9,10]. Classic functional decomposition methods, such as Volterra expansion, Taylor polynomial expansion, or Fourier series [9,11], provide a few options for kernel functions. These methods are founded on the idea that there are a finite set of fundamental kernel functions whose linear combination can be utilized to describe the dynamics of a non-linear state-space system. However, under the condition of more kernel functions, the efficiency of this kind of method decreases rapidly. A promising way to identify non-linear state-space systems is probabilistic method [12,13,14,15], which has received a lot of attention over the past few years. Earlier non-linear state-space systems identification methods based on probabilistic method, such as regression method, maximum likelihood (ML) [16], expectation-maximization (EM) [17], mainly utilize gradient descent method while ignoring the parameter identification and approximation process information.
To further improve the efficiency of parameter identification, many identification techniques have been presented that combine the gradient descent technique and Bayesian approximation recently. In particular, those based on the variational inference method have attracted more attention due to their superior performance, such as variational inference in the Gaussian process (GP) [12], Gaussian-process state-space model (GP-SSM) [13], deep variational Bayes filters (DVBF) [14], optimistic inference control (OIC) [15]. Various approximation expectation-maximization (EM)-based techniques have also been studied. In [18], the authors have examined applying EM method to estimate the parameters of a non-linear state space model of the disease dynamics. In [19,20], both Bayesian and a ML estimation strategy are employed in addition to a competitive GP approximation method. For learning, Monte Carlo technique and EM method are utilized in [21], which also includes variational method using the same GP approximation.
Notably, none of these methods take into account the prediction’s resilience and all of them presuppose that the structure of non-linear state-space models is unknown. However, in reality, many parameters are missing precision due to reasons, such as slow convergence rate or falling into local optimum. In this article, we concentrate on the optimum problem since it is more difficult and has numerous real-world implications. For example, in a pilot plant PH process, Hammerstein–Wiener model is used in the prediction of the neutral liquid pH value [22]. In addition, the problem of non-linear state-space systems with missing precision is related to complexity of parameters. Sparse representation can be used in dealing with sparse solutions of linear regression equations, which can effectively reduce the complexity of parametric solutions. There are just a few papers that deal with the sparse representation identification issue and multiple constraints on the system’s parameters [23,24]. In [25], the convex constraint is added to the parameters sparse representation in the non-linear identification algorithm. These works of literature only consider the selection of sparse methods and do not consider conditional restrictions in the process of parameter approximation.
Unlike the previous work, our suggested framework, on the other hand, allows us to include more constraints on the corresponding model parameters, e.g., inequality constraint, priori information and Stein discrepancy constraints. The dynamical system discovery issue is reimagined in this paper from the perspectives of sparse regression [26,27,28], compressed sensing [29,30,31], Stein approximation theory [32], and convex optimization method [23]. The use of sparsity approaches is relatively new [33,34,35] in dynamical systems. We all know that most non-linear state-space systems have only a few important dynamics elements, resulting in sparse parameter set in a high-dimensional non-linear function space.
Although these efforts are focused on non-linear system identification using state-space models, they have some significant flaws. First, these methods carry out system identification in a two-stage way, that is, compute the posterior objective function with parameters, and then learning system parameter with noise. The computing posterior objective function and the learning system parameter are two different processes in these two-stage approaches, and their parameters cannot be modified together. System identification will perform worse as a result of this. The ideal relationship between these two processes would be one of complementarity. That is, learning system identification should contribute to computing posterior objective function, and the updated posterior objective function should contribute to learning system parameter with noise. For non-linear systems with state-space formulation, ref. [36] addresses the recursive joint inference and learning problem, and a reduced rank formulation of GP-SSMs is used to model the system as a Gaussian-process state space model (GP-SSM). In [37], a two-stage Bayesian optimization framework is introduced, which consists of representation of the objective function in low-dimensional parameter space and surrogate model selection in the reduced space. In this study, the only posterior objective function is considered, which can not achieve effective interactive learning, and may also compromise the optimization performance for system parameters.
To address the problems of existing identification methods for non-linear state-space systems, we propose non-linear state-space identification algorithm with Sparse Bayesian and Stein Approach (NSSI-SBSA), which is an optimization approach for improving the accuracy of system parameter identification and posterior distribution computing simultaneously in an integrated structure as opposed to the conventional two-stage method. In our new method, we select least absolute shrinkage and selection operator (LASSO) [26] as parameter sparsity algorithms. The sparse parameter is taken into posterior distribution, which reduced the complexity of posterior distribution. Compared with some sparse method of least angle regression (LARS) [38], sequentially thresholded least squares (STLS) [39], and basis pursuit denoising (BPDN) [40], LASSO is more suitable for high-dimensional data. In our article, the sparse model identification results strikes a natural balance between model sparsity and precision, and prevent the model from being overfit to the data.
From a statistical perspective, we will discuss how a Bayesian technology, optimization method, and Stein approximation strategy might mitigate the difficulties of large correlations in the state matrix. The following are some of the technical note’s most important contributions:
(1)
The NSSI-SBSA algorithm is proposed. In the algorithm, the sparse method is used for the parameter estimation and prediction. Parameter prior, Bayesian sparse learning, and optimisation are used in an integrated computing framework instead of the classical two-stage method. The sparse model identification results strikes a natural balance between model sparsity and precision, preventing the model from overfit to the data.
(2)
A nonconvex optimisation problem is constructed in the non-linear state-space system identification issue with additive noise. Compared with other related methods, we not only take the evidence maximisation as an objective function, but also consider the Stein discrepancy of parameters as another objective function in the non-convex optimization problem. The two functions are integrated into one objective function containing more information. It can captures more important information from the raw date and reduce the complexity of parametric solutions.
The rest of this paper is organized as follows. Section 2 describes problem statement and background. In Section 3, construct the model in Bayesian framework. In Section 4, non-convex optimisation with Stein constrain for identification is introduced. Three numerical illustration, including Narendra-Li Model, NARX model, Kernel state-space models (KSSM) are presented in Section 5. Finally, we give some closing remarks in Section 6.

2. Problem Statement and Background

We consider the following non-linear state-space model [25]
π x i k = f i x k , u k + ν i k = r = 1 N θ i r f i r ( x k , u k ) + ν i k
o i k = g i x k , u k + e i k ,
where x k = [ x 1 k , x i k x n x , k ] is the state variable in time step k, and  u k is the external control input. When system is time continuous, π x i k = x ˙ i k ; When system is time discrete, π x i k = x i k or x i k x i , k 1 ; ν i k N 0 , σ i k 2 is noise (when i k , σ i k = 0 ), which is set to be i.i.d. Gaussian distribution. f i r ( x k , u k ) : R n x + n u R and g i x k , u k : R n x + n u R is Lipschitz continuous function, and  n x and n u are the dimensions of x k and u k , respectively. θ i r is the weight of basis functions. Together, θ i r and f i r determine the dynamics. It is worth emphasizing that we make no assumptions about the non-linear functions on the right-hand side of (1).
If the system can provide M data samples that meet (1), the system in (1) can be represented as
Y i = F i θ i + ν i , i = 1 , , n x ,
where Y i = π ( x i 1 ) , , π ( x i M ) R M × 1 , θ i T θ i 1 , , θ i N i R N i × 1 , ν i ν i 0 , , ν i , M 1 R M × 1 , and  F i R M × N i is called dictionary matrix. j-th column in F i is described by
[ f i j ( x 0 , u 0 ) , , f i j ( x M 1 , u M 1 ) ] .
In this article, the identification task is to estimate θ i from the measured data of Y i . This leads to the solution of a linear regression issue, in which the least square (LS) method can be used if some of model’s non-linear part is understood, i.e.,  F i is known. In (1), we merely discuss the identification problem, as we do in g i x k , u k . Because of the potential non-relevant or independent columns in F i , the solution θ i in ( 2 ) is often sparse, and a few frequently used non-linear dynamical models need to be considered. For the convenience of expression, we rewrite the linear regression issues in (2) into the following form
Y = F θ + ν

3. Constructing the Model in Bayesian Framework

All unknowns in Bayesian modeling are evaluated as they are random variables with specific distributions [41]. For  Y = F θ + ν in (3), noisy variables ν is Gaussian independently identically distribution (i.i.d.), i.e.,  ν N ( 0 , β I ) , β = σ 2 , identity matrix I . We can obtain the likelihood of the data P ( Y θ ) = N ( Y F θ , β I ) exp { 1 2 β Y F θ 2 2 } . P ( θ ) is a prior distribution defined as P ( θ ) exp { 1 2 α θ T θ } = j exp { 1 2 α θ j T θ j } = j P θ j , and  α is hyper parameter. For the convenience of calculating sparse parameters θ , 1 2 α θ T θ is selected as a concave non-decreasing function of θ j . The priors of θ includes Gaussian and t-distribution (see [42] for details).
The posterior P ( θ Y ) is heavily linked and non-Gaussian, so computing the posterior mean E ( θ Y ) is often difficult. To solve this problem, take P ( θ Y ) as an approximation of Gaussian distribution. In [41], effective posterior computation algorithms are used in the computation.
Another method is to use super Gaussian priors, in which the priors P θ j is computed by the variational EM algorithms [42]. We define hyperparameters α α 1 , , α N R + N . The priors of θ can be written as: P ( θ ) = j = 1 n P θ j , P θ j = N θ j 0 , α j φ α j , where φ α j is probability density function and φ α j 0 . P ( θ , α ) = j N θ j 0 , α j φ α j = P ( θ α ) P ( α )   P ( θ ) , where P ( θ α ) j N θ j 0 , α j , P ( α ) j φ α j . Considering the data Y , the posterior probability of θ can be represented as P ( θ Y , α ) = P ( Y θ ) P ( θ ; α ) P ( Y θ ) P ( θ ; α ) d θ = N m θ , Σ θ . From [43], the posterior mean m θ and covariance Σ θ are given by:
m θ = Λ F λ I + F Λ F 1 Y Σ θ = Λ Λ F λ I + F Λ F 1 F ,
where Λ is a diagonal matrix written as diag [ α ] . It is obvious to maximize P ( θ , α Y ) , the most important question is how to select the best α ^ . P ( Y θ ) and P ( θ ; α ) are taken as prior information, so we need only consider P ( Y θ ) P ( θ ; α ) d θ . Using type-II ML [43], the marginal likelihood P ( Y θ ) P ( θ ; α ) d θ can be maximised, and the selected α ^ is written as
α ^ = argmax α 0 P ( Y θ ) j = 1 n N θ j 0 , α j φ α j d θ .
After α ^ is computed in (5), the estimation of θ can be obtained as θ ^ = E ( θ Y ; α ^ ) = Λ ^ F λ I + F Λ ^ F 1 Y , with  Λ ^ diag [ α ^ ] . It indicates that picking the most likely hyperparameters α ^ is capable of explaining the data Y .

4. Non-Convex Optimisation with Stein Method for Identification

4.1. Stein Operators Selection and Stein Constrain Design

The approach can be sketched as follows for a target distribution P with support Z. Find a suitable operator B : = B P (referred to as the Stein operator) and a large class of functions F B = F ( B P ) (referred to as the Stein class) such that Z has distribution P, denoted Z P , if, and only if, we have f F B for all functions
E [ B f ( Z ) ] = 0 .
A Stein operator can be designed in a variety of ways [32]. In our framework, Stein’s identity and kernelized Stein difference are crucial. P ( θ ) is probability density function, which is continuous and differentiable on θ R d . According to Stein’s theory, suitable smooth and derivable function ϕ ( θ ) and Q ( θ ) are selected in (6), which are expressed as ϕ ( θ ) = ϕ 1 ( θ ) , , ϕ d ( θ ) , Q ( θ ) = Q 1 ( θ ) , , Q d ( θ ) .
E θ P B P ϕ ( θ ) Q ( θ ) = 0 ,
where
B P ϕ ( θ ) Q ( θ ) = Q ( θ ) ϕ ( θ ) θ log P ( θ )
+ θ ϕ ( θ ) Q ( θ ) + θ Q ( θ ) ϕ ( θ ) .
The Stein operator B P operates on the function ϕ ( θ ) Q ( θ ) and produces a zero mean function B P ϕ ( θ ) Q ( θ ) under θ P .
Assume mild zero boundary conditions on ϕ ( θ ) Q ( θ ) . θ i , when θ is compact, θ R d , P ( θ ) ϕ ( θ ) Q ( θ ) 0 . The expectation of B Q ϕ ( θ ) Q ( θ ) under θ Q are no longer equal zero. The magnitude of E θ Q B P ϕ ( θ ) Q ( θ ) is related with the difference between P and Q. The probability distances S ( Q , P ) for P ( θ ) and Q ( θ ) in some proper function set F B are defined as
S ( Q , P ) = max ϕ F B E θ Q trace B P ϕ ( θ ) Q ( θ ) 2 .
The discriminative power and computational tractability of the Stein discrepancy are determined by the set F B . F B includes sets of functions with bounded Lipschitz norms, each of which is a difficult and intractable functional optimization problem with special considerations. To tackle this trouble of calculation, Q ( θ ) and ϕ ( θ ) are selected in the unit sphere of a reproducing kernel Hilbert space (RKHS) [32]. Kernelized Stein discrepancy (KSD) between P ( θ ) and Q ( θ ) is described as
S ( Q , P ) = max ϕ H d [ E x Q ( trace ( B P ϕ ( θ ) Q ( θ ) ) ) ] 2 ,
s . t . | ϕ ( θ ) Q ( θ ) | H d 1 .
The optimal solution of (8) is ψ ( θ ) = [ ϕ Q , P ( θ ) Q ( θ ) ] / | ϕ Q , P Q ( θ ) | H d where
ψ Q , P ( · ) = E θ Q B P k ( θ , · ) Q ( θ ) .
A direct calculation shows that
S ( Q , P ) = | ψ Q , P 2 | H d .
For any fixed θ , kernel function k θ , · belongs to RKHS. S ( Q , P ) =0, that is to say ψ Q , P ( θ ) 0 only if P θ = Q θ . The radial basis function ( RBF ) kernel k θ , θ = exp 1 h θ θ 2 2 is purely positive definite in a strict sense. When θ approaches θ , the  RBF converges to 1. Then, S ( Q , P ) contains the information of parameter approximation, which is an important factor affecting the accuracy of parameters θ . For convenience, we define T j = ψ Q , P ( · ) and K ( θ j , · ) = k ( θ j , · ) Q ( θ j ) . Q ( θ j ) and P ( θ j ) are from the Gaussian distribution with different hyperparameters. Q ( θ j ) is defined as follows
Q ( θ j | τ j ) exp 1 2 τ j θ j T θ j .
By subtracting Q ( θ j | τ j ) and k ( θ j , · ) into K ( θ j , · ) , we have
K ( θ j , · ) exp 1 h ( θ j θ 0 ) T ( θ j θ 0 ) .
Based on K ( θ j , · ) , we have
θ K ( θ j , · ) 2 h ( θ j θ 0 ) exp 1 h ( θ j θ 0 ) T ( θ j θ 0 ) .
According to the results, T j is written as
T j = K ( θ j , · ) · θ j log P ( θ j | α j ) + θ K ( θ j , · ) = 1 α j θ j exp 1 h ( θ j θ 0 ) T ( θ j θ 0 ) 2 h ( θ j θ 0 ) exp 1 h ( θ j θ 0 ) T ( θ j θ 0 ) 1 α j θ j exp 1 h θ j T θ j 2 h θ j exp 1 h ( θ j θ 0 ) T ( θ j θ 0 ) = ( 1 α j + 2 h ) θ j exp 1 h θ j T θ j .
It is easy to derive the expectation of T j
E [ T j ] = ( 1 α j + 2 h ) θ j exp 1 h θ j T θ j exp 1 2 τ j θ j T θ j d θ j = ( 1 α j + 2 h ) θ j exp ( 1 h + 1 2 τ j ) θ j T θ j d θ j = 1 α j + 2 h 1 τ j + 2 h exp ( 1 h + 1 2 τ j ) θ j T θ j .
For convenience, let ξ ( α j ) = 1 α j + 2 h 1 τ j + 2 h , we have
E [ T j ] = ξ ( α j ) exp ( 1 h + 1 2 τ j ) θ j T θ j .
Putting every E [ T j ] together, we obtain the following result:
E [ T ] = E [ T j ] .
Based on the (11), we see that E [ T ] is non-convex objective function in the α -space. The optimisation problem is described as
α ^ = argmin α 0 E [ T ] .
Remark 1.
Stein method is improved and kernel function k θ , θ is also from Stein class F B , but the dynamic characteristics of proposed function Q ( θ ) is considered in the designing of B P operator for the approximation of P ( θ ) in the unit sphere of RKHS. The new B P operator can increase the chance to jump out of the local non-convex optimum. In the optimization problem, E [ T ] is a new objective function, which can accelerate approaching speed between P ( θ ) and Q ( θ ) .

4.2. Parameter Sparse Identification of Constraints from Data

In (12), E [ T ] is another objective function, which makes the parameter α less sensitive to noisy data and converges to true value. The problem of system identification with convex constraints is given by a sparse Bayesian formulation, which is then handled as a non-convex optimisation problem in this section. To obtain a better parameter α , the new objective function is constructed as
α ^ = argmax α 0 ( 1 / E [ T ] ) P ( Y θ ) j = 1 n N θ j 0 , α j φ α j d θ .

4.2.1. Objective Function in Parameter Identification

Theorem 1.
Use the notation J α ( α ) as the objective function
J α ( α ) = log λ I + F Λ F + Y λ I + F Λ F 1 Y + j = 1 N p α j 2 j = 1 N ( 1 h + 1 2 τ j ) θ j T θ j 2 log ξ ( α j ) .
By minimising J α ( α ) , the optimal hyperparameters α ^ in (13) is derived, where p α j = 2 log φ α j . The mean of θ is calculated and represented as θ ^ = Λ ^ F λ I + F Λ ^ F 1 Y .
Proof. 
Using the Woodbury inversion identity, re-express m θ and Σ θ in (4):
m θ = Λ F λ I + F Λ F 1 Y = 1 λ Σ θ F Y
Σ θ = Λ Λ F λ I + F Λ F 1 F Λ = Λ 1 + 1 λ F F 1 .
Since the data likelihood P ( Y θ ) is Gaussian, we can express the integral in (13) as follow:
( 1 / E [ T ] ) N ( Y F θ , λ I ) j = 1 N N θ j 0 , α j φ α j d θ
= ( 1 / E [ T ] ) 1 2 π λ M / 2 1 2 π N / 2 exp { E ( θ ) } d θ j = 1 N φ α j α j ,
where
E ( θ ) = 1 2 λ Y F θ 2 + 1 2 θ Λ 1 θ = 1 2 θ m θ Σ θ 1 θ m θ + E ( Y ) .
We obtain E ( Y ) using the Woodbury inversion identity.
E ( Y ) = 1 2 1 λ Y Y 1 λ Y F Σ θ Σ θ 1 Σ θ F Y 1 λ = 1 2 Y λ I + F Λ F 1 Y .
Just for the sake of calculation, we evaluate the integral of exp { E ( θ ) } as follows
exp { E ( θ ) } d θ = exp { E ( Y ) } ( 2 π ) N / 2 Σ θ 1 / 2 .
Then, applying a 2 log ( · ) transformation to (17), we have
2 log P ( Y θ ) j = 1 n N θ j 0 , α j φ α j d θ 2 log E [ T ] = log Σ θ + M log 2 π λ + log | Λ | + Y λ I + F Λ F 1 Y + j = 1 N p α j 2 j = 1 N ( 1 h + 1 2 τ j ) θ j T θ j 2 j = 1 N log ξ ( α j ) = log λ I + F Λ F + M log 2 π + Y λ I + F Λ F 1 Y + j = 1 N p α j 2 j = 1 N ( 1 h + 1 2 τ j ) θ j T θ j 2 j = 1 N log ξ ( α j ) ,
where Λ 1 λ I + F Λ F = | λ I | Λ 1 + 1 λ F F , log λ I + F Λ F = log Σ θ + M log λ + log | Λ | . From (13), we then obtain
α ^ = arg min α 0 ( log λ I + F Λ F + Y λ I + F Λ F 1 Y + j = 1 N p α j 2 j = 1 N ( 1 h + 1 2 τ j ) θ j T θ j 2 log ξ j ( α j ) ) .
To acquire an approximation of θ , we compute the posterior
mean θ : θ ^ = E ( θ Y ; α ^ ) = Λ F λ I + F Λ ^ F 1 Y .
Remark 2.
In (13), the objective function of recursive least squares estimation with L1-regularization is developed, which is integrated into the objective function of Stein approximation. The new one contains more information and can captures more important information from the raw date. We can obtain the relative good parameter probability distribution.
Lemma 1.
J α ( α ) in (14) is non-convex function.
Proof of Lemmma 1.
The data-dependent term Y λ I + F Λ F 1 Y in (14) is studied. By (15) and (16), it can be transformed as
Y λ I + F Λ F 1 Y = 1 λ Y Y 1 λ Y F Σ θ F 1 λ Y = 1 λ Y Fm θ 2 2 + m θ Λ 1 m θ = min x 1 λ Y Fx 2 2 + x Λ 1 x .
The minimisation issue is simply demonstrated to be convex in θ and α dimensions. Define ρ ( α ) log λ I + F Λ F + j = 1 N p α j 2 j = 1 N log ξ ( α j ) . log | x | is concave function. Furthermore, λ I + F Λ F is an affine function of α . When α 0 , it is positive semi-definite. This means log λ I + F Λ F is a concave non-decreasing function of α . We can see that ρ ( α ) is a concave function with respect to α .

4.2.2. Modified Objective Function in θ Estimation

We use the modified objective function of θ with a penalty function. By analyzing the corresponding objective function of (14) in the α space, the analogous objective function is subsequently shown to be non-convex as well.
Theorem 2.
Solving the optimisation problem below yields the estimated value for θ given restrictions.
min θ Y F θ 2 2 + λ r ( θ ) ,
where penalty function r ( θ ) = min α 0 θ Λ 1 θ + log λ I + F Λ F + j = 1 N p α j 2 j = 1 N ( 1 h + 1 2 α j ) θ j T θ j 2 j = 1 N log ξ ( α j ) .
Proof of Theorem 2.
Using the data-dependent term in ( 21 ) and J α ( α ) in (14), a stringent upper boundary auxiliary function can be created on J α ( α ) as
J α , θ ( α , θ ) = 1 λ Y F θ 2 2 + θ Λ 1 θ + log λ I + F Λ F + j = 1 N p α j 2 j = 1 N ( 1 h + 1 2 τ j ) θ j T θ j 2 j = 1 N log ξ ( α j ) .
When we minimise J α , θ ( α , θ ) over α and obtain
J θ ( θ ) min α 0 J α , θ ( α , θ ) = 1 λ Y F θ 2 2 + min α 0 ( θ Λ 1 θ + log λ I + F Λ F + j = 1 N p α j 2 j = 1 N ( 1 h + 1 2 τ j ) θ j T θ j 2 j = 1 N log ξ ( α j ) ) .
From the derivations in (21), we can see that the posterior mean m θ is the estimate of the parameter θ .
Lemma 2.
In Theorem 2, the penalty function r ( θ ) promotes sparsity on the weights by being a non-decreasing concave function of θ.
Proof of Lemma 2.
It is obvious that ρ ( α ) in Lemma 1 is concave. Using the duality lemma (see Section 4.2 in [35]), ρ ( α ) is denoted as min α 0 α , α ρ α , where ρ α is defined as the concave conjugate of ρ ( α ) and ρ α = min α 0 α , α ρ ( α ) . By (21), function J α , θ ( α , θ ) can be re-written as
J α , θ ( α , θ ) α , α ρ α + Y λ I + F Λ F 1 Y = 1 λ Y F θ 2 2 + j θ j 2 α j 2 h 1 τ j + α j α j ρ α .
r ( θ ) is re-expressed as
r ( θ ) = min α , α 0 j ( θ j 2 α j 2 h 1 τ j + α j α j ) ρ α .
It is easy to see that when α j = 1 2 ( 2 h + 1 τ j ) 2 + 4 θ j 2 α j + ( 1 h + 2 τ j ) , r ( θ ) reaches the minimum over α . Substitute α j into r ( θ )
r ( θ ) = min α 0 j 2 α j α j 2 h 1 τ j α j | θ j | ρ α .
When r ( θ ) is minimum, θ is much more sparse. From (26), r ( θ ) is non-decreasing concave function of θ .

4.2.3. Parameter Estimation with Sparse Method

From (23), we can see that ρ α does not affect the estimation of parameters α , so J α , θ ( α , θ ) is redefined as
J α ( α , θ ) 1 λ Y F ` 2 2 + j ( θ j 2 α j 2 h 1 τ j + α j α j ) .
For a fixed α , we notice that J α ( α , θ ) is jointly convex in θ and α . In (27), we can obtain θ j 2 α j 2 h 1 τ j + α j α j 2 α j α j 2 h 1 τ j α j θ j . When α j = 1 2 ( 2 h + 1 τ j ) 2 + 4 θ j 2 α j + ( 1 h + 2 τ j ) , J α ( α , θ ) is minimized for any θ . Substitute the α j into J α ( α , θ ) , θ ^ can be obtained as follows:
θ ^ = argmin θ Y F θ 2 2 + 2 λ j = 1 N α j α j 2 h 1 τ j α j | θ j | .
Due to the concavity of r ( θ ) , the objective function in Theorem 2 can be optimised using a re-weighted L 1 -minimisation in a similar kth way as was considered in (27).
In order to obtain more stable and accurate parameter θ ^ , the re-estimated method is put forward (Algorithm 1). At k-th iteration, the modified weight is then supplied by:
u j ( k ) r ( θ ) 2 θ j θ = θ ( k ) = η j · α j ,
where η j = α j α j 2 h 1 τ j . On the basis of the aforementioned, we can now describe how the parameters can be updated. To begin, we set the iteration count k to zero, u j ( 0 ) = 1 and initialise
η j ( k + 1 ) = α j ( k ) α j ( k ) 2 h 1 τ j ( k ) .
Algorithm 1: Non-linear state-space identification algorithm with sparse Bayesian and Stein approach (NSSI-SBSA)
Input:
1: Generate time series data from the system of discrete-time dynamics characterized by (1)
2: Choose the dictionary functions that will be used to build the dictionary matrix
  mentioned in Section 2;
u j 0 = 1 , α j ( k ) , h , τ j ( k ) , stopping threshold ε
3: for  k = 0 , 1 ,  do
4:   Solve the minimisation problem with L1-regularization and optimization method on θ .
argmin θ Y F θ 2 2 + 2 λ j = 1 N α j α j 2 h 1 τ j α j | θ j |
5:   Update parameter η j ( k + 1 ) and u j ( k + 1 ) in (28) and (29).
6:   u j ( k + 1 ) η j ( k + 1 ) · α j ( k + 1 )
7:   Update parameter α j ( k + 1 ) = 1 2 ( 2 h + 1 τ j ( k ) ) 2 + 4 ( θ j ( k ) ) 2 α j ( k ) + ( 1 h + 2 τ j ( k ) )
8: end for
9: if  | θ θ ^ | < ε  then
10:    Break
11: end if
Output:   
  The sparse weight set of θ ^ .
We obtain u j ( k ) = η j ( k ) · α j ( k ) . J α , θ ( α , θ ) is considered again. For any fixed α and θ , the tightest bound can be obtained by minimising over α . α is estimated, which equals the gradient of the function ρ ( α ) in Lemma 1. The estimation of α is computed as
α ^ = α ρ ( α ) = diag F λ I + F Λ F 1 F + p ( α ) 2 ζ ( α ) ,
where p ( α ) = p α 1 , , p α N , ζ ( α ) = ξ ( α 1 ) ξ ( α 1 ) , , ξ ( α N ) ξ ( α N ) . The optimal α ( k + 1 ) can then be replaced by α ^ ( k + 1 ) = diag F λ I + F Λ ( k ) F 1 F + p α ( k ) 2 ζ ( α ) . After computing the estimation of
α j = 1 2 ( 2 h + 1 τ j ) 2 + 4 θ j 2 α j + ( 1 h + 2 τ j ) ,
we can compute α j ( k + 1 ) , which gives
α j ( k + 1 ) = F j λ I + F j Λ ( k ) F j 1 F j + p α j ( k ) 2 ζ ( α j ( k ) )
α j ( k + 1 ) can be defined
α j ( k + 1 ) = 1 2 ( 2 h + 1 τ j ( k ) ) 2 + 4 ( θ j ( k ) ) 2 α j ( k ) + ( 1 h + 2 τ j ( k ) ) .
Substitute α j ( k + 1 ) and α j ( k + 1 ) into (27). Certain weights η j are estimated at each iteration k until | θ θ ^ | < ε , where ε is stopping threshold. Algorithm 1 summarizes the above-mentioned procedure.

5. Numerical Example

All examples are conducted on a computer with an Intel Core i7-6500U [email protected] and 16 GB of RAM. CVX package is used to solve convex programs in MATLAB2016 platform. We will give three numerical examples: Narendra-Li Model [20], NARX model [25], and Kernel state-space models [44]. The utility and performance of Algorithm 1 is proven on three simulation cases in this section. The performance of Algorithm 1 on examples is then illustrated involving a well studied and challenging non-linear system. The root mean squared error (RMSE) criterion will be utilized to demonstrate the performance of the suggested identification approach against noise perturbation. The ith estimate of parameter θ is denoted by θ ^ i at the Monte Carlo experiment. The RMSE at the experiment is defined as
R M S E = i = 1 n ( θ ^ i θ i ) 2 n 1
where θ i represents the true system parameter vector, and n is the trials. To validate the theoretical results, the identification of the structured state-space model in cases will be simulated in this part.

5.1. Example 1: Parameters Identification General Narendra-Li Model

Consider the state space representation of a non-linear system:
x t + 1 1 = α 1 x t 1 1 + x t 1 2 + α 2 sin x t 2 + ξ 1 t
x t + 1 2 = β 1 x t 2 cos x t 2 + β 2 x t 1 exp x t 1 2 + x t 2 2 8 + β 3 u t 3 1 + u t 2 + 0.5 cos x t 1 + x t 2 + ξ 2 t y t = x t 1 1 + 0.5 sin x t 2 + x t 2 1 + 0.5 sin x t 1
where the state variable x t = x t 1 , x t 2 . ξ i t is Gaussian white noise. To generate the estimation data, the system is excited with a uniformly distributed random input signal u ( t ) [ 2.5 , 2.5 ] with 1 t 1000 . The validation dataset is generated with the input
u ( t ) = sin 2 π t 10 + sin 2 π t 25 , t = 1 , , 1000 .
Let Φ 1 t = x t 1 1 + x t 1 2 sin x t 2 , Φ 2 t = sin x t 2 , Φ 3 t = x t 2 cos x t 2 , Φ 4 t = x t 1 exp x t 1 2 + x t 2 2 8 , Φ 5 t = β 3 u t 3 1 + u t 2 + 0.5 cos x t 1 + x t 2 . Because there are two state variables, the dictionary matrix Φ can be built as follows:
Φ = Φ 1 t Φ 2 t Φ 3 t Φ 4 t Φ 5 t Φ 1 t + M 1 Φ 2 t + M 1 Φ 3 t + M 1 Φ 4 t + M 1 Φ 5 t + M 1
Then, the state set can be defined as
x i x t + 1 1 , , x t + M 1 R M × 1 , i = 1 , 2
Using the dictionary matrix Φ in (34), the true value of parameter θ for the model in (32), (33) should be as follows:
θ t r u e = θ ( 1 ) , θ ( 2 )
= α 1 ( = 4 ) 0 α 2 ( = 3 ) 0 0 β 1 ( = 1.4 ) 0 β 2 ( = 1.5 ) 0 β 3 ( = 1.6 )
The parameters in bracket are true value. In our study, we use T = 1000 samples for learning and add white Gaussian measurement noise of ξ i t = 0.1 to the training data. In (32) and (33), Algorithm 1 is used for identifying parameters. The coefficients θ ( j ) is learned from prior data in (35). RMSE of θ is computed in the simulation, the result of which is 0.039 . When the noise ξ i t are 0.2, 0.3, 0.4, and 0.5, there is little change in the value of RMSE of θ in Figure 1. Despite using 2000 data points of [20], our method is substantially better than [20] in Table 1. Table 1 compares some previous results reported in the literature [45,46,47] with our method, we can also see that our method perform the best. In this experiment, we also examine how the method performs for various ξ i t = 0.1 , 0.3 . The last 60 in generated data sequence is selected for testing and the predicting result is compared with the true value. When the Algorithm 1 is executed 8 times, the average of the RMSE of output y t is 0.06, and is not increased fast from Figure 2.
In general, it is clear that the proposed model is capable enough to well describe the system behavior.

5.2. Example 2: Application to a NARX Model

We analyze the following polynomial terms model for a single-input single-output (SISO) non-linear autoregressive system with exogenous (NARX) input in this example [25].
y t k + 1 = 0.7 y t k 1 0.5 y t k 2 + 0.6 u 2 t k 2 0.7 y t k 2 u t k 1 + ξ t k
with y , u , ξ R . In expanded form, we may write (36) as:
y t k + 1 = w 1 + w 2 y t k + + w m x + 2 y t k m z + + w N y d x t k m z u d u t k m u + ξ t k = w f y t k , , y t k m x , u t k , , u t k m u + ξ t k
Model (36) is the general form of (37). d x and d u are the degree of the output and input; m y and m u is the given memory order of the output and input; w = w 1 , , w N R N is the weight vector; and f y t k , , y t k m y , u t k , , u t k m u = f 1 ( · ) , , f N ( · ) R N is the functions vector. Taking the NARX model (36) as an example, we set that d y = 1 , d u = 2 , m y = 2 , m u = 2 . This yields f ( · ) R 28 and, thus, w R 28 . Since w R 4 , only 4 of the 28 linked weights w i are non-zero. In our study, we use k = 1000 samples for learning with white Gaussian noise. The last 60 in generated data sequence is selected for testing and the predicting result is compared with the true value. The estimated parameter w agrees with the true value as shown in Figure 3. The predicting performance of Algorithm 1 is shown in the Figure 4. From the Figure 4, the predicted and exact trajectories match well with different ξ i t = 0.1 , 0.3 . When Algorithm 1 is executed 8 times, the average of the RMSE of output are 0.021 and 0.074, which are tolerable in the application.

5.3. Example 3: Kernel State-Space Models (KSSM) for Autoregressive Modeling

Kernel state-space models (KSSM) is autoregressive model, which satisfy the τ -order difference equation. As seen below, the model may be described as a first-order multivariate process.
x ¯ t + 1 = F t x ¯ t + V t
where x ¯ t = x t , , x t τ + 1 T , F t x ¯ t = [ f t x t , , x t τ + 1 , x t , , x t τ + 2 ] T , and V t = [ ξ t , 0 , , 0 ] T .
The hidden state of an SSM can then be viewed as the process x ¯ t , producing an SSM formulation of a complex autoregressive model with noisy V t . By using non-linear autoregressive modeling with a fixed number of delayed samples, the model can be utilized to predict time series. In addition, if the state-transition function f t is defined using kernels (39), we derive the suggested KSSM suited for autoregressive time series.
x t + 1 x t x t τ + 2 = i = 1 N w i k i x i t k x t x t τ + 2 + ξ t 0 0 Y t = h x ¯ t + V t ,
where Y t is the observed process, h x ¯ t is the observation function, V t is observation noise, and w = [ w 1 , w N ] is the weight. Periodic time series is widely used in physics, engineering and biology. We take Fourier kernel function in the KSSM. Consider 5 candidate kernel functions for k i ( · ) : sin x i , cos x i , x i , sin 2 x i and cos 2 x i .
Algorithm 1 is applied in the identification of parameter w in (39). The RMSE of w is 0.16, which is a satisfactory result. The estimation data in the experiment has 500 sample points, and Figure 5 shows the simulated outputs of the two processes evaluated on the validation set. In Figure 5, we compare the true and estimated value w using the probability distribution and dispersoid distribution. It can see that the sparse effect of the algorithm proposed in this paper is obvious.

6. Conclusions

The parameter estimation of non-linear discrete-time state-space systems with noisy state data are the subject of this work. For parameter estimation and prediction, a novel sparse Bayesian convex optimisation method (NSSI-SBSA) is presented, which considers approximation method, parameter prior, and posterior. The fundamental problem with identification is divided into two parts: the first step, the improved Stein approach is used to create a new optimisation objective function. The second step is to create a reweighted L 1 -regularized least squares solver, with the regularization value chosen from the optimization point. The new objective function is more information-rich and can easily extract more critical information from the raw data than the previous study. From the three examples, the NSSI-SBSA algorithm usually captures more information about the reliance of the data indicators than the methods discussed in the introduction part.

Author Contributions

Methodology, L.Z. and J.L.; Formal analysis, W.Z.; Writing—review and editing, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation (NNSF) of China under Grant (61703149) and the Natural Science Foundation of Hebei Province of China (F2019111009).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ljung, L. Perspectives on system identification. Annu. Control. 2010, 34, 1–12. [Google Scholar] [CrossRef] [Green Version]
  2. Luo, G.; Yang, Z.; Zhan, C.; Zhang, Q. Identification of nonlinear dynamical system based on raised-cosine radial basis function neural networks. Neural Process. Lett. 2021, 53, 355–374. [Google Scholar] [CrossRef]
  3. Yakoub, Z.; Naifar, O.; Ivanov, D. Unbiased Identification of Fractional Order System with Unknown Time-Delay Using Bias Compensation Method. Mathematics 2022, 10, 3028. [Google Scholar] [CrossRef]
  4. Yakoub, Z.; Amairi, M.; Aoun, M.; Chetoui, M. On the fractional closed-loop linear parameter varying system identification under noise corrupted scheduling and output signal measurements. Trans. Inst. Meas. Control. 2019, 41, 2909–2921. [Google Scholar] [CrossRef]
  5. Yakoub, Z.; Aoun, M.; Amairi, M.; Chetoui, M. Identification of continuous-time fractional models from noisy input and output signals. In Fractional Order Systems—Control Theory and Applications; Springer: Cham, Switzerland, 2022; pp. 181–216. [Google Scholar]
  6. Kumpati, S.N.; Kannan, P. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1990, 1, 4–27. [Google Scholar]
  7. Leontaritis, I.J.; Billings, S.A. Input-output parametric models for non-linear systems part II: Stochastic non-linear systems. Int. J. Control. 1985, 41, 329–344. [Google Scholar] [CrossRef]
  8. Rangan, S.; Wolodkin, G.; Poolla, K. New results for Hammerstein system identification. Proceedings of 1995 34th IEEE Conference on Decision and Control, New Orleans, LA, USA, 13–15 December 1995; IEEE: Piscataway, NJ, USA, 1995; Volume 1, pp. 697–702. [Google Scholar]
  9. Billings, S.A. Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains; John Wiley and Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  10. Haber, R.; Unbehauen, H. Structure identification of nonlinear dynamic systems—A survey on input-output approaches. Automatica 1990, 26, 651–677. [Google Scholar] [CrossRef]
  11. Barahona, M.; Poon, C.S. Detection of nonlinear dynamics in short, noisy time series. Nature 1996, 381, 215–217. [Google Scholar] [CrossRef]
  12. Frigola, R.; Lindsten, F.; Schon, T.B.; Rasmussen, C.E. Bayesian inference and learning in Gaussian process state-space models with particle MCMC. Adv. Neural Inf. Process. Syst. 2013, 26. [Google Scholar]
  13. Frigola, R.; Chen, Y.; Rasmussen, C.E. Variational Gaussian process state-space models. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
  14. Karl, M.; Soelch, M.; Bayer, J.; Van der Smagt, P. Deep variational bayes filters: Unsupervised learning of state space models from raw data. arXiv 2016, arXiv:1605.06432. [Google Scholar]
  15. Raiko, T.; Tornio, M. Variational Bayesian learning of nonlinear hidden state-space models for model predictive control. Neurocomputing 2009, 72, 3704–3712. [Google Scholar] [CrossRef]
  16. Ljung, L. Theory for the User. In System Identification; Prentice Hall: Hoboken, NJ, USA, 1987. [Google Scholar]
  17. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Ser. B 1977, 39, 1–22. [Google Scholar]
  18. Duncan, S.; Gyongy, M. Using the EM algorithm to estimate the disease parameters for smallpox in 17th century London. In Proceedings of the 2006 IEEE International Symposium on Intelligent Control, Munich, Germany, 4–6 October 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 3312–3317. [Google Scholar]
  19. Solin, A.; Sarkka, S. Hilbert space methods for reduced-rank Gaussian process regression. Stat. Comput. 2020, 30, 419–446. [Google Scholar] [CrossRef] [Green Version]
  20. Svensson, A.; Schon, T.B. A flexible state–space model for learning nonlinear dynamical systems. Automatica 2017, 80, 189–199. [Google Scholar] [CrossRef]
  21. Frigola, R. Bayesian Time Series Learning with Gaussian Processes; University of Cambridge: Cambridge, UK, 2015. [Google Scholar]
  22. Wilson, A.G.; Hu, Z.; Salakhutdinov, R.R.; Xing, E.P. Stochastic variational deep kernel learning. Adv. Neural Inf. Process. 2016, 2586–2594. [Google Scholar]
  23. Cerone, V.; Piga, D.; Regruto, D. Enforcing stability constraints in set-membership identification of linear dynamic systems. Automatica 2011, 47, 2488–2494. [Google Scholar] [CrossRef] [Green Version]
  24. Zavlanos, M.M.; Julius, A.A.; Boyd, S.P.; Pappas, G.J. Inferring stable genetic networks from steady-state data. Automatica 2011, 47, 1113–1122. [Google Scholar] [CrossRef] [Green Version]
  25. Pan, W.; Yuan, Y.; Goncalves, J.; Stan, G.B. A sparse Bayesian approach to the identification of nonlinear state space systems. IEEE Trans. Autom. Control. 2015, 61, 182–187. [Google Scholar] [CrossRef] [Green Version]
  26. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  27. Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009. [Google Scholar]
  28. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction To Statistical Learning; Springer: New York, NY, USA, 2013. [Google Scholar]
  29. Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
  30. Candes, E.J.; Romberg, J.; Tao, T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 2006, 52, 489–509. [Google Scholar] [CrossRef] [Green Version]
  31. Tropp, J.A.; Gilbert, A.C. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 2007, 53, 4655–4666. [Google Scholar] [CrossRef] [Green Version]
  32. Stein, C. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory, University of California, Berkeley, CA, USA, 21 June–18 July 1970, 9–12 April, 16–21 June and 19–22 July 1971; The Regents of the University of California: Berkeley, CA, USA, 1972. [Google Scholar]
  33. Brunton, S.L.; Tu, J.H.; Bright, I.; Kutz, J.N. Compressive sensing and low-rank libraries for classification of bifurcation regimes in nonlinear dynamical systems. Siam J. Appl. Dyn. Syst. 2014, 13, 1716–1732. [Google Scholar] [CrossRef]
  34. Bai, Z.; Wimalajeewa, T.; Berger, Z.; Wang, G.; Glauser, M.; Varshney, P.K. Low-dimensional approach for reconstruction of airfoil data via compressive sensing. AIAA J. 2015, 53, 920–933. [Google Scholar] [CrossRef]
  35. Arnaldo, I.; O’Reilly, U.M.; Veeramachaneni, K. Building predictive models via feature synthesis. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, Madrid, Spain, 11–15 July 2015; pp. 983–990. [Google Scholar]
  36. Berntorp, K. Online Bayesian inference and learning of Gaussian-process state–space models. Automatica 2021, 129, 109613. [Google Scholar] [CrossRef]
  37. Imani, M.; Ghoreishi, S.F. Two-stage Bayesian optimization for scalable inference in state-space models. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–12. [Google Scholar] [CrossRef]
  38. Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef] [Green Version]
  39. Brunton, S.L.; Proctor, J.L.; Kutz, J.N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 2016, 113, 3932–3937. [Google Scholar] [CrossRef] [Green Version]
  40. Chen, S.S.; Donoho, D.L.; Saunders, M.A. Atomic decomposition by basis pursuit. SIAM Rev. 2001, 43, 129–159. [Google Scholar] [CrossRef] [Green Version]
  41. Bishop, C.M. Pattern recognition. Mach. Learn. 2006, 128. [Google Scholar]
  42. Ma, Z.; Lai, Y.; Kleijn, W.B.; Song, Y.Z.; Wang, L.; Guo, J. Variational Bayesian learning for Dirichlet process mixture of inverted Dirichlet distributions in non-Gaussian image feature modeling. IEEE Trans. Neural Networks And Learn. Syst. 2018, 30, 449–463. [Google Scholar] [CrossRef] [PubMed]
  43. Tipping, M.E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
  44. Tobar, F.; Djuric, P.M.; Mandic, D.P. Unsupervised state-space modeling using reproducing kernels. IEEE Trans. Signal Process. 2015, 63, 5210–5221. [Google Scholar] [CrossRef]
  45. Roll, J.; Nazin, A.; Ljung, L. Nonlinear system identification via direct weight optimization. Automatica 2005, 41, 475–490. [Google Scholar] [CrossRef] [Green Version]
  46. Stenman, A. Model on Demand: Algorithms, Analysis And Applications; Department of Electrical Engineering, Linköping University: Linköping, Sweden, 1999. [Google Scholar]
  47. Xu, J.; Huang, X.; Wang, S. Adaptive hinging hyperplanes and its applications in dynamic system identification. Automatica 2009, 45, 2325–2332. [Google Scholar] [CrossRef]
Figure 1. RMSE of state x 1 and x 2 .
Figure 1. RMSE of state x 1 and x 2 .
Mathematics 10 03667 g001
Figure 2. Output of the state mode for 60 testing data in example 1: (a) ξ i = 0.1 and (b) ξ i = 0.3 .
Figure 2. Output of the state mode for 60 testing data in example 1: (a) ξ i = 0.1 and (b) ξ i = 0.3 .
Mathematics 10 03667 g002
Figure 3. The distribution of w: the above is true w, the below is the estimated w model.
Figure 3. The distribution of w: the above is true w, the below is the estimated w model.
Mathematics 10 03667 g003
Figure 4. Output of the state model for 60 testing data in example 2: (a) ξ = 0.1 and (b) ξ = 0.3 .
Figure 4. Output of the state model for 60 testing data in example 2: (a) ξ = 0.1 and (b) ξ = 0.3 .
Mathematics 10 03667 g004
Figure 5. Compare of distribution of w with sparsity: (a) sparse value and (b) true value.
Figure 5. Compare of distribution of w with sparsity: (a) sparse value and (b) true value.
Mathematics 10 03667 g005
Table 1. Accuracy comparison of different methods ( ξ i = 0.1 ).
Table 1. Accuracy comparison of different methods ( ξ i = 0.1 ).
MethodRMSEData
Our paper0.0391000
Bayesian Learning [20]0.062000
DWO [45] 0.43 50,000
MOD [46] 0.46 50,000
AHH [47] 0.31 2000
MARS [47] 0.49 2000
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhang, L.; Li, J.; Zhang, W.; Yang, J. Identification of Nonlinear State-Space Systems via Sparse Bayesian and Stein Approximation Approach. Mathematics 2022, 10, 3667. https://doi.org/10.3390/math10193667

AMA Style

Zhang L, Li J, Zhang W, Yang J. Identification of Nonlinear State-Space Systems via Sparse Bayesian and Stein Approximation Approach. Mathematics. 2022; 10(19):3667. https://doi.org/10.3390/math10193667

Chicago/Turabian Style

Zhang, Limin, Junpeng Li, Wenting Zhang, and Junzi Yang. 2022. "Identification of Nonlinear State-Space Systems via Sparse Bayesian and Stein Approximation Approach" Mathematics 10, no. 19: 3667. https://doi.org/10.3390/math10193667

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop